Page MenuHomeFreeBSD

Avoid double bus_dmamap_load() in ioat(4).
AbandonedPublic

Authored by mav on Nov 11 2019, 9:58 PM.
Tags
None
Referenced Files
Unknown Object (File)
Feb 26 2024, 6:57 PM
Unknown Object (File)
Dec 20 2023, 6:57 AM
Unknown Object (File)
Oct 30 2023, 3:02 AM
Unknown Object (File)
Sep 30 2023, 7:06 PM
Unknown Object (File)
Sep 21 2023, 5:42 PM
Unknown Object (File)
Sep 5 2023, 5:26 PM
Unknown Object (File)
Aug 26 2023, 4:00 AM
Unknown Object (File)
Jul 5 2023, 9:05 PM
Subscribers

Details

Reviewers
tychon
cem
Summary

After r345813 ioat(4) started to call _bus_dmamap_load_phys() for all addresses passed to it. But in code already using bus_dma(9) to do virtual to physical address translation this causes double loading. The proposed patch introduces mechanism to optionally delegate bus_dma(9) mapping to the caller, supplying it with proper parent bus_dma(9) tag to use and additional flags to declare which arguments are already mapped. I think that single bus_dma(9) load/unload call per I/O may be faster than per-segment, plus logically cleaner.

While there, add KPI call for getting NUMA domain to which specific DMA engine belongs. It may be useful for some performance optimization.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 27457

Event Timeline

ioat.c
851โ€“855

Are these not an issue for this scheme? It seems like the unloading should be left to the caller if they want to manage busdma loading. These calls take domain lock per invocation on DMAR iommu, and should? be avoided if the caller owns load/unload.

(The changes look good to me, aside from the question.)

ioat.c
851โ€“855

They are ugly unrelated to this scheme, but I don't see how this change would make them an issue. In case of loads done by caller, unload should be done by it also, and these unloads here will be a NOP. I can barely see them in profiler in case of default bounce backend, while in case of DMAR, looking on the code, I agree that additional locking may be expensive. I'll take a look on that.

After another look I found this change to be useless for me, since I need to load the memory into two different DMA engines same time. And it is easier to to without this functionality. So I'll leave this idea to somebody else who may actually need it. I've committed some changes slightly optimizing the area in other ways.