Page MenuHomeFreeBSD

Avoid double bus_dmamap_load() in ioat(4).
AbandonedPublic

Authored by mav on Nov 11 2019, 9:58 PM.

Details

Reviewers
tychon
cem
Summary

After r345813 ioat(4) started to call _bus_dmamap_load_phys() for all addresses passed to it. But in code already using bus_dma(9) to do virtual to physical address translation this causes double loading. The proposed patch introduces mechanism to optionally delegate bus_dma(9) mapping to the caller, supplying it with proper parent bus_dma(9) tag to use and additional flags to declare which arguments are already mapped. I think that single bus_dma(9) load/unload call per I/O may be faster than per-segment, plus logically cleaner.

While there, add KPI call for getting NUMA domain to which specific DMA engine belongs. It may be useful for some performance optimization.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 27457

Event Timeline

mav created this revision.Nov 11 2019, 9:58 PM
cem added inline comments.Wed, Nov 13, 6:28 PM
ioat.c
851–855

Are these not an issue for this scheme? It seems like the unloading should be left to the caller if they want to manage busdma loading. These calls take domain lock per invocation on DMAR iommu, and should? be avoided if the caller owns load/unload.

cem added a comment.Wed, Nov 13, 6:28 PM

(The changes look good to me, aside from the question.)

mav added inline comments.Wed, Nov 13, 8:10 PM
ioat.c
851–855

They are ugly unrelated to this scheme, but I don't see how this change would make them an issue. In case of loads done by caller, unload should be done by it also, and these unloads here will be a NOP. I can barely see them in profiler in case of default bounce backend, while in case of DMAR, looking on the code, I agree that additional locking may be expensive. I'll take a look on that.

mav abandoned this revision.Mon, Nov 25, 6:44 PM

After another look I found this change to be useless for me, since I need to load the memory into two different DMA engines same time. And it is easier to to without this functionality. So I'll leave this idea to somebody else who may actually need it. I've committed some changes slightly optimizing the area in other ways.