Avoid double bus_dmamap_load() in ioat(4).
AbandonedPublic
Actions

Authored by mav on Nov 11 2019, 9:58 PM.

Details

Reviewers

tychon
cem

Summary

After r345813 ioat(4) started to call _bus_dmamap_load_phys() for all addresses passed to it. But in code already using bus_dma(9) to do virtual to physical address translation this causes double loading. The proposed patch introduces mechanism to optionally delegate bus_dma(9) mapping to the caller, supplying it with proper parent bus_dma(9) tag to use and additional flags to declare which arguments are already mapped. I think that single bus_dma(9) load/unload call per I/O may be faster than per-segment, plus logically cleaner.

While there, add KPI call for getting NUMA domain to which specific DMA engine belongs. It may be useful for some performance optimization.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 27457

Event Timeline

mav created this revision.Nov 11 2019, 9:58 PM

Herald added a subscriber: imp. · View Herald TranscriptNov 11 2019, 9:58 PM

Harbormaster completed remote builds in B27457: Diff 64208.Nov 11 2019, 9:58 PM

cem added inline comments.Nov 13 2019, 6:28 PM

ioat.c
851–855	Are these not an issue for this scheme? It seems like the unloading should be left to the caller if they want to manage busdma loading. These calls take domain lock per invocation on DMAR iommu, and should? be avoided if the caller owns load/unload.

(The changes look good to me, aside from the question.)

mav added inline comments.Nov 13 2019, 8:10 PM

ioat.c
851–855	They are ugly unrelated to this scheme, but I don't see how this change would make them an issue. In case of loads done by caller, unload should be done by it also, and these unloads here will be a NOP. I can barely see them in profiler in case of default bounce backend, while in case of DMAR, looking on the code, I agree that additional locking may be expensive. I'll take a look on that.

After another look I found this change to be useless for me, since I need to load the memory into two different DMA engines same time. And it is easier to to without this functionality. So I'll leave this idea to somebody else who may actually need it. I've committed some changes slightly optimizing the area in other ways.