After r345813 ioat(4) started to call _bus_dmamap_load_phys() for all addresses passed to it. But in code already using bus_dma(9) to do virtual to physical address translation this causes double loading. The proposed patch introduces mechanism to optionally delegate bus_dma(9) mapping to the caller, supplying it with proper parent bus_dma(9) tag to use and additional flags to declare which arguments are already mapped. I think that single bus_dma(9) load/unload call per I/O may be faster than per-segment, plus logically cleaner.
While there, add KPI call for getting NUMA domain to which specific DMA engine belongs. It may be useful for some performance optimization.