Iflib has a complex mechanism to choose between using busdma and raw pmap_kextract() at runtime. This added complexity makes the code harder to maintain, and arguably hides bugs.
The stated purpose of having the raw pmap_kextract() path alongside busdma was to improve performance. However, on my setup (dual ixl 40GbE interfaces on a Haswell based E5-2697 v3), I'm unable to measure any meaningful difference in either packet forwarding or packet drop rate with this patch versus the stock tree. We run a less extensive version of this patch at Netflix and have noticed no performance issues from using busdma in our CDN workload.
When doing this patch, I uncovered several pre-existing issues, mostly centered around failing to call bus_dmamap_unload(), and unneeded bus_dmamap_load() / pmap_kextract() on clusters which have not been reallocated in _iflib_fl_refill(). Note that these are not fixed here; I plan to tackle those in a separate review.
Note that you may want to hold off on reviewing until Olivier can verify that it does no harm on his forwarding setup and until I can test it on a Netflix workload. I think there may be several revisions of this patch.