This is WIP branch, I published it for some discussions related to DMAR code refactoring.
I think either all should be moved, or none. I am fine with the move.
That said, after this merge I do not like sys/iommu.h idea. I think now that it should be sys/dev/iommu where MI files are put.
I'm a bit surprised at how this code obtains the hpa. If not for the fact that the memory is wired, what follows would be racy. And more to my real point, this code already knows about vm objects, so why doesn't it just iterate over the vm object's resident page list? Then, it could look for m->psind > 0 to increase the granularity of the mapping changes.
But the pages must be wired for pass-through to work, at least in the current state of the driver and the level of supported hardware. In principle, newer VT-d spec allows hardware to report non-fatal faults which can be handled by the DMAR driver. In fact, Mellanox/Nvidia ConnectX networking cards has similar facilities as well, but our RDMA/IB stack does not use it.
Looking into objects would make an assumption about the structure of the vmspace, for instance we must not have shadowing. This is true right now, but I suspect that stuff like snapshots invalidates the assumption (not sure at what state the snapshot code is).
Do you want to resume work on this, and finish some pieces and commit them?
Elaborating on my comment, I was surprised that the code wires the pages a second time to get the hpa. Note how the second wiring is immediately released before the hpa is calculated.
In regards to Mellanox/Nvidia and linuxkpi, I fear that we'll have to support MMU notifiers at some point, which will require changes sprinkled throughout the virtual memory system.
I hope so, but I am not sure I can do anything that large ATM. Do you have specific proposals about already useful pieces?
You mean the mere call to vm_fault_quick_hold_pages(), which is several lines later ends up in vm_page_unhold_pages(), after iommu_create_mapping()? And the first wire is due to vm being used for pass-through?
I think this is a minor overhead, in fact.
WRT to MMU notifiers, do you mean some kind of callback when the page becomes invalid or reclaimed?
While nothing will use it at the instant that you commit it, I am going to suggest that you peel off iommu_gas_remove() and commit it. Then, as I make further changes to iommu_gas, it will be easier for me to take into account the parts for eventually supporting bhyve.