Page MenuHomeFreeBSD

bhyve, DMAR: integrate
Needs ReviewPublic

Authored by kib on Jul 14 2020, 8:13 PM.

Details

Reviewers
br
Group Reviewers
bhyve
Summary

This is WIP branch, I published it for some discussions related to DMAR code refactoring.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

kib requested review of this revision.Jul 14 2020, 8:14 PM
sys/x86/iommu/intel_dmar.h
322

Some of these flags are used by generic IOMMU busdma backend:
IOMMU_MF_CANWAIT/CANSPLIT passed to iommu_map()

should we move the flags to sys/iommu.h ?

sys/x86/iommu/intel_dmar.h
322

I think either all should be moved, or none. I am fine with the move.

That said, after this merge I do not like sys/iommu.h idea. I think now that it should be sys/dev/iommu where MI files are put.

sys/amd64/vmm/vmm.c
941–943

I'm a bit surprised at how this code obtains the hpa. If not for the fact that the memory is wired, what follows would be racy. And more to my real point, this code already knows about vm objects, so why doesn't it just iterate over the vm object's resident page list? Then, it could look for m->psind > 0 to increase the granularity of the mapping changes.

sys/amd64/vmm/vmm.c
941–943

But the pages must be wired for pass-through to work, at least in the current state of the driver and the level of supported hardware. In principle, newer VT-d spec allows hardware to report non-fatal faults which can be handled by the DMAR driver. In fact, Mellanox/Nvidia ConnectX networking cards has similar facilities as well, but our RDMA/IB stack does not use it.

Looking into objects would make an assumption about the structure of the vmspace, for instance we must not have shadowing. This is true right now, but I suspect that stuff like snapshots invalidates the assumption (not sure at what state the snapshot code is).

Do you want to resume work on this, and finish some pieces and commit them?

sys/amd64/vmm/vmm.c
941–943

Elaborating on my comment, I was surprised that the code wires the pages a second time to get the hpa. Note how the second wiring is immediately released before the hpa is calculated.

In regards to Mellanox/Nvidia and linuxkpi, I fear that we'll have to support MMU notifiers at some point, which will require changes sprinkled throughout the virtual memory system.

kib marked 3 inline comments as done.Fri, Jul 29, 11:16 AM
In D25672#816160, @alc wrote:

Do you want to resume work on this, and finish some pieces and commit them?

I hope so, but I am not sure I can do anything that large ATM. Do you have specific proposals about already useful pieces?

sys/amd64/vmm/vmm.c
941–943

You mean the mere call to vm_fault_quick_hold_pages(), which is several lines later ends up in vm_page_unhold_pages(), after iommu_create_mapping()? And the first wire is due to vm being used for pass-through?

I think this is a minor overhead, in fact.

WRT to MMU notifiers, do you mean some kind of callback when the page becomes invalid or reclaimed?

In D25672#817019, @kib wrote:
In D25672#816160, @alc wrote:

Do you want to resume work on this, and finish some pieces and commit them?

I hope so, but I am not sure I can do anything that large ATM. Do you have specific proposals about already useful pieces?

While nothing will use it at the instant that you commit it, I am going to suggest that you peel off iommu_gas_remove() and commit it. Then, as I make further changes to iommu_gas, it will be easier for me to take into account the parts for eventually supporting bhyve.

kib marked an inline comment as done.

Rebase after iommu_gas_remove() commit.