Page MenuHomeFreeBSD

Enable I/O MMU when PCI pass through is first used.
ClosedPublic

Authored by jhb on Aug 8 2016, 11:35 PM.

Details

Summary

Enable I/O MMU when PCI pass through is first used.

Rather than enabling the I/O MMU when the vmm module is loaded,
defer initialization until the first attempt to pass a PCI device
through to a guest. If the I/O MMU fails to initialize or is not
present, than fail the attempt to pass a PCI device through to a
guest.

The hw.vmm.force_iommu tunable has been removed since the I/O MMU is
no longer enabled during boot. However, the I/O MMU support can be
disabled by setting the hw.vmm.iommu.enable tunable to 0 to prevent
use of the I/O MMU on any systems where it is buggy.

Test Plan
  • Tested PCI passthrough of a Chelsio T5 VF and verified that the IOMMU was enabled when it was passed through (verified via dmardump tool from other review).
  • Verified IOMMU was not enabled if only "plain" VMs without passthrough were started.
  • Set hw.vmm.iommu.enable=0 in kenv before loading vmm.ko and verified starting a VM with a pass through device failed. (The error claimed the device wasn't using ppt, but it still failed.)

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

jhb retitled this revision from to Enable I/O MMU when PCI pass through is first used..
jhb updated this object.
jhb edited the test plan for this revision. (Show Details)
jhb added reviewers: grehan, neel.
jhb added a subscriber: np.

One thing I thought of just as I was uploading this to arc is that we could be calling an 'iommu_present()' function from ppt's attach routine. It would be a wrapper around iommu_init() itself (the added logic in iommu_create_domain() would move there). One advantage of that approach is that we could have the ppt driver do something like:

 if (!iommu_present()) {
     device_printf(dev, "I/O MMU is not present or is disabled\n");
     return (ENXIO);
}

That would give a nice message in dmesg and the failure message from bhyve would then be more "accurate" as there wouldn't be a ppt device. However, I'm fine with either approach.

Another issue exposed by the more dynamic nature of ppt's now is that the host domain needs to be updated when a device is moved to/from the ppt state (perhaps ppt detach ?).

sys/amd64/vmm/io/iommu.c
231 ↗(On Diff #19144)

May need to change iommu_init() to have an error return to avoid calling through to create here if it fails.

sys/amd64/vmm/io/iommu.c
231 ↗(On Diff #19144)

... and also remove the panic() call in iommu_init() and return an error.

Another issue exposed by the more dynamic nature of ppt's now is that the host domain needs to be updated when a device is moved to/from the ppt state (perhaps ppt detach ?).

Hmm, this is almost handled now by ppt_unassign_device(). I think either iommu_remove_device() should move the device back into the host domain, or we should explicitly call iommu_add_device() with the host_domain from ppt.c. Since host_domain is static, the simplest fix would be to have iommu_remove_device() actually call iommu_add_device() with the host_domain after IOMMU_REMOVE_DEVICE() (though that breaks the 1:1 mapping of wrapper functions to IOMMU driver "methods").

Also, even though we don't remove ppt devices if bhyve crashes via a devfs_cdevpriv dtor callback, 'bhyvectl destroy' triggers ppt_unassign_all which will DTRT.

sys/amd64/vmm/io/iommu.c
231 ↗(On Diff #19144)

We can remove the panic, but this already handles iommu_init() failing. If it fails, then it will leave 'ops' as NULL which in turn results in IOMMU_CREATE_DOMAIN() returning NULL.

In D7448#155974, @jhb wrote:

Another issue exposed by the more dynamic nature of ppt's now is that the host domain needs to be updated when a device is moved to/from the ppt state (perhaps ppt detach ?).

Hmm, this is almost handled now by ppt_unassign_device(). I think either iommu_remove_device() should move the device back into the host domain, or we should explicitly call iommu_add_device() with the host_domain from ppt.c. Since host_domain is static, the simplest fix would be to have iommu_remove_device() actually call iommu_add_device() with the host_domain after IOMMU_REMOVE_DEVICE() (though that breaks the 1:1 mapping of wrapper functions to IOMMU driver "methods").

Also, even though we don't remove ppt devices if bhyve crashes via a devfs_cdevpriv dtor callback, 'bhyvectl destroy' triggers ppt_unassign_all which will DTRT.

I missed 'iommu_host_domain()', so we could just patch ppt.c to call iommu_add_device with that. In some ways I do think it's more robust to handle it in iommu_remove_device() if we want to avoid ever having a device not be in a valid domain.

jhb edited edge metadata.
  • Don't panic if creation of the host domain fails.
grehan edited edge metadata.

This is fine as it stands. The potential issues with ppt devices can be addressed separately.

This revision is now accepted and ready to land.Aug 26 2016, 2:16 PM
This revision was automatically updated to reflect the committed changes.