Page MenuHomeFreeBSD

bhyve: do not remove VM from IOMMU host domain
ClosedPublic

Authored by bz on Mar 18 2022, 8:49 PM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 28 2022, 2:38 AM
Unknown Object (File)
Dec 27 2022, 6:42 AM
Unknown Object (File)
Dec 14 2022, 6:55 PM

Details

Summary

When we run behyve without passthru host devices can DMA to all VMs
without restrictions.
When we add a passthru device that is no longer the case as we are
removing the weird VM memory from the host domain.

Now if we are using physical host devices from bhyve user-mode,
e.g., physical disks /dev/ada<n> /dev/da<n> with AHCI or over USB (umass)
then the guest physical address (GPA) is passed to the bhyve user mode
process which in turn will translate it to its virtual mappin in the
host space and pass it to the preadv syscall which will then go to,
e.g. ahci and use the then mapped address for the physical device.
That address is a guest mapped physical address which is no longer valid
in the host domain (as we removed the weird VM mappings from that) and
as a result the DMA will fail.

This will not result in an error in AHCI, or the syscall returned to
bhyve but the passed buffer will stay untouched essentially resulting
in non-working IO in the guest.

It is unclear why initially the mapped guest address space was removed
from the host domain, but it does not seem to give any extra security
for the host or guest compared to a non-passthru VM.

In conclusion rather than adding an extra bounce layer (as initially
drafted for a proof of concent in D34535), keep the GPA mappings valid
in the host domain and allow IO to work. That solves a long-standing
problem when using passthru devices and physical disks in the same VM.

With lots of help and patience from: grehan
PR: 260178

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

bz requested review of this revision.Mar 18 2022, 8:49 PM

@grehan might be able to explain this simpler and better than I can. This is here for more eyes and discussions. Please ask questions to leave comments as you see fit.

Removing guest pages from the host domain has the appearance of offering some additional security, but VM control structures are still accessible (EPT, VMCS etc). Also, it only protects guests that have ppt devices configured, and doesn't do anything for other guests.

Since it results in the inability to use any form of zero-copy i/o, I'm in favour of bz's change for now.

(I should add: IOMMU protection of a guest does seem like a very useful feature, though it requires a lot more work)

This revision is now accepted and ready to land.Mar 19 2022, 10:52 PM

Some suggestions on the wording:

bhyve: Do not remove guest physical addresses from IOMMU host domain

This permits I/O devices on the host to directly access wired memory
dedicated to guests using passthru devices.  Note that wired memory
belonging to guests that do not use passthru devices has always been
accessible by I/O devices on the host.

bhyve maps guest physical addresses into the user address space of the bhyve
process by mmap'ing /dev/vmm/<vmname>.  Device models pass pointers
derived from this mapping directly to system calls such as preadv() to minimize
copies when emulating DMA.  If the backing store for a device model is a raw
host device (e.g. when exporting a raw disk device such as /dev/ada<n> as a
drive in the guest), the host device driver (e.g. ahci for /dev/ada<n>) can itself
use DMA on the host directly to the guest's memory.  However, if the guest's
memory is not present in the host IOMMU domain, these DMA requests by
the host device will fail without raising an error visible to the host device driver
or to the guest resulting in non-working IO in the guest.

It is unclear why guest addresses were removed from the IOMMU host domain
initially, especially only for VM's with a passthru device as the host IOMMU
domain does not affect the permissions of passthru devices, only devices on
the host.

I considered an alternative of using bounce buffers instead (D34535 is a proof of
concept), but that adds additional overhead for unclear benefit.

This solves a long-standing problem when using passthru devices and physical disks in the same VM.

The original log message had a few typos you want to check for in the log message you end up using. (s/behyve/bhyve/, s/weird/wired/, s/mappin/mapping/ might be other typos I didn't see?)

I've applied this patch on freebsd 13R-p8 and it worked in my scenario.