Page MenuHomeFreeBSD

bhyve: Map the MSI-X table unconditionally when passthrough is enabled
ClosedPublic

Authored by markj on Oct 7 2021, 7:57 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Oct 21, 10:24 PM
Unknown Object (File)
Sun, Oct 19, 12:14 AM
Unknown Object (File)
Sun, Oct 19, 12:13 AM
Unknown Object (File)
Sun, Oct 19, 12:12 AM
Unknown Object (File)
Sun, Oct 19, 12:11 AM
Unknown Object (File)
Wed, Oct 15, 10:18 PM
Unknown Object (File)
Tue, Oct 14, 7:41 PM
Unknown Object (File)
Sun, Oct 12, 3:29 AM

Details

Summary

Right now we have to map the PBA in case it shares a page with the MSI-X
table. It turns out that some devices (newer Intel wifi devices in
particular) may map registers in the same page as the MSI-X table, in
violation of the PCI base spec. To handle both cases, and to avoid
having to use /dev/mem at all, use PCIOCBARMMAP to map the MSI-X BAR and
handle all accesses outside the table like we do today for the PBA.

After some discussion with jhb, I first tried implementing this by
having each trapped BAR access use the new PCIOCBARIO ioctl to actually
perform the access. This ends up being simple to implement, but the
downside is that the ioctl activates and deactivates the resource for
each call, which in this case means that each access will trigger a
global TLB shootdown. In my testing with an Intel wifi device, these
accesses may be quite frequent, and so I think this overhead is too high
since it also perturbs the rest of the system by generating IPIs. I
believe there isn't really much of a downside to unconditionally mapping
the MSI-X table, so I went with that approach instead.

Test Plan

Tested passthrough with a realtek NIC where the PBA and MSI-X table
are co-located, and an Intel wifi NIC where the driver writes to registers
in the same page as the MSI-X table.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 42009
Build 38897: arc lint + arc unit

Event Timeline

markj requested review of this revision.Oct 7 2021, 7:57 PM

This is meant to supersede both D31241 and D31534.

usr.sbin/bhyve/pci_passthru.c
319

You could perhaps keep the 'return (data)' here so that the else case below remains unindented. I don't really mind either way, but the write function below keeps the table case unindented and it seems useful to keep the two functions consistent in their general layout?

450

Minor pedanticism since the PBA can be in a different BAR.

markj marked 2 inline comments as done.

Address feedback.

This revision is now accepted and ready to land.Oct 8 2021, 8:30 PM
val_packett.cool added inline comments.
usr.sbin/bhyve/pci_emul.h
161

The removal of this field broke BHYVE_SNAPSHOT.

/usr/src/usr.sbin/bhyve/pci_emul.c:2082:36: error: no member named 'pba_page_offset' in 'struct pci_devinst::(unnamed at /usr/src/usr.sbin/bhyve/pci_emul.h:150:2)'; did you mean 'pba_offset'?
        SNAPSHOT_VAR_OR_LEAVE(pi->pi_msix.pba_page_offset, meta, ret, done);
                                          ^~~~~~~~~~~~~~~
                                          pba_offset
usr.sbin/bhyve/pci_emul.h
161

Sorry for the breakage. See https://reviews.freebsd.org/D32523 . Is it expected that snapshots saved by one version of bhyve(8) are restorable by a later one?

usr.sbin/bhyve/pci_emul.h
161

I have no idea. I just have the option enabled but I haven't ever actually used the functionality yet

andy_omniosce.org added inline comments.
usr.sbin/bhyve/pci_passthru.c
446

I'm in the process of porting this change to illumos and, looking at this, should it be

pi->pi_msix.table_bar

?

usr.sbin/bhyve/pci_passthru.c
446

I think you're right, thanks. An earlier version would only create this mapping if the PBA and MSI-X table resided in the same BAR, and that was true for the devices I was testing with. :(