Page MenuHomeFreeBSD

bhyve: Map the MSI-X table unconditionally when passthrough is enabled

Authored by markj on Oct 7 2021, 7:57 PM.
Referenced Files
F58883106: D32359.diff
Sat, Apr 1, 8:27 PM
Unknown Object (File)
Tue, Mar 7, 1:57 PM
Unknown Object (File)
Feb 26 2023, 9:49 PM
Unknown Object (File)
Jan 7 2023, 4:42 AM
Unknown Object (File)
Dec 18 2022, 7:55 AM
Unknown Object (File)
Dec 9 2022, 4:21 PM
Unknown Object (File)
Dec 4 2022, 7:47 AM
Unknown Object (File)
Nov 28 2022, 6:19 PM



Right now we have to map the PBA in case it shares a page with the MSI-X
table. It turns out that some devices (newer Intel wifi devices in
particular) may map registers in the same page as the MSI-X table, in
violation of the PCI base spec. To handle both cases, and to avoid
having to use /dev/mem at all, use PCIOCBARMMAP to map the MSI-X BAR and
handle all accesses outside the table like we do today for the PBA.

After some discussion with jhb, I first tried implementing this by
having each trapped BAR access use the new PCIOCBARIO ioctl to actually
perform the access. This ends up being simple to implement, but the
downside is that the ioctl activates and deactivates the resource for
each call, which in this case means that each access will trigger a
global TLB shootdown. In my testing with an Intel wifi device, these
accesses may be quite frequent, and so I think this overhead is too high
since it also perturbs the rest of the system by generating IPIs. I
believe there isn't really much of a downside to unconditionally mapping
the MSI-X table, so I went with that approach instead.

Test Plan

Tested passthrough with a realtek NIC where the PBA and MSI-X table
are co-located, and an Intel wifi NIC where the driver writes to registers
in the same page as the MSI-X table.

Diff Detail

rS FreeBSD src repository - subversion
Lint Passed
No Test Coverage
Build Status
Buildable 42009
Build 38897: arc lint + arc unit

Event Timeline

markj requested review of this revision.Oct 7 2021, 7:57 PM

This is meant to supersede both D31241 and D31534.


You could perhaps keep the 'return (data)' here so that the else case below remains unindented. I don't really mind either way, but the write function below keeps the table case unindented and it seems useful to keep the two functions consistent in their general layout?


Minor pedanticism since the PBA can be in a different BAR.

markj marked 2 inline comments as done.

Address feedback.

This revision is now accepted and ready to land.Oct 8 2021, 8:30 PM added inline comments.

The removal of this field broke BHYVE_SNAPSHOT.

/usr/src/usr.sbin/bhyve/pci_emul.c:2082:36: error: no member named 'pba_page_offset' in 'struct pci_devinst::(unnamed at /usr/src/usr.sbin/bhyve/pci_emul.h:150:2)'; did you mean 'pba_offset'?
        SNAPSHOT_VAR_OR_LEAVE(pi->pi_msix.pba_page_offset, meta, ret, done);

Sorry for the breakage. See . Is it expected that snapshots saved by one version of bhyve(8) are restorable by a later one?


I have no idea. I just have the option enabled but I haven't ever actually used the functionality yet added inline comments.

I'm in the process of porting this change to illumos and, looking at this, should it be




I think you're right, thanks. An earlier version would only create this mapping if the PBA and MSI-X table resided in the same BAR, and that was true for the devices I was testing with. :(