Page MenuHomeFreeBSD

bhyve/pci_emul: Use vmem to track BAR allocations
Needs ReviewPublic

Authored by bnovkov on Jan 11 2026, 1:32 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Feb 13, 5:29 AM
Unknown Object (File)
Mon, Feb 9, 11:33 PM
Unknown Object (File)
Sun, Feb 8, 7:32 AM
Unknown Object (File)
Tue, Feb 3, 4:23 PM
Unknown Object (File)
Tue, Feb 3, 4:22 PM
Unknown Object (File)
Tue, Feb 3, 4:18 PM
Unknown Object (File)
Tue, Feb 3, 4:17 PM
Unknown Object (File)
Jan 20 2026, 7:54 PM

Details

Reviewers
None
Group Reviewers
bhyve
Summary

This patch replaces bhyve's PCI BAR bump allocator with libuvmem(3).
This allows us to allocate PCI BARs during runtime, which is a
prerequisite for PCI device hotplugging.

Under this resource management scheme, each virtual PCI bus manages its
IO, MEM32, and MEM64 BARs using separate vmem_t arenas, with each BAR
type's address space evenly distributed across all virtual PCI buses.
For example, consider a hypothetical virtual machine with two PCI buses whose
PCIBAR_IO BARs can be allocated from [0x1000, 0x3000). Under this BAR
management scheme, a vmem_t arena for PCI bus 0 manages the
[0x1000, 0x2000) range, while the PCI bus 1 arena manages the
[0x2000, 0x3000) range.

All BARs are allocated using vmem's M_BESTFIT flag to match the
previous allocator's fragmentation prevention policy.

update_bar_address was also changed to use vmem_free and
vmem_xalloc to handle guest PCI BAR reprogramming.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

markj added inline comments.
usr.sbin/bhyve/pci_emul.c
821

alloc should be a bool.

885

Why do we need to specify M_NOWAIT?

1106

Does this comment get addressed somehow later in the series?

1119

Same here.

1119

Shouldn't the upper limit be ~0ul? The value you passed there gives a 32-bit limit, no?

1719

Do these pci_emul_* variables still need to be global?

1723

Where do these values come from? It looks like you're partitioning the address space based on the number of pci buses, but why are you using the size of each partition as the vmem quantum?

Oh, these aren't the quantum sizes, it's just the size of the initial address range. These variables are misnamed I think.

bnovkov marked 5 inline comments as done.

Address @markj 's comments.

usr.sbin/bhyve/pci_emul.c
885

Thanks for catching this, it was not intentional.

1106

Ah, I knew I'd forgotten address something.

I wasn't sure how PCIBAR_ROM behaved initially, but looking at the code now I think we can just remove that comment. It's effectively a noop since the actual allocation is handled by pci_emul_alloc_rom.

1119

I'm not sure how vmem_addr_t behaves here, but I'll throw in ul to be safe.

1719

Thanks for pointing this out, only the rombase variables should remain global since they're handled by pci_emul_alloc_rom.

1723

Sorry for the awkward phrasing, I forgot that 'quantum' vmem(9) jargon. I've changed the names of the variables, hopefully it's a bit clearer now.

rew added inline comments.
usr.sbin/bhyve/pci_emul.c
100

PCIBAR_MAX isn't declared as an enum member in this review

i see it's added later on in D54645 but I'm guessing you meant to include it in this review/commit

Address @rew 's comments.

usr.sbin/bhyve/pci_emul.c
100

Yes, I don't how PCIBAR_MAX managed to slip into D54645, thanks for catching this!

jhb added inline comments.
usr.sbin/bhyve/pci_emul.c
1119

In the kernel we just use '~0' for rman_res_t (e.g. for bus_alloc_resource_any()) and the compiler DTRT based on the argument type from the function prototype.

Have you tested with PCI passthru configured? With just this patch applied, I get an assertion failure in update_bar_address(). Same result if I test your github branch.

usr.sbin/bhyve/pci_emul.c
1076–1077

These assignments can be removed.

1119
bnovkov marked 2 inline comments as done.

Address @markj 's comments.

Have you tested with PCI passthru configured? With just this patch applied, I get an assertion failure in update_bar_address(). Same result if I test your github branch.

I have not, thank you for testing that part.
I'll try to fix the issue over the weekend.

Have you tested with PCI passthru configured? With just this patch applied, I get an assertion failure in update_bar_address(). Same result if I test your github branch.

So, this turned out to be a real headache.

The assertion failure you were seeing was caused by the fact that PCIBAR_MEM64 allocations will sometimes dip into the PCIBAR_MEM32 pool. The reasoning for this was given in a comment at line 913:

[[snip]]
		/*
		 * XXX
		 * Some drivers do not work well if the 64-bit BAR is allocated
		 * above 4GB. Allow for this by allocating small requests under
		 * 4GB unless then allocation size is larger than some arbitrary
		 * number (128MB currently).
		 */
[[snip]]

This problem and the assertion failure are trivially fixable, but there's another catch here - PCIBAR_MEM64 BARs are constructed from two 32-bit writes, meaning that update_bar_address gets called four times before a BAR is ready to be allocated from the appropriate vmem arena.
Once again, that problem can be solved (albeit not so trivially, but a state machine tracking the 64-bit BAR's state should do the job), but the combination of the two introduces so many edges cases

Circling back to the comment I quoted - git blame traces it back to when bhyve was first added to the tree in 2011. and I expect that it might not be true anymore since things have substantially changed in the last 15 years.
Removing the whole "dipping into the 32-bit BAR pool" behavior would make tracking BAR allocations way less convoluted and I am tempted to do so in this patch.
Do you happen to know of any cases where the comment might still hold true? The only emulated PCI devices that use PCIBAR_MEM64 are the PPT and nvme devices I haven't had the time to test my theory yet.

Have you tested with PCI passthru configured? With just this patch applied, I get an assertion failure in update_bar_address(). Same result if I test your github branch.

So, this turned out to be a real headache.

The assertion failure you were seeing was caused by the fact that PCIBAR_MEM64 allocations will sometimes dip into the PCIBAR_MEM32 pool. The reasoning for this was given in a comment at line 913:

[[snip]]
		/*
		 * XXX
		 * Some drivers do not work well if the 64-bit BAR is allocated
		 * above 4GB. Allow for this by allocating small requests under
		 * 4GB unless then allocation size is larger than some arbitrary
		 * number (128MB currently).
		 */
[[snip]]

This problem and the assertion failure are trivially fixable, but there's another catch here - PCIBAR_MEM64 BARs are constructed from two 32-bit writes, meaning that update_bar_address gets called four times before a BAR is ready to be allocated from the appropriate vmem arena.
Once again, that problem can be solved (albeit not so trivially, but a state machine tracking the 64-bit BAR's state should do the job), but the combination of the two introduces so many edges cases

Circling back to the comment I quoted - git blame traces it back to when bhyve was first added to the tree in 2011. and I expect that it might not be true anymore since things have substantially changed in the last 15 years.
Removing the whole "dipping into the 32-bit BAR pool" behavior would make tracking BAR allocations way less convoluted and I am tempted to do so in this patch.
Do you happen to know of any cases where the comment might still hold true? The only emulated PCI devices that use PCIBAR_MEM64 are the PPT and nvme devices I haven't had the time to test my theory yet.

I guess this mostly relates to support for legacy and 32-bit guest operating systems? I tend to think it's undesirable to break support for them, given that virtualizing legacy systems is a major reason to use bhyve in the first place, at least in principle.

Does it simplify things at all to use a single vmem arena for both 32-bit and 64-bit memory BARs, and apply constraints as needed when calling vmem_xalloc()?

Fix PCI passthrough:

  • Introduce a state machine that tracks a BAR's address. This allows us to properly release and allocate BAR addresses from the appropriate pool when the guest starts migrating BARs
  • Comply with the remark in pci_emul_assign_bar and allocate small MEM64 BARs from the MEM32 pool
  • Allow the guest to move an already allocated BAR to a new address

I guess this mostly relates to support for legacy and 32-bit guest operating systems? I tend to think it's undesirable to break support for them, given that virtualizing legacy systems is a major reason to use bhyve in the first place, at least in principle.

Does it simplify things at all to use a single vmem arena for both 32-bit and 64-bit memory BARs, and apply constraints as needed when calling vmem_xalloc()?

I managed to find a way of keeping the 32-bit allocation behavior in the end (although it's a bit verbose).

I have tested this patch on a Linux and FreeBSD guest, both with and without a passthrough device and I encountered no issues while booting or hotplugging additional devices.

usr.sbin/bhyve/pci_emul.c
885

Reopening this to add context - M_NOWAIT is now needed here to prevent vmem_xalloc from hanging if the target address is already occupied.