Page MenuHomeFreeBSD

bhyve: do not intercept PCI BAR probe addresses
AbandonedPublic

Authored by crowston_protonmail.com on Apr 10 2021, 8:14 PM.

Details

Reviewers
jhb
grehan
Group Reviewers
bhyve
Summary

During PCI initialization a string of 'f's is written to the BAR
registers to determine the capabilities of the device. This is
overwritten immediately with the true BAR address. However, bhyve
still tries to allocate the memory range for the temporary probing
registration. If the region is coincidentally already occupied, an
assertion fails in mem.c with EEXIST.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 38491
Build 35380: arc lint + arc unit

Event Timeline

Urgh, I realised my handling of the high 64 bit BAR is incorrect.

What problem are you trying to solve? Competent OS's should disable decoding so that 'decode' is false here. If a busted OS leaves it enabled, bhyve should faithfully brick the guest just as real hardware would (and real hardware does.. doing this for the frame buffer BAR without disabling decoding is a great way to lock up real hardware). Ignoring writes of all ones would leave the BAR registered at its old address which would result in confusion if a subsequent write moved the bar to a new address after it was sized resulting in the BAR being active in two places. Barring a really good reason, I think this is probably a bad idea.

In D29698#666736, @jhb wrote:

What problem are you trying to solve? Competent OS's should disable decoding so that 'decode' is false here. If a busted OS leaves it enabled, bhyve should faithfully brick the guest just as real hardware would (and real hardware does.. doing this for the frame buffer BAR without disabling decoding is a great way to lock up real hardware). Ignoring writes of all ones would leave the BAR registered at its old address which would result in confusion if a subsequent write moved the bar to a new address after it was sized resulting in the BAR being active in two places. Barring a really good reason, I think this is probably a bad idea.

This bug occurs before OS start up; I can trip the failed assert when passing through a GPU without specifying a bootable OS disk. It also would happen again during PCI enumeration by Windows and by Linux.

In D29698#666736, @jhb wrote:

What problem are you trying to solve? Competent OS's should disable decoding so that 'decode' is false here. If a busted OS leaves it enabled, bhyve should faithfully brick the guest just as real hardware would (and real hardware does.. doing this for the frame buffer BAR without disabling decoding is a great way to lock up real hardware). Ignoring writes of all ones would leave the BAR registered at its old address which would result in confusion if a subsequent write moved the bar to a new address after it was sized resulting in the BAR being active in two places. Barring a really good reason, I think this is probably a bad idea.

This bug occurs before OS start up; I can trip the failed assert when passing through a GPU without specifying a bootable OS disk. It also would happen again during PCI enumeration by Windows and by Linux.

No, both Linux and Windows should be disabling the bits in the command register before sizing BARs or they too would brick on real hardware. I'm quite confident that Linux disables decoding while sizing from the last time I looked at Linux's PCI bus driver. The PCI spec does not say that there is anything magic about not decoding for values of all ones. Can you describe your failure case in more detail? What assertion is failing, and what startup environment are you using (UEFI rom or some other loader like bhyveload or grub2-bhyve, and what stage of OS startup)?

In fact, the spec is explicit that decoding must be disabled before sizing a BAR. In the PCI 2.3 spec there is an Implementation Note in the section on BARs (6.2.5.1) which says:

Implementation Note: Sizing a 32-bit Base Address Register Example

Decode (I/O or memory) of a register is disabled via the command register before sizing a Base Address register. Software saves the original value of the Base Address register, writes 0FFFFFFFFh to the register, then reads it back. Size calculation can be done from the 32-bit value read by first clearing encoding information bits (bit 0 for I/O, bits 0-3 for memory), inverting all 32 bits (logical NOT), then incrementing by 1. The resultant 32- bit value is the memory/I/O range size decoded by the register. Note that the upper 16 bits of the result is ignored if the Base Address register is for I/O and bits 16-31 returned zero upon read. The original value in the Base Address register is restored before re- enabling decode in the command register of the device.

64-bit (memory) Base Address registers can be handled the same, except that the second 32-bit register is considered an extension of the first; i.e., bits 32-63. Software writes 0FFFFFFFFh to both registers, reads them back, and combines the result into a 64-bit value. Size calculation is done on the 64-bit value.

This problem went away after I updated uefi-edk2-bhyve from 0.2_1,1 to g20210214,2

Sorry for the noise!

Hello.

I'm trying to make the passthru of my Nvidia RTX 2080 ti from FreeBSD 13R to Ubuntu 21.04. Unfortunately a black screen appears as soon as I launch the bhyve command and it prevents me from completing the task. I've found that in this review he had the same problem that I have and I see a patch. I've applied it,but it didn't work for me. I would like to know what to do to fix my problem. Below I've explained what happens to me :

First of all I want to show you what is the FULL pci configuration on my PC :

Code:

root@marietto:/home/marietto # pciconf -v -l

hostb0@pci0:0:0:0: class=0x060000 rev=0x0d hdr=0x00 vendor=0x8086 device=0x3e30 subvendor=0x1458 subdevice=0x5000

vendor     = 'Intel Corporation'
device     = '8th/9th Gen Core 8-core Desktop Processor Host Bridge/DRAM Registers [Coffee Lake S]'
class      = bridge
subclass   = HOST-PCI

pcib1@pci0:0:1:0: class=0x060400 rev=0x0d hdr=0x01 vendor=0x8086 device=0x1901 subvendor=0x1458 subdevice=0x5000

vendor     = 'Intel Corporation'
device     = '6th-10th Gen Core Processor PCIe Controller (x16)'
class      = bridge
subclass   = PCI-PCI

pcib2@pci0:0:1:1: class=0x060400 rev=0x0d hdr=0x01 vendor=0x8086 device=0x1905 subvendor=0x1458 subdevice=0x5000

vendor     = 'Intel Corporation'
device     = 'Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x8)'
class      = bridge
subclass   = PCI-PCI

vgapci2@pci0:0:2:0: class=0x030000 rev=0x02 hdr=0x00 vendor=0x8086 device=0x3e98 subvendor=0x1458 subdevice=0xd000

vendor     = 'Intel Corporation'
device     = 'CoffeeLake-S GT2 [UHD Graphics 630]'
class      = display
subclass   = VGA

none0@pci0:0:18:0: class=0x118000 rev=0x10 hdr=0x00 vendor=0x8086 device=0xa379 subvendor=0x1458 subdevice=0x8888

vendor     = 'Intel Corporation'
device     = 'Cannon Lake PCH Thermal Controller'
class      = dasp

xhci1@pci0:0:20:0: class=0x0c0330 rev=0x10 hdr=0x00 vendor=0x8086 device=0xa36d subvendor=0x1458 subdevice=0x5007

vendor     = 'Intel Corporation'
device     = 'Cannon Lake PCH USB 3.1 xHCI Host Controller'
class      = serial bus
subclass   = USB

none1@pci0:0:20:2: class=0x050000 rev=0x10 hdr=0x00 vendor=0x8086 device=0xa36f subvendor=0x8086 subdevice=0x7270

vendor     = 'Intel Corporation'
device     = 'Cannon Lake PCH Shared SRAM'
class      = memory
subclass   = RAM

none2@pci0:0:22:0: class=0x078000 rev=0x10 hdr=0x00 vendor=0x8086 device=0xa360 subvendor=0x1458 subdevice=0x1c3a

vendor     = 'Intel Corporation'
device     = 'Cannon Lake PCH HECI Controller'
class      = simple comms

ahci0@pci0:0:23:0: class=0x010601 rev=0x10 hdr=0x00 vendor=0x8086 device=0xa352 subvendor=0x1458 subdevice=0xb005

vendor     = 'Intel Corporation'
device     = 'Cannon Lake PCH SATA AHCI Controller'
class      = mass storage
subclass   = SATA

pcib3@pci0:0:27:0: class=0x060400 rev=0xf0 hdr=0x01 vendor=0x8086 device=0xa340 subvendor=0x1458 subdevice=0x5001

vendor     = 'Intel Corporation'
device     = 'Cannon Lake PCH PCI Express Root Port'
class      = bridge
subclass   = PCI-PCI

pcib4@pci0:0:28:0: class=0x060400 rev=0xf0 hdr=0x01 vendor=0x8086 device=0xa338 subvendor=0x1458 subdevice=0x5001

vendor     = 'Intel Corporation'
device     = 'Cannon Lake PCH PCI Express Root Port'
class      = bridge
subclass   = PCI-PCI

pcib5@pci0:0:28:5: class=0x060400 rev=0xf0 hdr=0x01 vendor=0x8086 device=0xa33d subvendor=0x1458 subdevice=0x5001

vendor     = 'Intel Corporation'
device     = 'Cannon Lake PCH PCI Express Root Port'
class      = bridge
subclass   = PCI-PCI

pcib6@pci0:0:29:0: class=0x060400 rev=0xf0 hdr=0x01 vendor=0x8086 device=0xa330 subvendor=0x1458 subdevice=0x5001

vendor     = 'Intel Corporation'
device     = 'Cannon Lake PCH PCI Express Root Port'
class      = bridge
subclass   = PCI-PCI

isab0@pci0:0:31:0: class=0x060100 rev=0x10 hdr=0x00 vendor=0x8086 device=0xa305 subvendor=0x1458 subdevice=0x5001

vendor     = 'Intel Corporation'
device     = 'Z390 Chipset LPC/eSPI Controller'
class      = bridge
subclass   = PCI-ISA

hdac2@pci0:0:31:3: class=0x040300 rev=0x10 hdr=0x00 vendor=0x8086 device=0xa348 subvendor=0x1458 subdevice=0xa0c3

vendor     = 'Intel Corporation'
device     = 'Cannon Lake PCH cAVS'
class      = multimedia
subclass   = HDA

ichsmb0@pci0:0:31:4: class=0x0c0500 rev=0x10 hdr=0x00 vendor=0x8086 device=0xa323 subvendor=0x1458 subdevice=0x5001

vendor     = 'Intel Corporation'
device     = 'Cannon Lake PCH SMBus Controller'
class      = serial bus
subclass   = SMBus

none3@pci0:0:31:5: class=0x0c8000 rev=0x10 hdr=0x00 vendor=0x8086 device=0xa324 subvendor=0x8086 subdevice=0x7270

vendor     = 'Intel Corporation'
device     = 'Cannon Lake PCH SPI Controller'
class      = serial bus

em0@pci0:0:31:6: class=0x020000 rev=0x10 hdr=0x00 vendor=0x8086 device=0x15bc subvendor=0x1458 subdevice=0xe000

vendor     = 'Intel Corporation'
device     = 'Ethernet Connection (7) I219-V'
class      = network
subclass   = ethernet

vgapci0@pci0:1:0:0: class=0x030000 rev=0xa1 hdr=0x00 vendor=0x10de device=0x1e04 subvendor=0x19da subdevice=0x2503

vendor     = 'NVIDIA Corporation'
device     = 'TU102 [GeForce RTX 2080 Ti]'
class      = display
subclass   = VGA

hdac0@pci0:1:0:1: class=0x040300 rev=0xa1 hdr=0x00 vendor=0x10de device=0x10f7 subvendor=0x19da subdevice=0x2503

vendor     = 'NVIDIA Corporation'
device     = 'TU102 High Definition Audio Controller'
class      = multimedia
subclass   = HDA

xhci0@pci0:1:0:2: class=0x0c0330 rev=0xa1 hdr=0x00 vendor=0x10de device=0x1ad6 subvendor=0x19da subdevice=0x2503

vendor     = 'NVIDIA Corporation'
device     = 'TU102 USB 3.1 Host Controller'
class      = serial bus
subclass   = USB

none4@pci0:1:0:3: class=0x0c8000 rev=0xa1 hdr=0x00 vendor=0x10de device=0x1ad7 subvendor=0x19da subdevice=0x2503

vendor     = 'NVIDIA Corporation'
device     = 'TU102 USB Type-C UCSI Controller'
class      = serial bus

vgapci1@pci0:2:0:0: class=0x030000 rev=0xa1 hdr=0x00 vendor=0x10de device=0x1c02 subvendor=0x19da subdevice=0x2438

vendor     = 'NVIDIA Corporation'
device     = 'GP106 [GeForce GTX 1060 3GB]'
class      = display
subclass   = VGA

hdac1@pci0:2:0:1: class=0x040300 rev=0xa1 hdr=0x00 vendor=0x10de device=0x10f1 subvendor=0x19da subdevice=0x2438

vendor     = 'NVIDIA Corporation'
device     = 'GP106 High Definition Audio Controller'
class      = multimedia
subclass   = HDA

nvme0@pci0:3:0:0: class=0x010802 rev=0x03 hdr=0x00 vendor=0xc0a9 device=0x5403 subvendor=0xc0a9 subdevice=0x2100

vendor     = 'Micron/Crucial Technology'
class      = mass storage
subclass   = NVM

xhci2@pci0:5:0:0: class=0x0c0330 rev=0x03 hdr=0x00 vendor=0x1912 device=0x0014 subvendor=0x1912 subdevice=0x0015

vendor     = 'Renesas Technology Corp.'
device     = 'uPD720201 USB 3.0 Host Controller'
class      = serial bus
subclass   = USB

Then,according with the wiki : https://wiki.freebsd.org/bhyve/pci_passthru ; I have masked the pci devices of the graphic card inside the file /boot/loader.conf like this :

Code:

/boot/loader.conf

pptdevs="1/0/0 1/0/1 1/0/2 1/0/3"

and I have rebooted the PC and I've seen that all relevant pci devices have been masked correctly.

Code:

ppt0@pci0:1:0:0: class=0x030000 rev=0xa1 hdr=0x00 vendor=0x10de device=0x1e04 subvendor=0x19da subdevice=0x2503

vendor     = 'NVIDIA Corporation'
device     = 'TU102 [GeForce RTX 2080 Ti]'
class      = display
subclass   = VGA

ppt1@pci0:1:0:1: class=0x040300 rev=0xa1 hdr=0x00 vendor=0x10de device=0x10f7 subvendor=0x19da subdevice=0x2503

vendor     = 'NVIDIA Corporation'
device     = 'TU102 High Definition Audio Controller'
class      = multimedia
subclass   = HDA

ppt2@pci0:1:0:2: class=0x0c0330 rev=0xa1 hdr=0x00 vendor=0x10de device=0x1ad6 subvendor=0x19da subdevice=0x2503

vendor     = 'NVIDIA Corporation'
device     = 'TU102 USB 3.1 Host Controller'
class      = serial bus
subclass   = USB

ppt3@pci0:1:0:3: class=0x0c8000 rev=0xa1 hdr=0x00 vendor=0x10de device=0x1ad7 subvendor=0x19da subdevice=0x2503

vendor     = 'NVIDIA Corporation'
device     = 'TU102 USB Type-C UCSI Controller'
class      = serial bus

So,I tried to run the Ubuntu virtual machine with this command :

Code:

bhyve -S -c 4 -m 8G -w -H \

-s 0,hostbridge \
-s 1,virtio-blk,/mnt/da1p1/vms/os/ubuntu-budgie-gpu/ubuntu-2104-gpu.img \
-s 2,passthru,1/0/0 \
-s 2:1,passthru,1/0/1 \
-s 2:2,passthru,1/0/2 \
-s 2:3,passthru,1/0/3 \
-s 6,virtio-net,tap0 \
-s 29,fbuf,tcp=0.0.0.0:5900,w=1440,h=900 \
-s 30,xhci,tablet \
-s 31,lpc -l com1,stdio \
-l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
vm0

I tried also to pass thru only the second graphic card (gtx 1060) because it's older and maybe more supported,adding these slots :

-s 2,passthru,2/0/0 \
-s 2:1,passthru,2/0/1 \

Anyway,Ubuntu is not able to boot.

VM:vm0 is not created.
fbuf frame buffer base: 0xb04600000 [sz 16777216]
Assertion failed: (error == 0), function modify_bar_registration, file /usr/src/usr.sbin/bhyve/pci_emul.c, line 501.
Segnale di annullamento

root@marietto:/home/marietto/Desktop/Files/bhyve # ./os-uefi-hirsute.sh
vm_open: Invalid argument

root@marietto:/home/marietto/Desktop/Files/bhyve # ./os-uefi-hirsute.sh

VM:vm0 is not created.
fbuf frame buffer base: 0xb04600000 [sz 16777216]

AND BOOM : BLACK SCREEN. and nothing happens anymore,but I hear a noise coming from the fan of my PC that tells me that it's triying to do something,but for some reason it can't complete it.

Hello.

I'm trying to make the passthru of my Nvidia RTX 2080 ti from FreeBSD 13R to Ubuntu 21.04. Unfortunately a black screen appears as soon as I launch the bhyve command and it prevents me from completing the task. I've found that in this review he had the same problem that I have and I see a patch. I've applied it,but it didn't work for me. I would like to know what to do to fix my problem. Below I've explained what happens to me :

Hi,

I've created a similar patch (D28277) which is neccassary for GPU passthrough of an integrated Intel GPU to work properly. Maybe you could give that patch a try?
Additionally, add some logs to modify_bar_registration to see what's done by the guest:

diff --git a/usr.sbin/bhyve/pci_emul.c b/usr.sbin/bhyve/pci_emul.c
index 78390182fba..4ace2f51a97 100644
--- a/usr.sbin/bhyve/pci_emul.c
+++ b/usr.sbin/bhyve/pci_emul.c
@@ -490,6 +490,15 @@ pci_emul_alloc_resource(uint64_t *baseptr, uint64_t limit, uint64_t size,
 static void
 modify_bar_registration(struct pci_devinst *pi, int idx, int registration)
 {
+       printf("%s\n\r", __func__);
+       printf("  bdf : %d/%d/%d\n\r", pi->pi_bus, pi->pi_slot, pi->pi_func);
+       printf("  idx : %16x\n\r", idx);
+       printf("  reg : %16x\n\r", registration);
+       printf("  addr: %16lx\n\r", pi->pi_bar[idx].addr);
+       printf("  size: %16lx\n\r", pi->pi_bar[idx].size);
+       printf("  type: %16x\n\r", pi->pi_bar[idx].type);
+       printf("  cmd : %16x\n\r", pci_get_cfgdata16(pi, PCIR_COMMAND));
+
        struct pci_devemu *pe;
        int error;
        struct inout_port iop;

but your patch works for my nvidia graphic card ? i want to pass-thru this,but not the intel gpu because the nvidia graphic card is powerful,the integrated intel gpu it isn't

but your patch works for my nvidia graphic card ? i want to pass-thru this,but not the intel gpu because the nvidia graphic card is powerful,the integrated intel gpu it isn't

For Intel GPUs it requires some more patches to work properly.
D28277 just changes the behaviour of PCI BAR emulation. It will work for your nvidia card too.
However, it would be helpful to get some more information why modify_bar_registration fails. Therefore, you could add some logs as I mentioned before.

to make a more precise job,can I invite you to work directly inside my freebsd installation ? u can make all the modifications u want and u can gather all informations u want. in this way we will not lose time. Can I open an anydesk account for you ?

to make a more precise job,can I invite you to work directly inside my freebsd installation ? u can make all the modifications u want and u can gather all informations u want. in this way we will not lose time. Can I open an anydesk account for you ?

Okay.

would u give me your email and I will send you the credentials,so we can keep them secret ? thanks. Or even better : reply me here : marietto2008@gmail.com

would u give me your email and I will send you the credentials,so we can keep them secret ? thanks.

c.koehne@beckhoff.com