Page MenuHomeFreeBSD

ppt: Fix panic when configuring unavailable MSI-X vector
ClosedPublic

Authored by kgalazka on Feb 3 2025, 2:27 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Oct 10, 3:20 AM
Unknown Object (File)
Fri, Oct 3, 6:56 PM
Unknown Object (File)
Thu, Oct 2, 8:00 AM
Unknown Object (File)
Wed, Oct 1, 1:47 PM
Unknown Object (File)
Tue, Sep 30, 8:18 AM
Unknown Object (File)
Wed, Sep 17, 10:21 PM
Unknown Object (File)
Tue, Sep 16, 9:57 AM
Unknown Object (File)
Tue, Sep 16, 2:00 AM

Details

Summary

In some cases VM may have different idea about number
of available MSI-X vectors then PPT driver. Return
an error when VM requests setup for more vectors
than expected.

It was observed while using SR-IOV on an Intel E810 Ethernet adapter.
VF driver in a VM sees a correct number of available MSI-X vectors,
which depends on num-queues assigned in iovctl.conf, while
pci_msix_count in the PPT driver always returns 1.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

I'm not sure why pci_msix_count returns incorrect number. pciconf output looks fine:

ppt0@pci0:216:0:8:      class=0x020000 rev=0x02 hdr=0x00 vendor=0x8086 device=0x1889 subvendor=0x8086 subdevice=0x0000
    cap 11[70] = MSI-X supports 5 messages
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
jhb added a subscriber: jhb.

I believe pci_msix_count() is returning a cached value from when the VF was first created in pci_read_cap() called from pci_fill_devinfo() called from pci_add_iov_child() called from PCI_CREATE_IOV_CHILD. If the MSI-X count is then adjusted by the driver in the PCI_IOV_ADD_VF callback, the cached value will be wrong.

I think the fix for that is probably we should not be caching the msi_ctrl and msix_ctrl registers except across suspend/resume and should otherwise be reading the values from the registers each time pci_count_* is called.

The bounds check is a good fix regardless.

This revision is now accepted and ready to land.Feb 4 2025, 3:48 PM
In D48812#1113788, @jhb wrote:

I believe pci_msix_count() is returning a cached value from when the VF was first created in pci_read_cap() called from pci_fill_devinfo() called from pci_add_iov_child() called from PCI_CREATE_IOV_CHILD. If the MSI-X count is then adjusted by the driver in the PCI_IOV_ADD_VF callback, the cached value will be wrong.

Yup, that's exactly the case. MSI-X count for VFs of E800 adapters is set in PCI_IOV_ADD_VF callback according to number of assigned queues.

I think the fix for that is probably we should not be caching the msi_ctrl and msix_ctrl registers except across suspend/resume and should otherwise be reading the values from the registers each time pci_count_* is called.

I tried something simpler and added:

pci_read_cap(device_get_parent(bus), &vfinfo->cfg);

call in pci_iov.c after PCI_IOV_ADD_VF, and it seems to work. Do you think it's worth to use that as a temporary solution? I can submit a patch.

In D48812#1113894, @krzysztof.galazka_intel.com wrote:
In D48812#1113788, @jhb wrote:

I believe pci_msix_count() is returning a cached value from when the VF was first created in pci_read_cap() called from pci_fill_devinfo() called from pci_add_iov_child() called from PCI_CREATE_IOV_CHILD. If the MSI-X count is then adjusted by the driver in the PCI_IOV_ADD_VF callback, the cached value will be wrong.

Yup, that's exactly the case. MSI-X count for VFs of E800 adapters is set in PCI_IOV_ADD_VF callback according to number of assigned queues.

I think the fix for that is probably we should not be caching the msi_ctrl and msix_ctrl registers except across suspend/resume and should otherwise be reading the values from the registers each time pci_count_* is called.

I tried something simpler and added:

pci_read_cap(device_get_parent(bus), &vfinfo->cfg);

call in pci_iov.c after PCI_IOV_ADD_VF, and it seems to work. Do you think it's worth to use that as a temporary solution? I can submit a patch.

I'm worried that this might have other side effects for things like EA caps. However, I think it's also true that caching the MSI counts is probably wrong in general. It's not just VFs, in theory writing to some other register in a PF can change the value of the MSI control registers and we should just always read them.

In D48812#1114102, @jhb wrote:

I'm worried that this might have other side effects for things like EA caps. However, I think it's also true that caching the MSI counts is probably wrong in general. It's not just VFs, in theory writing to some other register in a PF can change the value of the MSI control registers and we should just always read them.

I put a draft on GIthub: https://github.com/freebsd/freebsd-src/pull/1593
I left msix_msgnum in the struct to let pci_alloc_msix_method use it. I'm not sure if it is safe to assume that the count won't decrease between pci_msix_count and pci_alloc_msix calls, but current implementation also does not care about such case.

kgalazka added subscribers: kbowling, erj.

@erj, @kbowling Could you please take a look?

Do we still need this after fixing the PCI bus driver to stop caching the message count?

In D48812#1148812, @jhb wrote:

Do we still need this after fixing the PCI bus driver to stop caching the message count?

You did say:

The bounds check is a good fix regardless.

Maybe this could break in some interesting new way in the future?

Oh yes, and I even accepted this when I said it. :)