Page MenuHomeFreeBSD

Implement a PCI bus rescan method.

Authored by jhb on Apr 20 2016, 2:21 AM.



Implement a PCI bus rescan method.

Rescanning a PCI bus uses the following steps:

  • Fetching the current set of child devices and save it in the 'devlist' array.
  • Allocate a parallel array 'unchanged' initalized with NULL pointers.
  • Scan the bus checking each slot (and each function on slots with a multifunction device).
  • If a valid function is found, look for a matching device in the 'devlist' array. If a device is found, save the pointer in the 'unchanged' array. If a device is not found, add a new device.
  • After the scan has finished, walk the 'devlist' array deleting any devices that do not have a matching pointer in the 'unchanged' array.
  • Finally, fetch an updated set of child devices and explicitly attach any devices that are not present in the 'unchanged' array.

This builds on the previous changes to move subclass data management into
pci_alloc_devinfo(), pci_child_added(), and bus_child_deleted().

Subclasses of the PCI bus use custom rescan logic explicitly override the
rescan method to disable rescans.

Test Plan
  • Use devctl rescan after insertion and removal of a card in an expresscard slot.

Diff Detail

rS FreeBSD src repository - subversion
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

jhb retitled this revision from to Implement a PCI bus rescan method..
jhb updated this object.
jhb edited the test plan for this revision. (Show Details)
jhb added reviewers: imp, adrian.
This revision was automatically updated to reflect the committed changes. added inline comments.

Typo... It should be 0xffff.



When device_delete_child() is invoked, the device has done.

Is this still safe -- device_delete_child() -> device_detach()? The detach method of the PCI device driver can't access the device's registers at all.


My typo -- I meant "When device_delete_child() is invoked, the device has *gone*".


Shouldn't we have a #define for PCIR_VENDOR_INVALID instead? It's used in a number of places now and magic numbers are bad.


There's a problem with misbehaving hardware that's not contemplated here. There's several CardBus cards (and a few PCI counterparts) that behave badly if they report that their function 0 is invalid, they don't expect you to access the other functions and return garbage instead of ffff's for the VENDOR. In the CardBus code I have (or maybe had) code that says if function 0 has a bad VENDOR tag (or a bad DEVICE ID too), then assume that something isn't quite right in the HDRTYPE register and reject the device. I have a vague notion that there was a tweak needed for 'happy meal' devices on Sparc64 because they did something odd as well, but that may have been a separate issue.


We ran into this problem in CardBus years ago. You have to call the detach routine or you have an orphaned device. It's the responsibility of the detach routine to know if the hardware is present and do the right thing if it has gone. There's also a race with the interrupt handler because some card departure events could lead to an interrupt (or a shared interrupt could fire before it could be detached) and interrupt handlers are required to cope with that as well. And there's various other races with other parts of the system that need to be coped with too: storage devices have outstanding bios at the low layer and associated bufs at the high level; network devices have outstanding mbufs in flight; usb may have transactions in flight, etc. All of these conditions must be handled even when the underlying hardware has suddenly disappeared. Well written drivers can cope with their hardware going away and will cancel what they can. The only minor snag that we have to worry about here is that the detach routines block until all this stuff is drained from the upper half of the kernel.


I've fixed the typo directly for now, but yes, we should add a constant.


That should already work correctly. If function 0 reports 0xffff we continue early on (line 3954) above. This means we won't bother examining any additional functions and will leave unchanged[i] NULL for all of the functions causing the loop at 3986 to remove them all.


Now I understand the points here.
Thank you very much for the detailed explanation, imp!