Page MenuHomeFreeBSD

Implement a PCI bus rescan method.
ClosedPublic

Authored by jhb on Apr 20 2016, 2:21 AM.

Details

Summary

Implement a PCI bus rescan method.

Rescanning a PCI bus uses the following steps:

  • Fetching the current set of child devices and save it in the 'devlist' array.
  • Allocate a parallel array 'unchanged' initalized with NULL pointers.
  • Scan the bus checking each slot (and each function on slots with a multifunction device).
  • If a valid function is found, look for a matching device in the 'devlist' array. If a device is found, save the pointer in the 'unchanged' array. If a device is not found, add a new device.
  • After the scan has finished, walk the 'devlist' array deleting any devices that do not have a matching pointer in the 'unchanged' array.
  • Finally, fetch an updated set of child devices and explicitly attach any devices that are not present in the 'unchanged' array.

This builds on the previous changes to move subclass data management into
pci_alloc_devinfo(), pci_child_added(), and bus_child_deleted().

Subclasses of the PCI bus use custom rescan logic explicitly override the
rescan method to disable rescans.

Test Plan
  • Use devctl rescan after insertion and removal of a card in an expresscard slot.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

jhb updated this revision to Diff 15360.Apr 20 2016, 2:21 AM
jhb retitled this revision from to Implement a PCI bus rescan method..
jhb updated this object.
jhb edited the test plan for this revision. (Show Details)
jhb added reviewers: imp, adrian.
This revision was automatically updated to reflect the committed changes.
decui_microsoft.com added inline comments.
head/sys/dev/pci/pci.c
3962

Typo... It should be 0xffff.

head/sys/dev/pci/pci.c
3989

Question:

When device_delete_child() is invoked, the device has done.

Is this still safe -- device_delete_child() -> device_detach()? The detach method of the PCI device driver can't access the device's registers at all.

head/sys/dev/pci/pci.c
3989

My typo -- I meant "When device_delete_child() is invoked, the device has *gone*".

imp added inline comments.Sep 21 2016, 1:49 PM
head/sys/dev/pci/pci.c
3962

Shouldn't we have a #define for PCIR_VENDOR_INVALID instead? It's used in a number of places now and magic numbers are bad.

3963

There's a problem with misbehaving hardware that's not contemplated here. There's several CardBus cards (and a few PCI counterparts) that behave badly if they report that their function 0 is invalid, they don't expect you to access the other functions and return garbage instead of ffff's for the VENDOR. In the CardBus code I have (or maybe had) code that says if function 0 has a bad VENDOR tag (or a bad DEVICE ID too), then assume that something isn't quite right in the HDRTYPE register and reject the device. I have a vague notion that there was a tweak needed for 'happy meal' devices on Sparc64 because they did something odd as well, but that may have been a separate issue.

3989

We ran into this problem in CardBus years ago. You have to call the detach routine or you have an orphaned device. It's the responsibility of the detach routine to know if the hardware is present and do the right thing if it has gone. There's also a race with the interrupt handler because some card departure events could lead to an interrupt (or a shared interrupt could fire before it could be detached) and interrupt handlers are required to cope with that as well. And there's various other races with other parts of the system that need to be coped with too: storage devices have outstanding bios at the low layer and associated bufs at the high level; network devices have outstanding mbufs in flight; usb may have transactions in flight, etc. All of these conditions must be handled even when the underlying hardware has suddenly disappeared. Well written drivers can cope with their hardware going away and will cancel what they can. The only minor snag that we have to worry about here is that the detach routines block until all this stuff is drained from the upper half of the kernel.

jhb added inline comments.Sep 21 2016, 4:57 PM
head/sys/dev/pci/pci.c
3962

I've fixed the typo directly for now, but yes, we should add a constant.

3963

That should already work correctly. If function 0 reports 0xffff we continue early on (line 3954) above. This means we won't bother examining any additional functions and will leave unchanged[i] NULL for all of the functions causing the loop at 3986 to remove them all.

head/sys/dev/pci/pci.c
3989

Now I understand the points here.
Thank you very much for the detailed explanation, imp!