Page MenuHomeFreeBSD

Support hotplug of PCI devices on EC2.
ClosedPublic

Authored by jhb on Oct 8 2019, 9:38 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 24, 4:56 PM
Unknown Object (File)
Fri, Jan 24, 2:37 PM
Unknown Object (File)
Sat, Jan 18, 10:01 PM
Unknown Object (File)
Fri, Jan 17, 3:36 PM
Unknown Object (File)
Wed, Jan 15, 5:23 AM
Unknown Object (File)
Tue, Jan 14, 1:22 PM
Unknown Object (File)
Sat, Dec 28, 4:56 AM
Unknown Object (File)
Sat, Dec 28, 1:32 AM

Details

Summary

Install notify handlers for child handles of ACPI PCI buses.
Only handles with an _EJ0 method are considered hotpluggable.

Test Plan
  • waiting for Colin to test this. I sent an earlier patch but it was incomplete as it was missing the changes in sys/dev/pci

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 26963
Build 25263: arc lint + arc unit

Event Timeline

Re-add the bus notify handler to see if that helps Scott's use case.

Running with

sysctl debug.acpi.level="ACPI_LV_INFO"
sysctl debug.acpi.layer="ACPI_ALL_COMPONENTS ACPI_ALL_DRIVERS"

when I attach a new disk I see in /var/log/messages:

Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446
Oct 8 22:37:20 freebsd kernel: ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler:
Oct 8 22:37:20 freebsd kernel: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler:
Oct 8 22:37:20 freebsd kernel: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler:
Oct 8 22:37:20 freebsd kernel: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446
Oct 8 22:37:20 freebsd kernel: ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler:
Oct 8 22:37:20 freebsd kernel: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446
Oct 8 22:37:20 freebsd kernel: ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446
Oct 8 22:37:20 freebsd kernel: ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler:
Oct 8 22:37:20 freebsd kernel: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446
Oct 8 22:37:20 freebsd kernel: ExSystemIoSpaceHandler:
Oct 8 22:37:20 freebsd kernel: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler:
Oct 8 22:37:20 freebsd kernel: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: evmisc-0267 EvQueueNotifyRequest : No notify handler for Notify, ignoring (S1E_, 1) node 0xfffff80003f14ec0
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE00
Oct 8 22:37:20 freebsd kernel: exregion-0446 ExSystemIoSpaceHandler: System-IO (width 32) R/W 0 Address=000000000000AE04

The important part here I assume is "EvQueueNotifyRequest : No notify handler for Notify, ignoring (S1E_, 1)" -- the notify handler doesn't seem to be installed. I have the instance running; how should I debug this?

sys/dev/acpica/acpi_pci.c
378

Maybe add a printf here, something like:

printf("%s: installing notify handler on %s\n", __func__,
    __func__, acpi_name(handle));

If it never prints, then maybe trace up above where we fail if _EJ0 isn't found to see if that check is always failing.

With an added printf I see:

Oct 8 23:14:20 freebsd kernel: acpi_pci_install_device_notify_handler: installing handler on \_SB_.PCI0.S1E_
...
Oct 8 23:18:59 freebsd kernel: evmisc-0267 EvQueueNotifyRequest : No notify handler for Notify, ignoring (S1E_, 1) node 0xfffff80003d15ec0

So... it looks like we're at least trying to install a handler here...

Use ACPI_SYSTEM_NOTIFY instead of ACPI_DEVICE_NOTIFY.

Thanks, with that fix the Notify handler is getting called:

evmisc-0290 EvQueueNotifyRequest  : Dispatching Notify on [S1E_] (Device) Value 0x01 (Device Check) Node 0xfffff80003e42ec0

panic: mutex Giant not owned at /usr/src/sys/kern/subr_bus.c:2919
cpuid = 3
time = 1570578746
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe001afe80b0
vpanic() at vpanic+0x19d/frame 0xfffffe001afe8100
panic() at panic+0x43/frame 0xfffffe001afe8160
mtx_assert() at mtx_assert+0xb4/frame 0xfffffe001afe8170
device_probe_and_attach() at device_probe_and_attach+0x2a/frame 0xfffffe001afe81a0
pci_rescan_method() at pci_rescan_method+0x4f5/frame 0xfffffe001afe8220
AcpiEvNotifyDispatch() at AcpiEvNotifyDispatch+0x50/frame 0xfffffe001afe8240
acpi_task_execute() at acpi_task_execute+0x10/frame 0xfffffe001afe8260
taskqueue_run_locked() at taskqueue_run_locked+0x10c/frame 0xfffffe001afe82c0
taskqueue_thread_loop() at taskqueue_thread_loop+0x88/frame 0xfffffe001afe82f0
fork_exit() at fork_exit+0x84/frame 0xfffffe001afe8330
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe001afe8330

  • trap 0, rip = 0, rsp = 0, rbp = 0 ---

... maybe not exactly the right way, though.

Lock Giant around BUS_RESCAN.

Looking giant around bus scan is, sadly, still required. Yea asserts I added :)

The newbus side of this looks good

This revision is now accepted and ready to land.Oct 9 2019, 12:36 AM

I can now attach a disk:

Oct 9 00:51:33 freebsd kernel: evmisc-0290 EvQueueNotifyRequest : Dispatching Notify on [S1E_] (Device) Value 0x01 (Device Check) Node 0xfffff80003d04ec0
Oct 9 00:51:33 freebsd kernel: nvme2: <Generic NVMe Device> at device 30.0 on pci0
Oct 9 00:51:34 freebsd kernel: nvme2: temperature threshold not supported
Oct 9 00:51:34 freebsd kernel: nvd2: <Amazon Elastic Block Store> NVMe namespace
Oct 9 00:51:34 freebsd kernel: nvd2: 102400MB (209715200 512 byte sectors)

But when I try to detach it:
Oct 9 00:52:46 freebsd kernel: EvQueueNotifyRequest : Dispatching Notify on [S1E_] (Device) Value 0x01 (Device Check) Node 0xfffff80003d04ec0
Oct 9 00:52:46 freebsd kernel: EvQueueNotifyRequest : Dispatching Notify on [S1E_] (Device) Value 0x03 (Eject Request) Node 0xfffff80003d04ec0
Oct 9 00:52:46 freebsd kernel: pci0: unknown notify 0x3

Which suggests that (a) we need to handle 0x03 (as expected), and also that maybe we need to do something to acknowledge the notification, since the disk *attachment* notification seems to be arriving a second time here.

The patch works as advertised right now. A couple of observations for my case:

  • Alpine Ridge TBT3 adapter, USB-A dongle with USB memory stick attached
  • acpi_pci_bus_notify_handler() gets the DEVICE_BUS_CHECK notify events from the hotplug SMI for both insertion and removal, as expected.
  • acpi_pci_device_notify_handler() gets no notify events
  • No "EJECT_REQUEST" events are delivered

The relevant ASL (I think) is here:

Method (HPME, 0, Serialized)
{
    If (((VDID != 0xFFFFFFFF) && (PMSX == 0x01)))
    {
        Notify (PXSX, 0x02) // Device Wake
        PMSX = 0x01
        PSPX = 0x01
    }
}

Add an in-kernel eject handler. We might replace this with a devctl_notify
that triggers events in userland instead, but this might work sufficiently
for now.

This revision now requires review to proceed.Oct 9 2019, 7:50 PM

Patch does not compile.

sys/dev/acpica/acpi_pci.c
367

This should be ACPI_NOTIFY_EJECT_REQUEST.

368

There is no "handle" in scope. Should it be "h" instead?

With the above changes (s/ACPI_NOTIFY_DEVICE_EJECT_REQUEST/ACPI_NOTIFY_EJECT_REQUEST/ and s/handle/h/) I can attach a disk and detach it, but I get

Oct 9 20:44:48 freebsd kernel: ACPI Warning: \_SB.PCI0.S1E._EJ0: Insufficient arguments - Caller passed 0, ACPI requires 1 (20190703/nsarguments-414)

and after detaching a disk I can't reattach it (even devctl rescan pci0 doesn't work) -- not sure if this is because the failed call to _EJ0 broke something, or if the way we're detaching the device is preventing it from appearing in rescan after it is reattached.

  • Fix compile issues
  • Pass the value 1 to _EJ0 after checking the ACPI 6.2 spec. Also, the spec says that for devices with _ADR (PCI), the bus driver should verify the device is removed after _EJ0 returns, so just trigger a BUS_RESCAN to delete the child device after _EJ0.

This is now working perfectly in my tests on EC2. When I attach a disk it shows up immediately; when I detach a disk it goes away from FreeBSD and EC2 recognizes that it has successfully detached (and without needing a "force detach"); and I can cycle the same disk attached/detached many times without issues.

As far as I'm concerned, this is ready for commit. My only regret is that it might be too late for FreeBSD 12.1.

This revision is now accepted and ready to land.Oct 9 2019, 10:14 PM

I wonder... if we add a "hw.pci.acpi_hotplug" boot tunable and check it in acpi_pci_{attach,detach} before doing anything, could we get this merged for 12.1? I would turn that tunable on in EC2 AMIs (since I've tested it there) but we could leave it turned off by default elsewhere.

For committing purposes I will probably split this up into a few commits:

  1. expose pci_attach/pci_detach
  2. add the "bus" notifications that work for Scott's use case
  3. add the "device" notifications using _EJ0 for EC2
This revision was automatically updated to reflect the committed changes.
jhb marked 2 inline comments as done.Oct 15 2019, 7:00 PM