Page MenuHomeFreeBSD

acpi_pci: Add quirk for PSTAT_PME-before-detach
ClosedPublic

Authored by cperciva on Feb 27 2025, 12:34 AM.
Tags
None
Referenced Files
F133241990: D49146.diff
Fri, Oct 24, 6:28 AM
F133214824: D49146.id151541.diff
Fri, Oct 24, 1:15 AM
Unknown Object (File)
Sat, Oct 18, 9:02 PM
Unknown Object (File)
Fri, Oct 17, 6:03 PM
Unknown Object (File)
Mon, Oct 13, 11:48 AM
Unknown Object (File)
Mon, Oct 13, 9:23 AM
Unknown Object (File)
Mon, Oct 13, 9:23 AM
Unknown Object (File)
Tue, Oct 7, 2:29 AM
Subscribers

Details

Summary

In order to signal to on Graviton [123] systems that a device is ready
to be "ejected" (after a detach request is made via the EC2 API) we
need to set PCIM_PSTAT_PME to 1 and PCIM_PSTAT_PMEENABLE to 0. We are
not aware of any rationale for this requirement beyond "another OS
kernel happens to do this", i.e. this is effectively bug-for-bug
compatibility.

Arguably this should be done by the ACPI _EJ0 method on these systems,
but it is not.

Create a new ACPI_Q_PME_ON_DETACH quirk and set it in EC2 AMIs, and
add the PCI register write to acpi_pci_device_notify_handler when that
quirk is set.

MFC after: 1 month
Sponsored by: Amazon

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 62646
Build 59530: arc lint + arc unit

Event Timeline

I'm not sure that the combination of asserting PME and clearing PMEENABLE makes sense, but it's what works. Hoping for feedback from someone who knows more about our PCI code than I do.

Also I'm not 100% certain that this belongs here as opposed to, say, in device detach code; but I can't make an argument for it to *not* belong here either. Again, looking for expert feedback.

And yes, "power off the device immediately before it is unplugged" is... a choice. Seems like it's probably harmless though? I'd like to MFC this to stable/14, and I'd be perfectly fine with hiding it behind a loader tunable if there's any concern about it potentially causing problems on non-Amazonian hardware.

The commit log isn't very clear. Traditionally power management in PCI is mostly about "device states" (D0, D1, D2, D3) and the PCIY_PMG registers can be used to place devices into power states. D3 is almost completely off (you can write to a D3 device to bring it back to D0, but that's all you can do to a device in D3), and D0 is fully on. D1/D2 are some in-between states where you might save less power, but be able to power back on sooner. Also, some things might work in D1/D2 (like link detection in a NIC port) that don't work in D3 so that you can still get an interrupt and wake the device back up to D0.

PME# is a separate thing. PME# is a signal a PCI device can assert when the entire system has been placed into suspend to trigger a wake from suspend. If a device does trigger a PME#, it's supposed to set the PCM_PSTAT_PME bit so that the OS can figure out which device triggered the wake from sleep.

That said, I think there is a larger bug here. In the copy I have access to of the PCI power management spec, there is this section on the PME bits:

Because a PCI bus RST# assertion does not necessarily clear all functions’ PME
Context (functions that support PME# from D3cold), the system software is required to
explicitly initialize all PME Context, including the PME support bits, for all functions
during initial operating system load. In terms of the PMCSR, this means that during the
initial operating system load each function’s PME_En bit must be written with a “0”,
and each function’s PME_Status bit must be written with a “1” by system software as
part of the process of initializing the system.

For now I think the first step is to do these writes much earlier, and perhaps we need to tweak the language a bit to say that we are writing 1 to PCIM_PSTATE_PME to clear any pending PME# and clearing PCIM_PSTATE_PMEENABLE to disable generation of PME# interrupts. We might need to do some testing that this doesn't break suspend/resume on some existing hardware though.

cperciva retitled this revision from acpi_pci: Poweroff PCI devices before detach to acpi_pci: Add quirk for PSTAT_PME-before-detach.Mar 5 2025, 5:09 AM
cperciva edited the summary of this revision. (Show Details)
cperciva edited the summary of this revision. (Show Details)

This generally looks good to me. I have a suggestion, but it is approved regardless.

sys/dev/acpica/acpivar.h
233 ↗(On Diff #151887)

Maybe call this ACPI_Q_CLEAR_PME_ON_DETACH?

This revision is now accepted and ready to land.Mar 5 2025, 5:32 PM
This revision now requires review to proceed.Mar 5 2025, 5:52 PM
sys/dev/acpica/acpivar.h
233 ↗(On Diff #151887)

Done.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 5 2025, 8:28 PM
This revision was automatically updated to reflect the committed changes.