Page MenuHomeFreeBSD

acpi_spmc: Add SPMC (system power management controller) driver
Needs RevisionPublic

Authored by obiwac on Jan 9 2025, 12:15 AM.
Tags
None
Referenced Files
F141942789: D48387.id168737.diff
Tue, Jan 13, 8:07 PM
F141912369: D48387.diff
Mon, Jan 12, 10:28 AM
F141911017: D48387.diff
Mon, Jan 12, 9:58 AM
Unknown Object (File)
Sun, Jan 11, 1:22 PM
Unknown Object (File)
Sat, Jan 10, 4:16 PM
Unknown Object (File)
Sat, Jan 10, 3:32 PM
Unknown Object (File)
Sat, Jan 10, 5:33 AM
Unknown Object (File)
Sat, Jan 10, 5:11 AM

Details

Reviewers
jkim
jhb
imp
olce
Summary

Add SPMC (system power management controller) driver as acpi_spmc. This is the device which provides the LPI device D-state constraints and allows for OSPM to send S0ix/modern standby entry/exit notifications. This supports the original Intel DSM (https://uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf, untested), the AMD DSM (tested), and the Microsoft DSM (partially tested, getting constraints seems to only work for the AMD DSM on my machine).

For entry/exit, all the notifications supported by the platform are called. I'm not sure of which ones are necessary and on which systems exactly, but my system needs at least one of the AMD or Microsoft regular entry notifications and the Microsoft modern standby one. I don't think it's necessarily an issue to do this, and Linux adopts the same strategy.

Before entry, acpi_spmc_check_constraints is called to notify of any violated power constraints. This will use acpi_pwr_get_state added in D48386 to get current device D-states, but the code for doing that is currently disabled because the D-state code changes had to be backed out.

This is a prerequisite to s2idle on AMD, as the EC (which we can't mask out the GPE's of because it notifies us of important wake events, such as the lid opening or the power button being pressed) is very noisy until the entry notifications are called: https://patchwork.kernel.org/project/linux-pm/patch/2279758.LZ0rCGQtcH@aspire.rjw.lan/ It still is very noisy (albeit less so) but hopefully that's something a driver for the AMD PMC will solve.

This is sort of a parallel to @bwidawsk 's D17676 revision, which went with device_post_suspend/resume device methods to call SPMC's entry/exit functions. Since we'll probably also need an AMD-specific PMC that runs after device_post_suspend in the near future, I decided against doing it this way and instead simply added acpi_spmc_enter/exit function pointers on the ACPI softc which acpi_spmc sets in probe to be called in a future patch before idling in acpi_EnterSleepState.

Another solution (what I believe Linux does after a cursory check) is to make acpi_spmc responsible for calling the entry/exit functions for any other PMC's there may be. My concern for doing this is if a system has an AMD PMC but no SPMC, if such a system exists.

Sponsored by: The FreeBSD Foundation

Test Plan

I have tested this on the Framework 13 AMD Ryzen 7040 series. I am able to get device constraints correctly and enter/exit from modern standby when idling the CPU and the device constraints are respected. Other devices (especially Intel-based) will need testing.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 61590
Build 58474: arc lint + arc unit

Event Timeline

obiwac created this revision.

On my laptop with Intel i7-1355U CPU I see this while device attached:

Jan 29 15:19:07 thinkpad kernel: acpi_spmc0: <Low Power S0 Idle (DSM sets 0x3)> on acpi0
Jan 29 15:19:07 thinkpad kernel: ACPI Error: Incorrect return type from \_SB_.PEPD._DSM - received [Buffer], requested [Package] (20241212/nsxfeval-274)
Jan 29 15:19:07 thinkpad kernel: acpi_spmc0: failed to call DSM 1 (acpi_spmc_get_constraints)

On my laptop with Intel i7-1355U CPU I see this while device attached:

Jan 29 15:19:07 thinkpad kernel: acpi_spmc0: <Low Power S0 Idle (DSM sets 0x3)> on acpi0
Jan 29 15:19:07 thinkpad kernel: ACPI Error: Incorrect return type from \_SB_.PEPD._DSM - received [Buffer], requested [Package] (20241212/nsxfeval-274)
Jan 29 15:19:07 thinkpad kernel: acpi_spmc0: failed to call DSM 1 (acpi_spmc_get_constraints)

Thanks for testing this out! Indeed, I believe this is due to me using the Microsoft UUID instead of the Intel one for getting device constraints on Intel. I will let you know when I fix this.

  • Use Intel GET_DEVICE_CONSTRAINTS DSM by default (Microsoft doesn't have this)
  • Comment for entry/exit notification functions

@shuriku_shurik.kiev.ua Intel (might) work now. I have not validated the acpi_spmc_get_constraints_spec function at all however, as I don't have an Intel laptop. So there are likely to be other problems :)

Print out message when all device power constraints are respected

In D48387#1111264, @obiwac_gmail.com wrote:

@shuriku_shurik.kiev.ua Intel (might) work now. I have not validated the acpi_spmc_get_constraints_spec function at all however, as I don't have an Intel laptop. So there are likely to be other problems :)

There are another errors now:

Jan 30 06:19:20 thinkpad kernel: acpi_spmc0: <Low Power S0 Idle (DSM sets 0x3)> on acpi0
Jan 30 06:19:20 thinkpad kernel: acpi_spmc0: failed to get handle for \_SB.PC00.EMMC
Jan 30 06:19:20 thinkpad kernel: acpi_spmc0: failed to get handle for \_SB.PC00.PSDC
Jan 30 06:19:20 thinkpad kernel: acpi_spmc0: failed to get handle for \_SB.PC00.SAT0.VOL0
Jan 30 06:19:20 thinkpad kernel: acpi_spmc0: failed to get handle for \_SB.PC00.GLAN
Jan 30 06:19:20 thinkpad kernel: acpi_spmc0: failed to get handle for \_SB.PC00.VMD0
Jan 30 06:19:20 thinkpad kernel: acpi_spmc0: failed to get handle for \_SB.PC00.RP25
Jan 30 06:19:20 thinkpad kernel: acpi_spmc0: failed to get handle for \_SB.PC00.RP26
Jan 30 06:19:20 thinkpad kernel: acpi_spmc0: failed to get handle for \_SB.PC00.RP27
Jan 30 06:19:20 thinkpad kernel: acpi_spmc0: failed to get handle for \_SB.PC00.RP28
Jan 30 06:19:20 thinkpad kernel: acpi_spmc0: failed to get handle for \_SB.PC00.PUF0
Jan 30 06:19:20 thinkpad kernel: acpi_spmc0: failed to get handle for \_SB.PC00.PUF1
Jan 30 06:19:20 thinkpad kernel: acpi_spmc0: failed to get handle for \_SB.PC00.TXDC
Jan 30 06:19:20 thinkpad kernel: acpi_spmc0: failed to get handle for \_SB.PC01.TRP0
Jan 30 06:19:20 thinkpad kernel: acpi_spmc0: failed to get handle for \_SB.PC01.TRP1
Jan 30 06:19:20 thinkpad kernel: acpi_spmc0: failed to get handle for \_SB.PC01.TRP2
Jan 30 06:19:20 thinkpad kernel: acpi_spmc0: failed to get handle for \_SB.PC01.TRP3

I found some time and tried it on Intel Core 8th gen laptop.

acpi_spmc0: <Low Power S0 Idle (DSM sets 0x1)> on acpi0
acpi_spmc0: failed to get handle for \134_SB.PCI0.PSDC

Thanks for trying this out @jkim & @shuriku_shurik.kiev.ua :)

Could you send me your ASL dumps by email (obiwac@freebsd.org):

acpidump -dt

Fix parens in KASSERTs, add comment for entry/exit notif functions.

Haven't reviewed the constraints code yet, and some of the device plumbing.

sys/dev/acpica/acpi_spmc.c
117–118

Not sure about the usefulness of having a separate function for the revision. Actually, the revision may command the actual functions available and what they do (this is what some verbatim in section 9.1.1 of the ACPI spec says).

I would prefer that, instead of passing a struct uuid around, you define, e.g., a new struct dsm_itf containing both a UUID and a revision ID, and then there's no need for rev_for_uuid(). Additionally, we could then enrich that structure to provide methods for the common operations (enter, exit), this could make the code clearer and more flexible, allowing to avoid specifying the right constants in encapsulated ways. With more refactoring (e.g., having an array of struct dsm_itf), in the acpi_spmc*_notif() functions we would get rid of the ifs and impose a common ordering functions (see also inline comments there).

Regarding which revisions to use:

  • The Intel documentation mentions only revision 0.
  • But the Linux code for Intel has been using revision 1 from the start, without any explanation. The commit introducing it however links to the document mentioning 0 (!).
  • Then AMD _DSM code was introduced.
  • And finally, MS _DSM, using the same revision as used for the CPU-vendor _DSM except for the initial enumeration (function index 0). In other words, it uses 0 on an AMD CPU and 1 on an Intel CPU for the same MS _DSM (except, as said, for enumeration, where it's 0 in both cases). I'm undecided on whether that's a feature or a bug in Linux, and on the first case a bug in the Intel platform... :-)

So I would just try on Intel hardware (if you have not done so already):

  • The current code, checking that the MS methods actually succeed (i.e., do not produce a "failed to call DSM" error). If they don't, then it's an Intel platform bug, and we'll have to implement the Linux's weird behavior described above. If they do, then the weird behavior is just a bug. Not sure if already reported tests in this revision are conclusive, as they list onl a few error lines with other errors.
  • Try revision 0 for Intel _DSM. If that works, we should probably just drop revision 1 entirely, even if that means not doing exactly the same as Linux. The refactoring evoked in the first paragraph would, among others, allow strategies where we try both revisions in turn.

Perhaps amend the comment below with what I said about the revision used with MS' _DSM in Linux.

I should be able to help test with my Intel-based Framework laptop.

174–175

Style.

497–502

Linux has the ordering: AMD, then MS, then Intel. Not a request to change the one here, but something to keep in mind in case of testing problems.

528–529

MS' "modern entry" has to be first according to MS' doc (https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-firmware-notifications), and this is also what Linux does.

538–547

This is the same ordering as in acpi_spmc_entry_notif(). Maybe this doesn't matter (or perhaps would even make things worse) but I'd find it more logical that the order is reversed. Linux's logic has the same property (flaw?).

obiwac added inline comments.
sys/dev/acpica/acpi_spmc.c
117–118

This sounds like a good idea and a cleaner solution indeed. is the itf in dsm_itf as in interface?

for the revisions - I will change this to just use 0 on all platforms at the moment, and, when I get around to testing on intel, I can fix this if this needs fixing.

528–529

yeah you're right, I have all the correct ordering in a subsequent local commit which I hadn't squashed and updated this revision with yet.

538–547

I think this is to be expected

sys/dev/acpica/acpi_spmc.c
117–118

This sounds like a good idea and a cleaner solution indeed.

Just to be clear, you don't *absolutely* need to do it before commit, especially if you have already done a lot of tests, but if you are going to do more of them, then yes, would be nice to do that.

We should also probably add some knobs to control part of the process to ease testing and feedback if people have problems, but that can be done later while the code is in the tree (e.g., rule out some DSM, tweak the ordering, change the revisions, etc.).

is the itf in dsm_itf as in interface?

Yes. That was just an example name, I'm sure we can find a better one.

for the revisions - I will change this to just use 0 on all platforms at the moment, and, when I get around to testing on intel, I can fix this if this needs fixing.

That's the cleanest approach I think.

538–547

I'm not so sure, but if that works, then it's likely the ordering just doesn't matter (at least for now on tested machines). A debug knob tweaking that might be useful (for later).

obiwac marked 3 inline comments as done.

Remove rev_for_uuid for now and just use revision 0 everywhere.

olce requested changes to this revision.Mon, Jan 12, 8:57 PM

I'm pretty new to device management in FreeBSD, but it looks to me that allocating some memory + acpi_set_private() in DEVICE_PROBE() (here, acpi_spmc_probe()) and acpi_get_private() + deallocation in DEVICE_ATTACH() is a bad pattern, as it will cause memory leaks and in fact even panics in most cases if there are multiple candidate drivers. Since there is for now only a single driver, could you please instead fill up the acpi_spmc_softc directly and have acpi_spmc_probe() return 0 on success (which guarantees preservation of the softc up to DEVICE_ATTACH())? Let's also avoid possible bad pattern spreading by copy-pasting (incidentally, I see the same pattern in acpi_ec.c).

I'll finish the review tomorrow morning (didn't have time to check some stuff).

sys/dev/acpica/acpi_spmc.c
158–159

private can't be NULL on allocation with M_WAITOK.

257–263

Please check here that the revision is indeed zero, and just ignore the constraint if it is not (format unknown). You'll have to update sc->constraint_count to that end (tweaking the malloc() is most probably not worth the trouble).

359–368

Not too sure about the ordering here. If both AMD and MS are supported, wouldn't it be better to use MS as I assume it respects the spec? Did you do specific tests around these?

Err, in fact, looking at Microsoft's document, constraints are simply not supported. I'd just drop the DSM_SET_MS case here, unless you have experimental evidence that these are supported?

As a consequence, and contrarily to what I said in the first paragraph above, if you prefer to still keep the MS case, it may be better to use it only as a last resort.

374

(result.Pointer == NULL here is probably redundant, but that's fine.)

375–376

I'd print AMD/Intel/MS so that we immediately know which exact DSM failed.

428

Since you added #ifdef notyet, which kind of problems did you stumble on? E.g., are there cases where the message is printed but suspend works? Is this linked to not matching lpi_uid?

Ideally, we should do this match, at least to be able to print which deeper LPIT state we will not be able to enter, since that does not preclude from entering shallower ones. If some constraint is not fulfilled even for the shallower state, we might want to simple abort. These are just ideas, nothing to fix here at this point.

460
  • Style (please wrap comment line).
  • Possibly relax the check (failure controlled only by status), and just call AcpiOsFree() on result.Pointer if non-NULL. That would be slightly different than Linux's way but a priori seems to make more sense.
528–529

Exit notifications should be in reverse order compared to entry ones as hinted in Microsoft's document, in particular the diagram and the fact that DSM_EXIT_NOTIF is supposed to indicate exit from lowest power state (possibly transient indication) whereas DSM_MODERN_EXIT_NOTIF is a global exit from "suspend".

This revision now requires changes to proceed.Mon, Jan 12, 8:57 PM
obiwac added inline comments.
sys/dev/acpica/acpi_spmc.c
359–368

This was a while ago for me now but as I recall the AMD get device constraints DSM was necessary and the MS one didn't have this. I have no idea what the situation is like on Intel though so in the meantime I defaulted to the MS one.

428

The #ifdef notyet is because the necessary commits for acpi_pwr_get_state had to be reverted, as they were the D-state changes that broke older machines with working S3. I will add this function back at some point, but not a super high priority as this is just an extra debugging check which doesn't seem to affect S0i3 entry (at least on AMD machines).

We should also avoid the ugly pollution of struct acpi_softc (I completely understand this was done to get something working more quickly). Rather, let's define a new interface, called something like acpi_lpi (defined in a new acpi_lpi_if.m file), with two methods, acpi_lpi_enter() and acpi_lpi_exit(), which would be called before/after entering suspend-to-idle in acpi_EnterSleepState() (e.g., on all direct children of acpi; we'll see that in D48735). There are examples of interfaces under sys/dev/acpica, acpi_if.m and acpi_bus_if.m. The .m files are processed by sys/tools/makeobjops.awk. There's a bit more doc on kobj(9) (which device(9) relies on) in the Architecture Handbook.

That doesn't seem much complicated, but could be done as well as a separate revision.

sys/dev/acpica/acpi_spmc.c
359–368

That makes sense. But since their doc doesn't mention any function to get constraints, I still think we should put it last at least (and remove/disable it later entirely depending on testing feedback; it would be great to print if the MS DSM is used).

428

Yes, that doesn't affect functionality (on systems where S0i3 indeed works), so there's no urgency.

That said, this is a needed piece for debugging when S0i3 is not entered or if it is but in some degraded fashion that still consumes an unexpected amount of power.

In general, for power and suspend issues, I think we'll never be able to test things on enough hardware, so we'd better prepare for information gathering and steps tweaking in the field to be able to actually diagnose and even bypass problems.

It seems a good idea to keep the constraints parsing code enabled even if it is functionally completely useless as long as the #ifdef notyet stays, so that we can already have feedbacks on how it fares.