acpi: Implement s2idle loop
Needs ReviewPublic
Actions

Authored by obiwac on Dec 29 2025, 9:03 PM.

Details

Reviewers

jkim
olce
markj

Summary

When entering s2idle, we might still receive spurious SCIs from devices we can't mask GPEs from (well, we can mask them but that would mean masking out wake GPEs too). To solve this, put everything in a loop where every CPU except BSP is idled, and, when BSP is woken up, check if we are meant to break out of this loop and wake all other CPUs or if we should immediately go back to sleep.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 70153
Build 67036: arc lint + arc unit

Event Timeline

obiwac created this revision.Dec 29 2025, 9:03 PM

Herald added a subscriber: imp. · View Herald TranscriptDec 29 2025, 9:03 PM

obiwac requested review of this revision.Dec 29 2025, 9:03 PM

Harbormaster completed remote builds in B69514: Diff 168743.Dec 29 2025, 9:03 PM

obiwac added parent revisions: D54409: x86: New (un)idle IPIs for setting idle flag in scheduler, D48734: acpi: Suspend-to-idle support (s2idle).Dec 29 2025, 9:04 PM

emaste added a subscriber: emaste.Jan 6 2026, 4:52 PM

obiwac added a parent revision: D54406: acpi: Make taskqueue only run on BSP.Jan 7 2026, 12:38 PM

Don't use new IPI_IDLE/IPI_UNIDLE, but use smp_rendezvous_cpus() calling sched_do_idle() instead.

Harbormaster completed remote builds in B69920: Diff 169842.Jan 16 2026, 2:20 PM

obiwac edited the summary of this revision. (Show Details)Jan 16 2026, 2:20 PM

Remove unused acpi_s2idle_looping

Harbormaster completed remote builds in B69921: Diff 169844.Jan 16 2026, 2:22 PM

olce mentioned this in D54407: sched: Flag to force scheduler to always choose idle thread.Jan 19 2026, 9:21 PM

This approach looks weird, and I'm still unclear on what is really needed. More generally, I'd like to know why we have to do that, and I question its safety.

So I have some questions:

Which kind of interrupts do you still see that prevent staying in S0ix? I think you mentioned some IPIs at some point, but which ones? Can we determine their cause?

(In passing, I think we should settle on a common terminology and use it everywhere; S0ix instead of s2idle? something else? the ACPI spec calls it Low-Power (S0) Idle.)

What does control routing of these interrupts to the different CPUs? Are we sure these are routed to CPU 0/the BSP?
In general, shouldn't we service interrupts quite quickly after they are triggered? Could delaying interrupt servicing for arbitrary amount of times cause problems (hardware going mad, possibly bringing down the whole platform; driver queues becoming full and the driver exploding at wakeup)?

Additionally, forcing execution of the idle threads on other CPUs, as we discussed offline, might lead to strange problems where kernel threads hold mutexes that might be needed by GPE servicing.

sys/dev/acpica/acpi.c
3609–3611	You could simplify by comparing directly `pc_cpuid` with `curcpu` and getting rid of `other_cpus`.

Which kind of interrupts do you still see that prevent staying in S0ix? I think you mentioned some IPIs at some point, but which ones? Can we determine their cause?

There's two parts to this: what's interrupting the BSP, and what's interrupting the other cores.

The s2idle loop is necessary for working around the BSP being interrupted. We mask out only SCIs, and we try to mask out the (wake) GPEs we care about, but unfortunately some wake events are put under the same GPE number as noisy devices, such as the battery on Framework laptops. This seems to be expected behaviour and the same thing happens on Linux. One thing entering S0i3 does on the Framework laptops is reduce the frequency of battery events, to once a second to once a minute (except if plugged in), so we're not interrupting S0i3 that often. I believe the actual code to do this is in common/charge_state_v2.c's charger_task() function. update_base_battery_info() is called in the infinite loop there, which in turn calls host_set_single_event(EC_HOST_EVENT_BATTERY), which is one of the things that can trigger an SCI according to SCI_HOST_EVENT_MASK in board/hx30/board.h (Framework board).

The actual code that decides the timing is the following:

/* How long to sleep? */
if (problems_exist)
	/* If there are errors, don't wait very long. */
	sleep_usec = CHARGE_POLL_PERIOD_SHORT;
else if (sleep_usec <= 0) {
	/* default values depend on the state */
	if (!curr.ac &&
		 (curr.state == ST_IDLE ||
		 curr.state == ST_DISCHARGE)) {
#ifdef CONFIG_CHARGER_OTG
		int output_current = curr.output_current;
#else
		int output_current = 0;
#endif
		/*
		 * If AP is off and we do not provide power, we
		 * can sleep a long time.
		 */
		if (chipset_in_state(CHIPSET_STATE_ANY_OFF |
					  CHIPSET_STATE_ANY_SUSPEND)
				&& output_current == 0)
			sleep_usec =
				CHARGE_POLL_PERIOD_VERY_LONG;
		else
			/* Discharging, not too urgent */
			sleep_usec = CHARGE_POLL_PERIOD_LONG;
	} else {
		/* AC present, so pay closer attention */
		sleep_usec = CHARGE_POLL_PERIOD_CHARGE;
	}
}

And this tracks with what's observed (CHARGE_POLL_PERIOD_VERY_LONG is a minute).

(Disclaimer that what I'm saying may not be entirely accurate; I have not spent much time with the EC code.)

I'm not getting IPIs I'm pretty sure. Verified by comparing IPI counts before and after the s2idle loops.

(In passing, I think we should settle on a common terminology and use it everywhere; S0ix instead of s2idle? something else? the ACPI spec calls it Low-Power (S0) Idle.)

Well s2idle and S0ix are 2 different things really. s2idle is just the idea that you're suspending the CPUs to an idle state, and in theory can be supported on any machine, so long as it can idle its CPUs. Whereas S0ix are states the firmware puts the whole CPU in when it detects the CPUs are idle + some other very platform-specific conditions (on modern AMD CPUs, the only relevant state is S0i3). So you're "doing" s2idle (+ some other stuff) in order to "enter" an S0ix state.

And low-power idle is kind of too generic a term for what we're attempting to support here, as in there's a lot of very S0ix-specific stuff that we're doing so I think its worth sticking with that name.

What does control routing of these interrupts to the different CPUs? Are we sure these are routed to CPU 0/the BSP?

For SCIs, yes. I seem to remember reading this somewhere in the ACPI docs but can't find it at the minute. Either way this can be experimentally shown and those should be the only enabled interrupts.

In general, shouldn't we service interrupts quite quickly after they are triggered? Could delaying interrupt servicing for arbitrary amount of times cause problems (hardware going mad, possibly bringing down the whole platform; driver queues becoming full and the driver exploding at wakeup)?

I guess by nature of suspend we should always put the system in a state where none of this matters anymore before disabling interrupts, which a priori we are already doing. Otherwise our CPUs would constantly exit idle and thus the package would constantly exit S0i3.

Additionally, forcing execution of the idle threads on other CPUs, as we discussed offline, might lead to strange problems where kernel threads hold mutexes that might be needed by GPE servicing.

Absolutely. Though no idea what the solution to this could be except for making sure our GPE service routines don't try to acquire any of these resources.

Small style changes.

Harbormaster completed remote builds in B70153: Diff 170483.Jan 26 2026, 7:04 PM

obiwac marked an inline comment as done.Jan 26 2026, 7:05 PM

obiwac added inline comments.Sat, Feb 7, 8:47 PM