Page MenuHomeFreeBSD

stop and restart kernel event timers in the suspend / resume cycle
ClosedPublic

Authored by avg on May 13 2018, 12:07 PM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 9 2023, 7:19 PM
Unknown Object (File)
Dec 9 2023, 7:13 PM
Unknown Object (File)
Dec 9 2023, 6:39 PM
Unknown Object (File)
Nov 19 2023, 12:58 PM
Unknown Object (File)
Nov 19 2023, 8:54 AM
Unknown Object (File)
Nov 19 2023, 1:48 AM
Unknown Object (File)
Nov 17 2023, 9:08 AM
Unknown Object (File)
Nov 17 2023, 4:39 AM
Subscribers

Details

Summary

I have a system that is very unstable after resuming from suspend-to-RAM
but only if HPET is used as the event timer. The theory is that SMM
code / firmware could be enabling HPET for its own uses and unexpected
interrupts cause a trouble for it. Originally I wanted to solve the
problem in hpet_suspend() method, but that was insuficient as the event
timer could get reprogrammed again.

So, it's better, for my case and in general, to stop the event timer(s)
before entering the hadrware suspend.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

fix a misplaced call to resumeclock(), it wouldn't get called if entering
a sleep state failed

sys/dev/acpica/acpi.c
2961 ↗(On Diff #42474)

I wonder if we should do this before DEVICE_SUSPEND() (and resume after DEVICE_RESUME()) so that the clock drivers can save and restore device state while in a "stable" stopped state.

sys/dev/acpica/acpi.c
2961 ↗(On Diff #42474)

If I interpret correctly what I saw in my tests then some drivers depend on working timers in their suspend routines. I haven't done a deep analysis, but it looked like storage drivers and USB were affected.
I will re-test this as my original change was to stop both HPET event timer and timecounter, so maybe it was more about the timecounter.

avg added inline comments.
sys/dev/acpica/acpi.c
2961 ↗(On Diff #42474)

Just tried stopping the event timer (but not the timecounter) before the device suspend and it gets stuck here:

db> bt 100117
Tracing pid 748 tid 100117 td 0xfffff80034a63000
sched_switch() at sched_switch+0x942/frame 0xfffffe0000531340
mi_switch() at mi_switch+0x18c/frame 0xfffffe0000531370
sleepq_switch() at sleepq_switch+0x10d/frame 0xfffffe00005313b0
sleepq_wait() at sleepq_wait+0x43/frame 0xfffffe00005313e0
_cv_wait() at _cv_wait+0x1e7/frame 0xfffffe0000531450
usb_proc_mwait() at usb_proc_mwait+0x80/frame 0xfffffe0000531480
usb_suspend() at usb_suspend+0x99/frame 0xfffffe00005314b0
bus_generic_suspend_child() at bus_generic_suspend_child+0x44/frame 0xfffffe00005314d0
bus_generic_suspend() at bus_generic_suspend+0x67/frame 0xfffffe0000531510
bus_generic_suspend_child() at bus_generic_suspend_child+0x44/frame 0xfffffe0000531530
pci_suspend_child() at pci_suspend_child+0x2d/frame 0xfffffe0000531550
bus_generic_suspend() at bus_generic_suspend+0x67/frame 0xfffffe0000531590
bus_generic_suspend_child() at bus_generic_suspend_child+0x44/frame 0xfffffe00005315b0
bus_generic_suspend() at bus_generic_suspend+0x67/frame 0xfffffe00005315f0
bus_generic_suspend_child() at bus_generic_suspend_child+0x44/frame 0xfffffe0000531610
bus_generic_suspend() at bus_generic_suspend+0x67/frame 0xfffffe0000531650
acpi_suspend() at acpi_suspend+0x2f/frame 0xfffffe0000531670
bus_generic_suspend_child() at bus_generic_suspend_child+0x44/frame 0xfffffe0000531690
bus_generic_suspend() at bus_generic_suspend+0x67/frame 0xfffffe00005316d0
bus_generic_suspend_child() at bus_generic_suspend_child+0x44/frame 0xfffffe00005316f0
bus_generic_suspend() at bus_generic_suspend+0x67/frame 0xfffffe0000531730
acpi_EnterSleepState() at acpi_EnterSleepState+0x3cf/frame 0xfffffe00005317b0
acpi_AckSleepState() at acpi_AckSleepState+0x147/frame 0xfffffe00005317e0
devfs_ioctl() at devfs_ioctl+0xcb/frame 0xfffffe0000531830
VOP_IOCTL_APV() at VOP_IOCTL_APV+0xd9/frame 0xfffffe0000531860
vn_ioctl() at vn_ioctl+0x124/frame 0xfffffe0000531970
devfs_ioctl_f() at devfs_ioctl_f+0x1f/frame 0xfffffe0000531990
kern_ioctl() at kern_ioctl+0x2b9/frame 0xfffffe00005319f0
sys_ioctl() at sys_ioctl+0x15c/frame 0xfffffe0000531ac0
amd64_syscall() at amd64_syscall+0x28c/frame 0xfffffe0000531bf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0000531bf0

Employing the power of remote kgdb I can see the USB thread being waited upon:

(kgdb) bt
#0  sched_switch (td=0xfffff80034190580, newtd=0xfffff80003250580, flags=<optimized out>) at /usr/devel/svn/head/sys/kern/sched_ule.c:2115
#1  0xffffffff80b9b3ac in mi_switch (flags=260, newtd=0x0) at /usr/devel/svn/head/sys/kern/kern_synch.c:438
#2  0xffffffff80be5f8d in sleepq_switch (wchan=0xffffffff81dfbd63 <pause_wchan+3>, pri=0) at /usr/devel/svn/head/sys/kern/subr_sleepqueue.c:613
#3  0xffffffff80be66e0 in sleepq_timedwait (wchan=0xffffffff81dfbd63 <pause_wchan+3>, pri=0) at /usr/devel/svn/head/sys/kern/subr_sleepqueue.c:727
#4  0xffffffff80b9abdd in _sleep (ident=0xffffffffffffffff, lock=<optimized out>, priority=<optimized out>, wmesg=0xffffffff8120896d "USBWAIT", sbt=<optimized out>, pr=0, flags=256) at /usr/devel/svn/head/sys/kern/kern_synch.c:213
#5  0xffffffff80b9b164 in pause_sbt (wmesg=0xffffffff8120896d "USBWAIT", sbt=34359736, pr=0, flags=256) at /usr/devel/svn/head/sys/kern/kern_synch.c:332
#6  0xffffffff809a2a2b in ehci_hcreset (sc=0xfffffe00290f6000) at /usr/devel/svn/head/sys/dev/usb/controller/ehci.c:211
#7  0xffffffff809be087 in usb_bus_suspend (pm=<optimized out>) at /usr/devel/svn/head/sys/dev/usb/controller/usb_controller.c:505
#8  0xffffffff809d9595 in usb_process (arg=0xfffffe00290fce08) at /usr/devel/svn/head/sys/dev/usb/usb_process.c:178
#9  0xffffffff80b50594 in fork_exit (callout=0xffffffff809d94b0 <usb_process>, arg=0xfffffe00290fce08, frame=0xfffffe000048cc00) at /usr/devel/svn/head/sys/kern/kern_fork.c:1039

So, it's obvious that pause cannot work with the kernel event timer.

Hi,

You'll need to add a suspending check to pause() so that it uses DELAY() instead of _sleep().

--HPS

You'll need to add a suspending check to pause() so that it uses DELAY() instead of _sleep().

It seems that at present we do not have a global suspending state like we have for the early boot or the panic.
In any case, I did not intend to start doing anything about that now, I just wanted to show that there is an issue.

Ugh ok. Probably we need to use bus passes for suspend and resume to truly fix this issue in the future, but your change is fine for now.

I agree with John that this is better then nothing, but the proper way I think would be: first suspend third-party devices, then stop clock, and then suspend timer devices, resume -- in opposite order.

I agree. Although, in practice, it seems that at present the timer devices (on x86, at least) don't have much, if any, state to save or restore.
It's sufficient that the resume code puts them in workable state and then the clocksource code just programs them for the next event.

This revision was not accepted when it landed; it landed in state Needs Review.May 21 2018, 8:23 PM
This revision was automatically updated to reflect the committed changes.