Page MenuHomeFreeBSD

HPET-based NMI (debug) watchdog
Needs ReviewPublic

Authored by avg on May 30 2018, 9:16 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 12, 9:07 AM
Unknown Object (File)
Sat, Nov 2, 4:16 AM
Unknown Object (File)
Wed, Oct 30, 1:48 PM
Unknown Object (File)
Fri, Oct 25, 12:13 AM
Unknown Object (File)
Oct 14 2024, 6:26 PM
Unknown Object (File)
Oct 4 2024, 5:49 PM
Unknown Object (File)
Oct 1 2024, 5:37 PM
Unknown Object (File)
Sep 24 2024, 9:49 AM

Details

Reviewers
jhb
kib
mav
imp
Summary

This change adds a watchdog capability, activated with NMI_WATCHDOG kernel option, that does not depend on any specialized hardware (in x86 galaxy).
The watchdog delivers a non-maskable interrupt when it fires.
This is useful for debugging system lock-ups where regular interrupts cannot be delivered and, thus, SW_WATCHDOG does not help.
The new watchdog does not help with recovering from hardware issues (of course).

The change consists of several logical parts.
First, event timer interface is extended with methods to configure a timer for the NMI mode and to check whether the NMI timer has fired.
Second, an NMI watchdog driver is provided that can use any NMI capable event timer as a backend.
Third, HPET timer has grown the NMI mode support on x86 configurations with APIC support. The NMI mode is provided only if HPET supports FSB / MSI interrupt mode.

Additionally, x86 NMI handling code has been reorganized to support NMI_WATCHDOG.
Also, I added a knob for a quirk of AMD based hardware. For IO-APIC and MSI interrupts there is no translation of the delivery mode from the APIC format to the HyperTransport format.
So, the delivery mode has to be specified in the HyperTransport format.

I wrote this code for my own use and didn't really intend to upstream it.
That reflected on the design and the code quality.
Most of all I don't like how NMI mode is configured for HPET FSB interrupt delivery.
I am not sure if it would be worth adding another bus method method for that.
Maye it would be sufficient to expose a direct interface to the MSI code.

I am creating this review request to solicit feedback on th general usefulness of the new facility and to get comments on the design and the code.

Test Plan

Works for me on AMD based hardware.

hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
hpet0: vendor 0x4353, rev 0x10, 14318180Hz, 3 timers, legacy route
hpet0:  t0: irqs 0x00c00000 (0), MSI, periodic
hpet0:  t1: irqs 0x00c00000 (0), MSI, periodic
hpet0:  t2: irqs 0x00c00000 (0), MSI, periodic
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 450
Event timer "HPET1" frequency 14318180 Hz quality 450
Event timer "HPET2" frequency 14318180 Hz quality 450
random: harvesting attach, 8 bytes (4 bits) from hpet0
NMI watchdog: found timer HPET1
NMI watchdog: using timer HPET1

Also, this is how it looks when the watchdog "barks":

panic: NMI watchdog
cpuid = 0
time = 1527700411
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff81fbc930
vpanic() at vpanic+0x1a3/frame 0xffffffff81fbc990
panic() at panic+0x43/frame 0xffffffff81fbc9f0
nmi_call_kdb() at nmi_call_kdb+0x80/frame 0xffffffff81fbca20
nmi_call_kdb_smp() at nmi_call_kdb_smp+0x47/frame 0xffffffff81fbca50
trap() at trap+0x246/frame 0xffffffff81fbcb60
nmi_calltrap() at nmi_calltrap+0x8/frame 0xffffffff81fbcb60
--- trap 0x13, rip = 0xffffffff811cf8d6, rsp = 0xfffffe00413bea50, rbp = 0xfffffe00413bea50 ---
acpi_cpu_c1() at acpi_cpu_c1+0x6/frame 0xfffffe00413bea50
acpi_cpu_idle() at acpi_cpu_idle+0x2ee/frame 0xfffffe00413beaa0
cpu_idle_acpi() at cpu_idle_acpi+0x3f/frame 0xfffffe00413beac0
cpu_idle() at cpu_idle+0x95/frame 0xfffffe00413beae0
sched_idletd() at sched_idletd+0x517/frame 0xfffffe00413bebb0
fork_exit() at fork_exit+0x84/frame 0xfffffe00413bebf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00413bebf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 11 tid 100003 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 17621
Build 17428: arc lint + arc unit

Event Timeline

Can you split this into per-topic patch ?

sys/dev/acpica/acpi_hpet.c
305

This requires coordination with the interrupt remapping code in the dmar driver.

sys/x86/x86/io_apic.c
152

Is it supposed that user manually set the tunable ? How can he know ?

I guess I can split up the patch. Maybe when committing it, if that ever happens.
But while the change consists of several logical parts that can be isolated, each is useless without others.
Only the small improvements in sys/x86/x86/cpu_machdep.c (extending sysctl descriptions, etc) are completely independent of the rest of the changes.

sys/dev/acpica/acpi_hpet.c
305

Yeah, I guess. As I noted in the request message this code directly messes with MSI internals.
But I couldn't come up with proper interface for requesting the NMI mode.
I am not sure if it should be a bus method akin to BUS_CONFIG_INTR (and if it should eventually support configuring IO-APIC interrupts for NMI mode too) or if MSI could expose a function that would allow for a more direct configuration of the NMI mode for MSI interrupts only.

sys/x86/x86/io_apic.c
152

At this point it's completely user controlled.
I do not know of a reliable way to automatically detect the quirk.
The problem appears to be in the chipsets (south bridge?), both external and integrated with processors.
Maybe checking if a vendor of the Host Bridge is AMD would be a good approximation?

By the way, I still don't know if Zen-based systems (their supporting chipsets) are still affected.

this patch didn't apply cleanly anymore for me, here's my tweak https://github.com/skunkwerks/freebsd/commit/22cff7130664edce262f9eb00adef5799baaf205.patch & also to confirm it works for me - just used it to get a crash dump from a system freeze. many thanks!

In D15630#339073, @dch wrote:

also to confirm it works for me - just used it to get a crash dump from a system freeze. many thanks!

Thank you for testing!