Page MenuHomeFreeBSD

Fix handling of NMIs from unknown sources (BMC, hypervisor)
ClosedPublic

Authored by vangyzen on Apr 24 2020, 4:45 PM.
Tags
None
Referenced Files
F108266117: D24558.id70945.diff
Thu, Jan 23, 6:45 AM
F108266048: D24558.id70983.diff
Thu, Jan 23, 6:43 AM
Unknown Object (File)
Wed, Jan 8, 9:41 AM
Unknown Object (File)
Wed, Jan 8, 9:23 AM
Unknown Object (File)
Tue, Jan 7, 8:04 AM
Unknown Object (File)
Dec 11 2024, 4:40 PM
Unknown Object (File)
Nov 12 2024, 1:46 AM
Unknown Object (File)
Nov 3 2024, 10:40 AM
Subscribers

Details

Summary

Release kernels have no KDB backends enabled, so they discard an NMI
if it is not due to a hardware failure. This includes NMIs from
IPMI BMCs and hypervisors.

Furthermore, the interaction of panic_on_nmi, kdb_on_nmi, and
debugger_on_panic is confusing.

Respond to all NMIs according to panic_on_nmi and debugger_on_panic.
Remove kdb_on_nmi. Expand the meaning of panic_on_nmi by making
it a bitfield. There are currently two bits, one for NMIs due to
hardware failure, and one for all others. Leave room for more.

If panic_on_nmi and debugger_on_panic are both true, don't actually panic,
but directly enter the debugger, to allow someone to leave the debugger
and [hopefully] resume normal execution.

Test Plan

I tested on bhyve with bhyvectl --inject-nmi using all combinations
of the relevant sysctls.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 30701
Build 28432: arc lint + arc unit

Event Timeline

sys/x86/x86/cpu_machdep.c
829

0xff is unexplainable, why not 3 ?

864

Why keep this line under #ifdef KDB? IMO it would be cleaner have if (!claimed && panic_on_nmi != 0) panic(); statement unconditional at the very end of the function.

vangyzen added inline comments.
sys/x86/x86/cpu_machdep.c
829

I'm allowing for future expansion without further churn in the interface. For example, we can query the IPMI BMC to determine whether an NMI is due to a watchdog timeout. We could use another bit to distinguish those.

If an admin wants the box to panic for all NMIs, even those we don't distinguish yet, he/she can set this to 0xff.

864

Effectively, this is the final !claimed && panic_on_nmi case. I suppose I could remove the & 2 from the condition.

If I move the !claimed after the #endif, I can imagine some analysis tools complaining that the test is useless in the !KDB case, which it is.

sys/x86/x86/cpu_machdep.c
864

Well, no, I like it with & 2, and I think it should stay this way.

IMO a deficiency in the analysis tool should not preclude us from structuring code in the natural way. This pattern of compiling out something that makes the code looks strange after processing all ifdefs is too common.

vangyzen marked 2 inline comments as done.
  • follow kib's suggestion
sys/x86/x86/cpu_machdep.c
866

I meant to put this (!claimed && panic_on_nmi !=) after the block for '2', i.e. after the closing brace at line 867. In this form indeed it is somewhat strange.

vangyzen added inline comments.
sys/x86/x86/cpu_machdep.c
866

Like this? It looks about the same to me, but if you prefer this, I'm fine with it.

This revision is now accepted and ready to land.Apr 25 2020, 6:20 PM