Page MenuHomeFreeBSD

Fix handling of NMIs from unknown sources (BMC, hypervisor)

Authored by vangyzen on Apr 24 2020, 4:45 PM.



Release kernels have no KDB backends enabled, so they discard an NMI
if it is not due to a hardware failure. This includes NMIs from
IPMI BMCs and hypervisors.

Furthermore, the interaction of panic_on_nmi, kdb_on_nmi, and
debugger_on_panic is confusing.

Respond to all NMIs according to panic_on_nmi and debugger_on_panic.
Remove kdb_on_nmi. Expand the meaning of panic_on_nmi by making
it a bitfield. There are currently two bits, one for NMIs due to
hardware failure, and one for all others. Leave room for more.

If panic_on_nmi and debugger_on_panic are both true, don't actually panic,
but directly enter the debugger, to allow someone to leave the debugger
and [hopefully] resume normal execution.

Test Plan

I tested on bhyve with bhyvectl --inject-nmi using all combinations
of the relevant sysctls.

Diff Detail

rS FreeBSD src repository
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

829 ↗(On Diff #70945)

0xff is unexplainable, why not 3 ?

864 ↗(On Diff #70945)

Why keep this line under #ifdef KDB? IMO it would be cleaner have if (!claimed && panic_on_nmi != 0) panic(); statement unconditional at the very end of the function.

vangyzen added inline comments.
829 ↗(On Diff #70945)

I'm allowing for future expansion without further churn in the interface. For example, we can query the IPMI BMC to determine whether an NMI is due to a watchdog timeout. We could use another bit to distinguish those.

If an admin wants the box to panic for all NMIs, even those we don't distinguish yet, he/she can set this to 0xff.

864 ↗(On Diff #70945)

Effectively, this is the final !claimed && panic_on_nmi case. I suppose I could remove the & 2 from the condition.

If I move the !claimed after the #endif, I can imagine some analysis tools complaining that the test is useless in the !KDB case, which it is.

864 ↗(On Diff #70945)

Well, no, I like it with & 2, and I think it should stay this way.

IMO a deficiency in the analysis tool should not preclude us from structuring code in the natural way. This pattern of compiling out something that makes the code looks strange after processing all ifdefs is too common.

vangyzen marked 2 inline comments as done.
  • follow kib's suggestion
865 ↗(On Diff #70983)

I meant to put this (!claimed && panic_on_nmi !=) after the block for '2', i.e. after the closing brace at line 867. In this form indeed it is somewhat strange.

vangyzen added inline comments.
865 ↗(On Diff #70983)

Like this? It looks about the same to me, but if you prefer this, I'm fine with it.

This revision is now accepted and ready to land.Apr 25 2020, 6:20 PM