Page MenuHomeFreeBSD

Provide option to panic when the IPMI creates an NMI
AcceptedPublic

Authored by jtl on Jun 1 2018, 5:05 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mar 16 2024, 6:10 PM
Unknown Object (File)
Feb 16 2024, 6:02 AM
Unknown Object (File)
Sep 28 2023, 11:15 PM
Unknown Object (File)
Sep 12 2023, 11:01 AM
Unknown Object (File)
Aug 28 2023, 7:34 AM
Unknown Object (File)
Aug 16 2023, 9:32 AM
Unknown Object (File)
Jul 8 2023, 2:31 AM
Unknown Object (File)
Jul 6 2023, 11:37 PM

Details

Summary

In our environment, we have an IPMI which will create an NMI, setting the timer 2 bit of system control port B. Currently, we have the option to drop to the debugger when this occurs (machdeb.kdb_on_nmi), but it would be nice to also have the option to make this cause a panic.

Test Plan

Tested on a machine which behaves as indicated in the description. The machine does not panic when the NMI is raised and the sysctl is set to 0, but does panic when the NMI is raised and the sysctl is set to 1.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 16965
Build 16836: arc lint + arc unit

Event Timeline

I don't know ISA well. I'm open to switching this to be an option to always panic on any NMI, instead of picking NMI_TIMER2 for special treatment.

In D15646#330558, @jtl wrote:

I don't know ISA well. I'm open to switching this to be an option to always panic on any NMI, instead of picking NMI_TIMER2 for special treatment.

All this stuff is actually quote chipset specific.... At most I'd consider doing a panic mask....

This revision is now accepted and ready to land.Jun 1 2018, 7:48 PM

Added avg@ as he has made some recent NMI-related changes and might have some thoughts. The clean way to handle this would be to let the BMC driver itself hook into the NMI path to determine if it's watchdog triggered the NMI but that much code in the NMI handler might not be safe. (It would have to run lockless and poll the BMC, etc.)

I also have a review request for a new kind of NMI watchdog, D15630.
In that case I do invoke a new callback to check whether the watchdog recognizes the NMI as caused by its hardware.
Not sure if that would be an overkill in this case.

In general, I feel that we have a rising need for a mechanism to register and invoke NMI handlers.

Finally, I am open to the idea of amending the behavior of kdb_on_nmi to call panic() instead of kdb_trap() when KDB_UNATTENDED is defined.
It does not make sense that we ignore that setting in this situation.

I did a couple of experimentation, and bit 0x20 of io-reg 0x61 seems entirely controlled by the chipset "timer-8254" counter2 functionality. You can reset it to zero with "outb(0x43, 0xb0)" and check the effect on inb(0x61) . I don't think it is possible for ipmi/bmc to have any control over it, so it might not be a suitable bit to use to discriminate the possible origin of a NMI.