In our environment, we have an IPMI which will create an NMI, setting the timer 2 bit of system control port B. Currently, we have the option to drop to the debugger when this occurs (machdeb.kdb_on_nmi), but it would be nice to also have the option to make this cause a panic.
Details
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 16965 Build 16836: arc lint + arc unit
Event Timeline
I don't know ISA well. I'm open to switching this to be an option to always panic on any NMI, instead of picking NMI_TIMER2 for special treatment.
All this stuff is actually quote chipset specific.... At most I'd consider doing a panic mask....
Added avg@ as he has made some recent NMI-related changes and might have some thoughts. The clean way to handle this would be to let the BMC driver itself hook into the NMI path to determine if it's watchdog triggered the NMI but that much code in the NMI handler might not be safe. (It would have to run lockless and poll the BMC, etc.)
I also have a review request for a new kind of NMI watchdog, D15630.
In that case I do invoke a new callback to check whether the watchdog recognizes the NMI as caused by its hardware.
Not sure if that would be an overkill in this case.
In general, I feel that we have a rising need for a mechanism to register and invoke NMI handlers.
Finally, I am open to the idea of amending the behavior of kdb_on_nmi to call panic() instead of kdb_trap() when KDB_UNATTENDED is defined.
It does not make sense that we ignore that setting in this situation.
I did a couple of experimentation, and bit 0x20 of io-reg 0x61 seems entirely controlled by the chipset "timer-8254" counter2 functionality. You can reset it to zero with "outb(0x43, 0xb0)" and check the effect on inb(0x61) . I don't think it is possible for ipmi/bmc to have any control over it, so it might not be a suitable bit to use to discriminate the possible origin of a NMI.