Handle broadcast NMIs.
On several Intel chipsets, diagnostic NMIs sent from BMC or NMIs
reporting hardware errors are broadcasted to all CPUs.
When kernel is configured to enter kdb on NMI, the outcome is
problematic, because each CPU tries to enter kdb. All CPUs are
executing NMI handlers, which set the latches disabling the nested NMI
delivery; this means that stop_cpus_hard(), used by kdb_enter() to
stop other cpus by broadcasting IPI_STOP_HARD NMI, cannot work. One
indication of this is the harmless but annoying diagnostic "timeout
stopping cpus".
Much more harming behaviour is that because all CPUs try to enter kdb,
and if ddb is used as debugger, all CPUs issue prompt on console and
race for the input, not to mention the simultaneous use of the ddb
shared state.
Try to fix this by introducing a pseudo-lock for simultaneous attempts
to handle NMIs. If one core happens to enter NMI trap handler, other
cores see it and simulate reception of the IPI_STOP_HARD. More,
generic_stop_cpus() avoids sending IPI_STOP_HARD and avoids waiting
for the acknowledgement, relying on the nmi handler on other cores
suspending and then restarting the CPU.
Since it is impossible to detect at runtime whether some stray NMI is
broadcast or unicast, add a knob for administrator (really developer)
to configure debugging NMI handling mode.
The updated patch was debugged with the help from Andrey Gapon (avg)
and discussed with him.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D8249