Several sources of NMIs are configured for broadcast mode. In particular, all Intel machines I had access to, implement 'ipmitool power diag' as broadcast NMI. Also, chipset RAS subsystems report hardware errors as broadcast NMI as well, I noted this when I developed DMAR driver.
Kernel typical reaction to the hardware-generated NMI is entrance into the debugger. Since each core gets NMI delivered, each core tries to enter the debugger. Due to synchronous nature of interruption, often all cores (or, at least more than one) are in NMI handlers simultaneously. As result, IPI_STOP_HARD is never delivered to such other cores, since NMI is latched and unblocked only by IRET. We end up with hard-locked machine in this case even after 'power diag'.
The patch tries to handle the problem by introducing the pseudo-lock for simultaneous attempts to handle NMIs. If one core happens to enter NMI trap handler, other cores see it and simulate reception of the IPI_STOP_HARD. More, generic_stop_cpus() avoids sending IPI_STOP_HARD, relying on the nmi handler reporting the stop.
Since it is impossible to detect at runtime whether some stray NMI is broadcast or unicast, patch adds a knob for administrator (really developer) to configure debugging NMI handling mode.