This is a power saving feature. We tend to have a lot of machines and VMs in a panic state at any given time. Especially for overprovisioned VM infrastructure, it's nice to save some cycles. This reduces spinning from all cores to only one core in the panic state. We do not expect to desire to restart secondary CPUs after a panic has occurred.
Details
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
sys/amd64/amd64/mp_machdep.c | ||
---|---|---|
1463 | Why is this a loop rather than an if statement? | |
1464 | Is this actually needed? IPI_STOP_HARD is implemented by raising an NMI, so interrupts should already be disabled here. And IPI_STOP is handled through an interrupt gate, so interrupts will be disabled here in that case too. |
IMO this is a wrong directiion. This means that CPU cannot be directed to make any further work. There are two potential issues:
First, it is claimed that reboot on some (older ?) platforms can only be handled by the BSP. Otherwise, some bugs in ACPI (?) prevent reliable reboot. I had patches to make reboot from ddb to migrate to BSP before issuing reset. The patches are de-facto abandoned, and the worry might be classified as the speculation.
Second, more interesting, is the other set of my patches, which allow to migrate ddb session from the current core to arbitrary another core. This is needed to inspect local CPU resources, like MSRs and LAPIC registers. which by definition cannot be done remotely. I have the patches for some time, but I did not productized them due to lazyness.
The change is incompatible with both set of patches, since halted cpu with disabled maskable interrupts and NMI can be only revived by INIT IPI.
I am not sure what to do with this. Might be, either execute hlt if ddb is not compiled into the kernel. Or add a tunable to set for the hlt to be enabled.
sys/amd64/amd64/mp_machdep.c | ||
---|---|---|
1464 | The NMI IDT entry is set as the interrupt gate in the hammer_time(). The type of gate makes the CPU implicitly disable maskable interrupts on entry. NMI itself is implicitly disabled by the internal latch upon NMI deliver; the latch i cleared only by iretq. |
That's the idea. We're panicing.
There are two potential issues:
First, it is claimed that reboot on some (older ?) platforms can only be handled by the BSP. Otherwise, some bugs in ACPI (?) prevent reliable reboot. I had patches to make reboot from ddb to migrate to BSP before issuing reset. The patches are de-facto abandoned, and the worry might be classified as the speculation.
This sounds like a non-issue unless we can find such hardware. And if we do, we can of course fix it at that time.
Second, more interesting, is the other set of my patches, which allow to migrate ddb session from the current core to arbitrary another core. This is needed to inspect local CPU resources, like MSRs and LAPIC registers. which by definition cannot be done remotely. I have the patches for some time, but I did not productized them due to lazyness.
The change is incompatible with both set of patches, since halted cpu with disabled maskable interrupts and NMI can be only revived by INIT IPI.
I am not sure what to do with this. Might be, either execute hlt if ddb is not compiled into the kernel.
No, that would entirely defeat the purpose of the patch. We would like DDB to continue on one core, but the others don't seem to do anything useful after a panic.
Or add a tunable to set for the hlt to be enabled.
Seems like a non-issue too. When you eventually productize your out-of-tree patches, you could consider adding such a tunable. Until then, the patch seems fine as-is and the tunable is unnecessary.
sys/amd64/amd64/mp_machdep.c | ||
---|---|---|
1464 | Thanks. |