Page MenuHomeFreeBSD

Add an option for entering KDB on recursive panics
ClosedPublic

Authored by mhorne on Nov 18 2020, 9:04 PM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 21 2023, 12:18 PM
Unknown Object (File)
Dec 20 2023, 3:01 AM
Unknown Object (File)
Dec 19 2023, 3:07 AM
Unknown Object (File)
Nov 7 2023, 11:40 PM
Unknown Object (File)
Oct 6 2023, 10:23 PM
Unknown Object (File)
Sep 2 2023, 8:02 PM
Unknown Object (File)
Sep 2 2023, 8:02 PM
Unknown Object (File)
Sep 2 2023, 7:57 PM
Subscribers

Details

Summary

There are many cases where one would choose avoid entering the debugger
on a normal panic, opting instead to reboot and possibly save a kernel
dump. However, recursive kernel panics are an unusual case that might
warrant attention from a human, so provide a secondary tunable,
debug.debugger_on_recursive_panic, to allow entering the debugger only
when this occurs.

For for simplicity in maintaining existing behaviour, the tunable
defaults to zero.

Test Plan

Insert a secondary panic in the vpanic code path, and check that the debugger is entered the second time with debug.debugger_on_panic=0, debug.debugger_on_recursive_panic=1.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 34891
Build 31912: arc lint + arc unit

Event Timeline

mhorne created this revision.

The idea seems good to me. It seems like the implementation may also be tripped by concurrent panic, not just recursive panic. I've usually (only?) seen concurrent panics in the context of unhandled MCA/MCE exceptions, and maybe that was an oddity of that particular FreeBSD derivative. I don't think that's a significant (or new) problem with this change.

A few minor man page suggestions.

share/man/man4/ddb.4
29

Don't forget to bump Dd :-)

95

I think .Dv is more canonical here.

96

Maybe s/visible//?

This revision is now accepted and ready to land.Nov 18 2020, 9:36 PM
In D27271#609086, @cem wrote:

The idea seems good to me. It seems like the implementation may also be tripped by concurrent panic, not just recursive panic. I've usually (only?) seen concurrent panics in the context of unhandled MCA/MCE exceptions, and maybe that was an oddity of that particular FreeBSD derivative. I don't think that's a significant (or new) problem with this change.

stop_cpus_hard() is supposed to provide serialization of concurrent panics, but I'm not sure it can work reliably in the face of NMIs.

In D27271#609086, @cem wrote:

The idea seems good to me. It seems like the implementation may also be tripped by concurrent panic, not just recursive panic. I've usually (only?) seen concurrent panics in the context of unhandled MCA/MCE exceptions, and maybe that was an oddity of that particular FreeBSD derivative. I don't think that's a significant (or new) problem with this change.

stop_cpus_hard() is supposed to provide serialization of concurrent panics, but I'm not sure it can work reliably in the face of NMIs.

Yeah, seems like a tricky problem to handle completely. I suppose an NMI could be triggered on any CPU, including the panic CPU? As long as newpanic = 0 executes then the recursive panic case will be triggered, which I would think is desirable for the unusual case of a concurrent panic.

This revision now requires review to proceed.Nov 19 2020, 3:32 PM
This revision is now accepted and ready to land.Nov 19 2020, 5:47 PM
This revision was automatically updated to reflect the committed changes.