Page MenuHomeFreeBSD

Add an option for entering KDB on recursive panics
ClosedPublic

Authored by mhorne on Nov 18 2020, 9:04 PM.
Tags
None
Referenced Files
Unknown Object (File)
Oct 20 2024, 10:56 AM
Unknown Object (File)
Oct 20 2024, 10:55 AM
Unknown Object (File)
Oct 20 2024, 10:55 AM
Unknown Object (File)
Oct 20 2024, 10:55 AM
Unknown Object (File)
Oct 20 2024, 10:15 AM
Unknown Object (File)
Oct 5 2024, 4:47 PM
Unknown Object (File)
Oct 5 2024, 12:49 PM
Unknown Object (File)
Oct 2 2024, 2:26 PM
Subscribers

Details

Summary

There are many cases where one would choose avoid entering the debugger
on a normal panic, opting instead to reboot and possibly save a kernel
dump. However, recursive kernel panics are an unusual case that might
warrant attention from a human, so provide a secondary tunable,
debug.debugger_on_recursive_panic, to allow entering the debugger only
when this occurs.

For for simplicity in maintaining existing behaviour, the tunable
defaults to zero.

Test Plan

Insert a secondary panic in the vpanic code path, and check that the debugger is entered the second time with debug.debugger_on_panic=0, debug.debugger_on_recursive_panic=1.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

mhorne created this revision.

The idea seems good to me. It seems like the implementation may also be tripped by concurrent panic, not just recursive panic. I've usually (only?) seen concurrent panics in the context of unhandled MCA/MCE exceptions, and maybe that was an oddity of that particular FreeBSD derivative. I don't think that's a significant (or new) problem with this change.

A few minor man page suggestions.

share/man/man4/ddb.4
29 ↗(On Diff #79727)

Don't forget to bump Dd :-)

95 ↗(On Diff #79727)

I think .Dv is more canonical here.

96 ↗(On Diff #79727)

Maybe s/visible//?

This revision is now accepted and ready to land.Nov 18 2020, 9:36 PM
In D27271#609086, @cem wrote:

The idea seems good to me. It seems like the implementation may also be tripped by concurrent panic, not just recursive panic. I've usually (only?) seen concurrent panics in the context of unhandled MCA/MCE exceptions, and maybe that was an oddity of that particular FreeBSD derivative. I don't think that's a significant (or new) problem with this change.

stop_cpus_hard() is supposed to provide serialization of concurrent panics, but I'm not sure it can work reliably in the face of NMIs.

In D27271#609086, @cem wrote:

The idea seems good to me. It seems like the implementation may also be tripped by concurrent panic, not just recursive panic. I've usually (only?) seen concurrent panics in the context of unhandled MCA/MCE exceptions, and maybe that was an oddity of that particular FreeBSD derivative. I don't think that's a significant (or new) problem with this change.

stop_cpus_hard() is supposed to provide serialization of concurrent panics, but I'm not sure it can work reliably in the face of NMIs.

Yeah, seems like a tricky problem to handle completely. I suppose an NMI could be triggered on any CPU, including the panic CPU? As long as newpanic = 0 executes then the recursive panic case will be triggered, which I would think is desirable for the unusual case of a concurrent panic.

This revision now requires review to proceed.Nov 19 2020, 3:32 PM
This revision is now accepted and ready to land.Nov 19 2020, 5:47 PM
This revision was automatically updated to reflect the committed changes.