Page MenuHomeFreeBSD

Avoid a double-panic caused by KASSERTs
ClosedPublic

Authored by jtl on Nov 3 2017, 12:15 AM.
Tags
None
Referenced Files
Unknown Object (File)
Oct 24 2024, 4:57 PM
Unknown Object (File)
Oct 24 2024, 4:32 PM
Unknown Object (File)
Oct 10 2024, 9:56 PM
Unknown Object (File)
Oct 3 2024, 12:34 PM
Unknown Object (File)
Sep 20 2024, 11:00 AM
Unknown Object (File)
Sep 19 2024, 10:29 AM
Unknown Object (File)
Sep 9 2024, 3:15 AM
Unknown Object (File)
Sep 8 2024, 5:06 AM
Subscribers

Details

Summary

When running with INVARIANTS, the kernel contains extra checks. However, these assumptions may not hold true once we've panic'd. Therefore, the checks hold less value after a panic. Additionally, if one of the checks fails while we are already panic'd, this creates a double-panic which can interfere with debugging the original panic.

Therefore, this change suppresses a response to KASSERT checks once we've panic'd.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 12421
Build 12697: arc lint + arc unit

Event Timeline

I have doubts that this is useful. Generally, code is not prepared to continue execution after the panic call. I agree with the opinion that a panic inside the panic could be some secondary violation caused by the first issue, but still it means that code is too broken to continue.

Did you considered either looping unconditionally, or under some tunable/sysctl control ? Another quirk that would be useful there, I suspect, is to (re-)enter the debugger if present, instead of plain return.

In D12920#268350, @kib wrote:

Generally, code is not prepared to continue execution after the panic call.

I agree that the code is not prepared to continue execution after the panic call. That is one of the reasons for skipping KASSERT checks in these circumstances. Once we have panic'd, the operation of the system has changed so much from the normal operation that it probably doesn't make sense to continue checking assertions.

I agree with the opinion that a panic inside the panic could be some secondary violation caused by the first issue, but still it means that code is too broken to continue.

It isn't just secondary violations (where one failed assertion leads to another failed assertion). The entire operation of the system has changed, and not all assertions are prepared to correctly check conditions both in normal operations and in panics. (And, in fairness, I don't think they should need to be prepared to operate correctly in both modes.)

I also don't think a failed KASSERT after the system has already panic'd necessarily means the system is too broken to continue taking a core dump. A second panic() may indicate the system is too broken to continue, and I don't propose changing that. But, KASSERTs are different. We sometimes add KASSERTS for things that "shouldn't happen", but will be fine for the limited operation during a panic. I don't think it makes sense to panic the system a second time because another assertion failed.

I have doubts that this is useful.

The whole reason for this change is that it is useful to us. We regularly see cases where two assertions fail. Once the second assertion fails, the system aborts taking a core dump and we lose our debugging information.

Did you considered either looping unconditionally, or under some tunable/sysctl control ? Another quirk that would be useful there, I suspect, is to (re-)enter the debugger if present, instead of plain return.

I did consider conditioning this on a tunable/sysctl (which defaulted to ignoring KASSERTs after a panic), just in case someone was truly trying to debug something related to core dumps or panics. I'm willing to add that, if you think it is important/useful. (If the second panic is allowed, the logic would be identical to that used for any other second panic. I believe re-entry to DDB is allowed, if configured through a sysctl; core dumps are not automatically taken.)

In D12920#268393, @jtl wrote:

I did consider conditioning this on a tunable/sysctl (which defaulted to ignoring KASSERTs after a panic), just in case someone was truly trying to debug something related to core dumps or panics. I'm willing to add that, if you think it is important/useful. (If the second panic is allowed, the logic would be identical to that used for any other second panic. I believe re-entry to DDB is allowed, if configured through a sysctl; core dumps are not automatically taken.)

At least I saw the re-entry several times. I do believe that the control would be not a waste of time to add it.

Added a sysctl to control whether KASSERTs are suppressed after a panic.

In D12920#268423, @kib wrote:

At least I saw the re-entry several times. I do believe that the control would be not a waste of time to add it.

Done. Thanks for the suggestion!

This revision is now accepted and ready to land.Nov 3 2017, 3:20 PM
This revision was automatically updated to reflect the committed changes.