Avoid a double-panic caused by KASSERTs
ClosedPublic
Actions

Authored by jtl on Nov 3 2017, 12:15 AM.

Details

Reviewers

jhb
kib

Commits

rS332860: When running with INVARIANTS, the kernel contains extra checks. However,

Summary

When running with INVARIANTS, the kernel contains extra checks. However, these assumptions may not hold true once we've panic'd. Therefore, the checks hold less value after a panic. Additionally, if one of the checks fails while we are already panic'd, this creates a double-panic which can interfere with debugging the original panic.

Therefore, this change suppresses a response to KASSERT checks once we've panic'd.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

jtl created this revision.Nov 3 2017, 12:15 AM

Harbormaster completed remote builds in B12410: Diff 34699.Nov 3 2017, 12:15 AM

I have doubts that this is useful. Generally, code is not prepared to continue execution after the panic call. I agree with the opinion that a panic inside the panic could be some secondary violation caused by the first issue, but still it means that code is too broken to continue.

Did you considered either looping unconditionally, or under some tunable/sysctl control ? Another quirk that would be useful there, I suspect, is to (re-)enter the debugger if present, instead of plain return.

In D12920#268350, @kib wrote:

Generally, code is not prepared to continue execution after the panic call.

I agree that the code is not prepared to continue execution after the panic call. That is one of the reasons for skipping KASSERT checks in these circumstances. Once we have panic'd, the operation of the system has changed so much from the normal operation that it probably doesn't make sense to continue checking assertions.

I agree with the opinion that a panic inside the panic could be some secondary violation caused by the first issue, but still it means that code is too broken to continue.

It isn't just secondary violations (where one failed assertion leads to another failed assertion). The entire operation of the system has changed, and not all assertions are prepared to correctly check conditions both in normal operations and in panics. (And, in fairness, I don't think they should need to be prepared to operate correctly in both modes.)

I also don't think a failed KASSERT after the system has already panic'd necessarily means the system is too broken to continue taking a core dump. A second panic() may indicate the system is too broken to continue, and I don't propose changing that. But, KASSERTs are different. We sometimes add KASSERTS for things that "shouldn't happen", but will be fine for the limited operation during a panic. I don't think it makes sense to panic the system a second time because another assertion failed.

I have doubts that this is useful.

The whole reason for this change is that it is useful to us. We regularly see cases where two assertions fail. Once the second assertion fails, the system aborts taking a core dump and we lose our debugging information.

Did you considered either looping unconditionally, or under some tunable/sysctl control ? Another quirk that would be useful there, I suspect, is to (re-)enter the debugger if present, instead of plain return.

I did consider conditioning this on a tunable/sysctl (which defaulted to ignoring KASSERTs after a panic), just in case someone was truly trying to debug something related to core dumps or panics. I'm willing to add that, if you think it is important/useful. (If the second panic is allowed, the logic would be identical to that used for any other second panic. I believe re-entry to DDB is allowed, if configured through a sysctl; core dumps are not automatically taken.)

In D12920#268393, @jtl wrote:

I did consider conditioning this on a tunable/sysctl (which defaulted to ignoring KASSERTs after a panic), just in case someone was truly trying to debug something related to core dumps or panics. I'm willing to add that, if you think it is important/useful. (If the second panic is allowed, the logic would be identical to that used for any other second panic. I believe re-entry to DDB is allowed, if configured through a sysctl; core dumps are not automatically taken.)

At least I saw the re-entry several times. I do believe that the control would be not a waste of time to add it.

Added a sysctl to control whether KASSERTs are suppressed after a panic.

Harbormaster completed remote builds in B12421: Diff 34719.Nov 3 2017, 3:01 PM

In D12920#268423, @kib wrote:

At least I saw the re-entry several times. I do believe that the control would be not a waste of time to add it.

Done. Thanks for the suggestion!

kib accepted this revision.Nov 3 2017, 3:20 PM

This revision is now accepted and ready to land.Nov 3 2017, 3:20 PM

Closed by commit rS332860: When running with INVARIANTS, the kernel contains extra checks. However, (authored by jtl). · Explain WhyApr 21 2018, 5:05 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

Path

Size

head/

sys/

kern/

kern_shutdown.c

9 lines

Diff 41711

View Options

head/sys/kern/kern_shutdown.c

	Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
	size_t __elfN(populate_note)(int, void , void , size_t, void **);			size_t __elfN(populate_note)(int, void , void , size_t, void **);
	int __elfN(freebsd_copyout_auxargs)(struct image_params *, uintptr_t);			int __elfN(freebsd_copyout_auxargs)(struct image_params *, uintptr_t);
	void __elfN(puthdr)(struct thread , void , size_t, int, size_t, int);			void __elfN(puthdr)(struct thread , void , size_t, int, size_t, int);
	void __elfN(prepare_notes)(struct thread , struct note_info_list ,			void __elfN(prepare_notes)(struct thread , struct note_info_list ,
	size_t *);			size_t *);
	void __elfN(size_segments)(struct thread , struct sseg_closure , int);			void __elfN(size_segments)(struct thread , struct sseg_closure , int);
	size_t __elfN(register_note)(struct thread , struct note_info_list ,			size_t __elfN(register_note)(struct thread , struct note_info_list ,
	int, outfunc_t, void *);			int, outfunc_t, void *);
	bool __elfN(parse_notes)(const struct image_params , Elf_Note ,			bool __elfN(parse_notes)(const struct image_params , const Elf_Note ,
	const char , const Elf_Phdr ,			const char , const Elf_Phdr ,
	bool ()(const Elf_Note , void , bool ), void *);			bool ()(const Elf_Note , void , bool ), void *);

	/* Machine specific function to dump per-thread information. */			/* Machine specific function to dump per-thread information. */
	void __elfN(dump_thread)(struct thread , void , size_t *);			void __elfN(dump_thread)(struct thread , void , size_t *);

	extern int __elfN(fallback_brand);			extern int __elfN(fallback_brand);
	extern Elf_Brandnote __elfN(freebsd_brandnote);			extern Elf_Brandnote __elfN(freebsd_brandnote);
	extern Elf_Brandnote __elfN(kfreebsd_brandnote);			extern Elf_Brandnote __elfN(kfreebsd_brandnote);
	#endif /* _KERNEL */			#endif /* _KERNEL */

	#endif /* !_SYS_IMGACT_ELF_H_ */			#endif /* !_SYS_IMGACT_ELF_H_ */

Avoid a double-panic caused by KASSERTsClosedPublicActions