Details

Reviewers

kib
jhb
markj
jhibbits
nwhitehorn
rwatson

Commits

rS332752: set kdb_why to "trap" when calling kdb_trap from trap_fatal

Summary

This will allow to hook a ddb script to "kdb.enter.trap" event.
Previously there was no specific name for this event, so it could only
be handled by either "kdb.enter.unknown" or "kdb.enter.default" hooks.
Both are very unspecific.

Having a specific event is useful because the fatal trap condition is
very similar to panic but it has an additional property that the current
stack frame is the frame where the trap occurred. So, both a register
dump and a stack bottom dump have additional information that can help
analyze the problem.

I have added the event only on architectures that have trap_fatal()
function defined. I haven't looked at other architectures.
Their maintainers can add support for the event later.

Sample script:
kdb.enter.trap=bt; show reg; x/aS $rsp,20; x/agx $rsp,20

Test Plan

Tested on amd64.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

avg created this revision.Apr 16 2018, 1:32 PM

Herald added a subscriber: imp. · View Herald TranscriptApr 16 2018, 1:32 PM

Harbormaster completed remote builds in B16161: Diff 41518.Apr 16 2018, 1:32 PM

kib accepted this revision.Apr 16 2018, 1:54 PM

kib added inline comments.

sys/amd64/amd64/trap.c
816 ↗	(On Diff #41518)	Please move locals at the start of the function (under #ifdef). Also, one line is enough, they are both bool.

This revision is now accepted and ready to land.Apr 16 2018, 1:54 PM

style change suggested by kib

This revision now requires review to proceed.Apr 16 2018, 2:13 PM

Harbormaster completed remote builds in B16164: Diff 41525.Apr 16 2018, 2:13 PM

markj accepted this revision.Apr 16 2018, 2:13 PM

This revision is now accepted and ready to land.Apr 16 2018, 2:13 PM

fix up the previous update, new variables should be under KDB

This revision now requires review to proceed.Apr 16 2018, 2:19 PM

Harbormaster completed remote builds in B16165: Diff 41526.Apr 16 2018, 2:19 PM

jhb added inline comments.Apr 16 2018, 5:53 PM

sys/powerpc/powerpc/trap.c
460 ↗	(On Diff #41526)	The renter logic is probably excessive since kdb_trap() just returns immediately. I'd argue that we should fix the calling code here to only call kdb_trap if !kdb_active. That would permit using a simpler code flow: if (debugger_on_panic && kdb_active == 0) { kdb_why = KDB_WHY_TRAP; handled = kdb_trap(...); kdb_why = KDB_WHY_UNSET; if (handled) return; } BTW, I wonder if this feature is why one can get stuck sometimes where 'c' from kdb keeps reporting the same panic (because handled is true so we retry the same instruction and never get around to actually panic'ing and writing out the dump)? In that case it seems like a bit of a misfeature.

avg added inline comments.Apr 17 2018, 5:25 AM

sys/powerpc/powerpc/trap.c
460 ↗	(On Diff #41526)	Thank you for pointing this out. I didn't pay attention to `kdb_trap` vs `kdb_active`. From a further code inspection it seems that `kdb_active` should never be set in this code path. If a trap happens while already in kdb, then a check in `trap()` should catch that and divert to `kdb_reenter`. And if we were not in kdb and kdb entry was initiated in a different context, then `kdb_active` is set only after the world is frozen (via `stop_cpus_hard`). So, I do not see how `kdb_active` could possibly be set when this code runs. I guess I'll just remove the check?

avg added inline comments.Apr 17 2018, 5:28 AM

sys/powerpc/powerpc/trap.c
460 ↗	(On Diff #41526)	BTW, I wonder if this feature is why one can get stuck sometimes where 'c' from kdb keeps reporting the same panic (because handled is true so we retry the same instruction and never get around to actually panic'ing and writing out the dump)? I am not sure if that's directly related. Actually, the current behavior seems reasonable. What would you expect kdb to do differently in that scenario?

jhb added inline comments.Apr 17 2018, 3:43 PM

sys/powerpc/powerpc/trap.c
460 ↗	(On Diff #41526)	Normally if you call panic() from anywhere else, kdb_trap is invoked as a result of a breakpoint() in kdb_enter(). When the trap handler returns after kdb_trap() returns, execution resumes post-breakpoint so it continues inside of panic() and writes out the dump. However, if you have a fatal trap, then kdb is entered here before panic is called. If you continue from here, the trap handler returns without calling panic and just retries the instruction so you are stuck in an infinite loop with no way to break out unless you manually modify $PC in ddb before continuing or some other evil. You have to explicitly call 'dump' to get a working dump unlike every other panic() in the kernel. That is what I find unreasonable and inconsistent. Just compare what happens if you do 'sysctl debug.kdb.panic=1' and then continue with doing 'sysctl debug.kdb.trap=1'.

avg added inline comments.Apr 17 2018, 4:34 PM

sys/powerpc/powerpc/trap.c
460 ↗	(On Diff #41526)	I see now... But what if, while in kdb, you modified memory and fixed the cause the trap and you want 'continue' to actually continue and not panic? Quite far fetched, I know. In any case, if we want to change / fix that behavior that would be a separate change. I'll handle the `kdb_active` check in this review.

jhb added inline comments.Apr 17 2018, 5:30 PM

sys/powerpc/powerpc/trap.c
460 ↗	(On Diff #41526)	Yes, I think whether or not to keep the kdb_trap is a topic for a different review. My thoughts on that are: I tried having a more generic version of that via 'options RESTARTABLE_PANICS' but I ended up removing it as I didn't really find any compelling use cases. Of course, there is the really gross kassert_panic() hack in the tree instead but it just blindly continues rather than requiring user intervention. In practice, I don't know that anyone is really going to be fixing fatal faults on the fly in ddb, so I would probably favor just removing it. I think if we keep it we should add a separate knob to enable it's use (debugger_on_trap or the like) instead of overloading debugger_on_panic so that the default behavior out of the box is consistent. If someone wants to try to fix fatal traps in ddb, they can enable the new sysctl and deal with the unusual behavior.

Don't check for kdb reentry in trap_fatal(), it's impossible.
trap() checks for it earlier and calls kdb_reentry().

Harbormaster completed remote builds in B16211: Diff 41620.Apr 18 2018, 3:26 PM

rebase on the latest head

Harbormaster completed remote builds in B16213: Diff 41628.Apr 18 2018, 4:27 PM

jhb accepted this revision.Apr 18 2018, 7:19 PM

This revision is now accepted and ready to land.Apr 18 2018, 7:19 PM

Closed by commit rS332752: set kdb_why to "trap" when calling kdb_trap from trap_fatal (authored by avg). · Explain WhyApr 19 2018, 5:07 AM

This revision was automatically updated to reflect the committed changes.

set kdb_why to "trap" when calling kdb_trap from trap_fatal
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 41646

head/sys/amd64/amd64/trap.c

head/sys/i386/i386/trap.c

head/sys/powerpc/powerpc/trap.c

head/sys/sys/kdb.h

set kdb_why to "trap" when calling kdb_trap from trap_fatalClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 41646

head/sys/amd64/amd64/trap.c

head/sys/i386/i386/trap.c

head/sys/powerpc/powerpc/trap.c

head/sys/sys/kdb.h

set kdb_why to "trap" when calling kdb_trap from trap_fatal
ClosedPublic
Actions

Revision Contents
Changeset List