Add stack_save_td_running(9).
ClosedPublic
Actions

Authored by markj on Jul 31 2015, 1:22 AM.

Details

Reviewers

kib
jhb
bdrewery

Commits

rS287645: Add stack_save_td_running(), a function to trace the kernel stack of a

Summary

This function allows the caller to capture the kernel stack of a running
thread. It is currently only implemented on i386 and amd64; on these
architectures, it is implemented by raising an NMI on the CPU on which
the target thread is currently running. Unlike stack_save_td(9), it may
fail, e.g. if the target thread is running in user mode.

Have kern.proc.kstack use this function for running threads.

Test Plan

procstat -kk shows stacks for running threads on amd64.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Event Timeline

markj updated this revision to Diff 7537.Jul 31 2015, 1:22 AM

markj retitled this revision from to Add stack_save_td_running(9)..

markj edited the test plan for this revision. (Show Details)

markj updated this object.

Herald added subscribers: emaste, andrew, imp. · View Herald TranscriptJul 31 2015, 1:22 AM

markj edited the test plan for this revision. (Show Details)Jul 31 2015, 1:25 AM

markj added reviewers: bdrewery, jhb, kib.

kib accepted this revision.Jul 31 2015, 3:35 AM

kib edited edge metadata.

kib added inline comments.

sys/amd64/amd64/trap.c
217	The only minor note I have, would it be safer to process both pmc and stack callbacks, instead of ignoring stack if the pmc consumed the interrupt ? Hmm, this might be even required for pmc and stack coexistence.

This revision is now accepted and ready to land.Jul 31 2015, 3:35 AM

markj added inline comments.Aug 1 2015, 12:58 AM

sys/amd64/amd64/trap.c
217	From my reading of the PMC interrupt handlers, the NMI will not be consumed unless at least one counter is found to have raised an interrupt. That is, an NMI raised by the stack code may be consumed by pmc, but in this case a second NMI will be pending, so the stack handler should eventually run. I may be misunderstanding your concern though. I'll do some testing with PMCs enabled.

kib added inline comments.Aug 1 2015, 5:30 AM

sys/amd64/amd64/trap.c
217	I think that you understood my concern correctly.

jhb added inline comments.Aug 1 2015, 5:07 PM

sys/amd64/amd64/trap.c
207	I have talked with Scott Long about adding a better NMI handler and moving all NMI handling to one place (right now some NMIs are handled here, and others are in duplicate cases in the switch statement later in trap()). Just FYI.

I think this is probably fine. Originally I had figured that doing precise runtime stacks would be expensive and was going to ask for a runtime sysctl to enable/disable it. Now I think it may not be all that expensive.

Much better than the other routes.

sys/amd64/amd64/trap.c
207	I've thought a bit about adding an x86 KPI that allows drivers/subsystems to register an NMI handler. When an NMI is received, each handler would get a chance to taste the NMI until one of them decides to consume it. (And if none of them do, panic.) I can try implementing that, but it'd be nice to have some more use-cases in mind.

jhb added inline comments.Aug 5 2015, 4:16 AM

sys/amd64/amd64/trap.c
207	I talked with Scott about this a bit and the problem is that there is an explicit order in the handlers. It's probably a bit simpler to just hardcode that order in an NMI handler, but we can at least move the logic out of trap() and avoid duplicating it.

markj added inline comments.Aug 7 2015, 1:02 AM

sys/amd64/amd64/trap.c
207	That makes sense to me. Are you referring to the duplication between i386 and amd64? If so, how about moving it into x86/intr_machdep.c? Or maybe create x86/nmi.c.

jhb added inline comments.Aug 8 2015, 4:02 PM

sys/amd64/amd64/trap.c
207	Well, there's duplication in trap() itself where the switch statement for both user and kernel cases has some of the NMI handling. It certainly might make sense to move the nmi handler to sys/x86 as well though. Maybe the cpu_machdep.c file kib added would make sense? I do think as part of this that trap should start with a switch statement here for "global" traps including NMI and MCE. The call to kdb_reenter() does make this a bit messy though. I think MCE should not trigger a kdb reenter so should move up. We can make the NMI handler handle kdb_reenter explicitly if there are some NMIs that shouldn't interrupt the debugger. However T_RESERVED traps probably should not interrupt the debugger. /* Handle traps for hardware faults. / switch (type) { case T_MCHK: mca_intr(); goto out; case T_NMI: nmi_intr(); goto out; } / kdb_reenter / / T_RESERVED / / PSL_I check and rest of trap */