Details

Reviewers

manu
kib
mjg

Group Reviewers

manpages

Commits

rS357334: Reimplement stack capture of running threads on i386 and amd64.

Summary

Change stack_save_td() to support stack capture for running threads, and
introduce a return value.

Reimplement stack capture for running threads on x86 using
smp_rendezvous instead of an NMI. It has become too difficult to deal
with all of the possible scheduler states, and we cannot capture a stack
while interrupts are disabled anyway.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

markj created this revision.Jan 24 2020, 10:37 PM

Herald added a reviewer: manu. · View Herald TranscriptJan 24 2020, 10:37 PM

Herald added subscribers: jhibbits, emaste, andrew. · View Herald Transcript

Harbormaster completed remote builds in B28903: Diff 67266.Jan 24 2020, 10:37 PM

markj added reviewers: kib, mjg.Jan 24 2020, 10:38 PM

Isn't the code to capture stack trace become MI ?

sys/amd64/amd64/trap.c
55 ↗	(On Diff #67266)	I this line still needed ?
sys/x86/x86/stack_machdep.c
109 ↗	(On Diff #67266)	I think this would leave out almost all threads. The issue is that we set up td_frame e.g. on syscall entry and keep it there. Fast interrupts and some MD handlers e.g. local timer when calling into MI code, set up td_intr_frame but do not touch td_frame. Also td_frame could be invalid or NULL for kernel threads that never trap from userspace.

markj added inline comments.Jan 25 2020, 8:14 PM

sys/x86/x86/stack_machdep.c
109 ↗	(On Diff #67266)	I meant to write td_intr_frame, but even that is not enough since we do not set it for rendezvous interrupts. One solution would be to re-add IPI_TRACE and handle it from ipi_bitmap_handler(), but then the code would no longer be MI. Another possibility is to modify MD callers of smp_rendezvous_action() to pass a trapframe, and have smp_rendezvous_action() use it to set td_intr_frame. Do you see any problems with the second approach?

kib added inline comments.Jan 26 2020, 8:02 PM

sys/x86/x86/stack_machdep.c
109 ↗	(On Diff #67266)	I think #2 would work, but I somewhat irrationally do not like it. That said, why do you need to check for usermode interruption ? It is displaying something like <empty> or <usermode> vs. one- or two-frames stack.

markj marked an inline comment as done.Jan 27 2020, 1:36 AM

markj added inline comments.

sys/x86/x86/stack_machdep.c
109 ↗	(On Diff #67266)	It is not really necessary to check, since we validate that the frame pointer lies within the kernel stack. But, we still need a way to get the interrupted thread's frame pointer. The use of td_pcb below is also wrong for running threads. The validation of the FP is MD and I do not think that all implementations of stack_capture() check this. So I suspect now that it is preferable to keep the implementation x86-specific and use a new IPI vector.

kib added inline comments.Jan 27 2020, 3:22 PM

sys/x86/x86/stack_machdep.c
109 ↗	(On Diff #67266)	Well, ok.

mjg added inline comments.Jan 28 2020, 8:47 AM

sys/x86/x86/stack_machdep.c
143 ↗	(On Diff #67266)	Is there anything apart from thread lock being held here which causes the stack_save_td caller to run with interrupts disabled? I think there are 2 issues here: callers should be required to not have interrutps disabled, perhaps modulo except for thread lock. then there is no need to check spinlock_count in the first place. since thread lock is dropped nothing prevents the target thread from exiting and getting reallocated. I don't see a good way to either prevent it or verify it did not happen. The least which can be done is validating in the dumping handler that the thread belongs to the expected process (by that i mean both the pointer and process start time match -- see microuptime(&p2->p_stats->p_start) in fork).

mjg added inline comments.Jan 28 2020, 9:03 AM

sys/x86/x86/stack_machdep.c
143 ↗	(On Diff #67266)	I just realized the caller is already required to not hold any other spinlocks since smp_rendezvous_cpu must be called with interrupts enabled. Therefore the spinlock_count check can be just removed as it is.

mjg added inline comments.Jan 28 2020, 9:05 AM

sys/x86/x86/stack_machdep.c
124 ↗	(On Diff #67266)	Is this really assertable? can the thread become swapped after the lock is dropped below? looks like this should be just checked

markj added inline comments.Jan 28 2020, 2:52 PM

sys/x86/x86/stack_machdep.c
124 ↗	(On Diff #67266)	As noted below, the caller prevents this either by holding the proc lock, or by ensuring that the thread is not already running, in which case the thread lock ensures that it cannot transition to TDS_SWAPPED.
143 ↗	(On Diff #67266)	I made a lot of mistakes in this diff. You are right that we do not need to check spinlock_count. I meant to check curthread's spinlock count, not td's, but it is not needed anyway. In cases where the thread may be running, we hold the process lock, which prevents both swapping and exiting. I will add an assertion for this.

Switch to using ipi_cpu(). This requires more synchronization that was previously provided by smp_rendezvous.
Distinguish between empty stacks (e.g., for interrupt threads that are in the scheduler but have never executed) and threads running in user mode. I think this distinction is useful in procstat output.

Harbormaster completed remote builds in B28990: Diff 67422.Jan 28 2020, 2:56 PM

kib added inline comments.Jan 28 2020, 4:59 PM

sys/x86/x86/stack_machdep.c
103 ↗	(On Diff #67422)	I think you do not need intr_success, it is enough to check that the stack was indeed captured in stack_save_td().
134 ↗	(On Diff #67422)	Wouldn't it better to gather all asserts at the start of the function, so that the consumer not miss anything in casual testing ?
143 ↗	(On Diff #67422)	Don't you need to clear intr_pending before ipi ? Also I would suggest to prefix all static vars in this file with e.g. stack_, since they are visible in debugger.

Apply kib's suggestions.

Harbormaster completed remote builds in B29001: Diff 67435.Jan 28 2020, 9:34 PM

kib accepted this revision.Jan 28 2020, 10:08 PM

This revision is now accepted and ready to land.Jan 28 2020, 10:08 PM

This removes the public kernel function stack_save() from some of the platforms.

I don't think this is intended.

Remove just stack_save_td_running(), not everything between it and EOF.

This revision now requires review to proceed.Jan 28 2020, 11:53 PM

Harbormaster completed remote builds in B29005: Diff 67441.Jan 28 2020, 11:53 PM

kib accepted this revision.Jan 29 2020, 11:32 AM

This revision is now accepted and ready to land.Jan 29 2020, 11:32 AM

jhibbits added inline comments.Jan 29 2020, 2:58 PM

sys/powerpc/powerpc/stack_machdep.c
111 ↗	(On Diff #67441)	Don't you want to remove stack_save_td_running() and keep stack_save()?

Closed by commit rS357334: Reimplement stack capture of running threads on i386 and amd64. (authored by markj). · Explain WhyJan 31 2020, 3:43 PM

This revision was automatically updated to reflect the committed changes.

markj added a commit: rS357334: Reimplement stack capture of running threads on i386 and amd64..

Herald added a reviewer: manpages. · View Herald TranscriptJan 31 2020, 3:43 PM

Herald added subscribers: rpokala, imp. · View Herald Transcript

Reimplement stack capture of running threads.
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 67552

head/share/man/man9/stack.9

head/sys/amd64/amd64/trap.c

head/sys/arm/arm/stack_machdep.c

head/sys/arm64/arm64/stack_machdep.c

head/sys/i386/i386/trap.c

head/sys/kern/kern_proc.c

head/sys/kern/subr_kdb.c

head/sys/kern/subr_sleepqueue.c

head/sys/kern/tty_info.c

head/sys/mips/mips/stack_machdep.c

head/sys/powerpc/powerpc/stack_machdep.c

head/sys/riscv/riscv/stack_machdep.c

head/sys/sparc64/sparc64/stack_machdep.c

head/sys/sys/stack.h

head/sys/x86/include/apicvar.h

head/sys/x86/include/stack.h

head/sys/x86/x86/mp_x86.c

head/sys/x86/x86/stack_machdep.c

Reimplement stack capture of running threads.ClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 67552

head/share/man/man9/stack.9

head/sys/amd64/amd64/trap.c

head/sys/arm/arm/stack_machdep.c

head/sys/arm64/arm64/stack_machdep.c

head/sys/i386/i386/trap.c

head/sys/kern/kern_proc.c

head/sys/kern/subr_kdb.c

head/sys/kern/subr_sleepqueue.c

head/sys/kern/tty_info.c

head/sys/mips/mips/stack_machdep.c

head/sys/powerpc/powerpc/stack_machdep.c

head/sys/riscv/riscv/stack_machdep.c

head/sys/sparc64/sparc64/stack_machdep.c

head/sys/sys/stack.h

head/sys/x86/include/apicvar.h

head/sys/x86/include/stack.h

head/sys/x86/x86/mp_x86.c

head/sys/x86/x86/stack_machdep.c

Reimplement stack capture of running threads.
ClosedPublic
Actions

Revision Contents
Changeset List