Page MenuHomeFreeBSD

Discard the correct thread event reported for a ptrace stop.
ClosedPublic

Authored by jhb on Oct 26 2017, 9:58 AM.
Tags
None
Referenced Files
F106064582: D12794.diff
Tue, Dec 24, 7:15 PM
Unknown Object (File)
Sun, Dec 8, 2:46 PM
Unknown Object (File)
Mon, Nov 25, 6:30 PM
Unknown Object (File)
Mon, Nov 25, 12:21 PM
Unknown Object (File)
Nov 20 2024, 11:24 PM
Unknown Object (File)
Nov 1 2024, 2:20 PM
Unknown Object (File)
Oct 21 2024, 9:58 AM
Unknown Object (File)
Oct 13 2024, 5:44 PM
Subscribers

Details

Summary

When multiple threads wish to report a tracing event to a debugger,
both threads call ptracestop() and one thread will win the race to be
the reporting thread (p->p_xthread). The debugger uses PT_LWPINFO
with the process ID to determine which thread / LWP is reporting an
event and the details of that event. This event is cleared as a side
effect of the subsequent ptrace event that resumed the process
(PT_CONTINUE, PT_STEP, etc.). However, ptrace() was clearing the
event identified by the LWP ID passed to the resume request even if
that wasn't the 'p_xthread'. This could result in clearing an event
that had not yet been observed by the debugger and leaving the
existing event for 'p_thread' pending so that it was reported a second
time.

Specifically, if the debugger stopped due to a software breakpoint in
one thread, but then switched to another thread that was used to
resume (e.g. if the user switched to a different thread and issued a
step), the resume request (PT_STEP) cleared a pending event (if any)
for the thread being stepped. However, the process immediately
stopped and the first thread reported it's breakpoint event a second
time. The debugger decremented the PC for "both" breakpoint events
which resulted in the PC now pointing into the middle of an
instruction (on x86) and a SIGILL fault when the process was resumed a
second time.

To fix, always clear the pending event for 'p_xthread' when resuming a
process. ptrace() still honors the requested LWP ID when enabling
single-stepping (PT_STEP) or setting a different PC (PT_CONTINUE).

Test Plan
  • The gdb.threads/continue-pending-status.exp test no longer fails with SIGILL which it reliably did before. (It now fails later on for a different reason, but not with a SIGILL.)
  • Attached test case also fails on old kernels and works on new ones.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 12237
Build 12529: arc lint + arc unit

Event Timeline

sys/kern/sys_process.c
789

I wonder should we do

td2 = p->p_xthread;
if (td2 == NULL)
     td2 = FIRST_THREAD_IN_PROC(p);

there ? Can we get into sendsig: label without P_STOPPED_TRACE set ?

1139

The trace line should be removed for commit, it is amd64-MD.

1327

Is this related ?

sys/kern/sys_process.c
789

Hmm, possibly. I think that is probably better than just 'first thread'.

1139

Oh sorry, that is definitely a leak (I have some other debugging in other files I need to drop as well).

1327

Oops, no, it is probably a separate commit. I might should send it to secteam@ actually.

sys/kern/sys_process.c
1327

I have this patch which I am sitting on for around 2 or 3 months: D12796. I was asked by secteam about some other patch about the same thing, and since your bzero() appears to be the third patch for the same issue, it should be closed some way or another.

jhb marked an inline comment as done.
  • Remove debugging.
sys/kern/sys_process.c
789

I mean, wouldn't the new line after sendsig: not needed if we compute td2 this way here.

sys/kern/sys_process.c
789

Actually, no. (I had misread the context). While we can get into sendsig: without P_STOPPED_TRACE perhaps (if you PT_DETACH while it isn't stopped), in that case td2 doesn't matter as you only clear TDB_XSIG (which is what clears the thread event) if P_STOPPED_TRACE is set.

789

No, the actual bug case in both gdb and the added test case is that T1 reports a stop but we want to resume the process by doing a PT_STEP of T2. Currently when we do ptrace(PT_STEP, T2, ..), we clear any pending event (TDB_XSIG) for T2 even though the debugger hasn't seen it yet (it just used ptrace(PT_LWPINFO, pid, ...) which returned a ptrace_lwpinfo with pl_lwpid == T1), and we fail to clear the event for T1, so T1 loops around in ptracestop() and sees TDB_XSIG still set and tries again, so the debugger again sees an event from T1. I feel that in the case of PT_STEP of T2, we should clear the event for T1 that triggered the stop and was reported to the debugger. And in particular in the case of PT_STEP T2, I want td2 to be T2 for the ptrace_single_step(), but only switch to p_xthread when we are going to clear TDB_XSIG in sendsig:.

This revision is now accepted and ready to land.Oct 26 2017, 5:02 PM
This revision was automatically updated to reflect the committed changes.