Page MenuHomeFreeBSD

thread_single(9): decline external requests for traced or debugger-stopped procs
ClosedPublic

Authored by kib on Mar 27 2024, 12:33 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Nov 30, 10:05 PM
Unknown Object (File)
Sat, Nov 30, 10:04 PM
Unknown Object (File)
Sat, Nov 30, 10:04 PM
Unknown Object (File)
Sat, Nov 30, 9:44 PM
Unknown Object (File)
Fri, Nov 8, 3:57 AM
Unknown Object (File)
Fri, Nov 8, 1:07 AM
Unknown Object (File)
Oct 22 2024, 7:03 AM
Unknown Object (File)
Oct 22 2024, 7:03 AM
Subscribers

Details

Summary
    Debugger has the powers to cause unbound delay in single-threading,
    which then blocks the threaded taskqueue. The reproducer is
`truss -f timeout 2 sleep 10`.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib requested review of this revision.Mar 27 2024, 12:33 PM
sys/kern/kern_thread.c
1249

So, the problem is that reap_kill_proc_locked() can block here while single-threading the debuggee? Then, shouldn't the checks come before this loop? Otherwise I do not quite understand the problem.

sys/kern/kern_thread.c
1249

This check must be atomic WRT setting the P_TOTAL_STOP flag. If done before the loop, which checks for other single-threading reasons, it would allow the race.

After the loop, we know that no other single-threading request can be active (it is blocked waiting for the process lock).

Then, the patch checks for P_TRACED so that debugger attempt to suspend all process threads (which is very similar but not covered by thread_single(), see sig_suspend_threads()) does not mess with our single-threading. This confusion, when two different suspensions (sig_suspend_threads() vs. P_TOTAL_STOP) act simultaneously, is the root cause of the reported hang.

This revision is now accepted and ready to land.Mar 30 2024, 2:20 PM
sys/kern/kern_thread.c
1249

I'm sorry, I just realized that my test isn't quite right, and looking again, I still don't understand the patch.

reap_kill_proc_locked() only calls thread_single() if P_STOPPED is not set in the target proc. So in particular, P_STOPPED_SINGLE is not set, so thread_single() will not drop the proc lock in the loop. I don't see how it can fix the problem reported by mjg.

sys/kern/kern_thread.c
1249

Why it is important is P_STOPPED_SINGLE set or not?

Let me formulate the goal of the patch differently: do not allow 'global' singlethreading (AKA singlethreading outside the process context) if the process is being debugged or stopped. The attributes of process being a debuggee is either P_TRACED or P_STOPPED_SIG flag.

sys/kern/kern_thread.c
1249

Might be, I understand your question. Mjg' issue is not a hang in the 'thrsgl' state. It is very different thing, which I called single-threading confusion between sig_suspend_threads() and P_TOTAL_STOP. Both requests utilize the same threads' state and same reporting (thread_suspend_one()), so if both come simultaneously, we might get some threads suspended due to ptracestop(), and other suspended due to single-threading. So my decision is to prevent P_STOP_ALL when the process is debugged.

Note that this is impossible to get into this state while single threading request comes from the process context.

sys/kern/kern_thread.c
1249

Thank you, I understand the bug better now and fixed my test case correspondingly.

Although I spent quite some time to understand the signal and ptrace machinery (earlier; I have spent some more for this review), I still don't understand it globally (and probably some details as well). I have a couple of remarks and questions that might improve our common understanding (at worst, hoping answers and info will at least improve mine).

Here are some details about the deadlock situation. This descriptions applies in exactly the same way on 13-STABLE, 14-STABLE, and CURRENT without the changes in this revision, unless otherwise noted.

  • "thread taskq"
    • Stack: mi_switch thread_suspend_switch thread_single reap_kill_proc_work taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline.
  • "truss"
    • Stack: mi_switch sleepq_catch_signals sleepq_wait_sig _sleep kern_wait6 sys_wait6 amd64_syscall fast_syscall_common
    • Process flags:
      • On 14, P: 0x10004002: P_INMEM, P_EXEC, P_CONTROLT, P2: 0
      • On 13, most probably the same, but didn't write it down.
    • Wchan: wait
    • Thread flags (CURRENT): TDF: 0x4c: TDF_CANSWAP, TDF_SINTR and TDF_INMEM, TDP: 0x100: TDP_SIGFASTBLOCK.
  • "timeout"
    • Stack: mi_switch _sleep reap_kill kern_procctl sys_procctl amd64_syscall fast_syscall_common
    • Process flags:
      • On 13, P: 0x10004002: P_INMEM, P_EXEC, P_CONTROLT, P2: 0
      • On 14 and CURRENT, same with additional: P: 0x800: P_TRACED
    • Thread flags (CURRENT): TDF: 0x44: TDF_CANSWAP, TDF_INMEM, TDP: 0x100: TDP_SIGFASTBLOCK.
    • Wchan: reapst
  • "sleep"
    • Stack: mi_switch thread_suspend_switch ptracestop amd64_syscall fast_syscall_common
    • Process flags:
      • On 13, P: 0x12084002: P_INMEM, P_TOTAL_STOP, P_STOPPED_SINGLE, P_EXEC, P_CONTROLT, P2: 0x80000: P2_REAPKILLED
      • On 14 and CURRENT, same with additional: P: 0x800: P_TRACED
    • Thread flags (CURRENT):
      • TDF: 0x244: TDF_ALLPROCSUSP, TDF_CANSWAP, TDF_INMEM
      • TDP: 0x100: TDP_SIGFASTBLOCK
      • TDP2: 0
      • TDB: 0xc010: TDB_BOUNDARY, TDB_SSWITCH, TDB_SCX.
      • TDI: 0x1: TDI_SUSPENDED
      • TDA: 0

We can see that "sleep" is blocked in ptracestop(), which switched it out of context. I'm assuming this is due to truss wanting to get signaled at system call entry or exit. If we follow the logic of ptracestop(), the (only) thread of "sleep" is switched out. It seems that we didn't go into the if block that sets P_STOPPED_SIG and P_STOPPED_TRACE and calls sig_suspend_threads(), because these flags are unset as can be seen from the report above (else something cleared them). However, although I wouldn't be surprised that (td->td_dbgflags & TDB_FSTP) != 0 is false (not at a first stop on attach), I'd expect the second part of the if guard, ((p->p_flag2 & P2_PTRACE_FSTP) == 0 && p->p_xthread == NULL) to be true, because p_xthread should generally be NULL . The following call to thread_suspend_switch() is where, according to the rest of the code in the function, I inferred "signaling" the debugger should happen, meaning waking it up while it waits in wait6() or similar. This last situation is exactly that in which the "truss" process above finds itself. So how can it still be blocked in wait6()? The only place where thread_suspend_switch() actually tries to wake the parent up is in a call to thread_stopped(), which checks that P_STOPPED_SIG is set and all threads (not including the calling one) are suspended before triggering the wakeup by calling childproc_stopped(). So no wake up may have been performed if P_STOPPED_SIG was indeed unset at this point (we only know for sure that it is so later, at the point of deadlock). Examining p_xthread for "sleep" at the point of deadlock gives NULL.

In the same process, p_singlethread points to "thread taskq" as expected. Additionally, presence of P_TOTAL_STOP and P_STOPPED_SINGLE suggests that reap_kill_proc_locked() indeed called thread_single() on "sleep" (which matches the call stack above for "thread taskq"). The loop inside called ast_sched_locked(td2, TDA_SUSPEND) with td2 being the only thread for "sleep" and then called thread_suspend_switch(), waiting to be resumed by "sleep" on ast() (via the code in thread_suspend_check()). But "sleep" never returned to userland, because it called ptracestop() and then suspended, waiting for the debugger to take action. Even if "sleep" was made to return to userland, it wouldn't execute ast_suspend(), as the call to thread_suspend_switch() cleared TDA_SUSPEND (which explains why TDA for "sleep" as reported above is 0): This is the first conflation problem. I think that part is relatively clear (if I may say so after all this text...).

The thing that isn't clear to me, however, is why the debugger ("truss") hasn't been notified by ptracestop(). In fact, my current theory is that it was, and then continued to executing ptrace(PT_SYSCALL), which in the end called ptrace_unsuspend() (after clearing p_xthread), which cleared P_STOPPED_TRACE and P_STOPPED_SIG of "sleep" and then called thread_unsuspend(). But because P_STOPPED_SINGLE is set, and the thread_single() comes from another process, thread_unsuspend() currently doesn't do anything to the debuggee ("sleep"). After this call, "truss" proceeds to wait6() again (exactly waitid() in eventloop()), hence the deadlock because "sleep" hasn't been resumed (and will never be).

Concerning the committed fix, I can see why the change works as expected for the reap_kill_proc_locked() caller (process is already stopped by the debugger, and we hold the proc lock, so the latter cannot restart the former in the meantime). However, I'm a little worried it is really correct for the other caller: stop_all_proc(). If I read the situation correctly, a stopped debuggee will now just prevent a machine from going into sleep. Getting 1 from thread_single() seems to mean that you can't suspend because some other suspension (or exit) is in progress. Before this change, I think getting 1 implied that the suspension would finish and then would be lifted (IIUC the change to have reap_kill() actually post the work to "thread taskq" instead of doing itself was in small part to preserve this invariant), but with the new code this doesn't appear to be true anymore.

What do you think?

Although I spent quite some time to understand the signal and ptrace machinery (earlier; I have spent some more for this review), I still don't understand it globally (and probably some details as well). I have a couple of remarks and questions that might improve our common understanding (at worst, hoping answers and info will at least improve mine).

Here are some details about the deadlock situation. This descriptions applies in exactly the same way on 13-STABLE, 14-STABLE, and CURRENT without the changes in this revision, unless otherwise noted.

  • "thread taskq"
    • Stack: mi_switch thread_suspend_switch thread_single reap_kill_proc_work taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline.
  • "truss"
    • Stack: mi_switch sleepq_catch_signals sleepq_wait_sig _sleep kern_wait6 sys_wait6 amd64_syscall fast_syscall_common
    • Process flags:
      • On 14, P: 0x10004002: P_INMEM, P_EXEC, P_CONTROLT, P2: 0
      • On 13, most probably the same, but didn't write it down.
    • Wchan: wait
    • Thread flags (CURRENT): TDF: 0x4c: TDF_CANSWAP, TDF_SINTR and TDF_INMEM, TDP: 0x100: TDP_SIGFASTBLOCK.
  • "timeout"
    • Stack: mi_switch _sleep reap_kill kern_procctl sys_procctl amd64_syscall fast_syscall_common
    • Process flags:
      • On 13, P: 0x10004002: P_INMEM, P_EXEC, P_CONTROLT, P2: 0
      • On 14 and CURRENT, same with additional: P: 0x800: P_TRACED
    • Thread flags (CURRENT): TDF: 0x44: TDF_CANSWAP, TDF_INMEM, TDP: 0x100: TDP_SIGFASTBLOCK.
    • Wchan: reapst
  • "sleep"
    • Stack: mi_switch thread_suspend_switch ptracestop amd64_syscall fast_syscall_common
    • Process flags:
      • On 13, P: 0x12084002: P_INMEM, P_TOTAL_STOP, P_STOPPED_SINGLE, P_EXEC, P_CONTROLT, P2: 0x80000: P2_REAPKILLED
      • On 14 and CURRENT, same with additional: P: 0x800: P_TRACED
    • Thread flags (CURRENT):
      • TDF: 0x244: TDF_ALLPROCSUSP, TDF_CANSWAP, TDF_INMEM
      • TDP: 0x100: TDP_SIGFASTBLOCK
      • TDP2: 0
      • TDB: 0xc010: TDB_BOUNDARY, TDB_SSWITCH, TDB_SCX.
      • TDI: 0x1: TDI_SUSPENDED
      • TDA: 0

We can see that "sleep" is blocked in ptracestop(), which switched it out of context. I'm assuming this is due to truss wanting to get signaled at system call entry or exit. If we follow the logic of ptracestop(), the (only) thread of "sleep" is switched out. It seems that we didn't go into the if block that sets P_STOPPED_SIG and P_STOPPED_TRACE and calls sig_suspend_threads(), because these flags are unset as can be seen from the report above (else something cleared them). However, although I wouldn't be surprised that (td->td_dbgflags & TDB_FSTP) != 0 is false (not at a first stop on attach), I'd expect the second part of the if guard, ((p->p_flag2 & P2_PTRACE_FSTP) == 0 && p->p_xthread == NULL) to be true, because p_xthread should generally be NULL . The following call to thread_suspend_switch() is where, according to the rest of the code in the function, I inferred "signaling" the debugger should happen, meaning waking it up while it waits in wait6() or similar. This last situation is exactly that in which the "truss" process above finds itself. So how can it still be blocked in wait6()? The only place where thread_suspend_switch() actually tries to wake the parent up is in a call to thread_stopped(), which checks that P_STOPPED_SIG is set and all threads (not including the calling one) are suspended before triggering the wakeup by calling childproc_stopped(). So no wake up may have been performed if P_STOPPED_SIG was indeed unset at this point (we only know for sure that it is so later, at the point of deadlock). Examining p_xthread for "sleep" at the point of deadlock gives NULL.

In the same process, p_singlethread points to "thread taskq" as expected. Additionally, presence of P_TOTAL_STOP and P_STOPPED_SINGLE suggests that reap_kill_proc_locked() indeed called thread_single() on "sleep" (which matches the call stack above for "thread taskq"). The loop inside called ast_sched_locked(td2, TDA_SUSPEND) with td2 being the only thread for "sleep" and then called thread_suspend_switch(), waiting to be resumed by "sleep" on ast() (via the code in thread_suspend_check()). But "sleep" never returned to userland, because it called ptracestop() and then suspended, waiting for the debugger to take action. Even if "sleep" was made to return to userland, it wouldn't execute ast_suspend(), as the call to thread_suspend_switch() cleared TDA_SUSPEND (which explains why TDA for "sleep" as reported above is 0): This is the first conflation problem. I think that part is relatively clear (if I may say so after all this text...).

The thing that isn't clear to me, however, is why the debugger ("truss") hasn't been notified by ptracestop(). In fact, my current theory is that it was, and then continued to executing ptrace(PT_SYSCALL), which in the end called ptrace_unsuspend() (after clearing p_xthread), which cleared P_STOPPED_TRACE and P_STOPPED_SIG of "sleep" and then called thread_unsuspend(). But because P_STOPPED_SINGLE is set, and the thread_single() comes from another process, thread_unsuspend() currently doesn't do anything to the debuggee ("sleep"). After this call, "truss" proceeds to wait6() again (exactly waitid() in eventloop()), hence the deadlock because "sleep" hasn't been resumed (and will never be).

That seems like a likely explanation.

Concerning the committed fix, I can see why the change works as expected for the reap_kill_proc_locked() caller (process is already stopped by the debugger, and we hold the proc lock, so the latter cannot restart the former in the meantime). However, I'm a little worried it is really correct for the other caller: stop_all_proc(). If I read the situation correctly, a stopped debuggee will now just prevent a machine from going into sleep.

I think you are right, and this represents a bug in the patch.

Getting 1 from thread_single() seems to mean that you can't suspend because some other suspension (or exit) is in progress. Before this change, I think getting 1 implied that the suspension would finish and then would be lifted (IIUC the change to have reap_kill() actually post the work to "thread taskq" instead of doing itself was in small part to preserve this invariant), but with the new code this doesn't appear to be true anymore.

What do you think?

The fundamental problem with P_TOTAL_STOP is that it is external to the stopped process but uses the same mechanism (suspending threads) as any kind of internal stop (single threading, SIGSTOP, ptracestop()). So having a thread suspended by P_TOTAL_STOP would make internal stop logic to consider the thread handled, and in reverse.

From your note, I agree that stop_all_proc() also has the same/similar issue. Skipping traced processes there is less desirable since it does allow for them to issue more activities which we want to surpress, while in the REAP_KILL case all bets are already off due to debugger. But still, it is probably a good enough solution.