Handle the first SIGSTOP from the PT_ATTACH specially: instead of waking up some thread in the target process, execute sig_suspend_threads() in the debugger. This is equivalent to the action of the xthread, except that all threads are suspended without interruption.
Details
- Reviewers
jhb markj - Commits
- rG1f8d845d1cdf: ptrace_test PT_SC_REMOTE: fix a race
rGa6b7d5cddacd: kern_sig.c: add chicken bit for old way of SIGSTOP handling on PT_ATTACH
rGecc662c749b1: PT_ATTACH: do not interrupt interruptible sleeps
rG4048ccc6ae37: kern_sig.c: extract the first stop handling into a helper
This passes all tests in ptrace_stop.c.
Also, the test program https://gist.github.com/kostikbel/77bbc3a78a19beed144dd9ef57d0df22 works as expected: the debugged child sleeps uninterruptible for 10 secs.
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
Test program I used for debugging https://gist.github.com/kostikbel/31ba4480542349d77e503ddcb2a10fa1
I will convert this to ATF after the discussion.
sys/kern/kern_sig.c | ||
---|---|---|
3299 | Can we call it sig_handle_first_stop()? | |
sys/kern/subr_sleepqueue.c | ||
729 ↗ | (On Diff #153198) | |
732 ↗ | (On Diff #153198) | Propagating the aborted sleep to the userspace boundary causes spurious EINTR returns from syscalls. |
733 ↗ | (On Diff #153198) | I guess it should be "handle signal processing here"? |
734 ↗ | (On Diff #153198) |
I think the general idea is ok. I wonder if there are some places where we are holding a lock across interruptible sleep though.
sys/kern/subr_sleepqueue.c | ||
---|---|---|
678 ↗ | (On Diff #153376) | Please add a comment pointing the reader to the larger comment in sleepq_timedwait_sig(). Or, can we move the SIGSTOP handling into a subroutine? static int sleepq_check_sigstop(struct thread *td, int rvals) { if ((td->td_flags & TDF_SBDRY) != 0 ...) return (rvals); if (sig_handle_sigstop(td)) { td->td_intrval = 0; sleepq_lock(wchan); return (0); } return (rvals); } and then rvals = sleepq_check_signals(); rvals = sleepq_check_sigstop(td, rvals); if (rvals != 0) return (rvals); |
Would we do that, it causes lock cascades. It was the significant problem with the interruptible nfs mounts, which prompted the SBDRY machinery.
To be clear, I think this will ensure that attaching while a thread is sleeping in nanosleep() will still finish the entire sleep before returning from the system call? That will fix several failures in gdb's testsuite if so. We can also remove the BUGS entry from the truss manpage.
This is quite a bit simpler than the approach I had partially implemented.
For the purposes of the ATF test, you could maybe alter the return value of the child process to return non-zero if it sees EINTR, and maybe fetch the time before/after and fail if the sleep is too short as well with a non-zero exit code? Then in the parent you can do a wait() with a timeout long enough for the child sleep() and check the exit value.
Put the stopped thread back to the sleepq when handling SIGSTOP after the wake.
There is a problem: we do not own the sleepq lock, so there is a window where the wake signal could be ignored. I did not convinced myself that this is safe (just because we are stopped anyway). Right now I do not see a way to fix it in the current approach because when the thread is woken, it is already removed from the queue.
John, what was your alternative approach for the fix?
I was wrong about the PT_SC_REMOTE test. There is the pre-existing bug that the child does PT_TRACE_ME and the parent does PT_ATTACH. They ops races, one of them is redundand.
Test becomes race-free and works after removal of extra PT_ATTACH.
Oh hmm. I thought both were necessary. It's not entirely clear from the documentation.
sys/kern/kern_sig.c | ||
---|---|---|
180 | Isn't this behaviour specific to SIGSTOP from ptrace()? |
Oh, it was tied up in a bigger batch of changes I had that I might still pursue to add support for non-stop debugging where only individual threads would stop instead of the entire process.
https://github.com/bsdjhb/freebsd/compare/master...bsdjhb:freebsd:spurious_EINTR
Note I haven't rebased it in a long while, and there is an earlier branch point there just to add a queue of threads. At the time the change just to switch to a queue broke one of the ptrace tests you added in a way I couldn't figure out.
I find that the stress2/ptrace5.sh test crashes almost immediately, with a failed KASSERT here, in sys_pcocess.c, line 934:
if (tid == 0) { if ((p->p_flag & P_STOPPED_TRACE) != 0) { KASSERT(p->p_xthread != NULL, ("NULL p_xthread"));
and has done so beginning with this commit:
commit ecc662c749b11434c63a0d3578fc40df6b4798ec (HEAD) Author: Konstantin Belousov <kib@FreeBSD.org> Date: Wed Apr 16 03:09:44 2025 +0300 PT_ATTACH: do not interrupt interruptible sleeps