Page MenuHomeFreeBSD

PT_ATTACH: do not interrupt interruptible sleeps
ClosedPublic

Authored by kib on Apr 6 2025, 3:08 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Oct 16, 6:24 AM
Unknown Object (File)
Thu, Oct 16, 5:06 AM
Unknown Object (File)
Thu, Oct 16, 12:15 AM
Unknown Object (File)
Thu, Sep 25, 5:55 AM
Unknown Object (File)
Wed, Sep 24, 7:33 AM
Unknown Object (File)
Sep 21 2025, 9:56 AM
Unknown Object (File)
Sep 21 2025, 2:51 AM
Unknown Object (File)
Sep 20 2025, 8:51 PM

Details

Summary

Handle the first SIGSTOP from the PT_ATTACH specially: instead of waking up some thread in the target process, execute sig_suspend_threads() in the debugger. This is equivalent to the action of the xthread, except that all threads are suspended without interruption.

Test Plan

This passes all tests in ptrace_stop.c.

Also, the test program https://gist.github.com/kostikbel/77bbc3a78a19beed144dd9ef57d0df22 works as expected: the debugged child sleeps uninterruptible for 10 secs.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib requested review of this revision.Apr 6 2025, 3:08 AM

Test program I used for debugging https://gist.github.com/kostikbel/31ba4480542349d77e503ddcb2a10fa1
I will convert this to ATF after the discussion.

sys/kern/kern_sig.c
3299

Can we call it sig_handle_first_stop()?

sys/kern/subr_sleepqueue.c
729 ↗(On Diff #153198)
732 ↗(On Diff #153198)

Propagating the aborted sleep to the userspace boundary causes spurious EINTR returns from syscalls.

733 ↗(On Diff #153198)

I guess it should be "handle signal processing here"?

734 ↗(On Diff #153198)
kib marked 5 inline comments as done.Apr 9 2025, 12:20 AM
kib added inline comments.
sys/kern/kern_sig.c
3299

I called it sig_prep_first_dbg_stop()

sys/kern/subr_sleepqueue.c
748 ↗(On Diff #153198)

This will be removed for commit, of course

kib marked an inline comment as done.

Rename function, edit comments.

I think the general idea is ok. I wonder if there are some places where we are holding a lock across interruptible sleep though.

sys/kern/subr_sleepqueue.c
678 ↗(On Diff #153376)

Please add a comment pointing the reader to the larger comment in sleepq_timedwait_sig().

Or, can we move the SIGSTOP handling into a subroutine?

static int
sleepq_check_sigstop(struct thread *td, int rvals)
{
    if ((td->td_flags & TDF_SBDRY) != 0 ...)
        return (rvals);
    if (sig_handle_sigstop(td)) {
        td->td_intrval = 0;
        sleepq_lock(wchan);
        return (0);
    }
    return (rvals);
}

and then

rvals = sleepq_check_signals();
rvals = sleepq_check_sigstop(td, rvals);
if (rvals != 0)
    return (rvals);

I think the general idea is ok. I wonder if there are some places where we are holding a lock across interruptible sleep though.

Would we do that, it causes lock cascades. It was the significant problem with the interruptible nfs mounts, which prompted the SBDRY machinery.

sleepq_handle_sigstop(()

This revision is now accepted and ready to land.Apr 11 2025, 9:33 PM

To be clear, I think this will ensure that attaching while a thread is sleeping in nanosleep() will still finish the entire sleep before returning from the system call? That will fix several failures in gdb's testsuite if so. We can also remove the BUGS entry from the truss manpage.

This is quite a bit simpler than the approach I had partially implemented.

For the purposes of the ATF test, you could maybe alter the return value of the child process to return non-zero if it sees EINTR, and maybe fetch the time before/after and fail if the sleep is too short as well with a non-zero exit code? Then in the parent you can do a wait() with a timeout long enough for the child sleep() and check the exit value.

kib planned changes to this revision.Apr 13 2025, 7:38 AM

Put the stopped thread back to the sleepq when handling SIGSTOP after the wake.
There is a problem: we do not own the sleepq lock, so there is a window where the wake signal could be ignored. I did not convinced myself that this is safe (just because we are stopped anyway). Right now I do not see a way to fix it in the current approach because when the thread is woken, it is already removed from the queue.

John, what was your alternative approach for the fix?

kib planned changes to this revision.Apr 14 2025, 12:22 PM
kib edited the summary of this revision. (Show Details)
kib edited the test plan for this revision. (Show Details)
kib retitled this revision from SIGSTOP: hide interruption of the interruptible sleeps to PT_ATTACH: do not interrupt interruptible sleeps.

I was wrong about the PT_SC_REMOTE test. There is the pre-existing bug that the child does PT_TRACE_ME and the parent does PT_ATTACH. They ops races, one of them is redundand.
Test becomes race-free and works after removal of extra PT_ATTACH.

kib edited the test plan for this revision. (Show Details)

Rename the helper. Add chicken bit for this variant as well.

In D49678#1136718, @kib wrote:

I was wrong about the PT_SC_REMOTE test. There is the pre-existing bug that the child does PT_TRACE_ME and the parent does PT_ATTACH. They ops races, one of them is redundand.

Oh hmm. I thought both were necessary. It's not entirely clear from the documentation.

sys/kern/kern_sig.c
180

Isn't this behaviour specific to SIGSTOP from ptrace()?

This revision is now accepted and ready to land.Apr 17 2025, 5:08 PM
kib marked an inline comment as done.

Update the chicken bit sysctl name and description.

This revision now requires review to proceed.Apr 17 2025, 6:18 PM
In D49678#1135845, @kib wrote:

Put the stopped thread back to the sleepq when handling SIGSTOP after the wake.
There is a problem: we do not own the sleepq lock, so there is a window where the wake signal could be ignored. I did not convinced myself that this is safe (just because we are stopped anyway). Right now I do not see a way to fix it in the current approach because when the thread is woken, it is already removed from the queue.

John, what was your alternative approach for the fix?

Oh, it was tied up in a bigger batch of changes I had that I might still pursue to add support for non-stop debugging where only individual threads would stop instead of the entire process.

https://github.com/bsdjhb/freebsd/compare/master...bsdjhb:freebsd:spurious_EINTR

Note I haven't rebased it in a long while, and there is an earlier branch point there just to add a queue of threads. At the time the change just to switch to a queue broke one of the ptrace tests you added in a way I couldn't figure out.

I find that the stress2/ptrace5.sh test crashes almost immediately, with a failed KASSERT here, in sys_pcocess.c, line 934:

	if (tid == 0) {
		if ((p->p_flag & P_STOPPED_TRACE) != 0) {
			KASSERT(p->p_xthread != NULL, ("NULL p_xthread"));

and has done so beginning with this commit:

commit ecc662c749b11434c63a0d3578fc40df6b4798ec (HEAD)
Author: Konstantin Belousov <kib@FreeBSD.org>
Date:   Wed Apr 16 03:09:44 2025 +0300

    PT_ATTACH: do not interrupt interruptible sleeps

I find that the stress2/ptrace5.sh test crashes almost immediately, with a failed KASSERT here, in sys_pcocess.c, line 934:

	if (tid == 0) {
		if ((p->p_flag & P_STOPPED_TRACE) != 0) {
			KASSERT(p->p_xthread != NULL, ("NULL p_xthread"));

and has done so beginning with this commit:

commit ecc662c749b11434c63a0d3578fc40df6b4798ec (HEAD)
Author: Konstantin Belousov <kib@FreeBSD.org>
Date:   Wed Apr 16 03:09:44 2025 +0300

    PT_ATTACH: do not interrupt interruptible sleeps

The fix was just committed. See D49961