Page MenuHomeFreeBSD

Do not leak attaching SIGSTOP after after PT_DETACH.
ClosedPublic

Authored by kib on Jul 20 2016, 5:09 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Mar 28, 1:30 AM
Unknown Object (File)
Feb 28 2024, 4:36 AM
Unknown Object (File)
Jan 18 2024, 3:44 PM
Unknown Object (File)
Jan 8 2024, 9:38 PM
Unknown Object (File)
Jan 8 2024, 9:38 PM
Unknown Object (File)
Jan 8 2024, 9:38 PM
Unknown Object (File)
Jan 8 2024, 9:37 PM
Unknown Object (File)
Jan 8 2024, 9:37 PM
Subscribers

Details

Summary

When a debugger attaches to the process, SIGSTOP is sent to the target. Due to way issignal() selects the next signal to deliver and report, if the simultaneous or already pending another signal exists, that signal will be reported by next waitpid(2). This causes minor annoyance for debuggers, which must be prepared to take any signal as the first event, then filter SIGSTOP later.

More importantly, for tools like gcore(1), which attach and then detach without processing events, SIGSTOP might leak to be delivered after PT_DETACH. AFAIK, this was the situation which bitten Mark.

The solution is to force SIGSTOP to be the first signal reported after the attach. Attach code sets P2_PTRACE_FSTP to indicate that the attaching ritual was not yet finished, and issignal() prefers SIGSTOP in that condition. Also, the thread which handles P2_PTRACE_FSTP is made to guarantee to own p_xthread during the first waitpid(2). All that ensures that SIGSTOP is consumed first.

Additionally, if P2_PTRACE_FSTP is still set on detach, which means that waitpid(2) was not called at all, SIGSTOP is removed from the queue, ensuring that the process is resumed on detach.

In issignal(), when acting on STOPing signals, remove the signal from queue before suspending. Otherwise parallel attach could
result in ptracestop() acting on that STOP as if it was the STOP signal from the attach. Then SIGSTOP from attach leaks again.

As a minor refactoring, some bits of the common attach code is moved to proc_set_traced().

Test Plan

Two tests by Mark, one for single-threaded, another for multi-threaded case, were used during patch development.

The tests/sys/kern/ptrace_test.c works.

Peter Holm tested the patch with stress2, including the Mark' tests.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib retitled this revision from to Do not leak attaching SIGSTOP after after PT_DETACH..
kib updated this object.
kib edited the test plan for this revision. (Show Details)
kib added reviewers: jhb, markj, emaste.
kib set the repository for this revision to rS FreeBSD src repository - subversion.
kib added a subscriber: pho.
markj edited edge metadata.

Should the loop in exit1() that clears TDB_SUSPEND also clear TDB_XSIG? I don't think the behaviour changes either way, since exit1() by the parent also clears the child's P_TRACED before sending SIGKILL and that's sufficient to break out of the ptracestop() loop, but I don't see any reason to be different from the PT_DETACH handler.

Otherwise this makes sense to me - thank you!

I still don't understand why stopevent() sets p_xthread to NULL, but aside from that function, it seems that proc_next_xthread() isn't needed. Is there any major consumer of the procfs interface?

sys/kern/kern_sig.c
2529 ↗(On Diff #18569)

Perhaps there should be a comment here to replace the one that was deleted?

This revision is now accepted and ready to land.Jul 20 2016, 7:11 AM
kib marked an inline comment as done.Jul 20 2016, 7:48 AM
In D7256#150891, @markj wrote:

Should the loop in exit1() that clears TDB_SUSPEND also clear TDB_XSIG? I don't think the behaviour changes either way, since exit1() by the parent also clears the child's P_TRACED before sending SIGKILL and that's sufficient to break out of the ptracestop() loop, but I don't see any reason to be different from the PT_DETACH handler.

Indeed, I did not noticed that during testing due to SIGKILL. I added the cleaning TDB_XSIG there too, thanks.

I still don't understand why stopevent() sets p_xthread to NULL, but aside from that function, it seems that proc_next_xthread() isn't needed. Is there any major consumer of the procfs interface?

I am not aware of any consumers of procfs, except jdk, also I do not think that jdk uses the stopevents part of the interface.

Note that I dropped proc_next_xthread() for now, making it working is somewhat not trivial and not needed for the issue at hands. I took the ptracestop() bits from that patch in the simplified form. I may revisit this if needed.

kib edited edge metadata.

Handle Mark' input.

Clear TDB_XSIG on the debugger exit. Re-add removed comment, reworded (corrections to the language are very welcome).

This revision now requires review to proceed.Jul 20 2016, 7:49 AM
kib edited edge metadata.

For consistency, also clear P2_PTRACE_FSTP and TDB_FSTP before SIGKILL. Should be nop.

sys/kern/kern_sig.c
2914 ↗(On Diff #18585)

extraneous ;?

sys/kern/kern_sig.c
2914 ↗(On Diff #18585)

A label in C is part of the statement, perhaps empty. Usually you can get away without ; since there is usually a statement following, e.g. return();. In this case, the label preceeds the closing brace for a block, so compiler would just barf at syntax error without an empty statement.

sys/kern/kern_sig.c
2914 ↗(On Diff #18585)

Indeed, somehow I missed that it immediately preceded the closing brace.

sys/kern/kern_sig.c
2538 ↗(On Diff #18585)
  • In the first sentence, "to" should be removed.
  • The second comma in the second sentence should be removed. Also, "decided" implies that the thread has control over this, but the recipient of SIGSTOP is determined by the tracer. "which was decided" seems more accurate to me, since the decision is made by a different thread.
  • In the third sentence, "avoid overwriting another thread's assignment to p_xthread"
  • In the fourth sentence, maybe "If another thread has already set p_xthread, the current thread will get a chance to report itself upon the next iteration."
sys/kern/kern_sig.c
2538 ↗(On Diff #18585)

Thank you.

kib edited edge metadata.

Mark' fixes to my language.

markj edited edge metadata.
This revision is now accepted and ready to land.Jul 20 2016, 4:23 PM

I see a bug I need to fix in ptrace(2) in that I claimed PL_FLAG_CHILD was posted as part of SIGTRAP instead of SIGSTOP (in general I claimed all the "extra" events were SIGTRAP events. I think the rest of them are (e.g. SCX/SCE are otherwise always SIGTRAP).

sys/kern/kern_fork.c
1077 ↗(On Diff #18597)

Technically this doesn't need the FSTP stuff since the child process is always single-threaded so the race cannot occur. However, this doesn't hurt.

sys/kern/sys_process.c
1100 ↗(On Diff #18597)

Should this be testing TDB_FSTP instead? Can there be a pending SIGSTOP in a thread's td_sigqueue that isn't due to ptrace? (I guess the real question is if you can have multiple SIGSTOP's pending, each for a different thread? If so, I wonder if you can "lose" a SIGSTOP that was queued to a thread due to !ptrace?)

sys/kern/kern_fork.c
1077 ↗(On Diff #18597)

The race can occur with single-threaded process as well, Mark provided the test program. The race there would be not among threads, but among signals for selection in issignal(). If there is numerically lower than SIGSTOP signal pending, it would selected before SIGSTOP, and reported first.

sys/kern/sys_process.c
1100 ↗(On Diff #18597)

No, I do not think that TDP_FSTP check would be correct there.

First, SIGSTOP (and SIGKILL) are never queued as siginfo, they are exceptional signals only set in the sigqueue bitmask. See the very beginning of the sigqueue_add().

Second, kern_psignal() pass td == NULL to tdsendsignal(), so tdsendsignal() selects process sigqueue.

Third, note that I only remove SIGSTOP from the process sigqueue (bitmask).

If there is a non-consumed SIGSTOP due to PT_ATTACH, P2_PTRACE_FSTP flag must be set, and the removal only from the process sigqueue is (almost) correct. If any SIGSTOP was queued to a thread and not to the process, then this sigqueue_delete() would not affect it.

I said 'almost' above, because there is a situation with two simultaneous SIGSTOPs, one from PT_ATTACH, another from kill(2), which would collapsed in the single bit in sigqueue sq_kill mask. Then the removal prevents a stop. This is the only situation where I see an issue. I think it can be hand-waved around by saying that kill(SIGSTOP) raced and was queued before PT_ATTACH.

sys/kern/sys_process.c
1100 ↗(On Diff #18597)

The only thing I'm somewhat confused on then is the statement:

"Third, note that I only remove SIGSTOP from the process sigqueue (bitmask)."

Yet, you are calling 'sigqueue_delete(&td3->td_sigqueue, SIGSTOP)' on each thread in this loop? Is the sigqueue_delete a nop? Or are you saying that the only possible per-thread SIGSTOP's are those from a ptracestop()?

kib marked an inline comment as done.Jul 27 2016, 4:59 PM
kib added inline comments.
sys/kern/sys_process.c
1100 ↗(On Diff #18597)

Sure, it is me being stupid.

kib edited edge metadata.
kib marked an inline comment as done.

Fix the bug pointed by jhb. Do not dequeue SIGSTOP from the thread queue if TDB_FSTP is not set.

Arguably, PT_DETACH cannot catch the thread in this state for the current code, since the process mutex is not dropped in issignal() between selection of the attach leader thread and clearing of the flag in ptracestop(), but this is not the feature I want to enforce.

This revision now requires review to proceed.Jul 27 2016, 5:03 PM
jhb edited edge metadata.
This revision is now accepted and ready to land.Jul 27 2016, 5:56 PM
markj edited edge metadata.
This revision was automatically updated to reflect the committed changes.