Page MenuHomeFreeBSD

Ignore debugger-injected signals left after detaching
ClosedPublic

Authored by kib on Jan 8 2022, 9:56 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mar 6 2024, 12:37 PM
Unknown Object (File)
Jan 25 2024, 8:04 AM
Unknown Object (File)
Jan 24 2024, 10:55 AM
Unknown Object (File)
Jan 13 2024, 12:51 AM
Unknown Object (File)
Jan 12 2024, 4:20 PM
Unknown Object (File)
Dec 23 2023, 3:25 PM
Unknown Object (File)
Dec 20 2023, 6:15 AM
Unknown Object (File)
Dec 16 2023, 9:20 AM
Subscribers

Details

Summary

PR: 261010
Reported by: Martin Simmons <martin@lispworks.com>

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib requested review of this revision.Jan 8 2022, 9:56 AM

So the problem is: if a PTRACE_SCE event (triggered by SIGINT) raises SIGTRAP, one of the target threads sets td->td_xsig to SIGTRAP and suspends, and after PT_DETACH the SIGTRAP is delivered to the target. And if truss is able to detach before the target calls ptracestop(), then there is no problem.

So certainly this change will fix the problem, but can't it cause legitimate signals to be lost?

So the problem is: if a PTRACE_SCE event (triggered by SIGINT) raises SIGTRAP,

Almost all events from ptracestop() are delivered as SIGTRAP, SCE is not specific. It can be SCX.

one of the target threads sets td->td_xsig to SIGTRAP and suspends, and after PT_DETACH the SIGTRAP is delivered to the target. And if truss is able to detach before the target calls ptracestop(), then there is no problem.

So certainly this change will fix the problem, but can't it cause legitimate signals to be lost?

My take is that anything arriving into td_xsig, was put there by ptrace/ptracestop action. As such, it must be cleared when tracing is stopped.

You mean that the following scenario is possible: external signal is arriving to the debuggee, it is catched by debugger, and forwarded to the debuggee e.g. by ptrace(PT_CONTINUE, orig_signal). Then orig_signal ends up in td_xsig and we could clear it, while it is really desirable not to. This is what you consider problematic, am I right?

I do not see how could kernel distinguish desirable vs. not desirable in this case. The signal was pushed by ptrace, so it should be cleared if debugger is not careful enough to ensure that detach is only done after all its desired actions are completed. I do agree that it is hard (or even outright impossible) for debugger to know that the signal was indeed delivered before PT_DETACH, but IMO not leaking past PT_DETACH is more important for correctness.

In D33787#764429, @kib wrote:

You mean that the following scenario is possible: external signal is arriving to the debuggee, it is catched by debugger, and forwarded to the debuggee e.g. by ptrace(PT_CONTINUE, orig_signal). Then orig_signal ends up in td_xsig and we could clear it, while it is really desirable not to. This is what you consider problematic, am I right?

Hmm, not quite. PT_DETACH requires the debuggee to be stopped, so the debugger must send SIGSTOP somehow. Suppose it does this, and an external signal arrives concurrently. Then, I believe it is possible for multiple threads to be stopped in ptracestop() with different td_xsig values, one for SIGSTOP and one for orig_signal. With this patch, PT_DETACH can "cancel" both signals, no?

Even if the debugger is careful to verify that the debuggee received SIGSTOP (truss does not even check this, see detach_proc()), PT_DETACH may still clear orig_signal. I wonder if it'd be sufficient to clear td->td_xsig only if td->td_xsig == SIGTRAP || td->td_proc->p_xthread == td.

I do not see how could kernel distinguish desirable vs. not desirable in this case. The signal was pushed by ptrace, so it should be cleared if debugger is not careful enough to ensure that detach is only done after all its desired actions are completed. I do agree that it is hard (or even outright impossible) for debugger to know that the signal was indeed delivered before PT_DETACH, but IMO not leaking past PT_DETACH is more important for correctness.

Ok. I agree that the PT_DETACH bug is significant, so I don't really object to the change. OTOH we've had bugs in the past where ptrace would swallow signals (I remember debugging one that motivated the addition of the PTRACE_FSTP machinery) and that kind of problem is quite painful to isolate.

In D33787#764429, @kib wrote:

You mean that the following scenario is possible: external signal is arriving to the debuggee, it is catched by debugger, and forwarded to the debuggee e.g. by ptrace(PT_CONTINUE, orig_signal). Then orig_signal ends up in td_xsig and we could clear it, while it is really desirable not to. This is what you consider problematic, am I right?

Hmm, not quite. PT_DETACH requires the debuggee to be stopped, so the debugger must send SIGSTOP somehow. Suppose it does this, and an external signal arrives concurrently. Then, I believe it is possible for multiple threads to be stopped in ptracestop() with different td_xsig values, one for SIGSTOP and one for orig_signal. With this patch, PT_DETACH can "cancel" both signals, no?

But in this case, the external signal is intercepted by ptracestop(), with the purpose of reporting it to debugger. At this moment the signal is no longer queued for the debuggee [*] and it is up to debugger to care about it. If PT_DETACH comes while we have this external signal coming, this is debugger problem that I mentioned in the second paragraph of my previous response ('impossible to handle').

  • And this cause siginfo_t for the signal to be lost, which is one of the shortcomings of the ptrace interface.

Even if the debugger is careful to verify that the debuggee received SIGSTOP (truss does not even check this, see detach_proc()), PT_DETACH may still clear orig_signal. I wonder if it'd be sufficient to clear td->td_xsig only if td->td_xsig == SIGTRAP || td->td_proc->p_xthread == td.

But this is not seemingly enough, exactly because other threads' td_xsigs would be acted on unsuspend. Should we also add SIGSTOP to the list of cleared signals? And of course we really do not know where the signals come from, so we might clear either real debugger traces, or something that was sent externally.

I do not see how could kernel distinguish desirable vs. not desirable in this case. The signal was pushed by ptrace, so it should be cleared if debugger is not careful enough to ensure that detach is only done after all its desired actions are completed. I do agree that it is hard (or even outright impossible) for debugger to know that the signal was indeed delivered before PT_DETACH, but IMO not leaking past PT_DETACH is more important for correctness.

Ok. I agree that the PT_DETACH bug is significant, so I don't really object to the change. OTOH we've had bugs in the past where ptrace would swallow signals (I remember debugging one that motivated the addition of the PTRACE_FSTP machinery) and that kind of problem is quite painful to isolate.

I suspect the best we can do right now is to explain the abortive behavior of PT_DETACH and possible lost of signals.

Ideal solution would mean passing ksiginfo both to and from debuggers, and using the origin info from ksiginfo to determine what to do with the signal.

In D33787#764682, @kib wrote:
In D33787#764429, @kib wrote:

You mean that the following scenario is possible: external signal is arriving to the debuggee, it is catched by debugger, and forwarded to the debuggee e.g. by ptrace(PT_CONTINUE, orig_signal). Then orig_signal ends up in td_xsig and we could clear it, while it is really desirable not to. This is what you consider problematic, am I right?

Hmm, not quite. PT_DETACH requires the debuggee to be stopped, so the debugger must send SIGSTOP somehow. Suppose it does this, and an external signal arrives concurrently. Then, I believe it is possible for multiple threads to be stopped in ptracestop() with different td_xsig values, one for SIGSTOP and one for orig_signal. With this patch, PT_DETACH can "cancel" both signals, no?

But in this case, the external signal is intercepted by ptracestop(), with the purpose of reporting it to debugger. At this moment the signal is no longer queued for the debuggee [*] and it is up to debugger to care about it. If PT_DETACH comes while we have this external signal coming, this is debugger problem that I mentioned in the second paragraph of my previous response ('impossible to handle').

  • And this cause siginfo_t for the signal to be lost, which is one of the shortcomings of the ptrace interface.

Sorry, but I'm missing something. Suppose sigprocess() calls ptracestop() with the external signal, and suspends (due to a pending SIGSTOP raised by the debugger). Then the debugger detaches. ptracestop() returns, and, if td_xsig is not cleared, sigprocess() will observe that P_TRACED is clear and will re-queue the external signal with the original ksiginfo_t. Why is it up to the debugger to care about it?

Even if the debugger is careful to verify that the debuggee received SIGSTOP (truss does not even check this, see detach_proc()), PT_DETACH may still clear orig_signal. I wonder if it'd be sufficient to clear td->td_xsig only if td->td_xsig == SIGTRAP || td->td_proc->p_xthread == td.

But this is not seemingly enough, exactly because other threads' td_xsigs would be acted on unsuspend.

But that's what I expect: PT_DETACH should clear td_xsig in p_xthread (the thread that reported the stop signal to the debugger) and it should clear signals that might have been injected by ptrace (SIGTRAP, though of course this could also be raised externally.

Should we also add SIGSTOP to the list of cleared signals?

If the debugger checks WSTOPSIG() == SIGSTOP, then we wouldn't need to explicitly clear SIGSTOP.

And of course we really do not know where the signals come from, so we might clear either real debugger traces, or something that was sent externally.

It is certainly not perfect.

I do not see how could kernel distinguish desirable vs. not desirable in this case. The signal was pushed by ptrace, so it should be cleared if debugger is not careful enough to ensure that detach is only done after all its desired actions are completed. I do agree that it is hard (or even outright impossible) for debugger to know that the signal was indeed delivered before PT_DETACH, but IMO not leaking past PT_DETACH is more important for correctness.

Ok. I agree that the PT_DETACH bug is significant, so I don't really object to the change. OTOH we've had bugs in the past where ptrace would swallow signals (I remember debugging one that motivated the addition of the PTRACE_FSTP machinery) and that kind of problem is quite painful to isolate.

I suspect the best we can do right now is to explain the abortive behavior of PT_DETACH and possible lost of signals.

Ideal solution would mean passing ksiginfo both to and from debuggers, and using the origin info from ksiginfo to determine what to do with the signal.

kib retitled this revision from ptrace(PT_DETACH): clear thread-local signals left after detaching to Ignore debugger-injected signals left after detaching.

We already mark debugger-injected signals, so use it to filter them when the process is no longer traced.

In fact, I see the race that matches what you described: truss() does kill(STOP); wait(); ptrace(PT_DETACH); kill(CONT); when detaching. And in mt process it is quite possible that wait() fetches an event from some thread instead of the stop due to our signal. Then again, it is possible that SIGCONT is ignored because STOP was not yet processing. As result, the process is stopped after truss is detached.

In D33787#764908, @kib wrote:

In fact, I see the race that matches what you described: truss() does kill(STOP); wait(); ptrace(PT_DETACH); kill(CONT); when detaching. And in mt process it is quite possible that wait() fetches an event from some thread instead of the stop due to our signal. Then again, it is possible that SIGCONT is ignored because STOP was not yet processing. As result, the process is stopped after truss is detached.

Right, exactly. gcore's elf_detach() is a bit more careful. Do you plan to fix truss next?

This revision is now accepted and ready to land.Jan 11 2022, 2:38 PM