Page MenuHomeFreeBSD

Ease the life of PT_TO_SCE/PT_TO_SCX users when debuggee sleeps in sigsuspend(2)/sig{timed,}wait(2).
ClosedPublic

Authored by kib on Mon, Nov 25, 4:31 PM.

Details

Summary

If PT_TO_SCE or PT_TO_SCX were issued after the target already entered sleep in sigsuspend(2) or sigtimedwait(2), there is no debugging events generated until a relevant signal is delivered. As result, debugger hangs waiting for an event.

Introduce spurious EINTR returned in that situations to help debugger to gain control earlier.

Requested and tested by: kevans

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

kib created this revision.Mon, Nov 25, 4:31 PM
kevans added inline comments.Mon, Nov 25, 4:32 PM
sys/kern/kern_sig.c
1330 ↗(On Diff #64852)

After looking around a bit more, should we use the PTRACE_SYSCALL spelling in these spots?

kib added inline comments.Mon, Nov 25, 4:54 PM
sys/kern/kern_sig.c
1330 ↗(On Diff #64852)

IMO it is some obfuscation, esp. for implementation side, but let use it.

kib updated this revision to Diff 64856.Mon, Nov 25, 4:54 PM
kib marked an inline comment as done.

Use PTRACE_SYSCALL.

kevans accepted this revision.Mon, Nov 25, 6:24 PM

I think this looks good to me, thanks! It does behave differently now in that truss(1) will cause us to exit these syscalls if it attaches in the middle -- but every application I can think of using them will swiftly return to sigsuspend() state once it realizes nothing productive happened.

This revision is now accepted and ready to land.Mon, Nov 25, 6:24 PM
markj added a comment.Tue, Nov 26, 4:18 PM

I think this looks good to me, thanks! It does behave differently now in that truss(1) will cause us to exit these syscalls if it attaches in the middle -- but every application I can think of using them will swiftly return to sigsuspend() state once it realizes nothing productive happened.

Sorry, I am confused. truss(1) attach should generate a SIGSTOP that interrupts the system call anyway.

Why do we interrupt only these two distinguished system calls?

sys/kern/kern_sig.c
1330 ↗(On Diff #64856)

Should it be conditional on error == 0?

I think this looks good to me, thanks! It does behave differently now in that truss(1) will cause us to exit these syscalls if it attaches in the middle -- but every application I can think of using them will swiftly return to sigsuspend() state once it realizes nothing productive happened.

Sorry, I am confused. truss(1) attach should generate a SIGSTOP that interrupts the system call anyway.

The SIGSTOP interrupts, then zsh returns to sigsuspend(2) following that. The application then goes to trace syscall entry and we're stuck in sigsuspend(2) at that point.

Why do we interrupt only these two distinguished system calls?

There are perhaps others, but sigsuspend was the first candidate because it's the most problematic one and sigwait/sigtimedwait are a natural extension to this.

markj added a comment.Tue, Nov 26, 5:39 PM

I think this looks good to me, thanks! It does behave differently now in that truss(1) will cause us to exit these syscalls if it attaches in the middle -- but every application I can think of using them will swiftly return to sigsuspend() state once it realizes nothing productive happened.

Sorry, I am confused. truss(1) attach should generate a SIGSTOP that interrupts the system call anyway.

The SIGSTOP interrupts, then zsh returns to sigsuspend(2) following that. The application then goes to trace syscall entry and we're stuck in sigsuspend(2) at that point.

Why do we interrupt only these two distinguished system calls?

There are perhaps others, but sigsuspend was the first candidate because it's the most problematic one and sigwait/sigtimedwait are a natural extension to this.

I think it's because kern_sigsuspend() and kern_sigtimedwait() call cursig() and thus may block in ptracestop() before returning to the usermode boundary.

So to be clear, the problem is that a debugger is expecting to see a system call entry following a PT_CONTINUE following the attach?

kib marked an inline comment as done.EditedTue, Nov 26, 5:43 PM

I think it's because kern_sigsuspend() and kern_sigtimedwait() call cursig() and thus may block in ptracestop() before returning to the usermode boundary.
So to be clear, the problem is that a debugger is expecting to see a system call entry following a PT_CONTINUE following the attach?

This is not quite a problem, but rather the inconvenience. In this situation debugger needs to understand that the process is waiting for a signal in a syscall that does not return until the signal is generated, and do something. It is quite involved for the debugger to infer this data, and much simpler for the kernel to fabricate a spurious EINTR instead.

Also, I believe it is consistent with behavior of other sleeping syscalls like select(2), where the situation causes spurious EINTR.

kib updated this revision to Diff 64895.Tue, Nov 26, 5:49 PM

Only inject EINTR in sigtimedwait() when error == 0.

This revision now requires review to proceed.Tue, Nov 26, 5:49 PM
In D22546#493292, @kib wrote:

I think it's because kern_sigsuspend() and kern_sigtimedwait() call cursig() and thus may block in ptracestop() before returning to the usermode boundary.
So to be clear, the problem is that a debugger is expecting to see a system call entry following a PT_CONTINUE following the attach?

This is not quite a problem, but rather the inconvenience. In this situation debugger needs to understand that the process is waiting for a signal in a syscall that does not return until the signal is generated, and do something. It is quite involved for the debugger to infer this data, and much simpler for the kernel to fabricate a spurious EINTR instead.
Also, I believe it is consistent with behavior of other sleeping syscalls like select(2), where the situation causes spurious EINTR.

I also suspect the new behavior is consistent with how Linux handles this situation, but I haven't dug too far to confirm. The application I'm working with has no problems with Linux+zsh, but it follows the same model: platform-independent layer calls into its platform-specific ptrace layer to attach (PT_ATTACH + SIGCONT + TRACEFORK, this is close enough in both Linux/FreeBSD versions), then PI layer proceeds to hook to syscall entry and call a syscall of its own in the target. The platform layer doesn't know at attach time what the caller's intent is, because it may just be doing an ATTACH/DETACH to make sure it's permissible to do so.

markj added a comment.Tue, Nov 26, 6:44 PM
In D22546#493292, @kib wrote:

Also, I believe it is consistent with behavior of other sleeping syscalls like select(2), where the situation causes spurious EINTR.

I agree it is more consistent than before, but from my reading:

  • it does not result in EINTR for a mere ptrace attach, and
  • it will cause a spurious EINTR for every invocation of the system call after PTRACE_SYSCALL is set. That is, I cannot see how the comment fragment "after userspace entered the syscall" is honoured.
kib added a comment.Tue, Nov 26, 8:14 PM
In D22546#493292, @kib wrote:

Also, I believe it is consistent with behavior of other sleeping syscalls like select(2), where the situation causes spurious EINTR.

I agree it is more consistent than before, but from my reading:

  • it does not result in EINTR for a mere ptrace attach, and

For PT_ATTACH it is not needed, I believe. There a SIGSTOP is generated and we would do all the magic inside ptracestop().

  • it will cause a spurious EINTR for every invocation of the system call after PTRACE_SYSCALL is set. That is, I cannot see how the comment fragment "after userspace entered the syscall" is honoured.

Note that the check for PTRACE_SYSCALL is only performed after the syscall code slept. In other words, we enter the syscall, check for signals, then either perform plain sleep or thread_suspend_check(), and for the new code to trigger, this sleep must be cancelled. It can happen only due to a signal delivered or due to the debugging action. Then we set error/increment has_sig, but still restart the loop and if an actual signal was delivered, we process it instead of our spurious action.

Let me reformulate the above explanation this way: I did not added more wakeups, only a terminating condition for the loop.

markj added a comment.Wed, Nov 27, 3:52 PM
In D22546#493348, @kib wrote:
In D22546#493292, @kib wrote:

Also, I believe it is consistent with behavior of other sleeping syscalls like select(2), where the situation causes spurious EINTR.

I agree it is more consistent than before, but from my reading:

  • it does not result in EINTR for a mere ptrace attach, and

For PT_ATTACH it is not needed, I believe. There a SIGSTOP is generated and we would do all the magic inside ptracestop().

Yes, but a plain PT_ATTACH/PT_CONTINUE in kern_sigsuspend() will not terminate the loop, since the SIGSTOP is consumed by the debugger, whereas a PT_ATTACH/CONTINUE while e.g., select(2) or kevent(2) is sleeping will cause EINTR to be returned to user mode due to the interrupted sleep. I suspect that this inconsistency is not very important though.

  • it will cause a spurious EINTR for every invocation of the system call after PTRACE_SYSCALL is set. That is, I cannot see how the comment fragment "after userspace entered the syscall" is honoured.

Note that the check for PTRACE_SYSCALL is only performed after the syscall code slept. In other words, we enter the syscall, check for signals, then either perform plain sleep or thread_suspend_check(), and for the new code to trigger, this sleep must be cancelled. It can happen only due to a signal delivered or due to the debugging action. Then we set error/increment has_sig, but still restart the loop and if an actual signal was delivered, we process it instead of our spurious action.
Let me reformulate the above explanation this way: I did not added more wakeups, only a terminating condition for the loop.

Ok, so kern_sigsuspend() will behave the same as before except in the case where the debugger intercepts and discards a signal. That seems ok to me.

sys/kern/kern_sig.c
1321 ↗(On Diff #64895)

If the timeout elapsed, we will clear error and return EAGAIN in the next loop iteration, but the new logic will convert it to EINTR if a debugger is attached and PTRACE_SYSCALL is set.

kib updated this revision to Diff 64942.Wed, Nov 27, 4:17 PM
kib marked an inline comment as done.

Report spurious EINTR from sigtimedwait() only if there is no other errors to return.

markj accepted this revision.Wed, Nov 27, 4:18 PM
This revision is now accepted and ready to land.Wed, Nov 27, 4:18 PM
This revision was automatically updated to reflect the committed changes.