Details

Reviewers

jhb
kevans
markj

Commits

rS355146: Requested and tested by: kevans

Summary

If PT_TO_SCE or PT_TO_SCX were issued after the target already entered sleep in sigsuspend(2) or sigtimedwait(2), there is no debugging events generated until a relevant signal is delivered. As result, debugger hangs waiting for an event.

Introduce spurious EINTR returned in that situations to help debugger to gain control earlier.

Requested and tested by: kevans

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kib created this revision.Nov 25 2019, 4:31 PM

Herald added a subscriber: imp. · View Herald TranscriptNov 25 2019, 4:31 PM

Harbormaster completed remote builds in B27759: Diff 64852.Nov 25 2019, 4:31 PM

kevans added inline comments.Nov 25 2019, 4:32 PM

sys/kern/kern_sig.c
1330 ↗	(On Diff #64852)	After looking around a bit more, should we use the PTRACE_SYSCALL spelling in these spots?

kib added inline comments.Nov 25 2019, 4:54 PM

sys/kern/kern_sig.c
1330 ↗	(On Diff #64852)	IMO it is some obfuscation, esp. for implementation side, but let use it.

Use PTRACE_SYSCALL.

Harbormaster completed remote builds in B27761: Diff 64856.Nov 25 2019, 4:55 PM

I think this looks good to me, thanks! It does behave differently now in that truss(1) will cause us to exit these syscalls if it attaches in the middle -- but every application I can think of using them will swiftly return to sigsuspend() state once it realizes nothing productive happened.

This revision is now accepted and ready to land.Nov 25 2019, 6:24 PM

In D22546#493033, @kevans wrote:

I think this looks good to me, thanks! It does behave differently now in that truss(1) will cause us to exit these syscalls if it attaches in the middle -- but every application I can think of using them will swiftly return to sigsuspend() state once it realizes nothing productive happened.

Sorry, I am confused. truss(1) attach should generate a SIGSTOP that interrupts the system call anyway.

Why do we interrupt only these two distinguished system calls?

sys/kern/kern_sig.c
1330 ↗	(On Diff #64856)	Should it be conditional on `error == 0`?

In D22546#493273, @markj wrote:

In D22546#493033, @kevans wrote:

I think this looks good to me, thanks! It does behave differently now in that truss(1) will cause us to exit these syscalls if it attaches in the middle -- but every application I can think of using them will swiftly return to sigsuspend() state once it realizes nothing productive happened.

Sorry, I am confused. truss(1) attach should generate a SIGSTOP that interrupts the system call anyway.

The SIGSTOP interrupts, then zsh returns to sigsuspend(2) following that. The application then goes to trace syscall entry and we're stuck in sigsuspend(2) at that point.

Why do we interrupt only these two distinguished system calls?

There are perhaps others, but sigsuspend was the first candidate because it's the most problematic one and sigwait/sigtimedwait are a natural extension to this.

In D22546#493275, @kevans wrote:

In D22546#493273, @markj wrote:

In D22546#493033, @kevans wrote:

I think this looks good to me, thanks! It does behave differently now in that truss(1) will cause us to exit these syscalls if it attaches in the middle -- but every application I can think of using them will swiftly return to sigsuspend() state once it realizes nothing productive happened.

Sorry, I am confused. truss(1) attach should generate a SIGSTOP that interrupts the system call anyway.

The SIGSTOP interrupts, then zsh returns to sigsuspend(2) following that. The application then goes to trace syscall entry and we're stuck in sigsuspend(2) at that point.

Why do we interrupt only these two distinguished system calls?

There are perhaps others, but sigsuspend was the first candidate because it's the most problematic one and sigwait/sigtimedwait are a natural extension to this.

I think it's because kern_sigsuspend() and kern_sigtimedwait() call cursig() and thus may block in ptracestop() before returning to the usermode boundary.

So to be clear, the problem is that a debugger is expecting to see a system call entry following a PT_CONTINUE following the attach?

In D22546#493291, @markj wrote:

I think it's because kern_sigsuspend() and kern_sigtimedwait() call cursig() and thus may block in ptracestop() before returning to the usermode boundary.

So to be clear, the problem is that a debugger is expecting to see a system call entry following a PT_CONTINUE following the attach?

This is not quite a problem, but rather the inconvenience. In this situation debugger needs to understand that the process is waiting for a signal in a syscall that does not return until the signal is generated, and do something. It is quite involved for the debugger to infer this data, and much simpler for the kernel to fabricate a spurious EINTR instead.

Also, I believe it is consistent with behavior of other sleeping syscalls like select(2), where the situation causes spurious EINTR.

Only inject EINTR in sigtimedwait() when error == 0.

This revision now requires review to proceed.Nov 26 2019, 5:49 PM

Harbormaster completed remote builds in B27782: Diff 64895.Nov 26 2019, 5:49 PM

In D22546#493292, @kib wrote:

In D22546#493291, @markj wrote:

I think it's because kern_sigsuspend() and kern_sigtimedwait() call cursig() and thus may block in ptracestop() before returning to the usermode boundary.

So to be clear, the problem is that a debugger is expecting to see a system call entry following a PT_CONTINUE following the attach?

This is not quite a problem, but rather the inconvenience. In this situation debugger needs to understand that the process is waiting for a signal in a syscall that does not return until the signal is generated, and do something. It is quite involved for the debugger to infer this data, and much simpler for the kernel to fabricate a spurious EINTR instead.

Also, I believe it is consistent with behavior of other sleeping syscalls like select(2), where the situation causes spurious EINTR.

I also suspect the new behavior is consistent with how Linux handles this situation, but I haven't dug too far to confirm. The application I'm working with has no problems with Linux+zsh, but it follows the same model: platform-independent layer calls into its platform-specific ptrace layer to attach (PT_ATTACH + SIGCONT + TRACEFORK, this is close enough in both Linux/FreeBSD versions), then PI layer proceeds to hook to syscall entry and call a syscall of its own in the target. The platform layer doesn't know at attach time what the caller's intent is, because it may just be doing an ATTACH/DETACH to make sure it's permissible to do so.

In D22546#493292, @kib wrote:

Also, I believe it is consistent with behavior of other sleeping syscalls like select(2), where the situation causes spurious EINTR.

I agree it is more consistent than before, but from my reading:

it does not result in EINTR for a mere ptrace attach, and
it will cause a spurious EINTR for every invocation of the system call after PTRACE_SYSCALL is set. That is, I cannot see how the comment fragment "after userspace entered the syscall" is honoured.

In D22546#493318, @markj wrote:

In D22546#493292, @kib wrote:

Also, I believe it is consistent with behavior of other sleeping syscalls like select(2), where the situation causes spurious EINTR.

I agree it is more consistent than before, but from my reading:

it does not result in EINTR for a mere ptrace attach, and

For PT_ATTACH it is not needed, I believe. There a SIGSTOP is generated and we would do all the magic inside ptracestop().

it will cause a spurious EINTR for every invocation of the system call after PTRACE_SYSCALL is set. That is, I cannot see how the comment fragment "after userspace entered the syscall" is honoured.

Note that the check for PTRACE_SYSCALL is only performed after the syscall code slept. In other words, we enter the syscall, check for signals, then either perform plain sleep or thread_suspend_check(), and for the new code to trigger, this sleep must be cancelled. It can happen only due to a signal delivered or due to the debugging action. Then we set error/increment has_sig, but still restart the loop and if an actual signal was delivered, we process it instead of our spurious action.

Let me reformulate the above explanation this way: I did not added more wakeups, only a terminating condition for the loop.

In D22546#493348, @kib wrote:

In D22546#493318, @markj wrote:

In D22546#493292, @kib wrote:

Also, I believe it is consistent with behavior of other sleeping syscalls like select(2), where the situation causes spurious EINTR.

I agree it is more consistent than before, but from my reading:

it does not result in EINTR for a mere ptrace attach, and

For PT_ATTACH it is not needed, I believe. There a SIGSTOP is generated and we would do all the magic inside ptracestop().

Yes, but a plain PT_ATTACH/PT_CONTINUE in kern_sigsuspend() will not terminate the loop, since the SIGSTOP is consumed by the debugger, whereas a PT_ATTACH/CONTINUE while e.g., select(2) or kevent(2) is sleeping will cause EINTR to be returned to user mode due to the interrupted sleep. I suspect that this inconsistency is not very important though.

it will cause a spurious EINTR for every invocation of the system call after PTRACE_SYSCALL is set. That is, I cannot see how the comment fragment "after userspace entered the syscall" is honoured.

Note that the check for PTRACE_SYSCALL is only performed after the syscall code slept. In other words, we enter the syscall, check for signals, then either perform plain sleep or thread_suspend_check(), and for the new code to trigger, this sleep must be cancelled. It can happen only due to a signal delivered or due to the debugging action. Then we set error/increment has_sig, but still restart the loop and if an actual signal was delivered, we process it instead of our spurious action.

Let me reformulate the above explanation this way: I did not added more wakeups, only a terminating condition for the loop.

Ok, so kern_sigsuspend() will behave the same as before except in the case where the debugger intercepts and discards a signal. That seems ok to me.

sys/kern/kern_sig.c
1321 ↗	(On Diff #64895)	If the timeout elapsed, we will clear `error` and return EAGAIN in the next loop iteration, but the new logic will convert it to EINTR if a debugger is attached and PTRACE_SYSCALL is set.

Report spurious EINTR from sigtimedwait() only if there is no other errors to return.

Harbormaster completed remote builds in B27804: Diff 64942.Nov 27 2019, 4:17 PM

markj accepted this revision.Nov 27 2019, 4:18 PM

This revision is now accepted and ready to land.Nov 27 2019, 4:18 PM

Closed by commit rS355146: Requested and tested by: kevans (authored by kib). · Explain WhyNov 27 2019, 8:34 PM

This revision was automatically updated to reflect the committed changes.

kib added a commit: rS355146: Requested and tested by: kevans.

Ease the life of PT_TO_SCE/PT_TO_SCX users when debuggee sleeps in sigsuspend(2)/sig{timed,}wait(2).
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 64975

head/sys/kern/kern_sig.c

Ease the life of PT_TO_SCE/PT_TO_SCX users when debuggee sleeps in sigsuspend(2)/sig{timed,}wait(2).ClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 64975

head/sys/kern/kern_sig.c

Ease the life of PT_TO_SCE/PT_TO_SCX users when debuggee sleeps in sigsuspend(2)/sig{timed,}wait(2).
ClosedPublic
Actions

Revision Contents
Changeset List