PROC_REAP_KILL must guarantee that all reaper descendants are signalled. In particular, it must ensure that forked but not yet fully linked descendants cannot escape killing. Currently, proc_reap() fullfils the guarantee by single-threading stopping the target process, which moves the target to the userspace boundary, so the target cannot fork while the signal is sent. Single-threading has undesirable effect of sometimes terminating sleeps with EINTR. Since the time that the bug with PROC_REAP_KILL was fixed, we grow the pg_killsx mechanism that is similarly used by the process group signalling to ensure that no member of the process group escapes. Reuse pg_killsx for PROC_REAP_KILL as well. Besides the functional change of no longer causing spurious EINTR, not single-threading the target means that we no longer need to delegate the work to the taskqueue. PR: 290844 reap_kill_subtree_once(): reap_kill_proc_work() might drop proctree_lock Due to this, restart the iteration over the p_reapsiblings if the lock was dropped.
Details
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
Simplify more: no need to single-thread itself, pg_killsx should be enough there as well.
I have tested both versions of the patch and am no longer able to provoke a spurious EINTR.
Without the patch, I could provoke one almost instantly by just holding down Ctrl-T while running a program that does nothing but wait for input on stdin:
% time timeout 60 ./r load: 72.65 cmd: r 90089 [ttyin] 0.26r 0.00u 0.00s 0% 920k load: 72.65 cmd: r 90089 [ttyin] 0.75r 0.00u 0.00s 0% 920k load: 72.65 cmd: r 90089 [ttyin] 0.77r 0.00u 0.00s 0% 920k load: 72.65 cmd: timeout 90088 [runnable] 0.77r 0.00u 0.00s 0% 2412k load: 72.65 cmd: r 90089 [ttyin] 0.79r 0.00u 0.00s 0% 920k load: 72.65 cmd: timeout 90088 [runnable] 0.79r 0.00u 0.00s 0% 2412k r: stdin: Interrupted system call timeout 60 ./r 0.00s user 0.00s system 0% cpu 0.794 total
pg_killsx is before proctree_lock, handle this.
Add a fix for a pre-existing problem of unsafe iteration over p_reapsiblings list.
| sys/kern/kern_procctl.c | ||
|---|---|---|
| 262–263 | Why is it called error1 instead of error? | |
| 271 | The procctl manual says, "The rk_fpid field is set to the pid of the first process for which signal delivery failed, e.g., due to permission problems", but here we only set it upon ESRCH. Shouldn't it be set for any error from cr_cansignal()? | |
| 283 | Can we call it reap_kill_proc() instead? The _work suffix made more sense when this was a deferred task. | |
| 295 | If I understand correctly, we check for P2_WEXIT because it's not possible to single-thread an exiting process. But now that we're not single-threading the target anymore, does it matter if P2_WEXIT is set? | |
| sys/kern/kern_procctl.c | ||
|---|---|---|
| 271 | I believe it is intentional, For instance, the visibility of the process from other uid, or another jail, would be compromised if we returned an error from cr_cansignal (which is different from ESRCH in these cases). | |
| 295 | It gives a nice optimization of not waiting for the exiting process to single-thread, and avoids taking the process group killpg lock. Also it potentially reduces the scope for the proctree_lock. I left it there because of that. | |
| sys/kern/kern_procctl.c | ||
|---|---|---|
| 295 | The assertions in reap_kill_proc() and reap_kill_proc_locked() make it look like this is required for correctness, but if it's just an optimization then IMO the assertions should be removed and comment should explain why we check for P2_WEXIT. | |
| sys/kern/kern_procctl.c | ||
|---|---|---|
| 271 | I misread before, here we are testing *w->error == ESRCH, not error == ESRCH. *w->error is initialized to ESRCH in reap_kill(), so here we are indeed returning the PID of the first process that is not visible. However, as soon as some process is successfully signaled, we set *w->error = 0, so we report rk_fpid only if no process was successfully signaled so far. | |
| 298 | ||
| sys/kern/kern_procctl.c | ||
|---|---|---|
| 271 | I think that the reason I mentioned in the (already dropped) comment is worth considering. | |
| sys/kern/kern_procctl.c | ||
|---|---|---|
| 271 | So, cr_bsd_visible() returns ESRCH if the target process is not visible. That way, plain kill(<random PID>, 0) cannot be used to distinguish between unused and "invisible" PIDs. Here, is it sufficient to ignore ESRCH from cr_cansignal()? That is, only update rk_fpid if 1) no error was reported already, i.e., *w->error == ESRCH, and 2) error != ESRCH. | |