Page MenuHomeFreeBSD

kern_exit.c: do not leak reaper bitmap busy bits
ClosedPublic

Authored by kib on Wed, Oct 1, 5:42 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Oct 12, 8:20 PM
Unknown Object (File)
Sun, Oct 12, 8:20 PM
Unknown Object (File)
Sun, Oct 12, 8:20 PM
Unknown Object (File)
Sun, Oct 12, 8:20 PM
Unknown Object (File)
Sun, Oct 12, 8:44 AM
Unknown Object (File)
Wed, Oct 1, 9:52 PM
Unknown Object (File)
Wed, Oct 1, 6:26 PM
Unknown Object (File)
Wed, Oct 1, 2:35 PM
Subscribers

Details

Summary
When re-assigning p_reapsubtree value, we must check if the result of
obliterating the old value would free the corresponding pid in the
proc_id_reap bitmap, and indeed free it when needed.

PR:     289917

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kib requested review of this revision.Wed, Oct 1, 5:42 AM

I do not know the code well enough to review, but I can test.

Not sure if caused by this change or if some other issue:

root@freebsd:~ # while true ; do ./timeout143 0.001 ./timeout-test2.sh ; done
^C
root@freebsd:~ #

The shell is no longer responsive after interrupting the loop command.

This is what ps reports:

0  1247  1244 0   0  0 25780 11040 select   S     -  0:00.05 |   `-- sshd-session: root@pts/0 (sshd-session)
0  1248  1247 0  59  0 14856  3680 ttybg    Ts    0  0:00.68 |     `-- -sh (sh)
0 27752  1248 0  86  0     0     0 -        Z+    0  0:00.00 |       `-- <defunct>

The shell state, just in case:

 PID    TID COMM                TDNAME              KSTACK
1248 100094 sh                  -                   mi_switch+0x172 thread_suspend_check+0xbd sig_ast_checksusp+0x3d sleepq_catch_signals+0x11d sleepq_wait_sig+0x9 _cv_wait_sig+0x165 tty_wait_background+0x29c ttydisc_read+0x190 ttydev_read+0x6c devfs_read_f+0xe4 dofileread+0x80 sys_read+0x127 amd64_syscall+0x169 fast_syscall_common+0xf8

P.S.
This was on a single-CPU VM (I run it with a single vCPU by accident).
Not sure if that matters.

Good news:

(kgdb) bitset_count proc_id_reapmap 100000
$1 = 14

after running the test.

The stuck shell issue cannot be reliably reproduced on subsequent attempts.
Happens about once in 10 attempts.
That's with timeout from 14.3, so there can be some bugs from it.

markj added inline comments.
sys/kern/kern_exit.c
136

BTW, this must be XLOCKED, I believe, since we are mutating a proc's reap list.

164

Before this assignment, we must have p2->p_reapsubtree != p->p_reapsubtree, right? Would it be worth adding an assertion to that effect?

This revision is now accepted and ready to land.Wed, Oct 1, 10:15 AM
kib marked an inline comment as done.Wed, Oct 1, 10:29 AM
kib added inline comments.
sys/kern/kern_exit.c
164

I do not understand why would it be true. As a corner case, we should find p among the set of p2, and then the assert would fail.

Tighten lock assert in reaper_clear()

This revision now requires review to proceed.Wed, Oct 1, 10:30 AM
markj added inline comments.
sys/kern/kern_exit.c
164

Ok, I misunderstood something before.

This revision is now accepted and ready to land.Wed, Oct 1, 11:06 AM
In D52814#1206750, @avg wrote:

That's with timeout from 14.3, so there can be some bugs from it.

That seems to be the case.
With timeout from main the "stuck shell" does not happen at all.

This revision was automatically updated to reflect the committed changes.