Page MenuHomeFreeBSD

reapkill: handle possible pid reuse after the pid was recorded as signalled
ClosedPublic

Authored by kib on May 14 2023, 1:53 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Dec 25, 4:16 AM
Unknown Object (File)
Fri, Dec 13, 9:34 PM
Unknown Object (File)
Dec 1 2024, 12:07 AM
Unknown Object (File)
Nov 24 2024, 9:18 PM
Unknown Object (File)
Nov 23 2024, 9:02 PM
Unknown Object (File)
Nov 23 2024, 5:20 AM
Unknown Object (File)
Nov 10 2024, 6:22 PM
Unknown Object (File)
Nov 10 2024, 4:25 PM
Subscribers

Details

Summary
Nothing prevents the signalled process from exiting, and then other
process among eligible for signalling to reuse the exited process pid.
In this case, presence of the pid in the 'pids' unr set prevents it from
getting the deserved signal.

Handle it by marking each process with the new flag P2_REAPKILLED when
we are about to send the signal.  If the process pid is present in the
pids unr, but the struct proc is not marked with P2_REAPKILLED, we must
send signal to the pid again.

The use of the flag relies on the global sapblk preventing parallel
reapkills.

The pids unr must be used to clear the flags to all signalled processes.
unr(9): add iterator interface
unr(9) iterator: add naive test

To use, compile userspace code e.g. into the subr_unit binary, then do
        $ while ./subr_unit -iv >|/tmp/subr_unit.log ; do :; done
The loop should be left run for as long as possible.

Also minor commits fix style, move is_bitmap() around and use it in one more place.
Document clean_unrhdr().

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib requested review of this revision.May 14 2023, 1:53 AM

Document first/last special ranges

uh->low is not neccessary 0

More fixes for uh->low != 0 first run iteration.
Add test.

share/man/man9/unr.9
104
107
sys/kern/kern_procctl.c
428

I would use atomic_load_int() here to indicate that this is a (safe) unlocked read.

520

How does this work with respect to parallel calls to PROC_REAP_KILL? AFAICS they are not serialized (only the shared proctree lock is used), so isn't it possible for this loop to clear flags that were set by a different caller?

kib marked 3 inline comments as done.May 15 2023, 6:22 PM
kib added inline comments.
sys/kern/kern_procctl.c
520

From the commit message `The use of the flag relies on the global sapblk preventing parallel
reapkills.` sx_stop_all_proc_blocker is taken around REAP_KILL for the case when the whole subtree is killed, thus requiring single-threading target processes. We cannot single-thread the same process in parallel.

Then I reused the existing mechanism.

Grammar in man pages.
Update comments about unlocked read of p_flag2 and sapblk.
Use atomic_load for unlocked p_flag2 accesses.

share/man/man9/unr.9
118
147
150
153
154
157
158
159
168
173
sys/kern/kern_procctl.c
446
sys/kern/subr_unit.c
266

Suppose I initialize a unr header and then try to iterate over it before allocating anything. In this case, uh->first == 0 and uh->head is empty, but here we assume uh->head is not empty.

293
325

Why iter_next_unrl() instead of next_iter_unr(), to match the pattern used everywhere else?

kib marked 15 inline comments as done.

Reviewer notes.
Fix grammar in the man page.
Correctly handle first iteration for an empty unit.

This revision is now accepted and ready to land.May 24 2023, 3:05 PM

Bug fixes.

  • Check for P2_WEXITED when deciding to skip the specific pid. Since P2_REAPKILLED is not set for such process, we might deadlock.

Several fixes for unr iterator implementation.

  • Do not use % to calculate the clipped bit index in the current bitmap, simply substract the upos_first_item idx.
  • Initialize upos_first_item for the first iteration where there is no compacted run at the beginning.

Add ddb show commands for unrhdr and unrhdr_iter.

This revision now requires review to proceed.May 26 2023, 5:01 PM