Different fix for the killpg race, part2
ClosedPublic
Actions

Authored by kib on Jul 21 2023, 9:47 AM.

Details

Reviewers

markj
jhb

Commits

rG232b922cb363: killpg(): close a race with fork(), part 2
rGdfe172484d0e: sigtd(): prefer non-stopped thread as a target for signal queue
rGaaa924138a31: Revert "killpg(): close a race with fork(), part 2"

Summary

Note: the revert is not included into the differential because it is trivial and only hinders readability.

Revert "killpg(): close a race with fork(), part 2"

This reverts commits 81a37995c757b4e3ad8a5c699864197fd1ebdcf5 and
565a343ae3a30bc2973182ff8dfd2fa37d7f615f.

There is still a leakage of the p_killpg_cnt, some but not all sources
of which were identified.

Second, and more important, is that there is a fundamental issue with
blocked signals having KSI_KILLPG flag set.  Queueing of such signal
increments p_killpg_cnt, but it cannot be decremented until the signal
is delivered.  If, for instance, a single-threaded process with blocked
signal receives killpg-kill and executes fork(2), the fork enter check
returns with ERESTART.  And since signal is blocked, the condition
cannot be cleared.

sigtd(): prefer non-stopped thread as a target for signal queue

This should improve signal delivery latency and better expose the
process state to the executing threads.

killpg(): close a race with fork(), part 2

When we are sending terminating signal to the group, killpg() needs
to guarantee that all group members are to be terminated (it does not
need to ensure that they are terminated on return from killpg()).  The
pg_killsx change eliminates the largest window there, but still, if
a multithreaded process is signalled, the following could happen:
- thread 1 is selected for the signal delivery and gets descheduled
- thread 2 waits for pg_killsx lock, obtains it and forks
- thread 1 continue executing and terminates the process
This scenario allows the child to escape still.

Fix it by single-threading forking parent if a conflict with pg_killsx
is noted.  We try to lock pg_killsx without sleeping, and failure to
acquire it means that a parallel killpg(2) is executed.  Then, stop
other threads for running and in particular, receive signals, to avoid
the situation explained above.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kib created this revision.Jul 21 2023, 9:47 AM

Herald added a subscriber: imp. · View Herald TranscriptJul 21 2023, 9:47 AM

kib requested review of this revision.Jul 21 2023, 9:47 AM

markj added inline comments.Jul 24 2023, 2:57 PM

sys/kern/kern_fork.c
954	This comment should move down, I think, and a comment should explain why we are single-threading.
957	This is a racy load, right?
1077	I suggest putting this hunk at the end of the function, with an unconditional `goto cleanup` before the `fail0` label.

kib marked 3 inline comments as done.Jul 24 2023, 7:57 PM

kib added inline comments.

sys/kern/kern_fork.c
957	Yes, it is same as in many other places. It is actually safe because p_numthreads cannot become > 1 if it is == 1 right now.

Changes: move cleanup, update comments.

Upload the right patch, that includes the comments changes.

markj added inline comments.Jul 25 2023, 1:48 PM

sys/kern/kern_fork.c
955
956	Suppose the current thread has the signal masked. Then, won't sigtd() still potentially choose a "random" stopped thread, allowing the child to escape?
958

kib marked 3 inline comments as done.Jul 25 2023, 3:51 PM

kib added inline comments.

sys/kern/kern_fork.c
956	If sigtd() is tasked with selecting some thread, then we must already block on pg_killsx. Now, if the signal to be delivered can be masked, we are in the same situation as with any non-threaded process that masked signal generated by killpg(2). There is nothing to guarantee there, because process actively interacts with the signal.