Page MenuHomeFreeBSD

killpg1: Fix a race between fork(2) and killpg1()
AbandonedPublic

Authored by dchagin on Apr 26 2023, 12:46 PM.
Tags
None
Referenced Files
Unknown Object (File)
Jan 8 2025, 9:20 AM
Unknown Object (File)
Dec 30 2024, 7:23 AM
Unknown Object (File)
Nov 1 2024, 11:52 AM
Unknown Object (File)
Nov 1 2024, 11:51 AM
Unknown Object (File)
Nov 1 2024, 11:40 AM
Unknown Object (File)
Oct 4 2024, 3:41 PM
Unknown Object (File)
Oct 1 2024, 3:12 AM
Unknown Object (File)
Sep 11 2024, 5:18 PM
Subscribers

Details

Reviewers
kib
Summary

We might not yes see a new child when signalling a process. Ensure that
this cannot happen by stoping all corresponding processes, which ensures
that the child is not inside a syscall, in particular fork(2).

The patch modeled after the series of PROC_REAP_KILL commits.

Test Plan

Note that pgsignal also should be modified, I propose to fix it when killpg1 is ready

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 51208
Build 48099: arc lint + arc unit

Event Timeline

dchagin added a reviewer: kib.

Can this code be somewhat unified with reaper kill, instead of copied?

In D39830#906945, @kib wrote:

Can this code be somewhat unified with reaper kill, instead of copied?

Well, I tried that from the beginning, but there are too many differences in the main part (subtree). I can try to unify taskqueue part, is it ok to move it to kern_sig.c?

Note that stopping process could cause spurious EINTRs returned from some syscalls.
Another thing I am curious about, is the overhead of this approach for more common operation killpg(2) then the reap kill. Are there any consumers that would note the difference?

Did you consider other approaches? For instance, we can add a generation counter to the struct pgrp and increment it on the process addition. I am not sure that this could be made into proper fix for all races, but might be worth trying.

sys/kern/kern_sig.c
1806

The code from this function, starting from this point, up to the end, except two lines, can be shared with reap_kill_proc_locked(). I see that the block after cr_cansignal() cannot be easily shared.

In D39830#907696, @kib wrote:

Note that stopping process could cause spurious EINTRs returned from some syscalls.

Thats sad, then a differ approach is needed indeed.

Another thing I am curious about, is the overhead of this approach for more common operation killpg(2) then the reap kill. Are there any consumers that would note the difference?

I think that broadcast case is most important here as it used by jail subsystem which is widely used.

Did you consider other approaches? For instance, we can add a generation counter to the struct pgrp and increment it on the process addition. I am not sure that this could be made into proper fix for all races, but might be worth trying.

Due to broadcast case of killpg1() we need another approach, for now i dont have a good idea. What if we going to mark signals as broadcast in killpg1() and then ‘copy' such signals to the newly created child during dofork()?