Details

Reviewers

jhb
markj
dchagin

Commits

rG81a37995c757: killpg(): close a race with fork(), part 2
rG3360b48525fc: killpg(2): close a race with fork(2), part1
rG4b59d1724b76: killpg1(): update the herald comment

Summary

If the process group member performs fork(), the child could escape
signalling from killpg(). Prevent it by introducing an sx process group
lock pg_killsx which is taken interruptibly shared around fork. If there
is a pending signal, do the trip through userspace with ERESTART to
handle signal ASTs. The lock is taken exclusively during killpg().

The lock is also locked exclusive when the process changes group
membership, to avoid escaping a signal by this means, by ensuring that
the process group is stable during fork.

Note that the new lock is before proctree lock, so in some situations we
could only do trylocking to obtain it.

This relatively simple approach cannot work for REAP_KILL, because
process potentially belongs to more than one reaper tree by having
sub-reapers.

Reported by:    dchagin

killpg(): close a race with fork(), part 2

When we are sending terminating signal to the group, killpg() needs to
guarantee that all group members are to be terminated (it does not need
to ensure that they are terminated on return from killpg()).  The
pg_killsx change eliminates the largest window there, but still, if a
multithreaded process is signalled, the following could happen:
- thread 1 is selected for the signal delivery and gets descheduled
- thread 2 waits for pg_killsx lock, obtains it and forks
- thread 1 continue executing and terminates the process
This scenario allows the child to escape still.

To fix it, count the number of signals sent to the process with
killpg(2), in p_killpg_cnt variable, which is incremented in killpg()
and decremented after signal handler frame is created or in exit1()
after single-threading.  This way we avoid forking if the termination is
due.

Noted by:       markj

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

kib created this revision.Jun 12 2023, 7:55 AM

Herald added a subscriber: imp. · View Herald TranscriptJun 12 2023, 7:55 AM

kib requested review of this revision.Jun 12 2023, 7:55 AM

dchagin mentioned this in D39830: killpg1: Fix a race between fork(2) and killpg1().Jun 12 2023, 8:48 AM

Just tested, it works, however, our fork became even slower, before the patch it averaged 0.00008 ns, after the patch it averaged 0.0001, JFYI

markj added inline comments.Jun 12 2023, 2:44 PM

sys/kern/kern_fork.c
952	I'd add a comment here: /* * Atomically check for signals and block threads from sending a signal to our process group until the child is visible. */
953	Suppose: thread T1 calls fork(), starts executing sys_fork() thread T2 locks the pgrp, delivers a fatal signal to the pgrp, drops the lock T1 acquires pg_killsx without blocking Then T1's proc has a fatal signal pending, but it will still create a child that escapes the signal. In other words, sx_slock_sig() doesn't atomically check for pending signals, but isn't that what you need in order to fully close the race?
956

In D40493#922019, @dchagin wrote:

Just tested, it works, however, our fork became even slower, before the patch it averaged 0.00008 ns, after the patch it averaged 0.0001, JFYI

Do you mean seconds instead of nanoseconds. Yes, the patch adds two atomics to the fork() path, but I do not see a way around it. Even if using something similar to seqlocks for fork.

Make pg_killsx locking recheck the pending signals.
Elaborate comments.

In D40493#922222, @kib wrote:

In D40493#922019, @dchagin wrote:

Just tested, it works, however, our fork became even slower, before the patch it averaged 0.00008 ns, after the patch it averaged 0.0001, JFYI

Do you mean seconds instead of nanoseconds.

Sure, seconds, sorry. This is a avg of kdump relative timestamps.

Yes, the patch adds two atomics to the fork() path, but I do not see a way around it. Even if using something similar to seqlocks for fork.

I understand, thank you for fixing this.

Just for the note, there are 11 tests in the Glibc test suite that use tons of fork() calls, they occasionally fails due to timeout, but with this patch it's the first time they all fails at the same time.
The timeout is hardcoded, so apparently it's a problem with my slow hw. However, I checked anyway and decided to write.
These tests are fixed in the awaited 2.38 Glibc, the amount of fork calls will be reduced.

Hmm, another thread in the forking process could have claimed the signal before the sig_intr() check, so this is still racy.

What exactly is the race that this patch is fixing? Is it a Linux test, or some artifact of the test framework?

It is already impossible to signal a process group reliably if one of the target processes is malicious (a process can simply change its pgrp), so to be reliable we already need to make some assumptions about the behaviour of the target. I wonder what assumptions/guarantees Linux has here?

In D40493#922440, @markj wrote:

Hmm, another thread in the forking process could have claimed the signal before the sig_intr() check, so this is still racy.

If other thread claimed the signal, then we are fine.

What exactly is the race that this patch is fixing? Is it a Linux test, or some artifact of the test framework?

It is already impossible to signal a process group reliably if one of the target processes is malicious (a process can simply change its pgrp), so to be reliable we already need to make some assumptions about the behaviour of the target. I wonder what assumptions/guarantees Linux has here?

Assume that there is no process actively changing it's process group. In this case, eg. killpg(SIGKILL) must reliably destroy whole group in presence of spawning processes. Right now this is not true.

In D40493#922446, @kib wrote:

In D40493#922440, @markj wrote:

Hmm, another thread in the forking process could have claimed the signal before the sig_intr() check, so this is still racy.

If other thread claimed the signal, then we are fine.

Suppose:

a thread T1 observes SIGKILL on the queue, clears it, calls sigexit(), releases the proc lock
the forking thread T2 checks the queue, sees it is empty, and proceeds with the fork
the signaled thread executes exit1(), single-threads the process

AFAIU T2 will not suspend itself until after the child is created. Is there something which prevents the child from escaping?

What exactly is the race that this patch is fixing? Is it a Linux test, or some artifact of the test framework?

It is already impossible to signal a process group reliably if one of the target processes is malicious (a process can simply change its pgrp), so to be reliable we already need to make some assumptions about the behaviour of the target. I wonder what assumptions/guarantees Linux has here?

Assume that there is no process actively changing it's process group. In this case, eg. killpg(SIGKILL) must reliably destroy whole group in presence of spawning processes. Right now this is not true.

In D40493#922454, @markj wrote:

In D40493#922446, @kib wrote:

In D40493#922440, @markj wrote:

Hmm, another thread in the forking process could have claimed the signal before the sig_intr() check, so this is still racy.

If other thread claimed the signal, then we are fine.

Suppose:

a thread T1 observes SIGKILL on the queue, clears it, calls sigexit(), releases the proc lock

the forking thread T2 checks the queue, sees it is empty, and proceeds with the fork

the signaled thread executes exit1(), single-threads the process

AFAIU T2 will not suspend itself until after the child is created. Is there something which prevents the child from escaping?

For T2 to check the queue, it needs to already own pg_killsx. Then SIGKILL taken by T1 cannot be sent from killpg() because signalling thread is blocked on the same pg_killsx.

In D40493#922675, @kib wrote:

In D40493#922454, @markj wrote:

In D40493#922446, @kib wrote:

In D40493#922440, @markj wrote:

Hmm, another thread in the forking process could have claimed the signal before the sig_intr() check, so this is still racy.

If other thread claimed the signal, then we are fine.

Suppose:

a thread T1 observes SIGKILL on the queue, clears it, calls sigexit(), releases the proc lock

the forking thread T2 checks the queue, sees it is empty, and proceeds with the fork

the signaled thread executes exit1(), single-threads the process

AFAIU T2 will not suspend itself until after the child is created. Is there something which prevents the child from escaping?

For T2 to check the queue, it needs to already own pg_killsx. Then SIGKILL taken by T1 cannot be sent from killpg() because signalling thread is blocked on the same pg_killsx.

The signalling thread can post a signal, release the pg_killsx lock. Then T1 checks the queue and claims the signal. Then T2 acquires pg_killsx and checks the queue, and observes it is empty, so proceeds with fork().

In D40493#923482, @markj wrote:

In D40493#922675, @kib wrote:

In D40493#922454, @markj wrote:

In D40493#922446, @kib wrote:

In D40493#922440, @markj wrote:

Hmm, another thread in the forking process could have claimed the signal before the sig_intr() check, so this is still racy.

If other thread claimed the signal, then we are fine.

Suppose:

a thread T1 observes SIGKILL on the queue, clears it, calls sigexit(), releases the proc lock

the forking thread T2 checks the queue, sees it is empty, and proceeds with the fork

the signaled thread executes exit1(), single-threads the process

AFAIU T2 will not suspend itself until after the child is created. Is there something which prevents the child from escaping?

For T2 to check the queue, it needs to already own pg_killsx. Then SIGKILL taken by T1 cannot be sent from killpg() because signalling thread is blocked on the same pg_killsx.

The signalling thread can post a signal, release the pg_killsx lock. Then T1 checks the queue and claims the signal. Then T2 acquires pg_killsx and checks the queue, and observes it is empty, so proceeds with fork().

And it is fine as well. If signal does not end up with the terminate action, the situation is not different than the victim process started fork() after receiving signal. If the signal is terminating, T2 checks for the suspend action and returns from fork with ERESTART, allowing ast handler to exit the thread.

In D40493#923486, @kib wrote:

In D40493#923482, @markj wrote:

In D40493#922675, @kib wrote:

In D40493#922454, @markj wrote:

In D40493#922446, @kib wrote:

In D40493#922440, @markj wrote:

Hmm, another thread in the forking process could have claimed the signal before the sig_intr() check, so this is still racy.

If other thread claimed the signal, then we are fine.

Suppose:

a thread T1 observes SIGKILL on the queue, clears it, calls sigexit(), releases the proc lock

the forking thread T2 checks the queue, sees it is empty, and proceeds with the fork

the signaled thread executes exit1(), single-threads the process

AFAIU T2 will not suspend itself until after the child is created. Is there something which prevents the child from escaping?

For T2 to check the queue, it needs to already own pg_killsx. Then SIGKILL taken by T1 cannot be sent from killpg() because signalling thread is blocked on the same pg_killsx.

The signalling thread can post a signal, release the pg_killsx lock. Then T1 checks the queue and claims the signal. Then T2 acquires pg_killsx and checks the queue, and observes it is empty, so proceeds with fork().

And it is fine as well. If signal does not end up with the terminate action, the situation is not different than the victim process started fork() after receiving signal. If the signal is terminating, T2 checks for the suspend action and returns from fork with ERESTART, allowing ast handler to exit the thread.

If the signal is terminating, T2 checks for the suspend action, yes, but this is not synchronized with sigexit(). It's possible for T2 to check for suspension before T1 starts single-threading the process.

kib updated this revision to Diff 123333.Jun 16 2023, 11:14 AM

kib edited the summary of this revision. (Show Details)

Now that we have KSI_KILLPG, is it possible to have a scheme where a forking thread copies a pending pgrp signal into the child process? That is, rather than blocking signal delivery with a mutex, it might be possible to copy pending signals that arrive during a fork.

sys/kern/kern_exit.c
314	We should assert against underflow.
sys/kern/kern_sig.c
1993–1994	Presumably this should also set KSI_KILLPG?
3108	We should assert against underflow.

In D40493#924588, @markj wrote:

Now that we have KSI_KILLPG, is it possible to have a scheme where a forking thread copies a pending pgrp signal into the child process? That is, rather than blocking signal delivery with a mutex, it might be possible to copy pending signals that arrive during a fork.

I hoped so when I first imagined KSI_KILLPG, but the problem is that I need to block new signals arrival, and interlock it with the check for already queued signals. sigqueue ps_mtx is too late.

Assert that p_killpg_cnt does not underflow.

markj accepted this revision.Jun 23 2023, 4:02 PM

This revision is now accepted and ready to land.Jun 23 2023, 4:02 PM

Closed by commit rG4b59d1724b76: killpg1(): update the herald comment (authored by kib). · Explain WhyJul 4 2023, 3:44 AM

This revision was automatically updated to reflect the committed changes.

kib added a commit: rG4b59d1724b76: killpg1(): update the herald comment.

kib added a commit: rG3360b48525fc: killpg(2): close a race with fork(2), part1.

kib added a commit: rG81a37995c757: killpg(): close a race with fork(), part 2.

kib added a reverting change: D41128: Different fix for the killpg race, part2.Jul 21 2023, 9:47 AM

kib added a reverting change: rGaaa924138a31: Revert "killpg(): close a race with fork(), part 2".Jul 26 2023, 3:22 PM