Page MenuHomeFreeBSD

Handle a race between fork(2) and killpg(2)
ClosedPublic

Authored by kib on Jun 12 2023, 7:55 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Oct 22, 4:54 AM
Unknown Object (File)
Tue, Oct 21, 6:57 AM
Unknown Object (File)
Mon, Oct 13, 7:14 PM
Unknown Object (File)
Mon, Oct 13, 3:23 AM
Unknown Object (File)
Sun, Oct 12, 10:10 PM
Unknown Object (File)
Sun, Oct 12, 5:57 PM
Unknown Object (File)
Sun, Oct 12, 5:56 PM
Unknown Object (File)
Sun, Oct 12, 5:56 PM
Subscribers

Details

Summary
If the process group member performs fork(), the child could escape
signalling from killpg(). Prevent it by introducing an sx process group
lock pg_killsx which is taken interruptibly shared around fork. If there
is a pending signal, do the trip through userspace with ERESTART to
handle signal ASTs. The lock is taken exclusively during killpg().

The lock is also locked exclusive when the process changes group
membership, to avoid escaping a signal by this means, by ensuring that
the process group is stable during fork.

Note that the new lock is before proctree lock, so in some situations we
could only do trylocking to obtain it.

This relatively simple approach cannot work for REAP_KILL, because
process potentially belongs to more than one reaper tree by having
sub-reapers.

Reported by:    dchagin
killpg(): close a race with fork(), part 2

When we are sending terminating signal to the group, killpg() needs to
guarantee that all group members are to be terminated (it does not need
to ensure that they are terminated on return from killpg()).  The
pg_killsx change eliminates the largest window there, but still, if a
multithreaded process is signalled, the following could happen:
- thread 1 is selected for the signal delivery and gets descheduled
- thread 2 waits for pg_killsx lock, obtains it and forks
- thread 1 continue executing and terminates the process
This scenario allows the child to escape still.

To fix it, count the number of signals sent to the process with
killpg(2), in p_killpg_cnt variable, which is incremented in killpg()
and decremented after signal handler frame is created or in exit1()
after single-threading.  This way we avoid forking if the termination is
due.

Noted by:       markj

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib requested review of this revision.Jun 12 2023, 7:55 AM

Just tested, it works, however, our fork became even slower, before the patch it averaged 0.00008 ns, after the patch it averaged 0.0001, JFYI

sys/kern/kern_fork.c
952

I'd add a comment here:

/*
 * Atomically check for signals and block threads from sending a signal to our process group until the child is visible.
 */
953

Suppose:

  • thread T1 calls fork(), starts executing sys_fork()
  • thread T2 locks the pgrp, delivers a fatal signal to the pgrp, drops the lock
  • T1 acquires pg_killsx without blocking

Then T1's proc has a fatal signal pending, but it will still create a child that escapes the signal. In other words, sx_slock_sig() doesn't atomically check for pending signals, but isn't that what you need in order to fully close the race?

956
kib marked 3 inline comments as done.Jun 12 2023, 9:21 PM

Just tested, it works, however, our fork became even slower, before the patch it averaged 0.00008 ns, after the patch it averaged 0.0001, JFYI

Do you mean seconds instead of nanoseconds. Yes, the patch adds two atomics to the fork() path, but I do not see a way around it. Even if using something similar to seqlocks for fork.

Make pg_killsx locking recheck the pending signals.
Elaborate comments.

In D40493#922222, @kib wrote:

Just tested, it works, however, our fork became even slower, before the patch it averaged 0.00008 ns, after the patch it averaged 0.0001, JFYI

Do you mean seconds instead of nanoseconds.

Sure, seconds, sorry. This is a avg of kdump relative timestamps.

Yes, the patch adds two atomics to the fork() path, but I do not see a way around it. Even if using something similar to seqlocks for fork.

I understand, thank you for fixing this.

Just for the note, there are 11 tests in the Glibc test suite that use tons of fork() calls, they occasionally fails due to timeout, but with this patch it's the first time they all fails at the same time.
The timeout is hardcoded, so apparently it's a problem with my slow hw. However, I checked anyway and decided to write.
These tests are fixed in the awaited 2.38 Glibc, the amount of fork calls will be reduced.

Hmm, another thread in the forking process could have claimed the signal before the sig_intr() check, so this is still racy.

What exactly is the race that this patch is fixing? Is it a Linux test, or some artifact of the test framework?

It is already impossible to signal a process group reliably if one of the target processes is malicious (a process can simply change its pgrp), so to be reliable we already need to make some assumptions about the behaviour of the target. I wonder what assumptions/guarantees Linux has here?

Hmm, another thread in the forking process could have claimed the signal before the sig_intr() check, so this is still racy.

If other thread claimed the signal, then we are fine.

What exactly is the race that this patch is fixing? Is it a Linux test, or some artifact of the test framework?

It is already impossible to signal a process group reliably if one of the target processes is malicious (a process can simply change its pgrp), so to be reliable we already need to make some assumptions about the behaviour of the target. I wonder what assumptions/guarantees Linux has here?

Assume that there is no process actively changing it's process group. In this case, eg. killpg(SIGKILL) must reliably destroy whole group in presence of spawning processes. Right now this is not true.

In D40493#922446, @kib wrote:

Hmm, another thread in the forking process could have claimed the signal before the sig_intr() check, so this is still racy.

If other thread claimed the signal, then we are fine.

Suppose:

  • a thread T1 observes SIGKILL on the queue, clears it, calls sigexit(), releases the proc lock
  • the forking thread T2 checks the queue, sees it is empty, and proceeds with the fork
  • the signaled thread executes exit1(), single-threads the process

AFAIU T2 will not suspend itself until after the child is created. Is there something which prevents the child from escaping?

What exactly is the race that this patch is fixing? Is it a Linux test, or some artifact of the test framework?

It is already impossible to signal a process group reliably if one of the target processes is malicious (a process can simply change its pgrp), so to be reliable we already need to make some assumptions about the behaviour of the target. I wonder what assumptions/guarantees Linux has here?

Assume that there is no process actively changing it's process group. In this case, eg. killpg(SIGKILL) must reliably destroy whole group in presence of spawning processes. Right now this is not true.

In D40493#922446, @kib wrote:

Hmm, another thread in the forking process could have claimed the signal before the sig_intr() check, so this is still racy.

If other thread claimed the signal, then we are fine.

Suppose:

  • a thread T1 observes SIGKILL on the queue, clears it, calls sigexit(), releases the proc lock
  • the forking thread T2 checks the queue, sees it is empty, and proceeds with the fork
  • the signaled thread executes exit1(), single-threads the process

AFAIU T2 will not suspend itself until after the child is created. Is there something which prevents the child from escaping?

For T2 to check the queue, it needs to already own pg_killsx. Then SIGKILL taken by T1 cannot be sent from killpg() because signalling thread is blocked on the same pg_killsx.

In D40493#922675, @kib wrote:
In D40493#922446, @kib wrote:

Hmm, another thread in the forking process could have claimed the signal before the sig_intr() check, so this is still racy.

If other thread claimed the signal, then we are fine.

Suppose:

  • a thread T1 observes SIGKILL on the queue, clears it, calls sigexit(), releases the proc lock
  • the forking thread T2 checks the queue, sees it is empty, and proceeds with the fork
  • the signaled thread executes exit1(), single-threads the process

AFAIU T2 will not suspend itself until after the child is created. Is there something which prevents the child from escaping?

For T2 to check the queue, it needs to already own pg_killsx. Then SIGKILL taken by T1 cannot be sent from killpg() because signalling thread is blocked on the same pg_killsx.

The signalling thread can post a signal, release the pg_killsx lock. Then T1 checks the queue and claims the signal. Then T2 acquires pg_killsx and checks the queue, and observes it is empty, so proceeds with fork().

In D40493#922675, @kib wrote:
In D40493#922446, @kib wrote:

Hmm, another thread in the forking process could have claimed the signal before the sig_intr() check, so this is still racy.

If other thread claimed the signal, then we are fine.

Suppose:

  • a thread T1 observes SIGKILL on the queue, clears it, calls sigexit(), releases the proc lock
  • the forking thread T2 checks the queue, sees it is empty, and proceeds with the fork
  • the signaled thread executes exit1(), single-threads the process

AFAIU T2 will not suspend itself until after the child is created. Is there something which prevents the child from escaping?

For T2 to check the queue, it needs to already own pg_killsx. Then SIGKILL taken by T1 cannot be sent from killpg() because signalling thread is blocked on the same pg_killsx.

The signalling thread can post a signal, release the pg_killsx lock. Then T1 checks the queue and claims the signal. Then T2 acquires pg_killsx and checks the queue, and observes it is empty, so proceeds with fork().

And it is fine as well. If signal does not end up with the terminate action, the situation is not different than the victim process started fork() after receiving signal. If the signal is terminating, T2 checks for the suspend action and returns from fork with ERESTART, allowing ast handler to exit the thread.

In D40493#923486, @kib wrote:
In D40493#922675, @kib wrote:
In D40493#922446, @kib wrote:

Hmm, another thread in the forking process could have claimed the signal before the sig_intr() check, so this is still racy.

If other thread claimed the signal, then we are fine.

Suppose:

  • a thread T1 observes SIGKILL on the queue, clears it, calls sigexit(), releases the proc lock
  • the forking thread T2 checks the queue, sees it is empty, and proceeds with the fork
  • the signaled thread executes exit1(), single-threads the process

AFAIU T2 will not suspend itself until after the child is created. Is there something which prevents the child from escaping?

For T2 to check the queue, it needs to already own pg_killsx. Then SIGKILL taken by T1 cannot be sent from killpg() because signalling thread is blocked on the same pg_killsx.

The signalling thread can post a signal, release the pg_killsx lock. Then T1 checks the queue and claims the signal. Then T2 acquires pg_killsx and checks the queue, and observes it is empty, so proceeds with fork().

And it is fine as well. If signal does not end up with the terminate action, the situation is not different than the victim process started fork() after receiving signal. If the signal is terminating, T2 checks for the suspend action and returns from fork with ERESTART, allowing ast handler to exit the thread.

If the signal is terminating, T2 checks for the suspend action, yes, but this is not synchronized with sigexit(). It's possible for T2 to check for suspension before T1 starts single-threading the process.

kib edited the summary of this revision. (Show Details)

Now that we have KSI_KILLPG, is it possible to have a scheme where a forking thread copies a pending pgrp signal into the child process? That is, rather than blocking signal delivery with a mutex, it might be possible to copy pending signals that arrive during a fork.

sys/kern/kern_exit.c
314

We should assert against underflow.

sys/kern/kern_sig.c
1993–1994

Presumably this should also set KSI_KILLPG?

3108

We should assert against underflow.

kib marked 3 inline comments as done.Jun 20 2023, 10:02 AM

Now that we have KSI_KILLPG, is it possible to have a scheme where a forking thread copies a pending pgrp signal into the child process? That is, rather than blocking signal delivery with a mutex, it might be possible to copy pending signals that arrive during a fork.

I hoped so when I first imagined KSI_KILLPG, but the problem is that I need to block new signals arrival, and interlock it with the check for already queued signals. sigqueue ps_mtx is too late.

Assert that p_killpg_cnt does not underflow.

This revision is now accepted and ready to land.Jun 23 2023, 4:02 PM