Page MenuHomeFreeBSD

posix_spawn(3): handle potential signal issues with vfork

Authored by kevans on Feb 1 2019, 7:25 PM.



Described in [1], signal handlers running in a vfork child have opportunities to corrupt the parent's state. Address this by adding a new rfork(2) flag, RFSPAWN, that has vfork(2) semantics but also resets signal handlers in the child during creation.

x86 uses rfork_thread(3) instead of a direct rfork(2) because rfork w/ RFMEM/RFSPAWN cannot work when the return address is stored on the stack -- further information about this problem is described under RFMEM in the rfork(2) man page.

Addressing this has been identified as a prerequisite to using posix_spawn in subprocess on FreeBSD [2].


Diff Detail

rS FreeBSD src repository
Lint Skipped
Unit Tests Skipped
Build Status
Buildable 26438

Event Timeline

kevans created this revision.Feb 1 2019, 7:25 PM
kib added a reviewer: jilles.Feb 1 2019, 7:42 PM
kib added a comment.EditedFeb 1 2019, 8:03 PM

Did you considered adding a flag for rfork(2) like RFSPAWN which would act as vfork+clear signals disposition ? (Note that vfork(2) behavior cannot be requested from rfork(2) using flags).

55 ↗(On Diff #53521)

There is no need in the underscore prefix. The symbol is static.

Why you cannot use sigset_t ?

SIGCANCEL should be added unconditionally to the set.

76 ↗(On Diff #53521)

SIG_DFL can be ignored as well. You may also clear bits when SIG_IGN and SIG_DFL are specified.

jilles requested changes to this revision.Feb 1 2019, 8:45 PM

The general approach seems good.

I contemplated pushing the signal reset bits further down into the vfork child and doing them as we're processing spawnattr (and something similar to the current approach if sa == NULL), but I wasn't sure if the added complexity was worth the trade-off.

This would be possible but not required. Note that processing file actions (which may block and may benefit from signals with default action and an open signal mask) happens after processing spawnattr (which may reset signals to the default action and change the signal mask).


In a threaded process, this will call libthr's __thr_sigprocmask which leaves SIGCANCEL/SIGTHR unmasked. Perhaps this should be __sigprocmask which is an unexported weak alias for __sys_sigprocmask.

63 ↗(On Diff #53521)

sa_flags and sa_mask should be initialized as well.

66 ↗(On Diff #53521)

Calling libthr's version prevents resetting SIGCANCEL/SIGTHR and does additional sigprocmask and rwlock calls.

75–77 ↗(On Diff #53521)

Since libthr's setting of SIGCANCEL/SIGTHR does not pass through here, it needs to be accounted for differently.

This revision now requires changes to proceed.Feb 1 2019, 8:45 PM
jilles added inline comments.Feb 1 2019, 8:58 PM
76 ↗(On Diff #53521)

Clearing bits may be incorrect when two threads issue sigaction for the same signal at the same time, one setting a handler and one setting the default action. libthr does not guarantee that the previously installed handler cannot be called after a sigaction with SIG_DFL returns, but does appear to lock the change properly otherwise.

kevans added inline comments.Feb 1 2019, 9:18 PM
55 ↗(On Diff #53521)

I hadn't spent any time considering any concurrency issues that may happen here, so I opted for the bare minimum uint32[_SIG_WORDS] that I can operate on atomically -- a luxury not provided by sigset operations last I had looked.

63 ↗(On Diff #53521)

I thought I had a memset here -- must have been lost in transit across machines. =( Will address this in another revision.

76 ↗(On Diff #53521)

The initial concern I had is invalid when I address your concerns about using libthr versions of sig* operations, but I did not want to clear these initially because I didn't want to clobber _used_sigs in the vfork parent when the child resets everything.

kevans added inline comments.Feb 1 2019, 9:23 PM
75–77 ↗(On Diff #53521)

Is there any reason I shouldn't have __libc_used_signal_reset just reset these ones unconditionally?

kevans updated this revision to Diff 53528.Feb 1 2019, 10:58 PM

New approach, because it looks a lot nicer- add an RFSPAWN flag for rfork(2) that uses vfork(2) semantics but doesn't inherit signal actions.

My understanding from a brief read-through of this stuff is that we can simply use the newly allocated sigacts and don't copy from the parent, but perhaps there's more required?

RFSPAWN is currently mutually exclusive with other rfork(2) options.

kib added inline comments.Feb 1 2019, 11:40 PM

It might be wise to fall back to vfork() on EINVAL.


You should check what you stated. If RFSPAWN is specified, all other flags must be cleared.


if (fr->fr_flags2 & FR2_SPAWN) == 0)

That said, I do not think it is correct. You must copy and then reset sigactions to have any sane semantics. Most important, the already queued signals should be dropped, if any.


Better call the flag basing on its functionality, not use.


/* kernel: parent sleeps ...


/* user: vfork(2) ...

kevans added inline comments.Feb 2 2019, 1:57 AM

Right, also going to revert the KERNELFLAG change and special case it here FOR RFPSPAWN

kevans updated this revision to Diff 53532.Feb 2 2019, 4:01 AM
kevans marked 4 inline comments as done.

(Ignore that "KERNELFLAGS" comment -- that was referring to RFKERNELONLY, which for some reason I had thought was used somewhere else that might care... nope)

First stab at fixing rfork(2) version:

  • Fall back to vfork(2) rfork errors out with EINVAL
  • Validate flags, RFSPAWN mutually exclusive with others
  • Rename to FR2_DROPSIG_CAUGHT (? seemed sensible)
  • Lifted the bits to reset caught signals from execsigs -- it was unclear to me if we want to drop the entire sigqueue or just for the signals we're resetting, but the latter seemed more correct
kib added a comment.Feb 2 2019, 5:31 PM

I think that the approach is basically sane but this is a good moment to step back and try to evaluate do we need any other assist from the kernel for posix_spawn(3). My point is that we want all that functionality to be put under RFSPAWN to avoid allocate one more flag.


Perhaps explain that this happens on old kernels.


I prefer to put {} around branches consistently: if one branch requires it, then the other also get {}.


Your sig_drop_caught() requires p2->ps_mtx locked around the call.


Add asserts that both proc lock and ps_mtx are locked.

jilles added a comment.Feb 3 2019, 3:54 PM

It is indeed important to ensure that this is the complete kernel assist for posix_spawn, so we do not leave multiple versions of this around, each with their own flag.


I'm surprised that this works at all on (among others) amd64: the child pops the return address and overwrites it with new return addresses for calls it makes; how can the parent's return (the ret instruction in rfork.S from SYS.h) go to the right place? Note that vfork has a hand-written stub that deals with this problem.

On architectures such as sparc64, arm and mips where subroutine return addresses are stored in a register and system call stubs do not use the stack, this should not be an issue.

kib added inline comments.Feb 3 2019, 6:17 PM

Indeed, perhaps rfork_thread(3) should be used. I hope that it still works.

kevans added inline comments.Feb 3 2019, 7:11 PM

I'll double check my test system and rework this

jilles added inline comments.Feb 3 2019, 9:12 PM

Using rfork_thread(3) to implement something vfork-like does not seem a violation of its deprecation. The deprecation is about using rfork_thread(3) to create threads.

rfork_thread(3)'s API is deficient in specifying the stack: it only takes an initial stack pointer, where I would expect a base and a size. Perhaps because of that or because it is deprecated, it is only available on i386 and amd64.

Using a separate stack for the child process does avoid GCC bugs like

kib added inline comments.Feb 3 2019, 9:24 PM

By 'hope' I mean that we do not corrupt userspace registers in the syscall for rfork_thread() to work. Note that fast syscall on amd64 does not restore all registers as an optimization.

Fortunately, child return from fork sets PCB_FULL_IRET in cpu_fork(), so all registers are restored. From my reading of the rfork_thread() code, it does not matter for parent.

But I did not verified it by a test.

jilles added inline comments.Feb 3 2019, 11:33 PM

rfork_thread(RFPROC | RFFDG | RFMEM, ...) appears to work still (on stable/11 amd64 and head amd64).

rfork_thread.S only depends on %rsi being preserved in the child, but vfork.S depends on %rsi being preserved in both parent and child. The fast return path of fast_syscall additionally preserves %rdi and %rsi, calling that a bonus in comments. I don't think it is a bonus -- vfork.S depends on it and has no good alternative location to store the parent's return address (thread local storage? yuck).

kib added inline comments.Feb 3 2019, 11:41 PM

From my reading, the child dependencies are %rsi, %r12, and %rbx. But they are handler by iret path.

trasz added a subscriber: trasz.Feb 4 2019, 12:46 PM
kevans updated this revision to Diff 61840.Sep 9 2019, 1:55 PM
kevans marked 11 inline comments as done.

Apologies for the delay... here's another iteration that I think addresses all of the concerns. From what I can glean, i386/amd64 are the only arch we support where rfork(RFMEM) cannot work.

kevans updated this revision to Diff 61841.Sep 9 2019, 2:23 PM

Fix last minute typos in locking/lock assertions... retesting RFSPAWN path, but it should be OK.

kib added inline comments.Sep 10 2019, 7:53 PM

You do not need both defines.

On amd64, there is so called red zone, an area in the stack below %rsp that compiler is free to use. Red zone size is 128 bytes, which leaves 192 - 128 = 64 bytes to use until formal overflow. IMO it is too optimistic.

kevans added inline comments.Sep 10 2019, 11:43 PM

Ah, that's good to know- 192 was a rough approximation based on measured-once...

do_posix_spawn -> process_spawnattr -> (syscalls)/sigismember

do_posix_spawn -> process_file_actions -> process_file_actions_entry -> (syscalls)

So accounting for 3/4-deep, perhaps 768 as a conservative estimate? Even 1k wouldn't exactly hurt anything, that's 'small potatoes' as far as posix_spawn memory requirements go, I would think.

kevans updated this revision to Diff 61998.Sep 12 2019, 6:47 PM
kevans marked an inline comment as done.
  • _RFORK_THREAD_STACK_SIZE implies the need for rfork_thread()
  • Use a more conservative estimate for rfork_thread stack size requirements
jilles added inline comments.Sep 12 2019, 8:10 PM

Various exported functions such as _execve() are called, which might cause rtld and/or libthr to be involved and stack usage to go up greatly. In principle, it's possible to do the necessary work with only unexported aliases and/or do preparations like getenv("PATH") in the parent process.


Do you mean "must not return"?

kib added inline comments.Sep 13 2019, 5:58 AM

Use 1U << 31 while there.

kevans marked an inline comment as done.Sep 13 2019, 8:18 PM
kevans added inline comments.

Hmm... at least most of this should be successfully avoiding libthr, as it's "in an undefined state after the vfork" according to it.

What specifically would you propose here? Switching _execve over to the unwrapped version and using an even more conservative 1k/2k/4k for the stack size to catch any edge cases?


Hmmm... I caught that once, but it looks like I had only fixed it locally in the diff that I uploaded (61840)... fixed for next update.

kevans marked an inline comment as done.Sep 13 2019, 8:45 PM
kevans added inline comments.

Alternatively, how bad of an idea would it be to fetch the stack pointer, subtract past the red zone, and use that for the rfork_thread stack? This is only x86 that has this problem, so it's really tempting.

kib added inline comments.Sep 14 2019, 9:01 AM

I would go forward with 2K or even 4K stack and claim that the issue is resolved.

kevans updated this revision to Diff 62099.Sep 14 2019, 5:22 PM
kevans edited the summary of this revision. (Show Details)
  • 768 -> 4k stack
  • Revised commit message, because it was half-wrong
kib accepted this revision.Sep 14 2019, 5:33 PM
kib added inline comments.

'This is used by .Xr posix_spawn 3 implementation in libc.'


() around p are not needed.

jilles accepted this revision.Sep 15 2019, 11:33 AM
jilles added inline comments.

What could be a problem is if the child is killed forcibly while holding a lock, since this would likely cause the parent to deadlock later. This could happen a signal is sent to the process group for which the parent has a handler that does little and the default action is to terminate the process. (Getting this to happen with SIGKILL would be very unlikely since it is hard to send a signal to the child process only.) With rtld, this can happen only with the first posix_spawn call using particular functionality in the process, and libthr seems to be successfully avoided.

This seems rare enough not to block this review on it.

This revision is now accepted and ready to land.Sep 15 2019, 11:33 AM
This revision was automatically updated to reflect the committed changes.
kevans marked 2 inline comments as done.