Page MenuHomeFreeBSD

Prevent a jail_attach race between threads
AcceptedPublic

Authored by jamie on Thu, Jun 25, 11:07 PM.

Details

Reviewers
markj
kib
Summary

Attaching to a jail changes its root directory and its process credentials. These operations both require unlocking the jail, and also need allprison_lock unlocked. That means that if two threads are trying to attach to different jails at the same time, it's possible for the process to end up with one jail's root directory but the other jail's credentials.

Solve this by forcing the process into single-threaded mode during the do_jail_attach operation. Since the problem is only intra-process, it's not necessary to try adding a new lock that would be higher in the order than the current vnode and process locks.

Test Plan

Add a new test, jail_thread, that tries to create the races by repeatedly setting two threads to attach to different jails. On my single test machine (4 cores of byhve on a 12-core Ryzen), this happens about a quarter of the time. I make 10000 tries to make sure something is tripped, which is still very quick since nothing like filesystem is involved.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

sys/kern/kern_jail.c
3228

Do not do this when holding any locks, esp. sleepable.

Also be aware that other threads get spurious EINTRs if sleeping in the slow syscalls.

3230

This causes spurious EAGAIN returned to the caller sometimes. Why not return ERESTART?

sys/kern/kern_jail.c
3228

I could drop allprison_lock before this, or move this test to the front of the system calls (especially considering the ERESTART stuff mentioned below).

Unless "do this" means the entire single-threaded section between here and and thread_single_end shouldn't be sleepable. If that's the case, I'll have to find a completely different strategy.

3230

Does ERESTART make it to the kernel/user boundary? I couldn't tell from a quick look through.

For jail_attach and jail_attach_jd that would work. jail_set(JAIL_CREATE | JAIL_ATTACH) is more complicated since restarting the system call would fail on attempting to re-create the jail. The solution may be to make this test in kern_jail_set before starting anything.

sys/kern/kern_jail.c
3228

Single threading should be done at the beginning of the syscall, if possible. After single threading is established, there is no limitation on the synchronization operations that the executing thread can perform. Almost no limitations, the existing shenanigans are related to threading and debugging, which should be not relevant there.

3230

It makes it back to userspace, and the syscall instruction is re-issued.

Move the thread_single and thread_single_end calls from do_jail_attach to the system call level (kern_jail_set, sys_jail_attach, sys_jail_attach_jd). Return ERESTART instead of EAGAIN for lost races.

jamie added inline comments.
sys/kern/kern_jail.c
3228

I'm content for spurious EINTR to be the caller's problem. This situation is unlikely to happen in real code, and the main focus is to keep things out of an invalid state (including possible security concerns, though no specific issue has been noted).

This revision is now accepted and ready to land.Fri, Jun 26, 5:10 AM

BTW, is the problem is in just two parallel jail attaches occurring? I.e. if other thread executes chdir() it is not a problem, am I correct?
If yes, then I think you do not need the single-threading there, it is too big hammer. You could have a single sx that is taken in exclusive mode under the same conditions as you currently do singlethreading. It is both simple and has less complicated side effects.