Page MenuHomeFreeBSD

linux(4) clone(2): Correctly handle CLONE_FS and CLONE_FILES
ClosedPublic

Authored by cem on Oct 29 2020, 10:03 PM.

Details

Summary

The two flags are distinct and it is impossible to correctly handle clone(2)
without the assistance of fork1().

I've added a fork_req flag, FR2_SHARE_PATHS, which indicates that p_pd
should be treated the opposite way p_fd is (based on RFFDG flag). This is a
little ugly, but the benefit is that existing RFFDG API is preserved.
Holding FR2_SHARE_PATHS disabled, RFFDG indicates both p_fd and p_pd are
copied, while !RFFDG indicates both should be cloned.

In Chrome, clone(2) is used with CLONE_FS, without CLONE_FILES, and expects
independent fd tables.

The previous conflation of CLONE_FS and CLONE_FILES was introduced in
r163371 (2006).

Test Plan

Trace from Chrome:

  701: close(15)
  701: close(14)
  701: linux_socketpair(0x1,0x1,0x0,0x7fffffffd7d8) = 0 (0x0)
// i.e., the new socketpair is fds 14 and 15.

  701: linux_clone(0x211,0x0,0x0,0x0,0x20)       = 702 (0x2be)
// 0x11 is signal mask, 0x200 => LINUX_CLONE_FS (not FILES)
  702: <new process>

  701: close(14)                                 = 0 (0x0)
// Closes fd 14 in shared table
  701: linux_prctl(0x4,0x0,0x0,0x0,0x0)          = 0 (0x0)

  702: linux_setrlimit(0x7,0x10289d0)            = 0 (0x0)
  702: close(15)                                 = 0 (0x0)
  702: read(14,0x7fffffffd820,1)                 ERR#-9 'Bad file descriptor'
// Attempts to access 14 in shared table, which has been closed by 701.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

cem requested review of this revision.Oct 29 2020, 10:03 PM
cem created this revision.
This revision is now accepted and ready to land.Oct 29 2020, 10:16 PM

@trasz Do we have a good regression suite I can try? How difficult / big is running something like LTP?

The reason we handle CLONE_FS by setting RFFDG is because a process' cwd (and chroot dir and umask) is stored in the fd table. I'm not sure how we could implement CLONE_FS properly, it'd require some modification to the core kernel.

With this diff CLONE_FS is effectively ignored. I don't have any objection to that, but I'd be surprised if it doesn't break something.

The reason we handle CLONE_FS by setting RFFDG is because a process' cwd (and chroot dir and umask) is stored in the fd table.

Yeah, I got that.

I'm not sure how we could implement CLONE_FS properly, it'd require some modification to the core kernel.

Yes, we’d have to break those settings out like Linux presumably does.

With this diff CLONE_FS is effectively ignored. I don't have any objection to that, but I'd be surprised if it doesn't break something.

Yeah, that is effectively the change, and I’m also concerned about what it might break. I think expecting the FS semantics would be extremely strange, but I haven’t investigated why Chrome does it. Certainly there are other clone() calls in chrome that do not include that flag. I also wonder if LTP covers this flag.

If the original comment is to be believed, maybe many FS users also pass FILES, and wouldn’t be broken here.

Datapoint: glibc does not appear to use CLONE_FS anywhere internally (or rather, only once, in tandem with CLONE_FILES, for creating threads): https://github.com/bminor/glibc/search?q=CLONE_FS

Chromium uses it (without CLONE_FILES) in a few places. Ignore seccomp and nacl for now. https://github.com/chromium/chromium/search?q=CLONE_FS The uses in tandem with SIGCHLD (0x11 on Linux) matches the 0x211 I observed.

One use is this mechanism, which, wow: https://github.com/chromium/chromium/blob/ba5cf7b3d18065b934e9a55e2380450f28ad585a/sandbox/linux/services/credentials.cc#L48-L108

Chrome spawns a child process to chroot both child and parent into a pseudo-directory specific to the child process, which then exits. So, uh, wow. CLONE_FS is definitely intended there. On the other hand, VFORK also means it is safe (sort of) to skip RFFDG for CLONE_FS. So I think we could handle this case safely with something like:

/* Do not create independent fd table copy if CLONE_FILES is set, or if (CLONE_FS | CLONE_VFORK) is set. */
if ((args->flags & LINUX_CLONE_FILES) == 0 &&
    (args->flags & (LINUX_CLONE_FS | LINUX_CLONE_VFORK)) != (LINUX_CLONE_FS | LINUX_CLONE_VFORK))
        ff |= RFFDG;

which is a bit ugly, but does not require implementing Linux's CLONE_FS semantics.

This use is much the same, except it requires the parent and child to run contemporaneously, because the child does not chroot until it receives a write on the shared socketpair: https://github.com/chromium/chromium/blob/e65f318c604d2159e65edda605863b98d36e2073/sandbox/linux/suid/sandbox.c#L72-L162 So our vfork hack would not work here, and the parent would remain un-chrooted. The API which actually uses this socketpair to chroot itself is in another file: https://github.com/chromium/chromium/blob/2ca8c5037021c9d2ecc00b787d58a31ed8fc8bcb/sandbox/linux/suid/client/setuid_sandbox_client.cc#L95-L132 I think Chrome might work anyway without correct sandboxing, but obviously it is not ideal. Anyway, Chrome can be run with --no-sandbox, which appears to avoid execution of this path (and possibly the earlier one as well).

All other uses of CLONE_FS in chrome appear to be either unittests or in tandem with CLONE_FILES.

That second use dates to, effectively, 2013, where it was added without any non-test consumers: https://github.com/chromium/chromium/blob/486efc8695c31bd2084db07bd47f104451553cc6/sandbox/linux/services/credentials.cc#L90-L134 At the time, it spawned a normal thread rather than using clone() directly, and subsequently unshare()'d CLONE_FILES, giving us essentially the same problem: Chrome wants to chroot from a subprocess without shared fds, and then have the subprocess die.

This revision now requires review to proceed.Nov 1 2020, 11:45 PM
cem retitled this revision from linux(4) clone(2): Do not share fd table if !CLONE_FILES to linux(4) clone(2): Correctly handle CLONE_FS and CLONE_FILES.Nov 1 2020, 11:45 PM
cem edited the summary of this revision. (Show Details)
This revision was not accepted when it landed; it landed in state Needs Review.Nov 17 2020, 9:20 PM
This revision was automatically updated to reflect the committed changes.