Create new fexecve() variant with explicit interpreter
AbandonedPublic

Authored by jonathan on Jan 3 2017, 2:34 PM.

Details

Reviewers
rwatson
brooks
Summary

The ffexecve(2) system call allows a run-time interpreter to be specified explicitly rather than relying on the default logic (e.g., a path baked into an ELF program header).

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 9233
Build 9683: arc lint + arc unit
jonathan retitled this revision from to First attempt at a new fexecve() variant that allows the interpreter to be specified explicitly with a file descriptor..Jan 3 2017, 2:34 PM
jonathan updated this object.
jonathan edited the test plan for this revision. (Show Details)
jonathan added a reviewer: rwatson.
emaste added a subscriber: emaste.Mar 17 2017, 3:37 PM
allanjude added inline comments.
sys/kern/init_sysent.c
6 ↗(On Diff #23573)

looks mangled

sys/kern/syscalls.c
6 ↗(On Diff #23573)

mangled

sys/sys/syscall.h
6 ↗(On Diff #23573)

mangled

sys/sys/syscall.mk
4 ↗(On Diff #23573)

mangled

sys/sys/sysproto.h
6 ↗(On Diff #23573)

mangled

kib added a subscriber: kib.Mar 17 2017, 3:44 PM

Please remove generated files from the diff.

lib/libc/sys/Symbol.map
333 ↗(On Diff #23573)

Wrong version.

emaste added inline comments.Mar 17 2017, 4:06 PM
sys/kern/init_sysent.c
6 ↗(On Diff #23573)

This is just an artifact of using old makesyscalls.sh in a git working dir (which has unexpanded $FreeBSD$ tags in the source files). The good news is that as of rS313564 these lines are not emitted by makesyscalls.sh.

jonathan updated this revision to Diff 26360.Mar 17 2017, 4:57 PM
  • Remove auto-generated files from diff.

Updating D9030: First attempt at a new fexecve() variant that allows the interpreter to be

specified explicitly with a file descriptor.

jonathan updated this revision to Diff 26361.Mar 17 2017, 5:04 PM

Rebase onto -CURRENT

Updating D9030: First attempt at a new fexecve() variant that allows the interpreter to be

specified explicitly with a file descriptor.

jonathan updated this revision to Diff 26365.Mar 17 2017, 6:13 PM
  • Fix MLINKS for fldexec.

Updating D9030: First attempt at a new fexecve() variant that allows the interpreter to be

specified explicitly with a file descriptor.

jonathan updated this revision to Diff 28238.May 11 2017, 1:54 PM
  • Merge 'origin/master' into arcpatch-D9030
  • Add !have_interp for consistency.
  • Rename fldexec to ffexecve.
  • Add ffexecve to execve(2) man page.

Updating D9030: First attempt at a new fexecve() variant that allows the interpreter to be

specified explicitly with a file descriptor.

jonathan retitled this revision from First attempt at a new fexecve() variant that allows the interpreter to be specified explicitly with a file descriptor. to Create new fexecve() variant with explicit interpreter.May 11 2017, 11:50 PM
jonathan edited the summary of this revision. (Show Details)
jonathan updated this revision to Diff 28258.May 11 2017, 11:51 PM
  • Disable ffexecve(2) with non-ELF image activators.
brooks accepted this revision.May 12 2017, 4:51 PM
brooks added a subscriber: brooks.

LGTM.

FWIW, I'm not a fan of the NULL terminated argv and envv arguments as they make argument validation in CheriABI harder, but I mostly think that ship has sailed. If I were to fix it I'd make the syscall a into a __ffexecve() that takes lengths and have ffexecve() be a user space wrapper that counts the lengths of the vectors and passes them down.

sys/kern/kern_exec.c
275

Given that sys_ffexecve and sys_fexecve appear to be identical except for this line, I'm tempted suggest a kern_ffexecve that you pass -1 to from sys_fexecve

This revision is now accepted and ready to land.May 12 2017, 4:51 PM
kib added a comment.May 12 2017, 5:10 PM

As is, the thing is unbelievable security hole. Unprivileged user can execute suid binary with arbitrary code injected instead of official interpreter. Even for the static suid binary, it seems. Please correct me.

Even ignoring this problem, I highly dislike the idea. If binary requires some specific interpeter, what possible process state could be get into if using wrong interpreter ?

I would suggest, instead, that for cap-mode execve(2), make kernel verify that the interpreter path in the binary PT_INTERP is equal to the ABI brandinfo interp_path and execute namei() for interp_path in non-cap mode. If pathes are not equal, return ENOCAP.

FWIW, I'm not a fan of the NULL terminated argv and envv arguments as they make argument validation in CheriABI harder, but I mostly think that ship has sailed. If I were to fix it I'd make the syscall a into a __ffexecve() that takes lengths and have ffexecve() be a user space wrapper that counts the lengths of the vectors and passes them down.

Hmmm. Might it make sense to convert all of the exec syscalls in that way together by modifying exec_copyin_args? I suppose that'd preclude an MFC to 11, but maybe that's still the best way to go?

sys/kern/kern_exec.c
275

That's a good point...

In D9030#221707, @kib wrote:

As is, the thing is unbelievable security hole. Unprivileged user can execute suid binary with arbitrary code injected instead of official interpreter. Even for the static suid binary, it seems. Please correct me.

Code review to the rescue! I knew there was something I'd forgotten. :)

We could add a check for credential_changing && imgp->interpreter != -1 at kern_exec.c:548 and disallow the use of user-specified interpreters on setuid/setgid binaries.

Even ignoring this problem, I highly dislike the idea. If binary requires some specific interpeter, what possible process state could be get into if using wrong interpreter ?

I would suggest that's the responsibility of the caller to deal with: if you use the non-default run-time linker, the consequences are yours to deal with.

I would suggest, instead, that for cap-mode execve(2), make kernel verify that the interpreter path in the binary PT_INTERP is equal to the ABI brandinfo interp_path and execute namei() for interp_path in non-cap mode. If pathes are not equal, return ENOCAP.

That would help with the problem of executing things from capability mode, but to me, this feels like a general-purpose functionality that a kernel ought to be able to provide. It would be helpful when, for example, making changes to the run-time linker itself: I could try executing binaries using the normal linker specified by PT_INTERP and then compare their behaviour to that of my alternate linker (which hasn't been installed into /libexec lest it break everything on the system). In a related use case, I'd like to be able to execute LLVM fat binaries using a linker that understands JIT, but that sort of thing would be very high-risk to add to /libexec/ld-elf.so.1. So, going beyond the previously-mentioned rtld maintenance use case, I might like to have an entirely separate linker than can optionally be used to run a binary with extra smarts. I still want the fallback of the regular linker, however, so PT_INTERP would still be set to /libexec/ld-elf.so.1 for normal execution.

Does that sound plausible?

kib added a comment.May 12 2017, 8:15 PM
In D9030#221707, @kib wrote:

As is, the thing is unbelievable security hole. Unprivileged user can execute suid binary with arbitrary code injected instead of official interpreter. Even for the static suid binary, it seems. Please correct me.

Code review to the rescue! I knew there was something I'd forgotten. :)

We could add a check for credential_changing && imgp->interpreter != -1 at kern_exec.c:548 and disallow the use of user-specified interpreters on setuid/setgid binaries.

Even ignoring this problem, I highly dislike the idea. If binary requires some specific interpeter, what possible process state could be get into if using wrong interpreter ?

I would suggest that's the responsibility of the caller to deal with: if you use the non-default run-time linker, the consequences are yours to deal with.

I would suggest, instead, that for cap-mode execve(2), make kernel verify that the interpreter path in the binary PT_INTERP is equal to the ABI brandinfo interp_path and execute namei() for interp_path in non-cap mode. If pathes are not equal, return ENOCAP.

That would help with the problem of executing things from capability mode, but to me, this feels like a general-purpose functionality that a kernel ought to be able to provide. It would be helpful when, for example, making changes to the run-time linker itself: I could try executing binaries using the normal linker specified by PT_INTERP and then compare their behaviour to that of my alternate linker (which hasn't been installed into /libexec lest it break everything on the system). In a related use case, I'd like to be able to execute LLVM fat binaries using a linker that understands JIT, but that sort of thing would be very high-risk to add to /libexec/ld-elf.so.1. So, going beyond the previously-mentioned rtld maintenance use case, I might like to have an entirely separate linker than can optionally be used to run a binary with extra smarts. I still want the fallback of the regular linker, however, so PT_INTERP would still be set to /libexec/ld-elf.so.1 for normal execution.

Does that sound plausible?

Let's split two things. I thought that your issue at hand was the conflict between the nature of capability mode disallowing implicit root and absolute lookups, badly interfering with the typical absolute path specification for ELF interpreters. Completely different issue is the development environment for rtld and possible override of the binary interpreter. For the later, I can assure you that I did not need this functionality while hacking on FreeBSD rtld for almost ten years. I usually relink the test binary with -I path linker switch, as needed. But if you consider the ability to switch interpreter without changing the binary important, I suggest to look at the glibc solution there: their ld.so is executable and the first argument is the name of the starting binary. I.e., you can do ld.so.1 <ld.so options> binary <binary args> and run specific instance of ld.so. It is very easy to add this functionality to FreeBSD, it just that nobody needed it.

That said, would making ld-elf.so.1 executable (adj) help your intended capability-related issue ? I suspect that yes, and also I think that my proposal of checking brandelf interpreter path against binary interpreter path should close the hole.

In D9030#221740, @kib wrote:

Let's split two things. I thought that your issue at hand was the conflict between the nature of capability mode disallowing implicit root and absolute lookups, badly interfering with the typical absolute path specification for ELF interpreters.

Fair enough: that was, indeed, the original motivation. However, as soon as we start building more sophisticated compartments out of sandboxed-from-inception processes, I'm going to want to explore more interesting uses like the LLVM fat binary case (in which we will want a different linker from the one specified by ELF brandinfo). I see three mutually-compatible requirements:

  • must be able to execute dynamically-linked applications from capability mode,
  • must be able to choose a non-standard run-time linker and
  • should not require root privilege to do it.

I think that an ffexecve(2) system call is a clean, fairly simple mechanism for supporting these and other requirements: it's a policy-agnostic mechanism that one might use to explore many different ideas in sandboxing. The approach that you suggested ("make kernel verify that the interpreter path in the binary PT_INTERP is equal to the ABI brandinfo interp_path and execute namei() for interp_path in non-cap mode") makes the ABI more complicated for non-expert developers to understand and also (this is the part that makes me nervous) deliberately violates Rule #1 of capability mode: no access to global namespaces.

Completely different issue is the development environment for rtld and possible override of the binary interpreter. For the later, I can assure you that I did not need this functionality while hacking on FreeBSD rtld for almost ten years. I usually relink the test binary with -I path linker switch, as needed. But if you consider the ability to switch interpreter without changing the binary important, I suggest to look at the glibc solution there: their ld.so is executable and the first argument is the name of the starting binary. I.e., you can do ld.so.1 <ld.so options> binary <binary args> and run specific instance of ld.so. It is very easy to add this functionality to FreeBSD, it just that nobody needed it.

I can see how that works outside of capability mode, but that won't help within capability mode, correct? It seems like the approaches you've described either require root privilege (to put the linker-under-test into a location described by ELF brandinfo in the kernel) or else can't work in capability mode.

That said, would making ld-elf.so.1 executable (adj) help your intended capability-related issue ?

I don't think that would help with the capability-mode part: an executable ld-elf.so.1as you've described would still need to look up binaries by pathname.

I think that my proposal of checking brandelf interpreter path against binary interpreter path should close the hole.

It would, indeed, close the setuid hole that you mentioned earlier, but it would also shut down the possibility of unprivileged capability-mode tinkering with run-time linkers (unless I'm misunderstanding something). Is there an issue with the setuid solution that I proposed earlier (only honour setuid if imgp->interpreter == -1, just as we currently only honour setuid if CRED_FLAG_CAPMODE is not found)?

jonathan updated this revision to Diff 28282.May 13 2017, 1:35 AM
  • Don't honour setuid/setgid with custom interpreter.
This revision now requires review to proceed.May 13 2017, 1:35 AM
jonathan updated this revision to Diff 28283.May 13 2017, 1:40 AM
  • Don't honour setuid/setgid with custom interpreter.

One aspect I've been struggling with in this approach is duplication of the logic to find run-time linkers -- i.e., shifting responsibility for ELF header parsing from the kernel to userspace, which seems generally undesirable. One possibility might be to pass a capability to a directory relative to which the kernel should look for the interpreter. This would fail to address the "use a run-time linker other than the one in the binary" use case, but would allow the kernel to continue to own ELF header processing (and similar for non-ELF binaries).

There's a similar question in my mind about whether there is a good way to handle script interpreters more generally, but my imagination is getting a bit of a bumpy ride there.

kib added a comment.May 13 2017, 12:23 PM
In D9030#221740, @kib wrote:

Let's split two things. I thought that your issue at hand was the conflict between the nature of capability mode disallowing implicit root and absolute lookups, badly interfering with the typical absolute path specification for ELF interpreters.

Fair enough: that was, indeed, the original motivation. However, as soon as we start building more sophisticated compartments out of sandboxed-from-inception processes, I'm going to want to explore more interesting uses like the LLVM fat binary case (in which we will want a different linker from the one specified by ELF brandinfo). I see three mutually-compatible requirements:

  • must be able to execute dynamically-linked applications from capability mode,
  • must be able to choose a non-standard run-time linker and
  • should not require root privilege to do it.

    I think that an ffexecve(2) system call is a clean, fairly simple mechanism for supporting these and other requirements: it's a policy-agnostic mechanism that one might use to explore many different ideas in sandboxing. The approach that you suggested ("make kernel verify that the interpreter path in the binary PT_INTERP is equal to the ABI brandinfo interp_path and execute namei() for interp_path in non-cap mode") makes the ABI more complicated for non-expert developers to understand and also (this is the part that makes me nervous) deliberately violates Rule #1 of capability mode: no access to global namespaces.

My point is that ffexecve(2) -like syscall is not needed.

Completely different issue is the development environment for rtld and possible override of the binary interpreter. For the later, I can assure you that I did not need this functionality while hacking on FreeBSD rtld for almost ten years. I usually relink the test binary with -I path linker switch, as needed. But if you consider the ability to switch interpreter without changing the binary important, I suggest to look at the glibc solution there: their ld.so is executable and the first argument is the name of the starting binary. I.e., you can do ld.so.1 <ld.so options> binary <binary args> and run specific instance of ld.so. It is very easy to add this functionality to FreeBSD, it just that nobody needed it.

I can see how that works outside of capability mode, but that won't help within capability mode, correct? It seems like the approaches you've described either require root privilege (to put the linker-under-test into a location described by ELF brandinfo in the kernel) or else can't work in capability mode.

Why ? fexecve(2) is enough to run ld.so which can be run standalone. What is slightly less obvious is how to pass the binary to ld.so, but: ELF format specified a way to run binaries by filedescriptor for long time. Look at the AT_EXECFD aux vector, we still have this code in rtld.c (it might somewhat rotten, but this is easily fixable). Then the standalone-runnable ld.so only needs to grow an option to specify which fd points to the binary, in addition to AT_EXECFD.

That said, would making ld-elf.so.1 executable (adj) help your intended capability-related issue ?

I don't think that would help with the capability-mode part: an executable ld-elf.so.1as you've described would still need to look up binaries by pathname.

See above.

I think that my proposal of checking brandelf interpreter path against binary interpreter path should close the hole.

It would, indeed, close the setuid hole that you mentioned earlier, but it would also shut down the possibility of unprivileged capability-mode tinkering with run-time linkers (unless I'm misunderstanding something). Is there an issue with the setuid solution that I proposed earlier (only honour setuid if imgp->interpreter == -1, just as we currently only honour setuid if CRED_FLAG_CAPMODE is not found)?

I did not thought about sufficiency of the proposed fix hard, it might be enough. I do not like the whole approach of exposing the detail of the ELF binary format into the public syscall layer. As Robert rightfully noted, coupling of binary and interpreter is mostly the duty of the toolchain, there is no need to expose that to user (in the sense, app programmer). More, I do not expect that app programmers could do anything useful with this.

In D9030#221824, @kib wrote:

What is slightly less obvious is how to pass the binary to ld.so, but: ELF format specified a way to run binaries by filedescriptor for long time. Look at the AT_EXECFD aux vector, we still have this code in rtld.c (it might somewhat rotten, but this is easily fixable). Then the standalone-runnable ld.so only needs to grow an option to specify which fd points to the binary, in addition to AT_EXECFD.

Hmmm, I see. I suppose the question is whether we want to change the ABI of the kernel by adding a new system call or change the ABI of ld.so by adding a way to pass a file descriptor to it when it runs in standalone mode. I prefer the former because it makes the new argument explicit and visible to the compiler rather than implicit within an environment variables or the like. I think that we are changing an ABI either way, and I also think that the sort of people who write application frameworks (things like Gtk and Qt are good targets for Capsicumization) are more likely to be comfortable with new system calls than with new run-time linker semantics. Still, it's a discussion worth having... does the following pseudocode express the various possible approaches fairly?

# Using a system call with linker argument:
ld = find_the_linker_for(binary)
ffexecve(ld, binary, args, env = [])

# Using an executable linker:
ld = find_the_linker_for(binary)
fexecve(ld, args, env = [ sprintf("LD_BINARY=%d", binary) ])

# Using a system call with linker dir argument:
ld_dir = find_the_linker_directory()   # usually just /libexec, but not in our experimental world
something_execve(ld_dir, binary, args, env = [])

# Using implicit paths:
fexecve(binary, args, env = [])        # requires linker in /libexec, violates capmode rules

I do not like the whole approach of exposing the detail of the ELF binary format into the public syscall layer. As Robert rightfully noted, coupling of binary and interpreter is mostly the duty of the toolchain, there is no need to expose that to user (in the sense, app programmer). More, I do not expect that app programmers could do anything useful with this.

In the pseudocode above, the first two approaches still require the programmer to get a descriptor from somewhere for the run-time linker. I agree that the find_the_linker_for function should be implemented by some part of the toolchain, but the latter two approaches don't really hide the details of ELF binary interpretation either: approach 3 requires creating a directory with linker(s) having specific names for specific ELF brands and approach 4 requires the same but also using root privilege to install the linker(s) to /libexec (and it also violates capability-mode rules). It seems to me that a "find me the right linker for this sort of file" function could provide the greatest information hiding while also allowing different implementations of the abstraction (something in the ELF toolchain for normal use, a Casper service for a Capsicum world, an LLVM-oriented search algorithm for the world of fat binaries, etc.).

kib added a comment.EditedMay 14 2017, 1:16 PM

Well, both approaches change ABI by extending it, we do not usuall call this 'changing the ABI'. But important point is that new syscall extends kernel ABI, and by its nature, this change is with us forever. If we start consider the approach wrong and better approach emerges, we still must maintain the syscall, due to the backward compatibility guarantees.

The syscall itself exposes toolchain internals to userspace API, which was never needed before. Using the direct execution mode with ld-elf.so.1 demonstrates that we can avoid touching kernel at all for this task. Not needing new code in kernel, not needing to expose new interfaces from kernel, localizing new code in something that is very specific to ELF and deals with interprets anyway (it is interpreter) feel as good properties of the knowingly intermediate solution.

I do not have a good proposal for find_the_linker_for(binary) utility right now. I suspect that actual user-visible API would need to be much more high-level, like cap_fexecve(binary_fd, args...), completely hiding the app code from any peculiarities of the binary format (ELF interpreter is very low level detail of ELF). Otherwise API is inherently non-portable, e.g. to COFF or MachO platforms. E.g. interpreter could be found, approved and passed by fd back by some helper, as is done for many features already. Then, API implementation could be designed free from the user-API considerations, and avoiding new interface in kernel is possible without much efforts (as demonstrated by direct exec mode for ld.so), which IMO alone is enough argument to considered it as nice possibility and not creating new kernel interface.

As you see, I do not like the proposed syscall, I do not see it as necessary, and I am concerned about the syscall quickly becoming unused fossil which we still have to maintain.

In D9030#222200, @kib wrote:

But important point is that new syscall extends kernel ABI, and by its nature, this change is with us forever. If we start consider the approach wrong and better approach emerges, we still must maintain the syscall, due to the backward compatibility guarantees.

Indeed, that's true, but would the same not apply to a new mechanism for passing an FD into ld-elf.so.1 (e.g., LD_BINARY_FD=xxx)? I had been operating under the assumption that ABI backwards-compatibility would include the mechanisms used to control the run-time linker like environment variables. If anything, I had thought the linker's ABI would be harder to maintain, as it doesn't benefit from ABI checking tools and the linker doesn't have explicit compat mechanisms like the kernel. However, perhaps my assumption was incorrect?

The syscall itself exposes toolchain internals to userspace API, which was never needed before. Using the direct execution mode with ld-elf.so.1 demonstrates that we can avoid touching kernel at all for this task. Not needing new code in kernel, not needing to expose new interfaces from kernel, localizing new code in something that is very specific to ELF and deals with interprets anyway (it is interpreter) feel as good properties of the knowingly intermediate solution.

True, some of those things do feel good in an intermediate solution, but to really get sandboxed code executing, a new interface and some new code has to go somewhere very low-level (either imgact code or the linker).

I do not have a good proposal for find_the_linker_for(binary) utility right now. I suspect that actual user-visible API would need to be much more high-level, like cap_fexecve(binary_fd, args...), completely hiding the app code from any peculiarities of the binary format (ELF interpreter is very low level detail of ELF).

That is certainly true: however we end up resolving this, the end goal ought to be a library that doesn't require use of this low-level interface (ffexecve(2) or LD_BINARY_FD) by high-level application programmers.

As you see, I do not like the proposed syscall, I do not see it as necessary, and I am concerned about the syscall quickly becoming unused fossil which we still have to maintain.

I definitely hear your concern. I suppose that, in the long run, I see the system call as being relatively easy to maintain because it shares so much of its code with existing exec-like syscalls and because the kernel has existing mechanisms for dealing with compatibility. Personally I find ld-elf.so.1 to be more subtle and intimidating to understand and modify, but as I mentioned above, if we don't have to worry about backwards-compatibility there, then it does indeed seem like an obvious way to go.

jonathan abandoned this revision.May 15 2017, 11:38 PM

I've just opened D10751 as a more rtld-centric way of achieving (hopefully) the same effect, now that D10701 has landed and ld-elf.so.1 is directly executable.