Page MenuHomeFreeBSD

Capsicum vs the Pathnames, a PoC
Needs ReviewPublic

Authored by trasz on Mar 15 2024, 12:48 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, May 9, 9:23 PM
Unknown Object (File)
Sat, May 4, 1:22 PM
Unknown Object (File)
Wed, May 1, 9:25 AM
Unknown Object (File)
Fri, Apr 26, 10:03 PM
Unknown Object (File)
Fri, Apr 26, 4:13 AM
Unknown Object (File)
Apr 19 2024, 7:13 PM
Unknown Object (File)
Apr 14 2024, 5:48 PM
Unknown Object (File)
Mar 18 2024, 5:42 PM

Details

Reviewers
brooks
val_packett.cool
jonathan
Group Reviewers
capsicum
Summary

This is a proof of concept implementation of some changes to how Capsicum
handles path names. It's in some ways similar to D38351 by Val Packett,
but implemented quite differently. The primary motivation is to make it possible
to execute binaries in capability mode from the start, without having to trust them.

The way this works now is that absolute path lookups are prohibited,
and relative are only allowed with an explicitely provided directory
descriptor.

The works it works with the patch is that both are allowed, but only
if the process - or its ancestor - called fchdir(2) and fchroot(2)
to set the descriptors the (nowly allowed) lookups are relative to.
Calling cap_enter(2) clears both descriptors again.

There is a (pretty terrible, and obviously temporary) hack
to chroot(8) utility to run binaries in capability mode "by hand":

$ chroot -Cdn 5 /bin/sh 5< /

Regarding the Capsicum security model, I believe the lookup change doesn't change it.
The directory descriptors for lookups still need to be provided by the process,
like before; it's just that now it can ask the kernel to use them for absolute
and relative lookups instead of having to explicitly pass them to APIs like openat(2).

Sponsored by: Innovate UK

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 56625
Build 53513: arc lint + arc unit

Event Timeline

I worry somewhat about interactions with dlopen which was previously disabled in capability mode by virtual of breaking open(2). It's true that fdlopen existed, but that's a somewhat different beast and I suspect users are more likely to be audited.

I kind of want to disallow fchroot in capability mode and have a cap_enter2 that takes a root fd and a flags argument that includes flags to disable this functionality, but that also feels like it adds complexity.

sys/kern/kern_mib.c
102

Not obviously related to the rest of this patch. Seems generally fine though.

sys/kern/syscalls.master
146

At first glance, I find myself wanting a separate flag from SYF_CAPENABLED so we can potentially deny these syscalls in syscallenter if both curdir and root are ecapmodevp. I'm not sure this is actually a good idea, but it's easier to make the annotations in syscalls.master different now.

160

Leakage from D44372?

usr.bin/procstat/procstat_files.c
148

Old binaries will still use this so might as well keep it until we're ready to completely remove API support.

Can you describe the dlopen threat model a bit more? My assumption is, a typical Capsicum-aware app wouldn't be setting the rootdir/curdir at all. Or, if it does, it could call cap_enter(2) again before calling dlopen(3), clearing those vnodes.

Also, do you think it makes sense to split off fchroot(2) and get that bit committed first?

sys/kern/kern_mib.c
102

Yeah, bmake refuses to work without it. I suppose it should be fixed in bmake and not here though; this often contains personal information (builder's hostname and login).

sys/kern/syscalls.master
146

I was thinking about something similar - a separate flag to be used by an explicit sysctl to switch back to the old semantics in case of security bug. It did make the patch quite a bit larger though.

160

Yup.

usr.bin/procstat/procstat_files.c
148

Makes sense.

Can you describe the dlopen threat model a bit more? My assumption is, a typical Capsicum-aware app wouldn't be setting the rootdir/curdir at all. Or, if it does, it could call cap_enter(2) again before calling dlopen(3), clearing those vnodes.

My worry is dlopen calls the developer is unaware of (or not thinking about) suddenly working. For example things that look like iconv or nss that didn't used to work and now could be coerced to work. I'm not sure how serious an issue this is.

Also, do you think it makes sense to split off fchroot(2) and get that bit committed first?

It probably does make sense to commit separately. This review is pretty large.

sys/kern/kern_mib.c
102

Hmm, could make is a SYSCTL_PROC and output something more reserved in capability mode?