Page MenuHomeFreeBSD

mac_curtain infrastructure: everything
Needs ReviewPublic

Authored by sigsys_gmail.com on Tue, May 3, 7:53 AM.

Details

Summary

This new diff is everything needed to support mac_curtain (https://github.com/Math2/freebsd-pledge).

(Previous diff was D34761)

Sysfils are gone. Now syscalls have only one category ("syscat") and it's independent from the Capsicum flag. The general idea behind the categorization is still the same, I just broke them down a bit more.

Some notes on the kernel changes:

Not all syscats are needed for pledge() compatibility (and some of them might not be worth the noise it adds to the code).

The mac_vnode_walk_*() functions must be called when unveil permissions might need to propagate to another vnode (e.g. after VOP_LOOKUP()/VOP_CREATE(), or when traversing a vnode's mount points, etc), or whenever bringing in a new vnode for which unveil permissions must be determined. They update the thread's "unveil tracker" which the mac_vnode_check_*() functions use. fget() and namei() fill the tracker automatically and that's enough for nearly all callers.

The change in linux_file_stat() is needed to make sandboxed DRM work. It's consistent with how vn_statfile() does it.

The changes in zfs_vnops_os.c aren't just so that the unveil permissions carry down to the extattr sub-files, it's also to stop mac_curtain from trying to find the "covering" directory (by ascending the directory hierarchy, when needed) because namei() is called while holding a vnode lock on the parent.

mac_cred_trim() needs to be called when a ucred is no longer being referenced from processes or threads (mac_curtain needs to free some stuff then).

I added a mac_socket_check_create_pair() separate from mac_socket_check_create() (like was suggested in an existing code comment). It's useful for pledge() compat. This could be a compatibility issue with existing MAC policies, but the stock policies don't use it.

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

hselasky added a subscriber: hselasky.

LinuxKPI bits are OK.

This revision is now accepted and ready to land.Tue, May 3, 9:05 AM

Overall this change seems generally well contained for something that touches most syscalls. There are a few things that seem stray that I've noted.

I'm a bit worried about adding an on-by-default (we enable MAC in GENERIC for mac_ntpd) branch and function call to many, if not most, syscalls. Should the SY_CAP bits be under MAC or under a separate define? Presumably your goal is to move toward something usable with GENERIC so that would argue for not adding a separate knob.

sys/kern/kern_descrip.c
2922

fdp is unused?

sys/kern/kern_prot.c
2091

This looks like it might just be a bug that should be fixed in a separate submission or have the locking rules changed?

sys/kern/kern_sysctl.c
66

It's not clear to me where the event handler changes are being used if anywhere.

sys/kern/uipc_shm.c
1808–1809

This weird duplicate if should probably be removed in a separate commit.

1836

It seems a bit odd to use off here when it's always zero.

sys/security/mac/mac_framework.h
426

I think this function is never called unless MAC is defined so the #ifdef seems redundant. Am I missing something?

sys/sys/syscat.h
103–106

We tend to put static asserts in .c files so they only evaluate once. I'm not completely sold on this practice.

sys/sys/sysent.h
315

What does _HLP_ stand for?

I've also noticed that syscall_helper_register is only called with SY_THR_STATIC|SY_HLP_PRESERVE_SYFLAGS which seems a bit absurd. I wonder if we should break its API accordingly and make those flags internal.

sys/tools/makesyscalls.lua
1231
1241
usr.bin/diff/diffreg.c
478

This seems unrelated

usr.bin/m4/eval.c
852

this also looks unrelated.

Overall this change seems generally well contained for something that touches most syscalls. There are a few things that seem stray that I've noted.

I'm a bit worried about adding an on-by-default (we enable MAC in GENERIC for mac_ntpd) branch and function call to many, if not most, syscalls. Should the SY_CAP bits be under MAC or under a separate define? Presumably your goal is to move toward something usable with GENERIC so that would argue for not adding a separate knob.

Hopefully the new branches are predictable. Maybe a __predict_false() around the CRED_FLAG_SYSFILT checks would be better. This ucred flag is only set for processes explicitly using mac_curtain (or their child processes) so the actual policy calls for most of those new MAC checks should never happen for other processes.

And yeah my goal would be that mac_curtain works with GENERIC, then get the rest of it in ports and eventually get firefox/chromium/webkit to use it internally (when it's available, without a hard dependency) with just package installs and enabling it in a config file.

sys/kern/kern_prot.c
2091

This is the last reference to the ucred, so IIUC it shouldn't matter when the mutex is destroyed. But this was causing witness warnings so I just moved the destroy up a little bit...

sys/kern/kern_sysctl.c
66

WELL, I know of a module that uses it... ;)

When you give permissions to use sysctls, mac_curtain holds references directly to sysctl_oid objects so it needs to know when they're going away. It also uses eventhandler for some process/thread events too (to maintain the unveil caches/trackers).

sys/kern/uipc_shm.c
1808–1809

There used to be some synchronization code between those two checks at some point.

1836

Yeah... it's weird. I'll remove it.

sys/security/mac/mac_framework.h
426

Not sure. It's not called but it's still processed by the compiler (mac_framework.h still gets included in other source files even with MAC disabled) and it feels wrong to have the inline functions refer to other functions that aren't going to be defined (the *_impl() ones), but it doesn't cause any errors somehow...

sys/sys/syscat.h
103–106

Hmm I'd have to find *somewhere* to put it.

sys/sys/sysent.h
315

Yeah seems like a good thing to extract from this patch.

HLP is for "helper".

Seems like SY_HLP_PRESERVE_SYFLAGS could be done by default. Some modules register their sysents with SYF_CAPENABLED set. This duplicates the information already in syscalls.master. Right now I can't see why overriding the existing sysent's sy_flags would ever be needed unless syscalls.master doesn't have the right flags. Which could happen with external kmods register their syscalls in some free slot I guess (like the 210-219 range that's for that). Maybe it could always OR the SYF_CAPENABLED bits between the two.

IIUC SY_THR_STATIC_KLD being conditionally defined is what lets syscall_helper_register() know how its caller was compiled (as a kld or not).

usr.bin/m4/eval.c
852

Yeah could be separated.

That's to make them work in a sandbox. curtain(1) runs things with a separate $TMPDIR by default (and no access to /tmp directly) so anything ignoring $TMPDIR is a problem. Without those changes to diff/m4 you can't run a sandboxed buildworld for example. You can give the sandbox access to /tmp (including a special limited access that works like pledge("tmppath")) but it's less safe and I try to make things work without it.

I would argue that ignoring $TMPDIR is generally wrong even without sandboxing issues. You might point your $TMPDIR to somewhere encrypted for example and programs shouldn't leak their temporary data elsewhere.

sigsys_gmail.com marked an inline comment as done.

Minor cleanups.

This revision now requires review to proceed.Wed, May 4, 3:15 AM

I realized I probably shouldn't update the diff just for small changes... it's disruptive. Especially for a large diffs. Sorry about that!

(And I missed some suggested changes to makesyscalls.lua at first but I saved them now).