Page MenuHomeFreeBSD

Jail descriptors
Needs ReviewPublic

Authored by jamie on Feb 1 2024, 12:23 AM.
Tags
None
Referenced Files
Unknown Object (File)
Nov 23 2024, 1:52 PM
Unknown Object (File)
Nov 22 2024, 4:33 AM
Unknown Object (File)
Nov 14 2024, 2:35 PM
Unknown Object (File)
Nov 5 2024, 12:35 PM
Unknown Object (File)
Oct 31 2024, 3:51 PM
Unknown Object (File)
Oct 16 2024, 4:19 PM
Unknown Object (File)
Oct 2 2024, 9:01 PM
Unknown Object (File)
Sep 26 2024, 1:36 AM

Details

Reviewers
brooks
bz
dch
Group Reviewers
Jails
security
Summary

Similar in spirit to process descriptors, jail descriptors are a reference to a jail using the file interface.

Jail descriptors can be created along with the underlying prison with jail_set(2), or fetched later with jail_get(2). They can then be used instead of the jid when a jail is used, via the new system calls jd_attach(2) and jd_remove(2), and also via the "desc" parameter in jail_get(2) and jail_set(2).

Note that jail_set and jail_get can both create a new descriptor or reference an existing one. Passing a negative file number in the "desc" parameter indicates a request to create and return the descriptor in that parameter, and a zero/positive file number is expected to refer to an existing descriptor (any thing else is an error).

Jail descriptors eliminate the race problem where a jail with a given ID may be destroyed and re-created, and not be the same jail that was referred to before. A descriptor always refers to a single jail over (and beyond) its lifetime, and it is an error to try use an existing descriptor to create a jail. So when a the jail on a descriptor claims to be dead, you can be sure it stays that way.

Permission for jail operations using descriptors is checked against the credentials of the process that first created the descriptor. This allows passing of jail descriptors to non-privileged users who could then for instance attach to a jail without being root.

Future plans, not yet done:

  • A kevent interface for jail operations. At least jail death, possibly also attach, set, and child creation, as may be found useful.
  • A mode where jail descriptor lifetime controls the prison itself, i.e. the prison is removed on the last close. This is how all process descriptors already work.
  • Finer controls (i.e. further constraint) on what a descriptor owner can do, via file properties (user/group and permissions) and capsicum flags.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

jamie requested review of this revision.Feb 1 2024, 12:23 AM

If you have reason to upload a new diff (e.g. addressing some feedback) please upload with full context - e.g. git diff -U99999 or git show -U99999 <hash>. Full details at https://wiki.freebsd.org/Phabricator

bz requested changes to this revision.Feb 1 2024, 1:56 PM
bz added a subscriber: bz.

Also, can we please have a man page?

sys/sys/jail.h
119

Can we spell these out as jail_desc_attach() or jaildesc_attach() or something similar but at least preserve the full jail* prefix?
I am just picking this place but obviously that pull through the changes.

This revision now requires changes to proceed.Feb 1 2024, 1:56 PM

I read the description, but I'm still not sure how a jail descriptor would be used. How about some pseudo-code, to illustrate the concept please?

In D43696#996669, @dvl wrote:

I read the description, but I'm still not sure how a jail descriptor would be used. How about some pseudo-code, to illustrate the concept please?

From just looking at the diff in it's current form I assume the following:

  • jail_set() with the JAIL_CREATE flag can be used to create a jail like before, but you can pass the new "desc" argument with a negative number (e.g. -1) as argument. This would cause jail_set() to attempt to create a new jail descriptor bound to the the jail and put into the calling process. The file descriptor number is returned by overwriting the previously negative int argument.
  • jail_get() can be used the same way to get a jail descriptor for an already existing jail.
  • jail_set() can than be used with the "desc" argument and a non-negative file descriptor number instead of the jail id in which case the permission checks are performed against the credentials used to create the jail file descriptor instead of the calling processes current permissions.
  • Or it could be required to use the new syscalls for this.
  • It's planned, but not yet implemented, to register a jail descriptor with kqueue using a new filter to select which types of events to notify (a variable was modified, a process attached, the jail was destroyed, etc.).
  • Not mentioned but also missing to allow safe delegation to unprivileged users is a way to restrict the capability represented by a jail descriptor e.g. attach only, destroy only, watch only.
lib/libc/sys/Symbol.map
420

Syscalls for 15.0 go in FBSD_1.8

sys/kern/kern_jail.c
2825

Is the i a typo?

sys/kern/kern_jaildesc.c
121

Blank line no longer required by style(9)

sys/sys/user.h
455

Why a spare? New entires seem not to include them and old ones put a huge block at the top.

sys/kern/kern_jail.c
2825

Yes - copy/paste from the jail_attach_args comment.

sys/sys/user.h
455

Yes, it does seem out of place. It preserves the 64-bit alignment, but that's not necessary.

sys/sys/jail.h
119

Sure. I went with "jd" from similarity with the "pd" calls, but "jaildesc" works.

sys/sys/jail.h
119

Maybe jail_attach_jd() or jail_attach_desc()? I somewhat prefer jd to desc.

markj added inline comments.
sys/sys/jail.h
119

Or jaild_ or jailfd_? :)

We have a generic __specialfd system call that can be used to return descriptors of different types. I just bring it up to show that there's some precedent for an "fd" suffix.

The __specialfd() system call is good to know about, but on its own doesn't provide a way to atomically create a configured jail with its lifetime tied to the file descriptor unlike passing a negative integer as "desc" argument to jail_set. There would either be a window where the jail would exist without a file descriptor associated with it which would reintroduce races and resource leak or the file descriptor has to start out unattached to a jail and be passed to the jail_set(JAIL_CREATE) call. The later design choice would get rid of the awkwardness of (ab-)using negative file descriptor numbers to request a file descriptor via an in-out parameter.

sys/sys/jail.h
119

Maybe jail_attach_jd() or jail_attach_desc()? I somewhat prefer jd to desc.

Yes, that jail_xxx_jd sounds good. What about the parameter - would a "jd" parameter be better than the current "desc"? My only concern is its similarity to "jid".

In D43696#996617, @bz wrote:

Also, can we please have a man page?

Of course I'll need to do that sometime ;-). The parameters to get/set the descriptor will have their place in jail(8) where all the other parameters live, though they're kind of strange there because it's very much not part of the "(8)" side of things. Also jail_attach_jd and jail_remove_jd (or whatever we settle on) could be added to jail(2) which has the other jail-related calls.

For how the whole thing works, I suppose a jaildesc(4) works, again as a counterpart to procdesc(4).

Here's the latest diff to address concerns so far (except those that request proper documentation).

jamie marked 5 inline comments as done.Feb 5 2024, 4:28 AM

The syscall bits look good (module feedback on a flag argument which you can take or leave). I've not reviewed the descriptor bits in great detail so another review by someone more familiar with that code would be good.

sys/kern/kern_jail.c
2702–2705

Should there be a KASSERT that only one flag is set?

sys/kern/syscalls.master
3338

Is there any possibility one might want flags in the future? An unused, must-be-zero flags argument is cheap.

sys/kern/kern_jaildesc.c
121

Perhaps assert that jd->jd_prison == NULL before the assignment?

142

The opening brace should be on the previous line.

sys/kern/syscalls.master
3341

How do jail descriptors fit into rights(4)? Should we define fine-grained rights on a jail descriptor, akin to CAP_PDGETPID, CAP_PDKILL, etc.?

sys/sys/jaildesc.h
50

It looks like this list is unused. Do you have some future plans for it?

sys/kern/kern_jail.c
2702–2705

Perhaps - this pattern is used a lot in kern_jail.c, and the flags are only local to this file. There are likely a number of places where I could assert such.

sys/kern/kern_jaildesc.c
142

Indeed. I often miss this because it's my own coding style for non-FreeBSD stuff.

sys/kern/syscalls.master
3338

I think not, because such a flag would mostly likely apply to the non-descriptor versions of both of these calls. Though I have to admit if there were a flag word passed to the old calls, I could have made these calls into a flag on them.

3341

Yes, that's what I'm planning as part of this. Capabilities for set, attach, remove.

sys/sys/jaildesc.h
50

Yes, it is for kevent. When a reportable event happens to a jail, it will traverse this list to report it.

dch requested changes to this revision.Mar 2 2024, 1:31 PM
dch added inline comments.
lib/libc/sys/Symbol.map
420

does this now move to lib/libsys/Symbol.sys.map after libc -> libsys changes?

recent CURRENT doesn't have Symbol.map anymore.

This revision now requires changes to proceed.Mar 2 2024, 1:31 PM

@jamie this not longer builds against CURRENT, if possible can you massage it a little, and then commit it so we don't get any more rot? I fixed symbol.map changes in https://git.sr.ht/~dch/src/commit/9051eedd52fc but that's not sufficient.

Diff updated for libsys and other recent changes.

lib/libc/sys/Symbol.map
420

That's correct. Most likely, just rebasing the patch will do the right thing today.

lib/libsys/Symbol.sys.map
382–383 ↗(On Diff #138628)
sys/kern/kern_jail.c
1071

This function seems to mostly favor comparing error to 0.

sys/sys/jail.h
425–426

These seem not to exist?

sys/kern/kern_jail.c
1071

Indeed. I can change this to match.

sys/sys/jail.h
425–426

Indeed - I found there was no need for those, since there's little overlap before do_jail_attach of prison_deref is called. I'll remove these lines.