Page MenuHomeFreeBSD

Dynamically add jail-enabled filesystems
ClosedPublic

Authored by jamie on Mar 13 2018, 3:58 PM.

Details

Summary

To make a filesystem work in a jail, you currently have to touch a number of kernel code places: you need to add the allow.jail.foofs jail parameter which is in jail.h and two places in kern_jail.c, and then you need to add the VFCF_JAIL flag to your filesystem, and elsewhere test the prison's PR_ALLOW_MOUNT_FOOFS flag.

This patch puts all the work in the jail and vfs framework, leaving the only per-filesystem work adding the single VFCF_JAIL flag to mark the filesystem as wanting to be mountable with a dynamically added jail parameter.

Of course it's still the programmer's responsibility to know that the filesystem in question is actually a good one to use with jails.

I've given this a run-through, but it touches some parts of the codebase I'm not as familiar with, and I'd like to be sure I'm doing it right. In particular, it has security implications since I'm replacing some permissions tests with others, often in different places.

Test Plan

Everything should work as before; in particular...
The combination of allow.mount and enforce_statfs<2 prevents all in-jail mounts.
No filesystems without the VFCF_JAIL flag (e.g. ufs) are mountable in a jail.
Jail-aware filesystems (like procfs) are only mountable when their parameter (allow.mount.procfs) is set.
The same holds true for unmounting and updating mounts.

Unlike before, the allow.mount.foofs parameters don't all exist in advance. kldloading a jail-aware filesystem automatically adds the allow.mount.* parameter, and kldloading a non-jail filesystem does not.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

allanjude added inline comments.
sys/jail.h
230 ↗(On Diff #40231)

Is it safe to change the value of RESERVED_PORTS from 8000 to 80 here? Would it be better to leave the unused bits for now?

(This is a question for my own edification, not a request to change the patch)

sys/jail.h
230 ↗(On Diff #40231)

I suppose theoretically a module could be using that flag, which means I should bump __FreeBSD_version. The reason for the change was to keep pw_allow_names clean (not requiring the e.g. [7] = "..." that pw_flag_names has), but adding that for PR_ALLOW_RESERVED_PORTS would be less bother than a version bump.

I'm not quite a fan of the manual sysctl tree walking. I also don't think you need to worry about pre-creating sysctls if they are written to. I think it is perfectly reasonable to only create the sysctl when the VFS module is loaded (and that's more typical). I think it is cleaner instead of allocating pr_allow bits on the fly, to instead use a flag in the 'struct vfsconf' to be the jail permission. You could just add a new 'VFCF_JAIL_ALLOW' which is a dynamic flag that the sysctl knobs turn on/off. The sysctl node would be a SYSCTL_PROC handler and it can take a pointer to the 'struct vfsconf' as its arg2 value. This avoids concerns about running out of bits, etc. For this you would want to change prison_check_vfs() to take a pointer to 'struct vfsconf' instead of the name.

kern/kern_jail.c
188 ↗(On Diff #40231)

Any reason to not let the compiler autocompute the size still? 'NBBY * NBPW' doesn't seem to make sense for setting the number of pointers in the array either.

In D14681#308495, @jhb wrote:

I'm not quite a fan of the manual sysctl tree walking. I also don't think you need to worry about pre-creating sysctls if they are written to. I think it is perfectly reasonable to only create the sysctl when the VFS module is loaded (and that's more typical). I think it is cleaner instead of allocating pr_allow bits on the fly, to instead use a flag in the 'struct vfsconf' to be the jail permission. You could just add a new 'VFCF_JAIL_ALLOW' which is a dynamic flag that the sysctl knobs turn on/off. The sysctl node would be a SYSCTL_PROC handler and it can take a pointer to the 'struct vfsconf' as its arg2 value. This avoids concerns about running out of bits, etc. For this you would want to change prison_check_vfs() to take a pointer to 'struct vfsconf' instead of the name.

I could replace the sysctl tree walking with a search through the pr_allow_names array (which I could protect under proson0.pr_mtx since I take it already). I got the sysctl-walking idea from existing VFS code that happened to be sitting near what I was working on.

The pre-creating of the parameter sysctls is a bit of a mess (I think you're referring to the start of sysctl_jail_default_allow). In fact, those jail_default_allow sysctls are a mess anyway: they're an ill-fitting leftover from the pre-jail-parameter days and have caused a good bit of confusion since the changeover. I have another patch in the works to finally obsolete them, but that's a story for another time.

I don't think the VFCF_JAIL_ALLOW thing would work. The problem is that would only allow a filesystem that one permission bit, not the current per-jail permission. I need (numjail * numfs) permission buts somewhere.

kern/kern_jail.c
188 ↗(On Diff #40231)

The array needs to reflect the full 32 permission bits which could be dynamically set. If I don't go the dynamic pr_allow route this wouldn't be necessary, but see my comment on that.

In D14681#308495, @jhb wrote:

... You could just add a new 'VFCF_JAIL_ALLOW' which is a dynamic flag that the sysctl knobs turn on/off. The sysctl node would be a SYSCTL_PROC handler and it can take a pointer to the 'struct vfsconf' as its arg2 value.

OK, I think I didn't quite understood this the first time around. So it's a matter of replacing the likes of security.jail.mount_devfs_allowed with a dynamic node? The trouble is that would require (I think) a new parent node to put them under since security.jail.* has a lot of other stuff, and they only exist at all for back-compat reasons.

You can hang dynamic nodes off of a static node. All dynamic nodes eventually have a static node as a parent. That said, I had missed that the "allow" flag is per-jail. I think if we want to allow legacy sysctls that enable/disable entire filesystems across all jails, VFCF_JAIL_ALLOW is still the right approach for those particular "all-jails" settings. You can still check both that flag and the per-jail flag in prison_check_vfs() in that case.

For per-jail settings I would still be tempted to not try to reserve space in the names, but instead perhaps have a separate "allow" mask just for VFS, and parse mount parameters explicitly. Eventually I think for HEAD I would suggest expanding 'struct vfsconf' to add a new field that holds the mask. You can then avoid having to lookup the string name in prison_check_vfs() and just check 'prison->pr_allow_vfs & vfc->vfc_prison_mask' or the like. MFC'ing a new field is a KBI change though, so you'd have to start out doing the name-based lookup instead and just do the vfc_prison_mask in HEAD as a followup. I would also perhaps avoid using the sysctl tree to manage the mappings of VFS name to index and just have a simple 'char *prison_vfs_perms[32];' A VFS would claim a bit in the 'allow_vfs' mask by setting an pointer in that array. You would need your own lock for that array to handle lookups, but I think it would be cheaper than (ab)using the sysctl tree. (You could perhaps abuse prison0.pr_mtx for this lock.)

One more thought: if you use the value from 'vfc_name' as the pointer you set in the array, you don't have to do actual string comparisons but can just do pointer compares to find the matching index (and thus bit) in the array in prison_check_vfs(). The jail parameter logic would still have to do string compares though.

In D14681#308541, @jhb wrote:

For per-jail settings I would still be tempted to not try to reserve space in the names, but instead perhaps have a separate "allow" mask just for VFS, and parse mount parameters explicitly.

Yes, I could do that - there are a few unused spots in struct prison to make it easy. Then I could leave pr_allow_names pretty much alone (except removing the old static allow.mount.*).

Eventually I think for HEAD I would suggest expanding 'struct vfsconf' to add a new field that holds the mask. You can then avoid having to lookup the string name in prison_check_vfs() and just check 'prison->pr_allow_vfs & vfc->vfc_prison_mask' or the like. MFC'ing a new field is a KBI change though, so you'd have to start out doing the name-based lookup instead and just do the vfc_prison_mask in HEAD as a followup.

Actually, I had considered that but thought of the sysctl search as a way of avoiding a KBI change. I guess one man's clever workaround is another man's bad hack ;-). Since this is starting in HEAD, would I then just start with the vfs_prison_mask field, and then put the string search in an MFC? I hadn't really considered MFC at this point.

I think if we want to allow legacy sysctls that enable/disable entire filesystems across all jails...

Yeah - if. It looks like it's time to submit my patch for taking those out, and perhaps resolve that issue first.

In D14681#308570, @jhb wrote:

One more thought: if you use the value from 'vfc_name' as the pointer you set in the array, you don't have to do actual string comparisons but can just do pointer compares to find the matching index (and thus bit) in the array in prison_check_vfs(). The jail parameter logic would still have to do string compares though.

I could give that a try, as it avoids the KBI change without doing a sysctl walk. I keep the jail parameters around on modunload, but since I'm just looking at pointer values it shouldn't matter if they're invalid. Well, theoretically you could load foofs, unload it, then load barfs, and the strings "foofs" and "barfs" could happen to start at the exact same address - but it seems somewhat unlikely.

I've updated the diff to:

  • put the flags in a new pr_allow_mount instead of in pr_allow.
  • leave the existing PR_ALLOW_XXX values alone.
  • add pr_allow_mount_vfc[], which makes prison_check_vfs() not need string compare or locking, while still keeping struct vfsconf the same.
  • pass the vfsconf to prison_add_vfs_param and prison_check_vfs, instead of just the fstype.
  • make the security.jail.mount_xxxfs_allowed sysctls dynamic (like I did with security.jail.param.allow.mout.foofs).

The only one I'm unsure about is the first. There's really nothing that requires that the allow.mount.* flag bits live in a different place than the rest of the allow.mount.* bits. And it requires duplicating a lot of code to operate on pr_allow, and then operate identically in pr_allow_mount (including the function sysctl_jail_default_allow_mount, a near copy of sysctl_jail_default_allow).

kib added inline comments.
kern/kern_jail.c
3855 ↗(On Diff #40274)

Why not use asprintf(9) instead of manually reimplement it and calculating the string lengths ?

3895 ↗(On Diff #40274)

This verbosity should be avoided if NO_SYSCTL_DESCR kernel option is configured.

3902 ↗(On Diff #40274)

This one too.

3920 ↗(On Diff #40274)

I would just store the bit number or bit mask for the pr_allow_mount in a member of struct vfsconf, allocated on the fs module load (you already do this). Then this function becomes trivial.

kern/kern_jail.c
3855 ↗(On Diff #40274)

For whatever reason, asprintf uses M_NOWAIT, while strdup uses M_WAITOK. Not that running out of memory for such a small allocation is at all likely, but I would want to handle failure (probably just return, same as no available bit).

asprintf may also be the better choice than having these hand-sized allowstr and global_allowstr arrays.

3895 ↗(On Diff #40274)

Indeed - I didn't know about that option. I'll put something in.

3920 ↗(On Diff #40274)

Yeah, jhb preferred it that way too. I could do that (and bump __FreeBSD_version), and the current version becomes the kind of thing that would go into an MFC.

The latest changes:

  • Put the bits back into pr_allow. Adding pr_allow_mount only served to duplicate code.
  • Make the KBI change: put a prison flag in struct vfsconf.
  • Replace prison_check_vfs with a call to prison_allow (another advantage of using pr_allow).
  • Use asprintf in prison_add_vfs, instead of sprintf/strdup.
  • Do the right thing ifdef NO_SYSCTL_DESCR.
kern/kern_jail.c
208 ↗(On Diff #40400)

This is weird. I must admit.

3774 ↗(On Diff #40400)

If NO_SYSCTL_DESCR is defined, the descr variable is not. Then you should get the compilation error.

3813 ↗(On Diff #40400)

/* should be on the separate line. Also, there should be a blank line before multi-line comment.

kern/kern_jail.c
208 ↗(On Diff #40400)

Yes, there's this gap now before PR_ALLOW_RESERVED_PORTS that requires this. Since I had to bump __FreeBSD_version anyway, I might as well close that gap (change the to defines past the mount points).

I'm not really happy with the way these arrays are presented anyway. I may fix them up in a separate commit.

3774 ↗(On Diff #40400)

The SYSCTL_ADD_PROC macro disregards descr when NO_SYSCTL_DESCR is defined, so it actually compiles without a problem.

3813 ↗(On Diff #40400)

Oops - I let my own coding style slip in when I wasn't paying attention. I'll fix that.

I you might want to bump VFS_VERSION instead of __FreeBSD_version.

This revision is now accepted and ready to land.Mar 18 2018, 7:08 PM
In D14681#309678, @kib wrote:

I you might want to bump VFS_VERSION instead of __FreeBSD_version.

Perhaps - I'll look into that.

Te latest diff, to account for the changes I committed to neaten the somewhat messy pr_allow_names/pr_allow_nonames array pairs. And I also went with bumping VFS_VERSION instead of __FreeBSD_version.

This revision now requires review to proceed.Mar 22 2018, 4:07 AM

I have another revision in the works, D14791, which removes those deprecated global permission parameters. Since this patch works with those parameters, I would naturally adjust whichever one goes in last (provided I get away with the parameter removal).

This revision was not accepted when it landed; it landed in state Needs Review.May 4 2018, 8:54 PM
This revision was automatically updated to reflect the committed changes.