As I had hoped, it took away the expected problem of attaching to a jail (when the process doesn't have its own visible cpuset), and ending up with the process still having its old root cpuset (though under a new anonymous masked bit). That was the real problem I see with the current setup (not that what I didn't see aren't problems as well, but at least there was something I noticed ;-).
Mon, Nov 23
Fri, Nov 13
It looks to be working correctly on a quick run-through.
Thu, Nov 12
It looks like you can just include "security" which will get the security team's attention. But first, I suggest you need to at least put in what r216266 had (the allow.kmem privilege).
I don't think this will go anywhere. There was an attempt to do this a while ago, with a new jail parameter allow.kmem (default not allowed) to not let it happen accidentally. Even with that, it fell flat - see commits r261266 and r261326. While I'm not against it myself, I don't wear a security hat, and I defer to those that do.
Mon, Nov 9
BTW I used allow.suser in my example, but of course it could as easily be allow.suser_enabled. To me the "allow" and "enabled" sound redundant together, but allow.suser_enabled still might be a better name because it's the same as security.bsd.suser_enabled. So either choice is fine.
Sorry, I'm really hope I'm not enjoying. Correct me if I'm wrong, I'm very new to this code.
The allow.* is something that we enable in jail. For example we are enabling raw sockets.
In case of suser (at least in my code) the suser by default is enabled (set to 1), and we would like to have option to be explicite disable.
I think this is additional protection that user may want but it can really limit usability of the jails (a lot of things doesn't work like setuid, chroot, initgroups etc.).
So at least for me the allow.* flags suggest that we are giving additional permission but actually we would like to disallow something.
This is why I was looking more into the 'securelevel' then 'allow.*' flags.
Or are you suggesting that the suser by default was disabled, and we enable it per jail? But this won't be something to annoying?
On the subject of a jail being able to clear (and only clear) its own suser_enabled bit via sysctl, I think the static suser_enabled variable in kern_priv.c is redundant. You already have something in prison0 which does the same job. Removing the redundant variable would add a touch of complexity, in that the sysctl would need code to change the child jails. But I think that's cleaner than having a similar (but not quite the same) value in two different places.
If I understand correctly - the allow.* and the suser has a reverted values. You can disable suser, which by default is enabled. I wanted to made it exactly the same as sysctl on the hosts system, but I don't have strong opinion here.
I'm not sure if I understand. Do you suggest to have allow.suser which allow you to change the suser sysctl?
There should be no possibility to get back the suser priviliged inside the jail.
In the scenario I tested you can give/retrieve the suser from the host.
can you elaborate on what that rare case of wanting to regain suser ability?
I may be handy to allow jailed root to control its own security.bsd.suser_enabled.
I would prefer allow.suser instead of suser_enabled. It's the logical place for such flags, and you can take advantage of existing code that manages disallowing adding a permission the parent lacks, and passing the restriction on to child jails. Then much of this diff can be collapsed into adding to pr_flag_allow, JAIL_DEFAULT_ALLOW, and SYSCTL_JAIL_PARAM(_allow, ...).
Oct 26 2020
Yes, I imagine that's all it needs.
Or is it significant enough to just fix a syscall? There's no good reason to attach to a jail while not being inside its directory structure, and I don't know of any program that depends on such a misfeature.
Oct 14 2020
You've added an optional permission bit, but there's no option to change it. If it would make sense to allow it to all jails, there's no need for PR_ALLOW_ICMP_ACCESS. If it would make sense to restrict to some jails, there needs to be a matching jail.allow parameter, as defined in kern_jail.c's pr_flag_allow array and SYSCTL_JAIL_PARAM(_allow, ...).
Sep 4 2020
Aug 30 2020
The patch looks good to me, but I'm unable to get that LOR on an unpatched system. Has something been fixed in the meantime?
Aug 29 2020
Aug 27 2020
Aug 26 2020
May 29 2020
I'll try the jail_set approach.
Considering jail_set(2) can also attach with the JAIL_ATTACH flag, it would be handy to put these new flags in the same space, with a JAIL_ATTACH_MASK including them. Then the attaching done by jail_set can also do the right thing if it chooses.
May 14 2020
May 13 2020
Might it also be useful to include something similar on the stop side, run after taking down the IP addresses? I can't think of a use offhand, unless one wants to leave absolutely no trace of jails, and remove everything that was added in exec.prepare.
It would make sense for this to be two separate commits - one for the reordering of IP_*.
May 7 2020
There's a "right thing" to do on SIGINT, though it's not immediately obvious what that is. In the jail creation process, it would make sense to clean up a partially created jail, probably in conjunction with letting the jailed processes handle their own SIGINT. But that's not quite the same as just ignoring it, because there are other cases:
May 1 2020
If jail_attach(2) doesn't leave a process sufficiently jailed to the point that it can be used for jailbreak, that's a bug in jail_attach that should be fixed.
Apr 24 2020
Apr 23 2020
It may be a better user interface if "-j" automatically did the right thing - limit output to the specified jail for traditional jails, and attach to vnet jails. So after the jail_getid(), a jail_get() or jail_getv() to fetch the jail's vnet parameter. Then the jail_attach() can happen if the vnet is JAIL_SYS_NEW.
Apr 14 2020
Apr 6 2020
Sep 5 2019
Aside from a few trailing whitespaces, this all looks good to me.
Jun 18 2019
May 23 2019
Tested with ps and jexec, passing jail name and jid, and with numerically and non-numerically named jails. All's good :-).
Nov 27 2018
New jails are now created with the unprivileged_proc_debug bit inherited from the parent unless otherwise specified.
Huh - so I can. I didn't know of (or even suspect) such a possibility.
Sorry for a post-acceptance note, but on trying it out I noticed that jails are created by default with allow.nounprivileged_proc_debug. That's an easy fix - the bit needs to be added to PW_DEFAULT_ALLOW in kern_jail.h. I'm apparently unable to change the diff in this revision, so instead of creating a new revision I'll just mention that's what I'll be committing.
Nov 24 2018
Because of this, having the check in sys/kern/kern_priv.c is the right place. There's no real need to duplicate the logic to prison_priv_check. I can still add it, if you want, but I believe it would be a waste of cycles.
priv_check_cred() in kern_priv.c isn't the right place to make the check, but prison_priv_check() in kern_jail.c. PRIV_DEBUG_UNPRIV is already in that function's list, in the part that lets jails do things, and it needs to be moved to the bottom part of the function where you'll see a number of other cases where a certain privilege checks a certain pr_allow bit.
Then the jail must simply obey the existing state of unprivileged process debugging. We could go that route, but I wanted to make it flexible. I think setting CTLFLAG_SECURE is a good compromise between flexibility and security.
OK, if the jail needs to have that bit set before anything is run, then yes it needs to be a parameter.
Nov 23 2018
Since this bit is under the full control of the prison itself, does it belong in pr_allow? On the plus side, that lets the system create a jail with this turned on, but that can be just as easily done in the jail's sysctl.conf. It's something of a departure from the idea of this being something the jail is or isn't allowed to do. If you forgo the ability to set it as a jail parameter, then the bit can go into pr_flags and you won't have to bother noting which PR_ALLOW bits are allowed to be set .
Nov 10 2018
It's allow.mount.nofusefs. It should (currently) work once the kld is loaded, but the new strcmp will need to be added to make it work before it's loaded.
It works with allow.mount.fusefs, but not with allow.mount.nofusefs (which will still try to kldload "fusefs"). Comparing the fs name to both "fusefs" and "nofusefs" should be all that's needed.
Oct 20 2018
Oct 18 2018
Oct 17 2018
Oct 6 2018
Aug 20 2018
Aug 16 2018
Aug 15 2018
Aug 14 2018
I'm keeping the sysctls around, though without COMPAT_FREEBSD11 (or with BURN_BRIDGES), they're read-only. This preserves the expected behavior for programs that want to find out what they're allowed to do before attempting it (e.g. rc.d/hostname and rc.d/zfs). But they will no longer be used to set global permissions for jails.
OK, looks good with one last-minute nit: spaces in the jailp.h line where a tab should be
Aug 12 2018
Yes, this is a need that has gone unanswered for a while now.
Jul 30 2018
One more thing to do: jail(8) should mention the flag. There's a section about module-specific flags where I think it would fit better than the main allow.* section.
Jul 20 2018
It's good to see that jail-enabling a filesystem is indeed easier now than it used to be!
Jul 19 2018
Jul 6 2018
Jul 5 2018
I've added D16146, which makes a new allow.* bit easy:
In addition to the question of where to check the permissions, there's also the issue that the allow.vmm parameter shouldn't exist in a non-VMM system. This means the SYSCTL_JAIL_PARAM should be defined in vmm_dev.c or some other vmm-related file; that way, if VMM is loaded as a module, the parameter would be attached to that module.
Jul 4 2018
Jul 3 2018
Jun 28 2018
Jun 18 2018
May 24 2018
What does this buy us? What I can come up with is: