This patch makes possible managing network stack of vnet enabled jail from the prison0.
This is especially usable for minimalistic jails that don't have full base installed, hence no ifconfig and route.
Details
[+] Create minimalistic jail: mkdir /j cp /rescue/ping /j ifconfig bridge0 create ifconfig epair0 create ifconfig epair0a up ifconfig bridge0 addm epair0a up ifconfig bridge0 10.10.10.1/24 jail -c name=test path=/j vnet vnet.interface=epair0b persist [+] Configure network inside jail ifconfig -j tets lo0 127.0.0.1/8 up ifconfig -j test epair0b 10.10.10.10/24 up route -j test add default 10.10.10.1 [+] Test network access from jexec test /ping 10.10.10.1 [+] Observe network stack in jail ifconfig -j test route -j test get default
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
sbin/ifconfig/ifconfig.8 | ||
---|---|---|
31 | Please bump .Dd. | |
2906 | s/may be/is/ | |
2908 | Missing an article. Also, please put new sentences on new lines. The target jail specification can be a jail name or jid. Automatic loading of modules is not supported in this mode. That functionality is only supported with vnet-enabled jails using a virtual network stack. Which "mode" is "this mode"? If it's ifconfig -j, that's not so much a mode as a usage. |
A couple of minor things. Incidentally, this is a pretty cool feature.
sbin/ifconfig/ifconfig.8 | ||
---|---|---|
2908 | "Target jail is name or jid." is still missing an article. Or maybe do it this way: .Ar jail is a jail name or jid. | |
2910 | Avoid the aside, which is an interruption in the train of thought: This functionality is supported only with vnet enabled jails using a virtual network stack. |
I think the jail_attach interface is fundamentally unsuitable for this purpose. The problem is the process appears in the jail.
Instead, there should be a way to restrict your actions to ones only affecting the target jail while not entering it and alter back away from it after you are done. This could be accomplished by providing e.g. jail-based file descriptors. You obtain jail fds for jails you want to modify and for the jail you are in right now so that you can go back.
pseudo-code wise this would be:
curjailfd = jail_current();
jaillfd = jail_lookup(name);
jail_restrict(jailfld);
/* do work here */
jail_leave(curjailfd);
I don't think I got big enough flamethrower for this one though.
There's a way to do this without entering the jail, and also without adding a whole new jail descriptor interface. You can move the interface to/from the jail in question, as is already done in the "ifconfig vnet" option. The only problem is the possibility that the jail's interface name conflicts with an existing interface in the current vnet (also an unsolved problem with "jail vnet").
I'm not sure whether this would be better than the jail_attach solution here - just another possibility.
Rather than the "jail does not exist" error message for a jid of -1, you can use jail_errmsg which is already set by jail_getid. See setifvnet() in ifconfig.c for an example. And while the second "jail does not exist" message for EINVAL is admirably complete, it seems a touch overkill for the race diction of a jail going away between the "-j" parsing and the jail_attach().
Guys, since there was a discussion over how the patch does what it does, can I get one more approval before I commit this, Just In Case™?
jail_attach() is not the correct approach, because it makes the actions of the administrator on the host visible within the jail.
I don't know if I understand. jexec uses exactly the same syscall, jail_attach to achieve the same effect. Putting ifconfig and route inside a jail allows even more actions to be taken by the in-jail admin.
Also, there is no obvious other way to achieve this with current API.
I see two other options: additional API to manage jails and vnet without attaching process to the jail, or finding a way to avoid duplicated interface names but adding some sort of unique id(?).
I think an interface like mjg described is the best approach.
Alternatively, you might be able to get away with just using the net/vnet.h macros
CURVNET_SET(vnet_pointer);
do work
CURVNET_RESTORE();
I was told 2.5 years ago that jail_attach(2) is insecure and I shouldn't use it. Apparently it's no longer the case - see for example rS360356. Can I please have it accepted now?
Apparently I have the following comment not posted from 2.5 years old discussion:
If ifconfig(8) would need to open any dynamic module, or kldload an interface driver module, the operation will fail after the jail_attach(). Perhaps there is still no dyn modules support in ifconfig (but I am surprised that so far there is no), while kldload is definitely there.
Also, what about name resolution, is everything resolved before the process entered the jail ?
That said, what is insecure about using jail_attach() there ? I do not get 'jail would see host admin action' at all.
I think it is even more obvious in the following way: suppose that the host admin takes the patch from this review and compiles its own ifconfig1(8) and route1(8) binaries from FreeBSD sources with the patch applied. How would this compromise 'security', and what is the 'security' there ?
I still do not get it. Or rather, I do not believe in the promise that after jail_attach(2) the process is completely jailed. Even if we disable ptrace(2), this does not make so determined host root any less powerfull.
And I still do not see anything wrong with userspace using existing syscall.
It's not about host root but jailed root.
Should jailed root attach with itself with ptrace, it can perform relative path lookups outside of the jail and in particular read/write files it normally would not have access to. For example consider a case where cwd is rootvnode. Then ptrace-injected open("root/.ssh/authorized_keys", O_RDWR) is one way to escalate your privileges.
This may seem like an unlikely problem to run into since jailed root should not be able to predict when ifconfig would get jail_attached (nor what pid could it possibly get), but the attack vector against the syscall is very practical -- e.g., there are several jail_attach + exec calls when spawning the jail during boot. If one of the execs spawns the exploit, there is a small pid range to continuously try to ptrace attach to (and in fact the exact pid can be computed if jails are spawned in serialized manner).
While this is not strictly a showstopper for this change (since the problem is already there), but I don't believe more consumers should be added without plugging the problem first. And preferably a better approach would be implemented in the first place, I commented on it in aforementioned review.
I cannot convince myself that what you describe is a reasonable treat for the uses of the feature. I would think that adding some explanation of the scenario to jail(2) and to the -j flag description in this patch would be enough, leaving the decision to worry or not to the user.
Adding this without plugging the problem lends itself towards introduction of more bugs, for example the new feature can be added to startup scripts and then the aforementioned attack vector is back on the table.
There is no fundamental difficulty addressing the immediate bug and it should be done first.
What do you mean by 'addresing the immediate bug' ? Do you mean disabling ptrace attachment to the jail_attach-ed process ? This is more of a policy decision then a bug fix, user can already control this with proccontrol -m trace -s disable route -j <id> .... It should be noted in the man page, I agree.
If jail_attach(2) doesn't leave a process sufficiently jailed to the point that it can be used for jailbreak, that's a bug in jail_attach that should be fixed.
We could add P2_NOTRACE on jail_attach. Or it may be a good idea to include an automatic re-root as part of jail_attach(), which fixes the immediate problem discussed. Doing so may have POLA problems, but not doing so still leaves open the possibility that a real-rooted jailed process will exec and drop the flag.
Either way, that would leave room to use jail_attach here.
Guys, thanks for the constructive comments.
I agree with @mjg that this is a potential security issue that, ideally, should be addressed by changing the jail_attach(2) behaviour. The simples patch I came with is adding the P2_NOTRACE in do_jail_attach, more or less in the form of
--- a/sys/kern/kern_jail.c +++ b/sys/kern/kern_jail.c @@ -2411,6 +2411,7 @@ do_jail_attach(struct thread *td, struct prison *pr) newcred = crget(); PROC_LOCK(p); + p->p_flag2 |= P2_NOTRACE; oldcred = crcopysafe(p, newcred); newcred->cr_prison = pr; proc_set_cred(p, newcred);
Although, this only adds the notrace flag for the currently jailed process, not the one immediately executed after, so it leaves regular jexec vulnerable but allows us to finally integrate more tools to use jail_attach.
If the change of default behaviour is not possible for POLA reasons, maybe we could add something like jail_attach2(int jid, int flags) that'd allow to set P2_NOTRACE or even P2_NOTRACE_EXEC and modify consumers to use it, and maybe even add a flag for jexec to set P2_NOTRACE_EXEC?