Page MenuHomeFreeBSD

Support bhyve within a jail
ClosedPublic

Authored by lattera-gmail.com on Jun 29 2018, 1:09 AM.

Details

Summary

This patch gives bhyve the ability to run inside a jailed environment. It introduces two new allow.* flags: allow.vmm and allow.novmm, with the default being allow.novmm.

Sponsored-by: HardenedBSD
Sponsored-by: G2, Inc

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Whoops. I just realized this version of the patch is based off of HardenedBSD's src tree. I'll update it soon based on FreeBSD's.

Rebase on FreeBSD's source code.

@lattera-gmail.com first of all thanks for the patch. I'm curious to know what guest OS have you tested inside a jail, could you please share it with me? Also let me know what devices you used, as an example: virtio-blk, virtio-net and etc..

If you can share your tests, it would be perfect.

Thanks.

lattera-gmail.com added a comment.EditedJun 29 2018, 12:31 PM

@lattera-gmail.com first of all thanks for the patch. I'm curious to know what guest OS have you tested inside a jail, could you please share it with me? Also let me know what devices you used, as an example: virtio-blk, virtio-net and etc..
If you can share your tests, it would be perfect.
Thanks.

I had a pre-installed Kali Linux VM. I used vmrun.sh to run the VM in UEFI mode:

sh /usr/share/examples/bhyve/vmrun.sh -c 4 -m 16g -t tap4 -C /dev/nmdm-kali-01-A -d /dev/zvol/rpool/bhyve/kali-01/disk-01 -E -P 5901 kali-01

My jail.conf(5) had an entry for the bhyve-01 jail:

bhyve-01 {
    $epair = "epair6";
    devfs_ruleset = 25;
    persist;
    vnet;
    allow.vmm;
}

My devfs.rules(5) file:

[devfs_rules_bhyve_jail=25]
add include $devfsrules_jail
add path vmm unhide
add path vmm/* unhide
add path tap* unhide
add path zvol/rpool/bhyve/* unhide
add path nmdm* unhide
emaste added a subscriber: emaste.Jun 29 2018, 6:53 PM
swills added a subscriber: swills.Jul 4 2018, 4:15 PM
swills added a subscriber: jamie.Jul 4 2018, 5:03 PM

Might want to add @jamie as a reviewer.

jamie added inline comments.Jul 4 2018, 7:17 PM
sys/amd64/vmm/vmm_dev.c
98 ↗(On Diff #44614)

Instead of adding a new function, you should call priv_check_cred(9), which would involve adding a new privilege to sys/priv.h, and a corresponding check in prison_priv_check(). You might also need to add something to prison_check_cred() itself, but probably not.

sys/amd64/vmm/vmm_dev.c
98 ↗(On Diff #44614)

I thought about doing that. I passed on it due to limited scope. But since you brought it up, that probably means it's a good idea. :)

I'll go ahead and do that and will update the patch once it's completed. I hope to have a new revision submitted by the end of next week. My interns at work are going to keep me _really_ busy the next few days.

jamie added a comment.Jul 5 2018, 3:02 PM

In addition to the question of where to check the permissions, there's also the issue that the allow.vmm parameter shouldn't exist in a non-VMM system. This means the SYSCTL_JAIL_PARAM should be defined in vmm_dev.c or some other vmm-related file; that way, if VMM is loaded as a module, the parameter would be attached to that module.

For an example of module-specific parameters, see compat/linux/linux_mib.c (and where the setup is called in linux_common.c). This is for the parameters linux.*, so it's not quite the same. But there's no actual requirement that your parameter be top-level, and could still be allow.vmm.

The problem is there's more code required for this than should be necessary for a simple allow bit. You don't need everything that linux_mib has, but you would need something to set up the parameter on module load. While I don't have support for dynamic allow.* parameters, I do have prison_add_vfs() in kern_jail.c, which is mostly for adding allow.mount.fsname parameters. I should change that a bit to allow for a more generic method of adding an allow.* flag. That would mean you have a dependency on me.

sys/amd64/vmm/vmm_dev.c
98 ↗(On Diff #44614)

Having established myself as someone to listen to, I will now completely reverse myself ;-). I neglected to consider the non-universal nature of this code. So it is indeed best to keep the checks entirely in this code.

In addition to the question of where to check the permissions, there's also the issue that the allow.vmm parameter shouldn't exist in a non-VMM system. This means the SYSCTL_JAIL_PARAM should be defined in vmm_dev.c or some other vmm-related file; that way, if VMM is loaded as a module, the parameter would be attached to that module.
For an example of module-specific parameters, see compat/linux/linux_mib.c (and where the setup is called in linux_common.c). This is for the parameters linux.*, so it's not quite the same. But there's no actual requirement that your parameter be top-level, and could still be allow.vmm.
The problem is there's more code required for this than should be necessary for a simple allow bit. You don't need everything that linux_mib has, but you would need something to set up the parameter on module load. While I don't have support for dynamic allow.* parameters, I do have prison_add_vfs() in kern_jail.c, which is mostly for adding allow.mount.fsname parameters. I should change that a bit to allow for a more generic method of adding an allow.* flag. That would mean you have a dependency on me.

I agree in principle. However, the way the allow.* parameters work is different than adding independent parameters. In this case, I'd rather keep using the allow.* namespace, rather than adding a whole new jail param namespace (ie, vmm.allow). Re-using the allow.* namespace allows us to piggy-back on the pr_allow field in the prison structure. Adding a new namespace adds unnecessary complexity.

If we were to add a generic and dynamic allow.* capability, that would be outside the scope of this patch. But, doing so would drastically reduce the complexity my implementation of your suggestion above.

Before I come to a final decision, let's discuss this more. I'm open to ideas. I'm just unsure the time involved in adding such complexity just to dynamically hide/show allow.vmm is worth it.

lattera-gmail.com marked 3 inline comments as done.Jul 5 2018, 3:14 PM
lattera-gmail.com added inline comments.
sys/amd64/vmm/vmm_dev.c
98 ↗(On Diff #44614)

Haha. Sounds good. I'll mark these comments as "done", then.

lattera-gmail.com marked 2 inline comments as done.Jul 5 2018, 3:14 PM
jamie added a comment.Jul 5 2018, 6:35 PM

I've added D16146, which makes a new allow.* bit easy:

flag = prison_add_allow(NULL, "foo", NULL, "Jailed user may do the foo thing");

I've added D16146, which makes a new allow.* bit easy:
flag = prison_add_allow(NULL, "foo", NULL, "Jailed user may do the foo thing");

Sounds good! I'll update this patch accordingly once D16146 lands.

Update the patch to take into account the new dynamic allow.* API.

jamie added inline comments.Jul 30 2018, 2:19 AM
sys/amd64/vmm/vmm_dev.c
110 ↗(On Diff #46006)

This check isn't necessary, if the one below is just "!(prison->pr_allow & pr_allow_flag)".

jamie added a comment.Jul 30 2018, 2:20 AM

One more thing to do: jail(8) should mention the flag. There's a section about module-specific flags where I think it would fit better than the main allow.* section.

One more thing to do: jail(8) should mention the flag. There's a section about module-specific flags where I think it would fit better than the main allow.* section.

Good call. I'll add that to the jail(8) manpage.

sys/amd64/vmm/vmm_dev.c
110 ↗(On Diff #46006)

It is to fail closed, like the comment above says. prison_add_allow can fail (and will if enough PR_ALLOW_* flags are added). This check future-proofs the code. Otherwise, the check below

#include <stdio.h>

int
main(int argc, char *argv[])
{
        unsigned int val = 0xffffffff;
        unsigned int c1 = 0;

        printf("%d\n", val & c1);
        printf("%d\n", !(val & c1));

        return (0);
}

Results in:

0
1

Your proposed solution would fail open.

sys/amd64/vmm/vmm_dev.c
110 ↗(On Diff #46006)

Too tired... I just realized the above statement was incorrect. Your proposed change works, too. I'll make it.

Address the superfluous conditional and add an entry into the jail(8) manpage.

jamie added inline comments.Jul 30 2018, 4:46 PM
usr.sbin/jail/jail.8
687 ↗(On Diff #46008)

Move that to just after allow.mount.zfs to keep the list alphabetical, and then everything looks good.

lattera-gmail.com marked 3 inline comments as done.Jul 30 2018, 4:52 PM
lattera-gmail.com added inline comments.
usr.sbin/jail/jail.8
687 ↗(On Diff #46008)

Whoops! Good catch. I'll fix that up soon and submit an updated patch by the end of today.

Put the allow.vmm documentation in the right place in the jail(8) manpage.

lattera-gmail.com marked 2 inline comments as done.Jul 30 2018, 5:34 PM
jamie accepted this revision.Jul 30 2018, 5:35 PM
This revision is now accepted and ready to land.Jul 30 2018, 5:35 PM
araujo accepted this revision.Jul 31 2018, 12:16 AM

Lgtm, nice work!

This revision was automatically updated to reflect the committed changes.