Page MenuHomeFreeBSD

Support bhyve within a jail
ClosedPublic

Authored by lattera-gmail.com on Jun 29 2018, 1:09 AM.

Details

Summary

This patch gives bhyve the ability to run inside a jailed environment. It introduces two new allow.* flags: allow.vmm and allow.novmm, with the default being allow.novmm.

Sponsored-by: HardenedBSD
Sponsored-by: G2, Inc

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Whoops. I just realized this version of the patch is based off of HardenedBSD's src tree. I'll update it soon based on FreeBSD's.

@lattera-gmail.com first of all thanks for the patch. I'm curious to know what guest OS have you tested inside a jail, could you please share it with me? Also let me know what devices you used, as an example: virtio-blk, virtio-net and etc..

If you can share your tests, it would be perfect.

Thanks.

@lattera-gmail.com first of all thanks for the patch. I'm curious to know what guest OS have you tested inside a jail, could you please share it with me? Also let me know what devices you used, as an example: virtio-blk, virtio-net and etc..

If you can share your tests, it would be perfect.

Thanks.

I had a pre-installed Kali Linux VM. I used vmrun.sh to run the VM in UEFI mode:

sh /usr/share/examples/bhyve/vmrun.sh -c 4 -m 16g -t tap4 -C /dev/nmdm-kali-01-A -d /dev/zvol/rpool/bhyve/kali-01/disk-01 -E -P 5901 kali-01

My jail.conf(5) had an entry for the bhyve-01 jail:

bhyve-01 {
    $epair = "epair6";
    devfs_ruleset = 25;
    persist;
    vnet;
    allow.vmm;
}

My devfs.rules(5) file:

[devfs_rules_bhyve_jail=25]
add include $devfsrules_jail
add path vmm unhide
add path vmm/* unhide
add path tap* unhide
add path zvol/rpool/bhyve/* unhide
add path nmdm* unhide

Might want to add @jamie as a reviewer.

sys/amd64/vmm/vmm_dev.c
98 ↗(On Diff #44614)

Instead of adding a new function, you should call priv_check_cred(9), which would involve adding a new privilege to sys/priv.h, and a corresponding check in prison_priv_check(). You might also need to add something to prison_check_cred() itself, but probably not.

sys/amd64/vmm/vmm_dev.c
98 ↗(On Diff #44614)

I thought about doing that. I passed on it due to limited scope. But since you brought it up, that probably means it's a good idea. :)

I'll go ahead and do that and will update the patch once it's completed. I hope to have a new revision submitted by the end of next week. My interns at work are going to keep me _really_ busy the next few days.

In addition to the question of where to check the permissions, there's also the issue that the allow.vmm parameter shouldn't exist in a non-VMM system. This means the SYSCTL_JAIL_PARAM should be defined in vmm_dev.c or some other vmm-related file; that way, if VMM is loaded as a module, the parameter would be attached to that module.

For an example of module-specific parameters, see compat/linux/linux_mib.c (and where the setup is called in linux_common.c). This is for the parameters linux.*, so it's not quite the same. But there's no actual requirement that your parameter be top-level, and could still be allow.vmm.

The problem is there's more code required for this than should be necessary for a simple allow bit. You don't need everything that linux_mib has, but you would need something to set up the parameter on module load. While I don't have support for dynamic allow.* parameters, I do have prison_add_vfs() in kern_jail.c, which is mostly for adding allow.mount.fsname parameters. I should change that a bit to allow for a more generic method of adding an allow.* flag. That would mean you have a dependency on me.

sys/amd64/vmm/vmm_dev.c
98 ↗(On Diff #44614)

Having established myself as someone to listen to, I will now completely reverse myself ;-). I neglected to consider the non-universal nature of this code. So it is indeed best to keep the checks entirely in this code.

In addition to the question of where to check the permissions, there's also the issue that the allow.vmm parameter shouldn't exist in a non-VMM system. This means the SYSCTL_JAIL_PARAM should be defined in vmm_dev.c or some other vmm-related file; that way, if VMM is loaded as a module, the parameter would be attached to that module.

For an example of module-specific parameters, see compat/linux/linux_mib.c (and where the setup is called in linux_common.c). This is for the parameters linux.*, so it's not quite the same. But there's no actual requirement that your parameter be top-level, and could still be allow.vmm.

The problem is there's more code required for this than should be necessary for a simple allow bit. You don't need everything that linux_mib has, but you would need something to set up the parameter on module load. While I don't have support for dynamic allow.* parameters, I do have prison_add_vfs() in kern_jail.c, which is mostly for adding allow.mount.fsname parameters. I should change that a bit to allow for a more generic method of adding an allow.* flag. That would mean you have a dependency on me.

I agree in principle. However, the way the allow.* parameters work is different than adding independent parameters. In this case, I'd rather keep using the allow.* namespace, rather than adding a whole new jail param namespace (ie, vmm.allow). Re-using the allow.* namespace allows us to piggy-back on the pr_allow field in the prison structure. Adding a new namespace adds unnecessary complexity.

If we were to add a generic and dynamic allow.* capability, that would be outside the scope of this patch. But, doing so would drastically reduce the complexity my implementation of your suggestion above.

Before I come to a final decision, let's discuss this more. I'm open to ideas. I'm just unsure the time involved in adding such complexity just to dynamically hide/show allow.vmm is worth it.

lattera-gmail.com added inline comments.
sys/amd64/vmm/vmm_dev.c
98 ↗(On Diff #44614)

Haha. Sounds good. I'll mark these comments as "done", then.

I've added D16146, which makes a new allow.* bit easy:

flag = prison_add_allow(NULL, "foo", NULL, "Jailed user may do the foo thing");

I've added D16146, which makes a new allow.* bit easy:

flag = prison_add_allow(NULL, "foo", NULL, "Jailed user may do the foo thing");

Sounds good! I'll update this patch accordingly once D16146 lands.

Update the patch to take into account the new dynamic allow.* API.

sys/amd64/vmm/vmm_dev.c
110 ↗(On Diff #46006)

This check isn't necessary, if the one below is just "!(prison->pr_allow & pr_allow_flag)".

One more thing to do: jail(8) should mention the flag. There's a section about module-specific flags where I think it would fit better than the main allow.* section.

One more thing to do: jail(8) should mention the flag. There's a section about module-specific flags where I think it would fit better than the main allow.* section.

Good call. I'll add that to the jail(8) manpage.

sys/amd64/vmm/vmm_dev.c
110 ↗(On Diff #46006)

It is to fail closed, like the comment above says. prison_add_allow can fail (and will if enough PR_ALLOW_* flags are added). This check future-proofs the code. Otherwise, the check below

#include <stdio.h>

int
main(int argc, char *argv[])
{
        unsigned int val = 0xffffffff;
        unsigned int c1 = 0;

        printf("%d\n", val & c1);
        printf("%d\n", !(val & c1));

        return (0);
}

Results in:

0
1

Your proposed solution would fail open.

sys/amd64/vmm/vmm_dev.c
110 ↗(On Diff #46006)

Too tired... I just realized the above statement was incorrect. Your proposed change works, too. I'll make it.

Address the superfluous conditional and add an entry into the jail(8) manpage.

usr.sbin/jail/jail.8
687 ↗(On Diff #46008)

Move that to just after allow.mount.zfs to keep the list alphabetical, and then everything looks good.

lattera-gmail.com added inline comments.
usr.sbin/jail/jail.8
687 ↗(On Diff #46008)

Whoops! Good catch. I'll fix that up soon and submit an updated patch by the end of today.

Put the allow.vmm documentation in the right place in the jail(8) manpage.

This revision is now accepted and ready to land.Jul 30 2018, 5:35 PM
This revision was automatically updated to reflect the committed changes.