Page MenuHomeFreeBSD

Capsicumise bhyve
ClosedPublic

Authored by kaktus on Oct 19 2016, 11:04 PM.

Details

Summary

This is initial version of a patch to capsicumise bhyve.
Right now sandbox is initialised after ACPI table has been generated and /dev/vmm.io loaded and closed.
All other modules: /dev/vmm, disk, network, rnd, uart, console and pci passthru are limited.
It should work on both 12 and 11 (tested on 12).

I tested it with bhyveload (FreeBSD), UEFI + VNC (RHEL), tap and stdio output.
In theory pci passthru, uart on nmdm and virtio_console should work but I didn't tested it (yet).

Definitely require more testing and limiting some fcntls (maybe other capabilities too!).

Sponsored by: Mysterious Code Ltd.

Test Plan

Test on all combinations of guests / vm hardware.

Maybe remove some capabilities.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
kaktus updated this revision to Diff 22216.Nov 15 2016, 10:41 AM
  • bvmcons, stdin/out/err, uart are now properly limited, that means that all descriptors used should be now properly restricted
  • additionally tested with nmdm
  • style fixes
  • small rewrites to reduce capabilities
kaktus marked 2 inline comments as done.Nov 15 2016, 10:44 AM
kaktus added inline comments.
usr.sbin/bhyve/bhyverun.c
1026 ↗(On Diff #21669)

Yes, the errno != ENOSYS is a bit verbose and I'd like to see something to replace it but this is out of scope of this patch anyway.

797 ↗(On Diff #21516)

stdio limits require some more planning to assure we're limiting the correct fd (i.e. stdio or nmdm etc)

usr.sbin/bhyve/pci_virtio_console.c
284 ↗(On Diff #21669)

That was leftover from previous version.

kaktus marked 2 inline comments as done.Nov 15 2016, 10:45 AM
kaktus added inline comments.
usr.sbin/bhyve/bhyverun.c
797 ↗(On Diff #21516)

stdin/out/err are now limited.

kaktus updated this revision to Diff 22256.Nov 16 2016, 1:45 PM
kaktus marked an inline comment as done.
  • use helper instead of raw catopen
  • remove unneeded call to cap_rights_remove
  • sort capabilities alphabetically
robak edited edge metadata.Nov 16 2016, 8:39 PM

@emaste @allanjude Guys, what do you think about the latest diff? Would you consider it ready for commit?

allanjude edited edge metadata.Nov 16 2016, 8:48 PM

@grehan Based on our discussion about this at MeetBSD, can you outline how you would like to move forward with this, so we can get at least basic capsicum-ization committed relatively soon.

Does VNC disconnect/reconnect work ? (I'll try and test this out)

I think all the ENOSYS tests can be removed by testing early, and exiting if that's the case.

Does VNC disconnect/reconnect work ? (I'll try and test this out)

Yes, you can connect and reconnect without a problem. Capabilities are set properly on the accepting socket to allow this.

I think all the ENOSYS tests can be removed by testing early, and exiting if that's the case.

I don't know if I got this right. If there is no Capsicum support in kernel we still want to run, just without the sandbox.
Yes the constant check for ENOSYS is verbose but the only other option that comes to my mind now is checking early by limiting for example stderr and setting some boolean to indicate if there is kernel support for Capsicum, and then using it instead of errno != ENOSYS. Do you like that or maybe have some other idea how to handle this?

Ping after two weeks of silence.

grehan added a comment.Dec 3 2016, 4:30 PM

silence == Thanksgiving vacation.

usr.sbin/bhyve/bhyverun.c
720 ↗(On Diff #22256)

Can this definition be moved into libvmmapi ? Easier to keep in sync then.

robak added a comment.Dec 9 2016, 2:31 PM

What about this guys? How many more bhyve vulns we're going to wait till this could be committed? ;)

In D8290#181049, @robak wrote:

What about this guys? How many more bhyve vulns we're going to wait till this could be committed? ;)

It sounds like there might still be a few more items left to do. I see at least one unaddressed comment (moving some stuff to libvmmapi, for example).

I've added the current revision of this patch to a feature branch in HardenedBSD. I'll test it out over the weekend.

I've added the current revision of this patch to a feature branch in HardenedBSD. I'll test it out over the weekend.

I've tested this and have experienced no regressions. HardenedBSD 12-CURRENT/amd64 as the host with a HardenedBSD 12-CURRENT/amd64 guest started with vmrun.sh and a Win10/amd64 UEFI guest started with bhyve directly.

kaktus updated this revision to Diff 22798.Dec 10 2016, 2:53 PM
kaktus added a reviewer: ngie.

Fix regression when stdin/out/err fds are are overridden by shell.
Found by Kyua tests.

bin/dd/dd_test:io -> passed [0.038s]
bin/dd/dd_test:length -> passed [0.022s]
bin/dd/dd_test:seek -> passed [0.060s]

kaktus updated this revision to Diff 22800.Dec 10 2016, 2:54 PM
kaktus removed a reviewer: ngie.

Wops, wrong differential, sorry.

Push back correct patch again.

@lattera-gmail.com Thank you for the tests! I'd also want to appeal to anyone else who can help testing it in any configuration.

usr.sbin/bhyve/bhyverun.c
720 ↗(On Diff #22256)

Why? It's private variable, used only by this one function, the ioctls are defined outside libvmmapi, and I don't see any benefits to make it publicly available from library.
If you know something that I don't or just simply want it moved please state your opinion clearly and I'll follow :-)

grehan added inline comments.Dec 12 2016, 7:15 PM
usr.sbin/bhyve/bhyverun.c
720 ↗(On Diff #22256)

... or given that I maintain bhyve I have a stake in making sure that abstractions are kept.

There are no direct VM_* calls in bhyve, and instead all of these are wrapped in libvmmapi so that the implementation can be changed if need be. There have already been modifications to ioctl values where bhyve didn't have to change.

Stepping around this just for capsicum is a step backwards, hence my proposal to have an api call in libvmmapi to return the list of ioctl cmds.

kaktus updated this revision to Diff 22910.Dec 14 2016, 12:03 PM
  • Address @grehan comments and move table with VM_ ioctls to a separate function in libvmmapi
  • Style fixes here and there
  • Sync with r310050

Thanks for making the change. I'll try and give it a test with PCI passthru.

lib/libvmmapi/vmmapi.c
1449 ↗(On Diff #22910)

Might be best to pass in a length param - if NULL, the length would be returned but no copy done, similar to how sysctl works. This allows the caller to determine the size of it's buffer.

kaktus added inline comments.Dec 15 2016, 11:31 PM
lib/libvmmapi/vmmapi.c
1449 ↗(On Diff #22910)

Can I keep the static allocation of the table and just pass a pointer to it or would you prefer dynamic allocation and forcing caller to free() it?

grehan added inline comments.Dec 21 2016, 5:59 AM
lib/libvmmapi/vmmapi.c
1449 ↗(On Diff #22910)

Not a lot of prior art in libc :(, but my preference would be to return a const pointer (and size param).

kaktus updated this revision to Diff 23848.Jan 10 2017, 8:48 PM
kaktus added a reviewer: domagoj.stolfa_gmail.com.
  • Address @grehan comments
  • Sync with r311881
  • Small style fixes

Thanks for the update - I'll give this a tryout.

robak added a comment.Jan 23 2017, 2:51 PM

@grehan Any news? I'd love to commit it, we're just waiting for a green light on your side.

Rebuilding with CURRENT to give it a tryout - thanks for the nudge.

@grehan Any news on the testing?

grehan added inline comments.Jan 30 2017, 9:28 AM
usr.sbin/bhyve/block_if.c
468 ↗(On Diff #23848)

fsync needs to be added here, or an ahci-hd file-backed disk image will fail on FreeBSD guest shutdown with

(ada0:ahcich0:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 04 (ABRT )
(ada0:ahcich0:0:0:0): RES: 41 04 00 00 00 40 00 00 00 00 00
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 04 (ABRT )
(ada0:ahcich0:0:0:0): RES: 41 04 00 00 00 40 00 00 00 00 00
(ada0:ahcich0:0:0:0): Error 5, Retries exhausted
(ada0:ahcich0:0:0:0): Synchronize cache failed

#ifndef WITHOUT_CAPSICUM

  • cap_rights_init(&rights, CAP_IOCTL, CAP_READ, CAP_SEEK, CAP_WRITE);
  • if (ro)

+ cap_rights_init(&rights, CAP_IOCTL, CAP_READ, CAP_SEEK, CAP_WRITE,
+ CAP_FSYNC);
+ if (ro) {

		cap_rights_clear(&rights, CAP_WRITE);

+ cap_rights_clear(&rights, CAP_FSYNC);
+ }

kaktus updated this revision to Diff 24551.Jan 30 2017, 9:43 AM

Allow fsync(2) in block devices.

robak added a comment.Jan 30 2017, 9:57 AM

@pawel.biernacki-gmail.com thanks for quick update! @grehan can we have equally fast review? :)

Still some of the test matrix to cover. Hopefully shortly.

grehan added inline comments.Feb 1 2017, 8:47 AM
usr.sbin/bhyve/pci_e82545.c
2240 ↗(On Diff #24551)

Needs an additional capability here

  • cap_rights_init(&rights, CAP_READ, CAP_WRITE);

+ cap_rights_init(&rights, CAP_EVENT, CAP_READ, CAP_WRITE);

kaktus updated this revision to Diff 24648.Feb 1 2017, 10:59 PM

Allow pci_e82545 descriptor to be used by mevent.

grehan added inline comments.Feb 4 2017, 5:54 AM
usr.sbin/bhyve/bhyverun.c
764 ↗(On Diff #24648)

I think this one should be just CAP_MMAP_RW (no executable permission needed).

kaktus added inline comments.Feb 4 2017, 11:39 AM
usr.sbin/bhyve/bhyverun.c
764 ↗(On Diff #24648)

You're right. It was leftover from some initial version.

kaktus updated this revision to Diff 24714.Feb 4 2017, 11:42 AM

Drop executable permission in mmap.

grehan added a comment.Feb 8 2017, 8:08 AM

Into the final straight now: doing a visual inspection and picking up minor style issues.

lib/libvmmapi/vmmapi.c
1423 ↗(On Diff #24714)

minor style(9) nit - an extra blank line should be inserted here (also matches other one-line functions in the file).

1426 ↗(On Diff #24714)

As above, this can be cap_ioctl_t

1452 ↗(On Diff #24714)

Minor style nit: bcopy is used in all other routines in this file.

lib/libvmmapi/vmmapi.h
166 ↗(On Diff #24714)

u_long can be changed to cap_ioctl_t here. This file already requires <sys/types.h>, which unconditionally defines that type (the man page for cap_ioctls_limit needs to be updated).

kaktus updated this revision to Diff 24856.Feb 8 2017, 9:31 AM

Address @grehan comments:

  • use bcopy instead of memcpy to be consistent with the rest of the code,
  • use cap_ioctl_t instead of unsigned long,
  • style(9) fixes.
grehan accepted this revision.Feb 12 2017, 8:08 PM
grehan edited edge metadata.

robak - this is fine to commit once the last minor changes are done.

dbgport needs some capsicum support added but that can be done later.

lib/libvmmapi/vmmapi.h
39 ↗(On Diff #24856)

Need to bump the minor number since there 2 new api calls.

usr.sbin/bhyve/bhyverun.c
58 ↗(On Diff #24856)

This is in capsicum_helpers.h so no need to be included again.

usr.sbin/bhyve/pci_ahci.c
34 ↗(On Diff #24856)

AHCI_DEBUG is for a developer custom-build so no need for capsicum protection here.

usr.sbin/bhyve/pci_virtio_rnd.c
158 ↗(On Diff #24856)

CAP_EVENT can be removed (no mevent calls).

This revision is now accepted and ready to land.Feb 12 2017, 8:08 PM
kaktus updated this revision to Diff 25077.Feb 13 2017, 1:26 PM
kaktus edited edge metadata.

@grehan Thank you for your help and the review.

I addressed your inline comments. I also added gdbport sandboxing and tested it as documented in https://wiki.freebsd.org/bhyve/DebuggingWithGdb.

This revision now requires review to proceed.Feb 13 2017, 1:26 PM
kaktus updated this revision to Diff 25078.Feb 13 2017, 1:40 PM
kaktus edited edge metadata.

Remove now empty #ifdef.

@grehan There was a minor update to the code, do you mind to tick the approve box so I could commit it with complete adherence to the process? ;)

@grehan capsicum @emaste Guys, can this be reviewed and accepted once again so it could be safely committed?

Sorry for being late to the party ;(

lib/libvmmapi/vmmapi.c
1432 ↗(On Diff #25078)

Not getting this part why we just don't do a global from that?

usr.sbin/bhyve/bhyverun.c
988 ↗(On Diff #25078)

No need to check ENOSYS.

994 ↗(On Diff #25078)

No need for this warning.

usr.sbin/bhyve/block_if.c
472 ↗(On Diff #25078)

I'm not sure if here the reverse logic would be more secure.

Approved by: emaste

kaktus updated this revision to Diff 25149.Feb 14 2017, 1:20 PM

Remove extra ENOSYS check and notification about lack of Capsicum support in the kernel.

This revision was automatically updated to reflect the committed changes.
robak added a comment.Feb 14 2017, 1:39 PM

Guys, I've committed it as discussed with @emaste, to get the code out there. Please, continue the work on polishing it, and let me know when there'll be a patch ready and agreed to commit it further. Once that done, we'll set up a MFC date so it could get included in closest -RELEASE and we'll swing the whip at Paweł to get more code.

emaste added a subscriber: jhb.Feb 14 2017, 4:57 PM
emaste added inline comments.
lib/libvmmapi/vmmapi.c
1432 ↗(On Diff #25078)

API refinement suggested by @jhb in https://lists.freebsd.org/pipermail/svn-src-all/2017-February/139587.html

I wonder if instead of doing this in the executable and exposing
vm_get_ioctls, etc. if the API shouldn't really be something like
'vm_limit_rights(ctx)' and this code should be in that function in
libvmmapi?  It would be something like (assuming you drop
vm_get_ioctls() entirely and expose vm_ioctl_cmds[] as a static
global in the library):

int
vm_limit_rights(struct vm_ctx *ctx)
{
	...
	cap_rights_init(&rights, CAP_IOCTL, CAP_MMAP_RW);
	if (cap_rights_limit(ctx->fd, &rights) == -1) {
 		if (errno == ENOSYS)
			return (0);
		else
			return (-1);
	}
	/* Shouldn't get ENOSYS here if cap_rights_limit worked. */
	return (cap_ioctls_limit(ctx->fd, vm_ioctl_cmds, nitems(vm_ioctl_cmds));
}

You wouldn't need vm_get_device_fd() either in this API.
usr.sbin/bhyve/block_if.c
472 ↗(On Diff #25078)

That is:

cap_rights_init(&rights, CAP_IOCTL, CAP_READ, CAP_SEEK);
if (!ro)
        cap_rights_set(&rights, CAP_FSYNC, CAP_WRITE);

I agree that looks preferable.

As I wrote in a mail to @jhb - the idea is worth considering. I'd like to gather some feedback from broader audience as the code is in base now and start a new review with some ideas about polishing / improvements to the code.

emaste - any suggestions for a good forum to discuss this ? Unfortunately it straddles the capsicum@ and virtualization@ lists :(

As I wrote in a mail to @jhb - the idea is worth considering. I'd like to gather some feedback from broader audience as the code is in base now and start a new review with some ideas about polishing / improvements to the code.

Indeed. I just want to have a cross-reference from here to the discussion that took place on the mailing list (and there will be another xref from here to a new review with further polishing).

@emaste - any suggestions for a good forum to discuss this ? Unfortunately it straddles the capsicum@ and virtualization@ lists :(

We don't have a FreeBSD-hosted capsicum mailing list AFAIK, only the Cambridge-hosted one. Both lists are very low traffic, so I think you could just crosspost to cl-capsicum-discuss@lists.cam.ac.uk and freebsd-virtualization@freebsd.org.