Page MenuHomeFreeBSD

Capsicumise bhyve
ClosedPublic

Authored by kaktus on Oct 19 2016, 11:04 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jan 8, 1:06 AM
Unknown Object (File)
Tue, Jan 7, 11:34 AM
Unknown Object (File)
Mon, Jan 6, 9:46 AM
Unknown Object (File)
Mon, Jan 6, 6:29 AM
Unknown Object (File)
Sat, Jan 4, 5:39 AM
Unknown Object (File)
Sat, Jan 4, 5:38 AM
Unknown Object (File)
Fri, Jan 3, 1:25 PM
Unknown Object (File)
Fri, Jan 3, 12:50 PM
Subscribers

Details

Summary

This is initial version of a patch to capsicumise bhyve.
Right now sandbox is initialised after ACPI table has been generated and /dev/vmm.io loaded and closed.
All other modules: /dev/vmm, disk, network, rnd, uart, console and pci passthru are limited.
It should work on both 12 and 11 (tested on 12).

I tested it with bhyveload (FreeBSD), UEFI + VNC (RHEL), tap and stdio output.
In theory pci passthru, uart on nmdm and virtio_console should work but I didn't tested it (yet).

Definitely require more testing and limiting some fcntls (maybe other capabilities too!).

Sponsored by: Mysterious Code Ltd.

Test Plan

Test on all combinations of guests / vm hardware.

Maybe remove some capabilities.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
  • bvmcons, stdin/out/err, uart are now properly limited, that means that all descriptors used should be now properly restricted
  • additionally tested with nmdm
  • style fixes
  • small rewrites to reduce capabilities
kaktus added inline comments.
usr.sbin/bhyve/bhyverun.c
797

stdio limits require some more planning to assure we're limiting the correct fd (i.e. stdio or nmdm etc)

1012

Yes, the errno != ENOSYS is a bit verbose and I'd like to see something to replace it but this is out of scope of this patch anyway.

usr.sbin/bhyve/pci_virtio_console.c
287

That was leftover from previous version.

kaktus added inline comments.
usr.sbin/bhyve/bhyverun.c
797

stdin/out/err are now limited.

kaktus marked an inline comment as done.
  • use helper instead of raw catopen
  • remove unneeded call to cap_rights_remove
  • sort capabilities alphabetically

@emaste @allanjude Guys, what do you think about the latest diff? Would you consider it ready for commit?

@grehan Based on our discussion about this at MeetBSD, can you outline how you would like to move forward with this, so we can get at least basic capsicum-ization committed relatively soon.

Does VNC disconnect/reconnect work ? (I'll try and test this out)

I think all the ENOSYS tests can be removed by testing early, and exiting if that's the case.

Does VNC disconnect/reconnect work ? (I'll try and test this out)

Yes, you can connect and reconnect without a problem. Capabilities are set properly on the accepting socket to allow this.

I think all the ENOSYS tests can be removed by testing early, and exiting if that's the case.

I don't know if I got this right. If there is no Capsicum support in kernel we still want to run, just without the sandbox.
Yes the constant check for ENOSYS is verbose but the only other option that comes to my mind now is checking early by limiting for example stderr and setting some boolean to indicate if there is kernel support for Capsicum, and then using it instead of errno != ENOSYS. Do you like that or maybe have some other idea how to handle this?

Ping after two weeks of silence.

silence == Thanksgiving vacation.

usr.sbin/bhyve/bhyverun.c
720

Can this definition be moved into libvmmapi ? Easier to keep in sync then.

What about this guys? How many more bhyve vulns we're going to wait till this could be committed? ;)

In D8290#181049, @robak wrote:

What about this guys? How many more bhyve vulns we're going to wait till this could be committed? ;)

It sounds like there might still be a few more items left to do. I see at least one unaddressed comment (moving some stuff to libvmmapi, for example).

I've added the current revision of this patch to a feature branch in HardenedBSD. I'll test it out over the weekend.

I've added the current revision of this patch to a feature branch in HardenedBSD. I'll test it out over the weekend.

I've tested this and have experienced no regressions. HardenedBSD 12-CURRENT/amd64 as the host with a HardenedBSD 12-CURRENT/amd64 guest started with vmrun.sh and a Win10/amd64 UEFI guest started with bhyve directly.

kaktus added a reviewer: ngie.

Fix regression when stdin/out/err fds are are overridden by shell.
Found by Kyua tests.

bin/dd/dd_test:io -> passed [0.038s]
bin/dd/dd_test:length -> passed [0.022s]
bin/dd/dd_test:seek -> passed [0.060s]

kaktus removed a reviewer: ngie.

Wops, wrong differential, sorry.

Push back correct patch again.

@lattera-gmail.com Thank you for the tests! I'd also want to appeal to anyone else who can help testing it in any configuration.

usr.sbin/bhyve/bhyverun.c
720

Why? It's private variable, used only by this one function, the ioctls are defined outside libvmmapi, and I don't see any benefits to make it publicly available from library.
If you know something that I don't or just simply want it moved please state your opinion clearly and I'll follow :-)

usr.sbin/bhyve/bhyverun.c
720

... or given that I maintain bhyve I have a stake in making sure that abstractions are kept.

There are no direct VM_* calls in bhyve, and instead all of these are wrapped in libvmmapi so that the implementation can be changed if need be. There have already been modifications to ioctl values where bhyve didn't have to change.

Stepping around this just for capsicum is a step backwards, hence my proposal to have an api call in libvmmapi to return the list of ioctl cmds.

  • Address @grehan comments and move table with VM_ ioctls to a separate function in libvmmapi
  • Style fixes here and there
  • Sync with r310050

Thanks for making the change. I'll try and give it a test with PCI passthru.

lib/libvmmapi/vmmapi.c
1449

Might be best to pass in a length param - if NULL, the length would be returned but no copy done, similar to how sysctl works. This allows the caller to determine the size of it's buffer.

lib/libvmmapi/vmmapi.c
1449

Can I keep the static allocation of the table and just pass a pointer to it or would you prefer dynamic allocation and forcing caller to free() it?

lib/libvmmapi/vmmapi.c
1449

Not a lot of prior art in libc :(, but my preference would be to return a const pointer (and size param).

kaktus added a reviewer: domagoj.stolfa_gmail.com.
  • Address @grehan comments
  • Sync with r311881
  • Small style fixes

Thanks for the update - I'll give this a tryout.

@grehan Any news? I'd love to commit it, we're just waiting for a green light on your side.

Rebuilding with CURRENT to give it a tryout - thanks for the nudge.

usr.sbin/bhyve/block_if.c
468

fsync needs to be added here, or an ahci-hd file-backed disk image will fail on FreeBSD guest shutdown with

(ada0:ahcich0:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 04 (ABRT )
(ada0:ahcich0:0:0:0): RES: 41 04 00 00 00 40 00 00 00 00 00
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 04 (ABRT )
(ada0:ahcich0:0:0:0): RES: 41 04 00 00 00 40 00 00 00 00 00
(ada0:ahcich0:0:0:0): Error 5, Retries exhausted
(ada0:ahcich0:0:0:0): Synchronize cache failed

#ifndef WITHOUT_CAPSICUM

  • cap_rights_init(&rights, CAP_IOCTL, CAP_READ, CAP_SEEK, CAP_WRITE);
  • if (ro)

+ cap_rights_init(&rights, CAP_IOCTL, CAP_READ, CAP_SEEK, CAP_WRITE,
+ CAP_FSYNC);
+ if (ro) {

		cap_rights_clear(&rights, CAP_WRITE);

+ cap_rights_clear(&rights, CAP_FSYNC);
+ }

Allow fsync(2) in block devices.

@pawel.biernacki-gmail.com thanks for quick update! @grehan can we have equally fast review? :)

Still some of the test matrix to cover. Hopefully shortly.

usr.sbin/bhyve/pci_e82545.c
2240

Needs an additional capability here

  • cap_rights_init(&rights, CAP_READ, CAP_WRITE);

+ cap_rights_init(&rights, CAP_EVENT, CAP_READ, CAP_WRITE);

Allow pci_e82545 descriptor to be used by mevent.

usr.sbin/bhyve/bhyverun.c
764

I think this one should be just CAP_MMAP_RW (no executable permission needed).

usr.sbin/bhyve/bhyverun.c
764

You're right. It was leftover from some initial version.

Drop executable permission in mmap.

Into the final straight now: doing a visual inspection and picking up minor style issues.

lib/libvmmapi/vmmapi.c
1423

minor style(9) nit - an extra blank line should be inserted here (also matches other one-line functions in the file).

1426

As above, this can be cap_ioctl_t

1452

Minor style nit: bcopy is used in all other routines in this file.

lib/libvmmapi/vmmapi.h
166

u_long can be changed to cap_ioctl_t here. This file already requires <sys/types.h>, which unconditionally defines that type (the man page for cap_ioctls_limit needs to be updated).

Address @grehan comments:

  • use bcopy instead of memcpy to be consistent with the rest of the code,
  • use cap_ioctl_t instead of unsigned long,
  • style(9) fixes.
grehan edited edge metadata.

robak - this is fine to commit once the last minor changes are done.

dbgport needs some capsicum support added but that can be done later.

lib/libvmmapi/vmmapi.h
39

Need to bump the minor number since there 2 new api calls.

usr.sbin/bhyve/bhyverun.c
58

This is in capsicum_helpers.h so no need to be included again.

usr.sbin/bhyve/pci_ahci.c
34

AHCI_DEBUG is for a developer custom-build so no need for capsicum protection here.

usr.sbin/bhyve/pci_virtio_rnd.c
158

CAP_EVENT can be removed (no mevent calls).

This revision is now accepted and ready to land.Feb 12 2017, 8:08 PM
kaktus edited edge metadata.

@grehan Thank you for your help and the review.

I addressed your inline comments. I also added gdbport sandboxing and tested it as documented in https://wiki.freebsd.org/bhyve/DebuggingWithGdb.

This revision now requires review to proceed.Feb 13 2017, 1:26 PM
kaktus edited edge metadata.

Remove now empty #ifdef.

@grehan There was a minor update to the code, do you mind to tick the approve box so I could commit it with complete adherence to the process? ;)

@grehan capsicum @emaste Guys, can this be reviewed and accepted once again so it could be safely committed?

Sorry for being late to the party ;(

lib/libvmmapi/vmmapi.c
1432

Not getting this part why we just don't do a global from that?

usr.sbin/bhyve/bhyverun.c
989

No need to check ENOSYS.

995

No need for this warning.

usr.sbin/bhyve/block_if.c
472

I'm not sure if here the reverse logic would be more secure.

Remove extra ENOSYS check and notification about lack of Capsicum support in the kernel.

This revision was automatically updated to reflect the committed changes.

Guys, I've committed it as discussed with @emaste, to get the code out there. Please, continue the work on polishing it, and let me know when there'll be a patch ready and agreed to commit it further. Once that done, we'll set up a MFC date so it could get included in closest -RELEASE and we'll swing the whip at Paweł to get more code.

emaste added inline comments.
lib/libvmmapi/vmmapi.c
1432

API refinement suggested by @jhb in https://lists.freebsd.org/pipermail/svn-src-all/2017-February/139587.html

I wonder if instead of doing this in the executable and exposing
vm_get_ioctls, etc. if the API shouldn't really be something like
'vm_limit_rights(ctx)' and this code should be in that function in
libvmmapi?  It would be something like (assuming you drop
vm_get_ioctls() entirely and expose vm_ioctl_cmds[] as a static
global in the library):

int
vm_limit_rights(struct vm_ctx *ctx)
{
	...
	cap_rights_init(&rights, CAP_IOCTL, CAP_MMAP_RW);
	if (cap_rights_limit(ctx->fd, &rights) == -1) {
 		if (errno == ENOSYS)
			return (0);
		else
			return (-1);
	}
	/* Shouldn't get ENOSYS here if cap_rights_limit worked. */
	return (cap_ioctls_limit(ctx->fd, vm_ioctl_cmds, nitems(vm_ioctl_cmds));
}

You wouldn't need vm_get_device_fd() either in this API.
usr.sbin/bhyve/block_if.c
472

That is:

cap_rights_init(&rights, CAP_IOCTL, CAP_READ, CAP_SEEK);
if (!ro)
        cap_rights_set(&rights, CAP_FSYNC, CAP_WRITE);

I agree that looks preferable.

As I wrote in a mail to @jhb - the idea is worth considering. I'd like to gather some feedback from broader audience as the code is in base now and start a new review with some ideas about polishing / improvements to the code.

emaste - any suggestions for a good forum to discuss this ? Unfortunately it straddles the capsicum@ and virtualization@ lists :(

As I wrote in a mail to @jhb - the idea is worth considering. I'd like to gather some feedback from broader audience as the code is in base now and start a new review with some ideas about polishing / improvements to the code.

Indeed. I just want to have a cross-reference from here to the discussion that took place on the mailing list (and there will be another xref from here to a new review with further polishing).

@emaste - any suggestions for a good forum to discuss this ? Unfortunately it straddles the capsicum@ and virtualization@ lists :(

We don't have a FreeBSD-hosted capsicum mailing list AFAIK, only the Cambridge-hosted one. Both lists are very low traffic, so I think you could just crosspost to cl-capsicum-discuss@lists.cam.ac.uk and freebsd-virtualization@freebsd.org.