Page MenuHomeFreeBSD

vmm: Add new device file that handles creation/destruction of VMs
Needs ReviewPublic

Authored by cyril_freebsdfoundation.org on Jul 27 2021, 7:01 PM.

Details

Reviewers
markj
Group Reviewers
bhyve
Summary

This revision adds a new device file at /dev/vmmctl which can be used to create VMs / device files using new ioctl commands VMMCTL_CREATE and VMMCTL_DESTROY. When a file descriptor to /dev/vmmctl is closed, all VMs associated with the file descriptor are automatically destroyed.

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

lib/libvmmapi/vmmapi.c
108
lib/libvmmapi/vmmapi.h
116
sys/amd64/vmm/vmm_dev.c
1093

I think *name can be const?

1349

Per style(9), there should be no extra indentation for case statements.

1351

op->name is a character buffer supplied by userland. Nothing guarantees that it is nul-terminated, so we should check that here, e.g., with strnlen(), and fail with EINVAL if not.

1366

Same here regarding nul termination.

1396

Perhaps loop over all VMs and fail if any are non-persistent?

1409

I don't think there's any need to define a new module here? Can't we hook into vmmdev_init() and vmmdev_cleanup()?

Fix some style issues, fix bug where string from userspace was not strnlen'd, move device file setup, cleanup code into vmmdev_init and vmmdev_cleanup (which causes some code to be shifted around in the file).

Before looking deeper into the code I picked out these first.

lib/libvmmapi/vmmapi.c
115

the last 0 is not necessary if you are not creating a file.

123

We allows vmname with the length of VM_MAX_NAMELEN.

137

We allows vmname with the length of VM_MAX_NAMELEN.

sys/amd64/include/vmm_dev.h
351

VM_MAX_NAMELEN + 1

sys/amd64/vmm/vmm_dev.c
1140

style-nit: there is no need to make one-line statement-true a compound statement.

1291
1301

The namespace is still shared between vmmctl-created vm and sysctl-created vm.

1372

VM_MAX_NAMELEN + 1

Fix some style issues, and update max length of VM name accordingly due to changes in commit df95cc76affb.

I don't understand the motivation for this change. What kind of VM objects (vm_object_t) are being created and for what use? Do you mean instead that you want to create Virtual Machines (in which case, please do not say VM objects, that means vm_object_t from <vm/vm_object.h>)? I think in general we'd like to move to using a cdev for this and retiring the sysctl entirely. However, a couple of thoughts:

  1. The create sysctl should take a structure that we can extend over time to add new fields (e.g. eventually a "max_vcpus" field is one use case folks already want). I think I would also make the 'persistent' flag be a flag passed to create rather than implicit (this would let us get rid of the sysctl entirely if bhyveload were able to create persistent VMs via /dev/vmmctl). The contract should be that userland has to 'bzero' (or memset(0)) the structure passed to create before setting specific fields. This would let us handle backward compat ABIs via the ioctl size field and knowing that unknown fields are set to 0.
  1. Somewhat orthogonal, but we probably want a way to restrict namespaces so that you can create jails that can't "see" the VMs in other jails. Perhaps in the outer jail you could see child VMs using some sort of "jail/VM" syntax for the purposes of bhyvectl. One way of doing that might be to have a VMM_OPEN that takes a pathname and allow bhyvectl to use that instead of /dev/vmm/<foo> devices directly.

I know when I've talked to @grehan in the past about using FD's to represent VMs there has been a desire for at least the non-bhyveload case to have something like VMM_CREATE return a new fd that the ioctl are then performed on without creating a /dev/vmm/<foo> entry at all, just an anon FD that destroys the VM on close.

In D31325#708727, @jhb wrote:

I don't understand the motivation for this change. What kind of VM objects (vm_object_t) are being created and for what use? Do you mean instead that you want to create Virtual Machines (in which case, please do not say VM objects, that means vm_object_t from <vm/vm_object.h>)? I think in general we'd like to move to using a cdev for this and retiring the sysctl entirely.

Right, VM here means "virtual machine." The ambiguous "VM object" terminology is my fault, we should just say "VMs" generally. In some cases one wants to refer to a struct vm specifically, which is where "VM object" came from.

This change adds a cdev that can be used to create VMs, instead of the sysctl interface. I tend to think that the sysctl interface needs to stick around for a while, as it's useful to be able to run bhyve(8) from older releases with a new kernel.

However, a couple of thoughts:

  1. The create sysctl should take a structure that we can extend over time to add new fields (e.g. eventually a "max_vcpus" field is one use case folks already want). I think I would also make the 'persistent' flag be a flag passed to create rather than implicit (this would let us get rid of the sysctl entirely if bhyveload were able to create persistent VMs via /dev/vmmctl). The contract should be that userland has to 'bzero' (or memset(0)) the structure passed to create before setting specific fields. This would let us handle backward compat ABIs via the ioctl size field and knowing that unknown fields are set to 0.
  1. Somewhat orthogonal, but we probably want a way to restrict namespaces so that you can create jails that can't "see" the VMs in other jails. Perhaps in the outer jail you could see child VMs using some sort of "jail/VM" syntax for the purposes of bhyvectl. One way of doing that might be to have a VMM_OPEN that takes a pathname and allow bhyvectl to use that instead of /dev/vmm/<foo> devices directly.

https://reviews.freebsd.org/D31156 is a step towards this. With that change, only a parent jail can see VMs in a child jail. The namespace is still global for now, so two sibling jails cannot create VMs with the same name.

I know when I've talked to @grehan in the past about using FD's to represent VMs there has been a desire for at least the non-bhyveload case to have something like VMM_CREATE return a new fd that the ioctl are then performed on without creating a /dev/vmm/<foo> entry at all, just an anon FD that destroys the VM on close.

This change is intended to be a step towards that. For now the new device file only handles creation and destruction, with destruction happening automatically when the fd is closed.

This change adds a cdev that can be used to create VMs, instead of the sysctl interface. I tend to think that the sysctl interface needs to stick around for a while, as it's useful to be able to run bhyve(8) from older releases with a new kernel.

While I think we might need to leave it around under COMPAT_FREEBSD<N> for some time, I'd like for the new interface to also support bhyveload if possible (e.g. by having persistent be a flag and then not destroying persistent VMs on close). That would let us stop using the sysctl entirely on HEAD for example.

cyril_freebsdfoundation.org retitled this revision from vmm: Add new device file that handles creation/destruction of VM objects to vmm: Add new device file that handles creation/destruction of VMs.
cyril_freebsdfoundation.org edited the summary of this revision. (Show Details)

Add a "persistent" flag to allow some vmmctl-created VMs to not be automatically destroyed when the file descriptor is closed. This allows /dev/vmmctl to be used with bhyveload as well. I also added some padding to the ioctl structs for future-proofing.