Index: head/usr.sbin/bhyve/bhyve.8 =================================================================== --- head/usr.sbin/bhyve/bhyve.8 (revision 302210) +++ head/usr.sbin/bhyve/bhyve.8 (revision 302211) @@ -1,373 +1,376 @@ .\" Copyright (c) 2013 Peter Grehan .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" -.Dd April 18, 2016 +.Dd June 24, 2016 .Dt BHYVE 8 .Os .Sh NAME .Nm bhyve .Nd "run a guest operating system inside a virtual machine" .Sh SYNOPSIS .Nm .Op Fl abehuwxACHPSWY .Op Fl c Ar numcpus .Op Fl g Ar gdbport .Op Fl l Ar lpcdev Ns Op , Ns Ar conf -.Op Fl m Ar size Ns Op Ar K|k|M|m|G|g|T|t +.Op Fl m Ar memsize Ns Op Ar K|k|M|m|G|g|T|t .Op Fl p Ar vcpu:hostcpu .Op Fl s Ar slot,emulation Ns Op , Ns Ar conf .Op Fl U Ar uuid .Ar vmname .Sh DESCRIPTION .Nm is a hypervisor that runs guest operating systems inside a virtual machine. .Pp Parameters such as the number of virtual CPUs, amount of guest memory, and I/O connectivity can be specified with command-line parameters. .Pp The guest operating system must be loaded with .Xr bhyveload 8 or a similar boot loader before running .Nm . .Pp .Nm runs until the guest operating system reboots or an unhandled hypervisor exit is detected. .Sh OPTIONS .Bl -tag -width 10n .It Fl a The guest's local APIC is configured in xAPIC mode. The xAPIC mode is the default setting so this option is redundant. It will be deprecated in a future version. .It Fl A Generate ACPI tables. Required for .Fx Ns /amd64 guests. .It Fl b Enable a low-level console device supported by .Fx kernels compiled with .Cd "device bvmconsole" . This option will be deprecated in a future version. .It Fl c Ar numcpus Number of guest virtual CPUs. The default is 1 and the maximum is 16. .It Fl C Include guest memory in core file. .It Fl e Force .Nm to exit when a guest issues an access to an I/O port that is not emulated. This is intended for debug purposes. .It Fl g Ar gdbport For .Fx kernels compiled with .Cd "device bvmdebug" , allow a remote kernel kgdb to be relayed to the guest kernel gdb stub via a local IPv4 address and this port. This option will be deprecated in a future version. .It Fl h Print help message and exit. .It Fl H Yield the virtual CPU thread when a HLT instruction is detected. If this option is not specified, virtual CPUs will use 100% of a host CPU. .It Fl l Ar lpcdev Ns Op , Ns Ar conf Allow devices behind the LPC PCI-ISA bridge to be configured. The only supported devices are the TTY-class devices .Ar com1 and .Ar com2 and the boot ROM device .Ar bootrom . -.It Fl m Ar size Ns Op Ar K|k|M|m|G|g|T|t +.It Fl m Ar memsize Ns Op Ar K|k|M|m|G|g|T|t Guest physical memory size in bytes. This must be the same size that was given to .Xr bhyveload 8 . .Pp The size argument may be suffixed with one of K, M, G or T (either upper or lower case) to indicate a multiple of kilobytes, megabytes, gigabytes, or terabytes. If no suffix is given, the value is assumed to be in megabytes. +.Pp +.Ar memsize +defaults to 256M. .It Fl p Ar vcpu:hostcpu Pin guest's virtual CPU .Em vcpu to .Em hostcpu . .It Fl P Force the guest virtual CPU to exit when a PAUSE instruction is detected. .It Fl s Ar slot,emulation Ns Op , Ns Ar conf Configure a virtual PCI slot and function. .Pp .Nm provides PCI bus emulation and virtual devices that can be attached to slots on the bus. There are 32 available slots, with the option of providing up to 8 functions per slot. .Bl -tag -width 10n .It Ar slot .Ar pcislot[:function] .Ar bus:pcislot:function .Pp The .Ar pcislot value is 0 to 31. The optional .Ar function value is 0 to 7. The optional .Ar bus value is 0 to 255. If not specified, the .Ar function value defaults to 0. If not specified, the .Ar bus value defaults to 0. .It Ar emulation .Bl -tag -width 10n .It Li hostbridge | Li amd_hostbridge .Pp Provide a simple host bridge. This is usually configured at slot 0, and is required by most guest operating systems. The .Li amd_hostbridge emulation is identical but uses a PCI vendor ID of .Li AMD . .It Li passthru PCI pass-through device. .It Li virtio-net Virtio network interface. .It Li virtio-blk Virtio block storage interface. .It Li virtio-rnd Virtio RNG interface. .It Li ahci-cd AHCI controller attached to an ATAPI CD/DVD. .It Li ahci-hd AHCI controller attached to a SATA hard-drive. .It Li uart PCI 16550 serial device. .It Li lpc LPC PCI-ISA bridge with COM1 and COM2 16550 serial ports and a boot ROM. The LPC bridge emulation can only be configured on bus 0. .El .It Op Ar conf This optional parameter describes the backend for device emulations. If .Ar conf is not specified, the device emulation has no backend and can be considered unconnected. .Pp Network devices: .Bl -tag -width 10n .It Ar tapN Ns Op , Ns Ar mac=xx:xx:xx:xx:xx:xx .It Ar vmnetN Ns Op , Ns Ar mac=xx:xx:xx:xx:xx:xx .Pp If .Ar mac is not specified, the MAC address is derived from a fixed OUI and the remaining bytes from an MD5 hash of the slot and function numbers and the device name. .Pp The MAC address is an ASCII string in .Xr ethers 5 format. .El .Pp Block storage devices: .Bl -tag -width 10n .It Pa /filename Ns Oo , Ns Ar block-device-options Oc .It Pa /dev/xxx Ns Oo , Ns Ar block-device-options Oc .El .Pp The .Ar block-device-options are: .Bl -tag -width 8n .It Li nocache Open the file with .Dv O_DIRECT . .It Li direct Open the file using .Dv O_SYNC . .It Li ro Force the file to be opened read-only. .It Li sectorsize= Ns Ar logical Ns Oo / Ns Ar physical Oc Specify the logical and physical sector sizes of the emulated disk. The physical sector size is optional and is equal to the logical sector size if not explicitly specified. .El .Pp TTY devices: .Bl -tag -width 10n .It Li stdio Connect the serial port to the standard input and output of the .Nm process. .It Pa /dev/xxx Use the host TTY device for serial port I/O. .El .Pp Boot ROM device: .Bl -tag -width 10n .It Pa romfile Map .Ar romfile in the guest address space reserved for boot firmware. .El .Pp Pass-through devices: .Bl -tag -width 10n .It Ns Ar slot Ns / Ns Ar bus Ns / Ns Ar function Connect to a PCI device on the host at the selector described by .Ar slot , .Ar bus , and .Ar function numbers. .El .Pp Guest memory must be wired using the .Fl S option when a pass-through device is configured. .Pp The host device must have been reserved at boot-time using the .Va pptdev loader variable as described in .Xr vmm 4 . .El .It Fl S Wire guest memory. .It Fl u RTC keeps UTC time. .It Fl U Ar uuid Set the universally unique identifier .Pq UUID in the guest's System Management BIOS System Information structure. By default a UUID is generated from the host's hostname and .Ar vmname . .It Fl w Ignore accesses to unimplemented Model Specific Registers (MSRs). This is intended for debug purposes. .It Fl W Force virtio PCI device emulations to use MSI interrupts instead of MSI-X interrupts. .It Fl x The guest's local APIC is configured in x2APIC mode. .It Fl Y Disable MPtable generation. .It Ar vmname Alphanumeric name of the guest. This should be the same as that created by .Xr bhyveload 8 . .El .Sh SIGNAL HANDLING .Nm deals with the following signals: .Pp .Bl -tag -width indent -compact .It SIGTERM Trigger ACPI poweroff for a VM .El .Sh EXIT STATUS Exit status indicates how the VM was terminated: .Pp .Bl -tag -width indent -compact .It 0 rebooted .It 1 powered off .It 2 halted .It 3 triple fault .El .Sh EXAMPLES The guest operating system must have been loaded with .Xr bhyveload 8 or a similar boot loader before .Xr bhyve 4 can be run. .Pp To run a virtual machine with 1GB of memory, two virtual CPUs, a virtio block device backed by the .Pa /my/image filesystem image, and a serial port for the console: .Bd -literal -offset indent bhyve -c 2 -s 0,hostbridge -s 1,lpc -s 2,virtio-blk,/my/image \\ -l com1,stdio -A -H -P -m 1G vm1 .Ed .Pp Run a 24GB single-CPU virtual machine with three network ports, one of which has a MAC address specified: .Bd -literal -offset indent bhyve -s 0,hostbridge -s 1,lpc -s 2:0,virtio-net,tap0 \\ -s 2:1,virtio-net,tap1 \\ -s 2:2,virtio-net,tap2,mac=00:be:fa:76:45:00 \\ -s 3,virtio-blk,/my/image -l com1,stdio \\ -A -H -P -m 24G bigvm .Ed .Pp Run an 8GB quad-CPU virtual machine with 8 AHCI SATA disks, an AHCI ATAPI CD-ROM, a single virtio network port, an AMD hostbridge, and the console port connected to an .Xr nmdm 4 null-modem device. .Bd -literal -offset indent bhyve -c 4 \\ -s 0,amd_hostbridge -s 1,lpc \\ -s 1:0,ahci-hd,/images/disk.1 \\ -s 1:1,ahci-hd,/images/disk.2 \\ -s 1:2,ahci-hd,/images/disk.3 \\ -s 1:3,ahci-hd,/images/disk.4 \\ -s 1:4,ahci-hd,/images/disk.5 \\ -s 1:5,ahci-hd,/images/disk.6 \\ -s 1:6,ahci-hd,/images/disk.7 \\ -s 1:7,ahci-hd,/images/disk.8 \\ -s 2,ahci-cd,/images/install.iso \\ -s 3,virtio-net,tap0 \\ -l com1,/dev/nmdm0A \\ -A -H -P -m 8G .Ed .Sh SEE ALSO .Xr bhyve 4 , .Xr nmdm 4 , .Xr vmm 4 , .Xr ethers 5 , .Xr bhyvectl 8 , .Xr bhyveload 8 .Sh HISTORY .Nm first appeared in .Fx 10.0 . .Sh AUTHORS .An Neel Natu Aq Mt neel@freebsd.org .An Peter Grehan Aq Mt grehan@freebsd.org Index: head/usr.sbin/bhyve/bhyverun.c =================================================================== --- head/usr.sbin/bhyve/bhyverun.c (revision 302210) +++ head/usr.sbin/bhyve/bhyverun.c (revision 302211) @@ -1,971 +1,971 @@ /*- * Copyright (c) 2011 NetApp, Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "bhyverun.h" #include "acpi.h" #include "inout.h" #include "dbgport.h" #include "fwctl.h" #include "ioapic.h" #include "mem.h" #include "mevent.h" #include "mptbl.h" #include "pci_emul.h" #include "pci_irq.h" #include "pci_lpc.h" #include "smbiostbl.h" #include "xmsr.h" #include "spinup_ap.h" #include "rtc.h" #define GUEST_NIO_PORT 0x488 /* guest upcalls via i/o port */ #define MB (1024UL * 1024) #define GB (1024UL * MB) typedef int (*vmexit_handler_t)(struct vmctx *, struct vm_exit *, int *vcpu); extern int vmexit_task_switch(struct vmctx *, struct vm_exit *, int *vcpu); char *vmname; int guest_ncpus; char *guest_uuid_str; static int guest_vmexit_on_hlt, guest_vmexit_on_pause; static int virtio_msix = 1; static int x2apic_mode = 0; /* default is xAPIC */ static int strictio; static int strictmsr = 1; static int acpi; static char *progname; static const int BSP = 0; static cpuset_t cpumask; static void vm_loop(struct vmctx *ctx, int vcpu, uint64_t rip); static struct vm_exit vmexit[VM_MAXCPU]; struct bhyvestats { uint64_t vmexit_bogus; uint64_t vmexit_reqidle; uint64_t vmexit_hlt; uint64_t vmexit_pause; uint64_t vmexit_mtrap; uint64_t vmexit_inst_emul; uint64_t cpu_switch_rotate; uint64_t cpu_switch_direct; } stats; struct mt_vmm_info { pthread_t mt_thr; struct vmctx *mt_ctx; int mt_vcpu; } mt_vmm_info[VM_MAXCPU]; static cpuset_t *vcpumap[VM_MAXCPU] = { NULL }; static void usage(int code) { fprintf(stderr, "Usage: %s [-abehuwxACHPSWY] [-c vcpus] [-g ] [-l ]\n" - " %*s [-m mem] [-p vcpu:hostcpu] [-s ] [-U uuid] \n" + " %*s [-m memsize[K|k|M|m|G|g|T|t]] [-p vcpu:hostcpu] [-s ] [-U uuid] \n" " -a: local apic is in xAPIC mode (deprecated)\n" " -A: create ACPI tables\n" " -c: # cpus (default 1)\n" " -C: include guest memory in core file\n" " -e: exit on unhandled I/O access\n" " -g: gdb port\n" " -h: help\n" " -H: vmexit from the guest on hlt\n" " -l: LPC device configuration\n" - " -m: memory size in MB\n" + " -m: memory size\n" " -p: pin 'vcpu' to 'hostcpu'\n" " -P: vmexit from the guest on pause\n" " -s: PCI slot config\n" " -S: guest memory cannot be swapped\n" " -u: RTC keeps UTC time\n" " -U: uuid\n" " -w: ignore unimplemented MSRs\n" " -W: force virtio to use single-vector MSI\n" " -x: local apic is in x2APIC mode\n" " -Y: disable MPtable generation\n", progname, (int)strlen(progname), ""); exit(code); } static int pincpu_parse(const char *opt) { int vcpu, pcpu; if (sscanf(opt, "%d:%d", &vcpu, &pcpu) != 2) { fprintf(stderr, "invalid format: %s\n", opt); return (-1); } if (vcpu < 0 || vcpu >= VM_MAXCPU) { fprintf(stderr, "vcpu '%d' outside valid range from 0 to %d\n", vcpu, VM_MAXCPU - 1); return (-1); } if (pcpu < 0 || pcpu >= CPU_SETSIZE) { fprintf(stderr, "hostcpu '%d' outside valid range from " "0 to %d\n", pcpu, CPU_SETSIZE - 1); return (-1); } if (vcpumap[vcpu] == NULL) { if ((vcpumap[vcpu] = malloc(sizeof(cpuset_t))) == NULL) { perror("malloc"); return (-1); } CPU_ZERO(vcpumap[vcpu]); } CPU_SET(pcpu, vcpumap[vcpu]); return (0); } void vm_inject_fault(void *arg, int vcpu, int vector, int errcode_valid, int errcode) { struct vmctx *ctx; int error, restart_instruction; ctx = arg; restart_instruction = 1; error = vm_inject_exception(ctx, vcpu, vector, errcode_valid, errcode, restart_instruction); assert(error == 0); } void * paddr_guest2host(struct vmctx *ctx, uintptr_t gaddr, size_t len) { return (vm_map_gpa(ctx, gaddr, len)); } int fbsdrun_vmexit_on_pause(void) { return (guest_vmexit_on_pause); } int fbsdrun_vmexit_on_hlt(void) { return (guest_vmexit_on_hlt); } int fbsdrun_virtio_msix(void) { return (virtio_msix); } static void * fbsdrun_start_thread(void *param) { char tname[MAXCOMLEN + 1]; struct mt_vmm_info *mtp; int vcpu; mtp = param; vcpu = mtp->mt_vcpu; snprintf(tname, sizeof(tname), "vcpu %d", vcpu); pthread_set_name_np(mtp->mt_thr, tname); vm_loop(mtp->mt_ctx, vcpu, vmexit[vcpu].rip); /* not reached */ exit(1); return (NULL); } void fbsdrun_addcpu(struct vmctx *ctx, int fromcpu, int newcpu, uint64_t rip) { int error; assert(fromcpu == BSP); /* * The 'newcpu' must be activated in the context of 'fromcpu'. If * vm_activate_cpu() is delayed until newcpu's pthread starts running * then vmm.ko is out-of-sync with bhyve and this can create a race * with vm_suspend(). */ error = vm_activate_cpu(ctx, newcpu); if (error != 0) err(EX_OSERR, "could not activate CPU %d", newcpu); CPU_SET_ATOMIC(newcpu, &cpumask); /* * Set up the vmexit struct to allow execution to start * at the given RIP */ vmexit[newcpu].rip = rip; vmexit[newcpu].inst_length = 0; mt_vmm_info[newcpu].mt_ctx = ctx; mt_vmm_info[newcpu].mt_vcpu = newcpu; error = pthread_create(&mt_vmm_info[newcpu].mt_thr, NULL, fbsdrun_start_thread, &mt_vmm_info[newcpu]); assert(error == 0); } static int fbsdrun_deletecpu(struct vmctx *ctx, int vcpu) { if (!CPU_ISSET(vcpu, &cpumask)) { fprintf(stderr, "Attempting to delete unknown cpu %d\n", vcpu); exit(1); } CPU_CLR_ATOMIC(vcpu, &cpumask); return (CPU_EMPTY(&cpumask)); } static int vmexit_handle_notify(struct vmctx *ctx, struct vm_exit *vme, int *pvcpu, uint32_t eax) { #if BHYVE_DEBUG /* * put guest-driven debug here */ #endif return (VMEXIT_CONTINUE); } static int vmexit_inout(struct vmctx *ctx, struct vm_exit *vme, int *pvcpu) { int error; int bytes, port, in, out; int vcpu; vcpu = *pvcpu; port = vme->u.inout.port; bytes = vme->u.inout.bytes; in = vme->u.inout.in; out = !in; /* Extra-special case of host notifications */ if (out && port == GUEST_NIO_PORT) { error = vmexit_handle_notify(ctx, vme, pvcpu, vme->u.inout.eax); return (error); } error = emulate_inout(ctx, vcpu, vme, strictio); if (error) { fprintf(stderr, "Unhandled %s%c 0x%04x at 0x%lx\n", in ? "in" : "out", bytes == 1 ? 'b' : (bytes == 2 ? 'w' : 'l'), port, vmexit->rip); return (VMEXIT_ABORT); } else { return (VMEXIT_CONTINUE); } } static int vmexit_rdmsr(struct vmctx *ctx, struct vm_exit *vme, int *pvcpu) { uint64_t val; uint32_t eax, edx; int error; val = 0; error = emulate_rdmsr(ctx, *pvcpu, vme->u.msr.code, &val); if (error != 0) { fprintf(stderr, "rdmsr to register %#x on vcpu %d\n", vme->u.msr.code, *pvcpu); if (strictmsr) { vm_inject_gp(ctx, *pvcpu); return (VMEXIT_CONTINUE); } } eax = val; error = vm_set_register(ctx, *pvcpu, VM_REG_GUEST_RAX, eax); assert(error == 0); edx = val >> 32; error = vm_set_register(ctx, *pvcpu, VM_REG_GUEST_RDX, edx); assert(error == 0); return (VMEXIT_CONTINUE); } static int vmexit_wrmsr(struct vmctx *ctx, struct vm_exit *vme, int *pvcpu) { int error; error = emulate_wrmsr(ctx, *pvcpu, vme->u.msr.code, vme->u.msr.wval); if (error != 0) { fprintf(stderr, "wrmsr to register %#x(%#lx) on vcpu %d\n", vme->u.msr.code, vme->u.msr.wval, *pvcpu); if (strictmsr) { vm_inject_gp(ctx, *pvcpu); return (VMEXIT_CONTINUE); } } return (VMEXIT_CONTINUE); } static int vmexit_spinup_ap(struct vmctx *ctx, struct vm_exit *vme, int *pvcpu) { int newcpu; int retval = VMEXIT_CONTINUE; newcpu = spinup_ap(ctx, *pvcpu, vme->u.spinup_ap.vcpu, vme->u.spinup_ap.rip); return (retval); } #define DEBUG_EPT_MISCONFIG #ifdef DEBUG_EPT_MISCONFIG #define EXIT_REASON_EPT_MISCONFIG 49 #define VMCS_GUEST_PHYSICAL_ADDRESS 0x00002400 #define VMCS_IDENT(x) ((x) | 0x80000000) static uint64_t ept_misconfig_gpa, ept_misconfig_pte[4]; static int ept_misconfig_ptenum; #endif static int vmexit_vmx(struct vmctx *ctx, struct vm_exit *vmexit, int *pvcpu) { fprintf(stderr, "vm exit[%d]\n", *pvcpu); fprintf(stderr, "\treason\t\tVMX\n"); fprintf(stderr, "\trip\t\t0x%016lx\n", vmexit->rip); fprintf(stderr, "\tinst_length\t%d\n", vmexit->inst_length); fprintf(stderr, "\tstatus\t\t%d\n", vmexit->u.vmx.status); fprintf(stderr, "\texit_reason\t%u\n", vmexit->u.vmx.exit_reason); fprintf(stderr, "\tqualification\t0x%016lx\n", vmexit->u.vmx.exit_qualification); fprintf(stderr, "\tinst_type\t\t%d\n", vmexit->u.vmx.inst_type); fprintf(stderr, "\tinst_error\t\t%d\n", vmexit->u.vmx.inst_error); #ifdef DEBUG_EPT_MISCONFIG if (vmexit->u.vmx.exit_reason == EXIT_REASON_EPT_MISCONFIG) { vm_get_register(ctx, *pvcpu, VMCS_IDENT(VMCS_GUEST_PHYSICAL_ADDRESS), &ept_misconfig_gpa); vm_get_gpa_pmap(ctx, ept_misconfig_gpa, ept_misconfig_pte, &ept_misconfig_ptenum); fprintf(stderr, "\tEPT misconfiguration:\n"); fprintf(stderr, "\t\tGPA: %#lx\n", ept_misconfig_gpa); fprintf(stderr, "\t\tPTE(%d): %#lx %#lx %#lx %#lx\n", ept_misconfig_ptenum, ept_misconfig_pte[0], ept_misconfig_pte[1], ept_misconfig_pte[2], ept_misconfig_pte[3]); } #endif /* DEBUG_EPT_MISCONFIG */ return (VMEXIT_ABORT); } static int vmexit_svm(struct vmctx *ctx, struct vm_exit *vmexit, int *pvcpu) { fprintf(stderr, "vm exit[%d]\n", *pvcpu); fprintf(stderr, "\treason\t\tSVM\n"); fprintf(stderr, "\trip\t\t0x%016lx\n", vmexit->rip); fprintf(stderr, "\tinst_length\t%d\n", vmexit->inst_length); fprintf(stderr, "\texitcode\t%#lx\n", vmexit->u.svm.exitcode); fprintf(stderr, "\texitinfo1\t%#lx\n", vmexit->u.svm.exitinfo1); fprintf(stderr, "\texitinfo2\t%#lx\n", vmexit->u.svm.exitinfo2); return (VMEXIT_ABORT); } static int vmexit_bogus(struct vmctx *ctx, struct vm_exit *vmexit, int *pvcpu) { assert(vmexit->inst_length == 0); stats.vmexit_bogus++; return (VMEXIT_CONTINUE); } static int vmexit_reqidle(struct vmctx *ctx, struct vm_exit *vmexit, int *pvcpu) { assert(vmexit->inst_length == 0); stats.vmexit_reqidle++; return (VMEXIT_CONTINUE); } static int vmexit_hlt(struct vmctx *ctx, struct vm_exit *vmexit, int *pvcpu) { stats.vmexit_hlt++; /* * Just continue execution with the next instruction. We use * the HLT VM exit as a way to be friendly with the host * scheduler. */ return (VMEXIT_CONTINUE); } static int vmexit_pause(struct vmctx *ctx, struct vm_exit *vmexit, int *pvcpu) { stats.vmexit_pause++; return (VMEXIT_CONTINUE); } static int vmexit_mtrap(struct vmctx *ctx, struct vm_exit *vmexit, int *pvcpu) { assert(vmexit->inst_length == 0); stats.vmexit_mtrap++; return (VMEXIT_CONTINUE); } static int vmexit_inst_emul(struct vmctx *ctx, struct vm_exit *vmexit, int *pvcpu) { int err, i; struct vie *vie; stats.vmexit_inst_emul++; vie = &vmexit->u.inst_emul.vie; err = emulate_mem(ctx, *pvcpu, vmexit->u.inst_emul.gpa, vie, &vmexit->u.inst_emul.paging); if (err) { if (err == ESRCH) { fprintf(stderr, "Unhandled memory access to 0x%lx\n", vmexit->u.inst_emul.gpa); } fprintf(stderr, "Failed to emulate instruction ["); for (i = 0; i < vie->num_valid; i++) { fprintf(stderr, "0x%02x%s", vie->inst[i], i != (vie->num_valid - 1) ? " " : ""); } fprintf(stderr, "] at 0x%lx\n", vmexit->rip); return (VMEXIT_ABORT); } return (VMEXIT_CONTINUE); } static pthread_mutex_t resetcpu_mtx = PTHREAD_MUTEX_INITIALIZER; static pthread_cond_t resetcpu_cond = PTHREAD_COND_INITIALIZER; static int vmexit_suspend(struct vmctx *ctx, struct vm_exit *vmexit, int *pvcpu) { enum vm_suspend_how how; how = vmexit->u.suspended.how; fbsdrun_deletecpu(ctx, *pvcpu); if (*pvcpu != BSP) { pthread_mutex_lock(&resetcpu_mtx); pthread_cond_signal(&resetcpu_cond); pthread_mutex_unlock(&resetcpu_mtx); pthread_exit(NULL); } pthread_mutex_lock(&resetcpu_mtx); while (!CPU_EMPTY(&cpumask)) { pthread_cond_wait(&resetcpu_cond, &resetcpu_mtx); } pthread_mutex_unlock(&resetcpu_mtx); switch (how) { case VM_SUSPEND_RESET: exit(0); case VM_SUSPEND_POWEROFF: exit(1); case VM_SUSPEND_HALT: exit(2); case VM_SUSPEND_TRIPLEFAULT: exit(3); default: fprintf(stderr, "vmexit_suspend: invalid reason %d\n", how); exit(100); } return (0); /* NOTREACHED */ } static vmexit_handler_t handler[VM_EXITCODE_MAX] = { [VM_EXITCODE_INOUT] = vmexit_inout, [VM_EXITCODE_INOUT_STR] = vmexit_inout, [VM_EXITCODE_VMX] = vmexit_vmx, [VM_EXITCODE_SVM] = vmexit_svm, [VM_EXITCODE_BOGUS] = vmexit_bogus, [VM_EXITCODE_REQIDLE] = vmexit_reqidle, [VM_EXITCODE_RDMSR] = vmexit_rdmsr, [VM_EXITCODE_WRMSR] = vmexit_wrmsr, [VM_EXITCODE_MTRAP] = vmexit_mtrap, [VM_EXITCODE_INST_EMUL] = vmexit_inst_emul, [VM_EXITCODE_SPINUP_AP] = vmexit_spinup_ap, [VM_EXITCODE_SUSPENDED] = vmexit_suspend, [VM_EXITCODE_TASK_SWITCH] = vmexit_task_switch, }; static void vm_loop(struct vmctx *ctx, int vcpu, uint64_t startrip) { int error, rc; enum vm_exitcode exitcode; cpuset_t active_cpus; if (vcpumap[vcpu] != NULL) { error = pthread_setaffinity_np(pthread_self(), sizeof(cpuset_t), vcpumap[vcpu]); assert(error == 0); } error = vm_active_cpus(ctx, &active_cpus); assert(CPU_ISSET(vcpu, &active_cpus)); error = vm_set_register(ctx, vcpu, VM_REG_GUEST_RIP, startrip); assert(error == 0); while (1) { error = vm_run(ctx, vcpu, &vmexit[vcpu]); if (error != 0) break; exitcode = vmexit[vcpu].exitcode; if (exitcode >= VM_EXITCODE_MAX || handler[exitcode] == NULL) { fprintf(stderr, "vm_loop: unexpected exitcode 0x%x\n", exitcode); exit(1); } rc = (*handler[exitcode])(ctx, &vmexit[vcpu], &vcpu); switch (rc) { case VMEXIT_CONTINUE: break; case VMEXIT_ABORT: abort(); default: exit(1); } } fprintf(stderr, "vm_run error %d, errno %d\n", error, errno); } static int num_vcpus_allowed(struct vmctx *ctx) { int tmp, error; error = vm_get_capability(ctx, BSP, VM_CAP_UNRESTRICTED_GUEST, &tmp); /* * The guest is allowed to spinup more than one processor only if the * UNRESTRICTED_GUEST capability is available. */ if (error == 0) return (VM_MAXCPU); else return (1); } void fbsdrun_set_capabilities(struct vmctx *ctx, int cpu) { int err, tmp; if (fbsdrun_vmexit_on_hlt()) { err = vm_get_capability(ctx, cpu, VM_CAP_HALT_EXIT, &tmp); if (err < 0) { fprintf(stderr, "VM exit on HLT not supported\n"); exit(1); } vm_set_capability(ctx, cpu, VM_CAP_HALT_EXIT, 1); if (cpu == BSP) handler[VM_EXITCODE_HLT] = vmexit_hlt; } if (fbsdrun_vmexit_on_pause()) { /* * pause exit support required for this mode */ err = vm_get_capability(ctx, cpu, VM_CAP_PAUSE_EXIT, &tmp); if (err < 0) { fprintf(stderr, "SMP mux requested, no pause support\n"); exit(1); } vm_set_capability(ctx, cpu, VM_CAP_PAUSE_EXIT, 1); if (cpu == BSP) handler[VM_EXITCODE_PAUSE] = vmexit_pause; } if (x2apic_mode) err = vm_set_x2apic_state(ctx, cpu, X2APIC_ENABLED); else err = vm_set_x2apic_state(ctx, cpu, X2APIC_DISABLED); if (err) { fprintf(stderr, "Unable to set x2apic state (%d)\n", err); exit(1); } vm_set_capability(ctx, cpu, VM_CAP_ENABLE_INVPCID, 1); } static struct vmctx * do_open(const char *vmname) { struct vmctx *ctx; int error; bool reinit, romboot; reinit = romboot = false; if (lpc_bootrom()) romboot = true; error = vm_create(vmname); if (error) { if (errno == EEXIST) { if (romboot) { reinit = true; } else { /* * The virtual machine has been setup by the * userspace bootloader. */ } } else { perror("vm_create"); exit(1); } } else { if (!romboot) { /* * If the virtual machine was just created then a * bootrom must be configured to boot it. */ fprintf(stderr, "virtual machine cannot be booted\n"); exit(1); } } ctx = vm_open(vmname); if (ctx == NULL) { perror("vm_open"); exit(1); } if (reinit) { error = vm_reinit(ctx); if (error) { perror("vm_reinit"); exit(1); } } return (ctx); } int main(int argc, char *argv[]) { int c, error, gdb_port, err, bvmcons; int max_vcpus, mptgen, memflags; int rtc_localtime; struct vmctx *ctx; uint64_t rip; size_t memsize; char *optstr; bvmcons = 0; progname = basename(argv[0]); gdb_port = 0; guest_ncpus = 1; memsize = 256 * MB; mptgen = 1; rtc_localtime = 1; memflags = 0; optstr = "abehuwxACHIPSWYp:g:c:s:m:l:U:"; while ((c = getopt(argc, argv, optstr)) != -1) { switch (c) { case 'a': x2apic_mode = 0; break; case 'A': acpi = 1; break; case 'b': bvmcons = 1; break; case 'p': if (pincpu_parse(optarg) != 0) { errx(EX_USAGE, "invalid vcpu pinning " "configuration '%s'", optarg); } break; case 'c': guest_ncpus = atoi(optarg); break; case 'C': memflags |= VM_MEM_F_INCORE; break; case 'g': gdb_port = atoi(optarg); break; case 'l': if (lpc_device_parse(optarg) != 0) { errx(EX_USAGE, "invalid lpc device " "configuration '%s'", optarg); } break; case 's': if (pci_parse_slot(optarg) != 0) exit(1); else break; case 'S': memflags |= VM_MEM_F_WIRED; break; case 'm': error = vm_parse_memsize(optarg, &memsize); if (error) errx(EX_USAGE, "invalid memsize '%s'", optarg); break; case 'H': guest_vmexit_on_hlt = 1; break; case 'I': /* * The "-I" option was used to add an ioapic to the * virtual machine. * * An ioapic is now provided unconditionally for each * virtual machine and this option is now deprecated. */ break; case 'P': guest_vmexit_on_pause = 1; break; case 'e': strictio = 1; break; case 'u': rtc_localtime = 0; break; case 'U': guest_uuid_str = optarg; break; case 'w': strictmsr = 0; break; case 'W': virtio_msix = 0; break; case 'x': x2apic_mode = 1; break; case 'Y': mptgen = 0; break; case 'h': usage(0); default: usage(1); } } argc -= optind; argv += optind; if (argc != 1) usage(1); vmname = argv[0]; ctx = do_open(vmname); if (guest_ncpus < 1) { fprintf(stderr, "Invalid guest vCPUs (%d)\n", guest_ncpus); exit(1); } max_vcpus = num_vcpus_allowed(ctx); if (guest_ncpus > max_vcpus) { fprintf(stderr, "%d vCPUs requested but only %d available\n", guest_ncpus, max_vcpus); exit(1); } fbsdrun_set_capabilities(ctx, BSP); vm_set_memflags(ctx, memflags); err = vm_setup_memory(ctx, memsize, VM_MMAP_ALL); if (err) { fprintf(stderr, "Unable to setup memory (%d)\n", errno); exit(1); } error = init_msr(); if (error) { fprintf(stderr, "init_msr error %d", error); exit(1); } init_mem(); init_inout(); pci_irq_init(ctx); ioapic_init(ctx); rtc_init(ctx, rtc_localtime); sci_init(ctx); /* * Exit if a device emulation finds an error in it's initilization */ if (init_pci(ctx) != 0) exit(1); if (gdb_port != 0) init_dbgport(gdb_port); if (bvmcons) init_bvmcons(); if (lpc_bootrom()) { if (vm_set_capability(ctx, BSP, VM_CAP_UNRESTRICTED_GUEST, 1)) { fprintf(stderr, "ROM boot failed: unrestricted guest " "capability not available\n"); exit(1); } error = vcpu_reset(ctx, BSP); assert(error == 0); } error = vm_get_register(ctx, BSP, VM_REG_GUEST_RIP, &rip); assert(error == 0); /* * build the guest tables, MP etc. */ if (mptgen) { error = mptable_build(ctx, guest_ncpus); if (error) exit(1); } error = smbios_build(ctx); assert(error == 0); if (acpi) { error = acpi_build(ctx, guest_ncpus); assert(error == 0); } if (lpc_bootrom()) fwctl_init(); /* * Change the proc title to include the VM name. */ setproctitle("%s", vmname); /* * Add CPU 0 */ fbsdrun_addcpu(ctx, BSP, BSP, rip); /* * Head off to the main event dispatch loop */ mevent_dispatch(); exit(1); } Index: head/usr.sbin/bhyveload/bhyveload.8 =================================================================== --- head/usr.sbin/bhyveload/bhyveload.8 (revision 302210) +++ head/usr.sbin/bhyveload/bhyveload.8 (revision 302211) @@ -1,180 +1,175 @@ .\" .\" Copyright (c) 2012 NetApp Inc .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" -.Dd February 26, 2016 +.Dd June 24, 2016 .Dt BHYVELOAD 8 .Os .Sh NAME .Nm bhyveload .Nd load a .Fx guest inside a bhyve virtual machine .Sh SYNOPSIS .Nm .Op Fl C .Op Fl S .Op Fl c Ar cons-dev .Op Fl d Ar disk-path .Op Fl e Ar name=value .Op Fl h Ar host-path .Op Fl l Ar os-loader -.Op Fl m Ar mem-size +.Op Fl m Ar memsize Ns Op Ar K|k|M|m|G|g|T|t .Ar vmname .Sh DESCRIPTION .Nm is used to load a .Fx guest inside a .Xr bhyve 4 virtual machine. .Pp .Nm is based on .Xr loader 8 and will present an interface identical to the .Fx loader on the user's terminal. This behavior can be changed by specifying a different OS loader. .Pp The virtual machine is identified as .Ar vmname and will be created if it does not already exist. .Sh OPTIONS The following options are available: .Bl -tag -width indent .It Fl c Ar cons-dev .Ar cons-dev is a .Xr tty 4 device to use for .Nm terminal I/O. .Pp The text string "stdio" is also accepted and selects the use of unbuffered standard I/O. This is the default value. .It Fl d Ar disk-path The .Ar disk-path is the pathname of the guest's boot disk image. .It Fl e Ar name=value Set the .Fx loader environment variable .Ar name to .Ar value . .Pp The option may be used more than once to set more than one environment variable. .It Fl h Ar host-path The .Ar host-path is the directory at the top of the guest's boot filesystem. .It Fl l Ar os-loader Specify a different OS loader. By default .Nm will use .Pa /boot/userboot.so , which presents a standard .Fx loader. -.It Fl m Ar mem-size Xo -.Sm off -.Op Cm K | k | M | m | G | g | T | t -.Xc -.Sm on -.Ar mem-size +.It Fl m Ar memsize Ns Op Ar K|k|M|m|G|g|T|t +.Ar memsize is the amount of memory allocated to the guest. .Pp The -.Ar mem-size +.Ar memsize argument may be suffixed with one of .Cm K , .Cm M , .Cm G or .Cm T (either upper or lower case) to indicate a multiple of Kilobytes, Megabytes, Gigabytes or Terabytes respectively. .Pp -The default value of -.Ar mem-size -is 256M. +.Ar memsize +defaults to 256M. .It Fl C Include guest memory in the core file when .Nm dumps core. This is intended for debugging an OS loader as it allows inspection of the guest memory. .It Fl S Wire guest memory. .El .Sh EXAMPLES To create a virtual machine named .Ar freebsd-vm that boots off the ISO image .Pa /freebsd/release.iso and has 1GB memory allocated to it: .Pp .Dl "bhyveload -m 1G -d /freebsd/release.iso freebsd-vm" .Pp To create a virtual machine named .Ar test-vm with 256MB of memory allocated, the guest root filesystem under the host directory .Pa /user/images/test and terminal I/O sent to the .Xr nmdm 4 device .Pa /dev/nmdm1B .Pp .Dl "bhyveload -m 256MB -h /usr/images/test -c /dev/nmdm1B test-vm" .Sh SEE ALSO .Xr bhyve 4 , .Xr nmdm 4 , .Xr vmm 4 , .Xr bhyve 8 , .Xr loader 8 .Sh HISTORY .Nm first appeared in .Fx 10.0 , and was developed at NetApp Inc. .Sh AUTHORS .Nm was developed by .An -nosplit .An Neel Natu Aq Mt neel@FreeBSD.org at NetApp Inc with a lot of help from .An Doug Rabson Aq Mt dfr@FreeBSD.org . .Sh BUGS .Nm can only load .Fx as a guest. Index: head/usr.sbin/bhyveload/bhyveload.c =================================================================== --- head/usr.sbin/bhyveload/bhyveload.c (revision 302210) +++ head/usr.sbin/bhyveload/bhyveload.c (revision 302211) @@ -1,793 +1,793 @@ /*- * Copyright (c) 2011 NetApp, Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ /*- * Copyright (c) 2011 Google, Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "userboot.h" #define MB (1024 * 1024UL) #define GB (1024 * 1024 * 1024UL) #define BSP 0 #define NDISKS 32 static char *host_base; static struct termios term, oldterm; static int disk_fd[NDISKS]; static int ndisks; static int consin_fd, consout_fd; static char *vmname, *progname; static struct vmctx *ctx; static uint64_t gdtbase, cr3, rsp; static void cb_exit(void *arg, int v); /* * Console i/o callbacks */ static void cb_putc(void *arg, int ch) { char c = ch; (void) write(consout_fd, &c, 1); } static int cb_getc(void *arg) { char c; if (read(consin_fd, &c, 1) == 1) return (c); return (-1); } static int cb_poll(void *arg) { int n; if (ioctl(consin_fd, FIONREAD, &n) >= 0) return (n > 0); return (0); } /* * Host filesystem i/o callbacks */ struct cb_file { int cf_isdir; size_t cf_size; struct stat cf_stat; union { int fd; DIR *dir; } cf_u; }; static int cb_open(void *arg, const char *filename, void **hp) { struct cb_file *cf; char path[PATH_MAX]; if (!host_base) return (ENOENT); strlcpy(path, host_base, PATH_MAX); if (path[strlen(path) - 1] == '/') path[strlen(path) - 1] = 0; strlcat(path, filename, PATH_MAX); cf = malloc(sizeof(struct cb_file)); if (stat(path, &cf->cf_stat) < 0) { free(cf); return (errno); } cf->cf_size = cf->cf_stat.st_size; if (S_ISDIR(cf->cf_stat.st_mode)) { cf->cf_isdir = 1; cf->cf_u.dir = opendir(path); if (!cf->cf_u.dir) goto out; *hp = cf; return (0); } if (S_ISREG(cf->cf_stat.st_mode)) { cf->cf_isdir = 0; cf->cf_u.fd = open(path, O_RDONLY); if (cf->cf_u.fd < 0) goto out; *hp = cf; return (0); } out: free(cf); return (EINVAL); } static int cb_close(void *arg, void *h) { struct cb_file *cf = h; if (cf->cf_isdir) closedir(cf->cf_u.dir); else close(cf->cf_u.fd); free(cf); return (0); } static int cb_isdir(void *arg, void *h) { struct cb_file *cf = h; return (cf->cf_isdir); } static int cb_read(void *arg, void *h, void *buf, size_t size, size_t *resid) { struct cb_file *cf = h; ssize_t sz; if (cf->cf_isdir) return (EINVAL); sz = read(cf->cf_u.fd, buf, size); if (sz < 0) return (EINVAL); *resid = size - sz; return (0); } static int cb_readdir(void *arg, void *h, uint32_t *fileno_return, uint8_t *type_return, size_t *namelen_return, char *name) { struct cb_file *cf = h; struct dirent *dp; if (!cf->cf_isdir) return (EINVAL); dp = readdir(cf->cf_u.dir); if (!dp) return (ENOENT); /* * Note: d_namlen is in the range 0..255 and therefore less * than PATH_MAX so we don't need to test before copying. */ *fileno_return = dp->d_fileno; *type_return = dp->d_type; *namelen_return = dp->d_namlen; memcpy(name, dp->d_name, dp->d_namlen); name[dp->d_namlen] = 0; return (0); } static int cb_seek(void *arg, void *h, uint64_t offset, int whence) { struct cb_file *cf = h; if (cf->cf_isdir) return (EINVAL); if (lseek(cf->cf_u.fd, offset, whence) < 0) return (errno); return (0); } static int cb_stat(void *arg, void *h, int *mode, int *uid, int *gid, uint64_t *size) { struct cb_file *cf = h; *mode = cf->cf_stat.st_mode; *uid = cf->cf_stat.st_uid; *gid = cf->cf_stat.st_gid; *size = cf->cf_stat.st_size; return (0); } /* * Disk image i/o callbacks */ static int cb_diskread(void *arg, int unit, uint64_t from, void *to, size_t size, size_t *resid) { ssize_t n; if (unit < 0 || unit >= ndisks ) return (EIO); n = pread(disk_fd[unit], to, size, from); if (n < 0) return (errno); *resid = size - n; return (0); } static int cb_diskioctl(void *arg, int unit, u_long cmd, void *data) { struct stat sb; if (unit < 0 || unit >= ndisks) return (EBADF); switch (cmd) { case DIOCGSECTORSIZE: *(u_int *)data = 512; break; case DIOCGMEDIASIZE: if (fstat(disk_fd[unit], &sb) == 0) *(off_t *)data = sb.st_size; else return (ENOTTY); break; default: return (ENOTTY); } return (0); } /* * Guest virtual machine i/o callbacks */ static int cb_copyin(void *arg, const void *from, uint64_t to, size_t size) { char *ptr; to &= 0x7fffffff; ptr = vm_map_gpa(ctx, to, size); if (ptr == NULL) return (EFAULT); memcpy(ptr, from, size); return (0); } static int cb_copyout(void *arg, uint64_t from, void *to, size_t size) { char *ptr; from &= 0x7fffffff; ptr = vm_map_gpa(ctx, from, size); if (ptr == NULL) return (EFAULT); memcpy(to, ptr, size); return (0); } static void cb_setreg(void *arg, int r, uint64_t v) { int error; enum vm_reg_name vmreg; vmreg = VM_REG_LAST; switch (r) { case 4: vmreg = VM_REG_GUEST_RSP; rsp = v; break; default: break; } if (vmreg == VM_REG_LAST) { printf("test_setreg(%d): not implemented\n", r); cb_exit(NULL, USERBOOT_EXIT_QUIT); } error = vm_set_register(ctx, BSP, vmreg, v); if (error) { perror("vm_set_register"); cb_exit(NULL, USERBOOT_EXIT_QUIT); } } static void cb_setmsr(void *arg, int r, uint64_t v) { int error; enum vm_reg_name vmreg; vmreg = VM_REG_LAST; switch (r) { case MSR_EFER: vmreg = VM_REG_GUEST_EFER; break; default: break; } if (vmreg == VM_REG_LAST) { printf("test_setmsr(%d): not implemented\n", r); cb_exit(NULL, USERBOOT_EXIT_QUIT); } error = vm_set_register(ctx, BSP, vmreg, v); if (error) { perror("vm_set_msr"); cb_exit(NULL, USERBOOT_EXIT_QUIT); } } static void cb_setcr(void *arg, int r, uint64_t v) { int error; enum vm_reg_name vmreg; vmreg = VM_REG_LAST; switch (r) { case 0: vmreg = VM_REG_GUEST_CR0; break; case 3: vmreg = VM_REG_GUEST_CR3; cr3 = v; break; case 4: vmreg = VM_REG_GUEST_CR4; break; default: break; } if (vmreg == VM_REG_LAST) { printf("test_setcr(%d): not implemented\n", r); cb_exit(NULL, USERBOOT_EXIT_QUIT); } error = vm_set_register(ctx, BSP, vmreg, v); if (error) { perror("vm_set_cr"); cb_exit(NULL, USERBOOT_EXIT_QUIT); } } static void cb_setgdt(void *arg, uint64_t base, size_t size) { int error; error = vm_set_desc(ctx, BSP, VM_REG_GUEST_GDTR, base, size - 1, 0); if (error != 0) { perror("vm_set_desc(gdt)"); cb_exit(NULL, USERBOOT_EXIT_QUIT); } gdtbase = base; } static void cb_exec(void *arg, uint64_t rip) { int error; if (cr3 == 0) error = vm_setup_freebsd_registers_i386(ctx, BSP, rip, gdtbase, rsp); else error = vm_setup_freebsd_registers(ctx, BSP, rip, cr3, gdtbase, rsp); if (error) { perror("vm_setup_freebsd_registers"); cb_exit(NULL, USERBOOT_EXIT_QUIT); } cb_exit(NULL, 0); } /* * Misc */ static void cb_delay(void *arg, int usec) { usleep(usec); } static void cb_exit(void *arg, int v) { tcsetattr(consout_fd, TCSAFLUSH, &oldterm); exit(v); } static void cb_getmem(void *arg, uint64_t *ret_lowmem, uint64_t *ret_highmem) { *ret_lowmem = vm_get_lowmem_size(ctx); *ret_highmem = vm_get_highmem_size(ctx); } struct env { const char *str; /* name=value */ SLIST_ENTRY(env) next; }; static SLIST_HEAD(envhead, env) envhead; static void addenv(const char *str) { struct env *env; env = malloc(sizeof(struct env)); env->str = str; SLIST_INSERT_HEAD(&envhead, env, next); } static const char * cb_getenv(void *arg, int num) { int i; struct env *env; i = 0; SLIST_FOREACH(env, &envhead, next) { if (i == num) return (env->str); i++; } return (NULL); } static int cb_vm_set_register(void *arg, int vcpu, int reg, uint64_t val) { return (vm_set_register(ctx, vcpu, reg, val)); } static int cb_vm_set_desc(void *arg, int vcpu, int reg, uint64_t base, u_int limit, u_int access) { return (vm_set_desc(ctx, vcpu, reg, base, limit, access)); } static struct loader_callbacks cb = { .getc = cb_getc, .putc = cb_putc, .poll = cb_poll, .open = cb_open, .close = cb_close, .isdir = cb_isdir, .read = cb_read, .readdir = cb_readdir, .seek = cb_seek, .stat = cb_stat, .diskread = cb_diskread, .diskioctl = cb_diskioctl, .copyin = cb_copyin, .copyout = cb_copyout, .setreg = cb_setreg, .setmsr = cb_setmsr, .setcr = cb_setcr, .setgdt = cb_setgdt, .exec = cb_exec, .delay = cb_delay, .exit = cb_exit, .getmem = cb_getmem, .getenv = cb_getenv, /* Version 4 additions */ .vm_set_register = cb_vm_set_register, .vm_set_desc = cb_vm_set_desc, }; static int altcons_open(char *path) { struct stat sb; int err; int fd; /* * Allow stdio to be passed in so that the same string * can be used for the bhyveload console and bhyve com-port * parameters */ if (!strcmp(path, "stdio")) return (0); err = stat(path, &sb); if (err == 0) { if (!S_ISCHR(sb.st_mode)) err = ENOTSUP; else { fd = open(path, O_RDWR | O_NONBLOCK); if (fd < 0) err = errno; else consin_fd = consout_fd = fd; } } return (err); } static int disk_open(char *path) { int err, fd; if (ndisks >= NDISKS) return (ERANGE); err = 0; fd = open(path, O_RDONLY); if (fd > 0) { disk_fd[ndisks] = fd; ndisks++; } else err = errno; return (err); } static void usage(void) { fprintf(stderr, "usage: %s [-S][-c ] [-d ] [-e ]\n" - " %*s [-h ] [-m mem-size] \n", + " %*s [-h ] [-m memsize[K|k|M|m|G|g|T|t]] \n", progname, (int)strlen(progname), ""); exit(1); } int main(int argc, char** argv) { char *loader; void *h; void (*func)(struct loader_callbacks *, void *, int, int); uint64_t mem_size; int opt, error, need_reinit, memflags; progname = basename(argv[0]); loader = NULL; memflags = 0; mem_size = 256 * MB; consin_fd = STDIN_FILENO; consout_fd = STDOUT_FILENO; while ((opt = getopt(argc, argv, "CSc:d:e:h:l:m:")) != -1) { switch (opt) { case 'c': error = altcons_open(optarg); if (error != 0) errx(EX_USAGE, "Could not open '%s'", optarg); break; case 'd': error = disk_open(optarg); if (error != 0) errx(EX_USAGE, "Could not open '%s'", optarg); break; case 'e': addenv(optarg); break; case 'h': host_base = optarg; break; case 'l': if (loader != NULL) errx(EX_USAGE, "-l can only be given once"); loader = strdup(optarg); if (loader == NULL) err(EX_OSERR, "malloc"); break; case 'm': error = vm_parse_memsize(optarg, &mem_size); if (error != 0) errx(EX_USAGE, "Invalid memsize '%s'", optarg); break; case 'C': memflags |= VM_MEM_F_INCORE; break; case 'S': memflags |= VM_MEM_F_WIRED; break; case '?': usage(); } } argc -= optind; argv += optind; if (argc != 1) usage(); vmname = argv[0]; need_reinit = 0; error = vm_create(vmname); if (error) { if (errno != EEXIST) { perror("vm_create"); exit(1); } need_reinit = 1; } ctx = vm_open(vmname); if (ctx == NULL) { perror("vm_open"); exit(1); } if (need_reinit) { error = vm_reinit(ctx); if (error) { perror("vm_reinit"); exit(1); } } vm_set_memflags(ctx, memflags); error = vm_setup_memory(ctx, mem_size, VM_MMAP_ALL); if (error) { perror("vm_setup_memory"); exit(1); } if (loader == NULL) { loader = strdup("/boot/userboot.so"); if (loader == NULL) err(EX_OSERR, "malloc"); } h = dlopen(loader, RTLD_LOCAL); if (!h) { printf("%s\n", dlerror()); free(loader); return (1); } func = dlsym(h, "loader_main"); if (!func) { printf("%s\n", dlerror()); free(loader); return (1); } tcgetattr(consout_fd, &term); oldterm = term; cfmakeraw(&term); term.c_cflag |= CLOCAL; tcsetattr(consout_fd, TCSAFLUSH, &term); addenv("smbios.bios.vendor=BHYVE"); addenv("boot_serial=1"); func(&cb, NULL, USERBOOT_VERSION_4, ndisks); free(loader); return (0); }