There is an assert on fbsdrun_addcpu() that can lead the bhyve have a core dump in case we try to use bhyve(8) on a running vm.
Details
Just run a vm, and right after, try to use bhyve with any option on the same vm name.
(gdb) run -Y fbsd1
Starting program: /usr/sbin/bhyve -Y fbsd1
[New LWP 100688]
Assertion failed: (error == 0), function fbsdrun_addcpu, file /usr/src/usr.sbin/bhyve/bhyverun.c, line 261.
[New Thread 801c06400 (LWP 100688/bhyve)]
Program received signal SIGABRT, Aborted.
[Switching to Thread 801c06400 (LWP 100688/bhyve)]
0x0000000800fd267a in thr_kill () from /lib/libc.so.7
(gdb) bt
#0 0x0000000800fd267a in thr_kill () from /lib/libc.so.7
#1 0x0000000800fd2628 in raise () from /lib/libc.so.7
#2 0x0000000800fd0e49 in abort () from /lib/libc.so.7
#3 0x0000000800fb1291 in __assert () from /lib/libc.so.7
#4 0x0000000000408559 in fbsdrun_addcpu (ctx=0x801c16080, fromcpu=0,
newcpu=0, rip=18446744071576357445) at /usr/src/usr.sbin/bhyve/bhyverun.c:261
#5 0x000000000040903f in main (argc=1, argv=0x7fffffffeb38)
at /usr/src/usr.sbin/bhyve/bhyverun.c:884
(gdb) up 4
#4 0x0000000000408559 in fbsdrun_addcpu (ctx=0x801c16080, fromcpu=0,
newcpu=0, rip=18446744071576357445) at /usr/src/usr.sbin/bhyve/bhyverun.c:261
261 assert(error == 0);
Current language: auto; currently minimal
(gdb) up 5
#5 0x000000000040903f in main (argc=1, argv=0x7fffffffeb38)
at /usr/src/usr.sbin/bhyve/bhyverun.c:884
884 fbsdrun_addcpu(ctx, BSP, BSP, rip);
(gdb) quit
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage
Event Timeline
There are many reasons why 'vm_activate_cpu()' might fail.
As you point out one reason is when more than one bhyve(8) process tries to run the same virtual machine. I would classify this as operator error along the same lines as pointing multiple virtual machines to a single disk image.
There isn't much an isolated bhyve process can do to prevent this.
However there are other errors that might cause vm_activate_cpu() to fail:
- ioctl mismatch between userspace and kernel.
- malicious or buggy guest behavior where the same AP is brought up more than once.
- bugs in vlapic emulation where 'newcpu' is computed incorrectly.
- race between the bhyve process and vmm.ko with 'active_cpus'
In all these cases a core file would be invaluable to determine the source of the problem. I am not convinced that the benefit of fixing one specific issue outweighs the ability to debug hard-to-reproduce error conditions.
@neel have explained why we have the coredump, and make sense.
So, I'm closing this review request.
Thanks,