If userspace has a newer bhyve than the kernel, it may be able to decode
and emulate some instructions vmm.ko is unaware of. This is more for
prototyping than anything else.
Details
- Reviewers
grehan bcran - Group Reviewers
bhyve - Commits
- rS362600: bhyve(8): For prototyping, reattempt decode in userspace
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
Nice change. GCE has moved instruction emulation out of kernel KVM, and it would be interesting for bhyve to also have that ability: this is a step towards that.
Well, it may be somewhat broken. I can’t tell if it’s just vmexit for every APIC poll which is bog-slow, or if emulation behavior of bextr is incorrect, or what, but my VM with the VEX instruction mostly hangs when it gets here. How do you usually debug this kind of code?
(I have logic not shown here to print out userspace-decoded vies, but after the initial implementation that print became way too slow to run the VM.)
One rough edge is that the !decoded vie from the kernel vmexit may contain partial decode state from the kernel. It isn’t a problem for (valid) VEX instructions, as those have a unique prefix byte we never handled before. But we may need something like vie_init() that doesn’t trash the thread-specific register state.
Debugging the APIC is a slog. Start with KTR compiled in an single vCPU guests. See what accesses are being done with BEXTR and what the device emulation is returning. Use FreeBSD as a guest and ddb to put in guest breakpoints to allow the ktr log to be examined. You may have to spend a lot more time with SDM Vol3 Ch10 than you would care for.
Turns out it was just missing these kernel-emulated devices in our userspace memory map. I haven't tested D24525 yet, but barring bugs, I think that should resolve this problem.
Actually I guess there is no dependency relationship here. We can fallback to userspace decode just fine for memory regions userspace has knowledge of. Just my bad luck that the first missing instruction happened to be in a PIC access rather than PCI MMIO.
Reset decoding state before attempting decode in userspace.
This is significant because a partial decode could advance at least the vie
cursor (num_processed). (Maybe other state relies on the initial zero values,
but this one definitely does.) Be careful not to discard the instruction
contents during restart.
Minor update: In the seemingly impossible case where inst_length is zero, don't
leave trash in the vie->num_valid field.
Need to figure out a way to get the userspace build to observe the new kernel headers. The <machine/foo> symlink headers aren't pointing at the ones in SRCTOP. I guess I could put __FreeBSD_version conditional dummy prototypes in a userspace header and bump that?
Or change the build to point at kernel source headers directly, something like this: https://reviews.freebsd.org/P398 Thoughts?