This patch allows the use of hypercalls in the guest OS in order to communicate to the hypervisor. It can be used as a basis for more complex operations through bhyve.
Recompile the kernel in order to load the extended vmm module. Use the command sysctl hw.vmm.hypercalls_enabled=1 in order to enable hypercalls. Create a release ISO from the source and use a hypercall using .byte 0x0f,0x01,0xc1 on Intel processors and .byte 0x0f,0x01,0xd9 on AMD processors. Do NOT use the bhyve_hypercall.h, as it is intended as a way to do hypercalls from the kernel. Userland applications are to implement their own way of doing hypercalls. In order to test the behaviour with disabled hypercalls, use the command sysctl hw.vmm.hypercalls_enabled=0 and run the beforementioned instructions in the guest OS.
Added some explanations.
|63 ↗||(On Diff #20886)|
These hypercall numbers are reserved for use with DTrace in the near future.
|75 ↗||(On Diff #20886)|
This would be good to be refactored as it causes code duplication. As of now, I couldn't find a way to do it. Extended ASM doesn't allow runtime operations inside it.
|98 ↗||(On Diff #20886)|
The stack usage here is based on the possibility of register names changing in the future. This could as well be implemented using registers r8-r15, but potentially might require change in the future. This can be reimplemented should it be preferred with registers.
Thanks for this work - it will be very useful.
Some general high-level comments:
- There are a number of guest users of hypercalls: KVM, Xen, Hyper-V. Since bhyve may emulate some/all of these hypercalls, it's probably worthwhile looking at the call sequences they use to try and co-exist. For example, it may even be useful at some future point to have DTrace support in those hypervisors.
- In a similar vein, 'hypercall.h' should probably be 'bhyve_hypercall.h'
- the hypercall exit should be disabled by default unless a tunable is set. It's an attack point that should only be enabled through hypervisor configuration. While there isn't currently a generic way to set per-guest configuration, there will be at some point, at which time this feature could be shifted over to that instead of having a global enable/disable.
- in addition, a future work item, and not really part of this review, is to have a CPUID space carved out for bhyve that will provide a way for the guest to determine if the hypercall facility is available. See the KVM docs for an example of how this is done.
I'm curious regarding a few details about these implementations, as there are multiple ways of doing it.
- Should the hypercall header be restructured with registers, as it can be done using registers RAX, RBX and R8-R15?
- Should I keep the CPUIDs used in other hypervisors available for compatibility or does it not matter?
- The hypercall availability being passed in by the CPUID can be done either on a per-hypercall basis, where one could then check if a certain hypercall is available or as a whole, providing the information only after the hypercall is called.
This raises the issue where we could either opt for enabling hypercalls globally and on a per-hypercall basis or keep the switch only globally, as both of these are simple changes to make.
I haven't been able to find a way to disable the actual VMExit on Intel, so as of now, it is a simple if. If anyone has any more insight, I will change this to actually disable the VMExit.
This patch addresses a couple of issues:
- Allow global enabling and disabling of hypercalls. The hypercall exists are by default disabled
- Add the necessary dispatchers and handlers for a hypervisor emulation layer
- Clean up the bhyve_hypercall.h using macros and revert to the calling convention that uses the stack
- Add a tunable which allows setting of the hypervisor mode
- Unimplemented hypercalls now cause an #UD fault
- CPUID_4000_0001 gives information in EAX whether hypercalls are enabled or not
Address the points discussed at MeetBSD.
- The hypercalls are now added in assembly using the SystemV ABI.
- The instructions are patched at boot time dependent on the cpu_vendor_id.
- The calling convention now uses registers instead of the stack.
Point out an issue regarding mi_startup.
In general, this could be machine independent, however as bhyve is currently limited to amd64, specifically VMCALL and VMMCALL opcodes, perhaps this would make sense to perform in an amd64-specific function as opposed to a machine independent one?
This diff changes the following:
- Remove DTrace stuff as much as possible, create a prototype instead
- Remove the unnecessary static NULL assignments in the hc_dispatcher
- Reiterate the instructions on some parts of the code
These are maybe better kept explicit in terms of the protection ring they can be called from, so there is no ambiguity, even though they are initialized to 0 by default. I can change this if needed, though.
Update the diff to reflect changes in HEAD with bhyve identification. This diff no longer requires it.
@grehan -- is there a chance this could be upstreamed soon? It's becoming quite a hard dependency for now a working prototype, and changes in HEAD relating to the same code would be very difficult to handle on my end (as it propagates throughout a lot of other code).
Do you reckon some things here still need to be changed, or is it satisfactory?