Details
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
The commit log here really needs some clarity. I have an attempt below that I think describes what you are trying to do:
vmm: Permit some IPIs to be handled by userspace. Add VM_EXITCODE_IPI to permit returning unhandled IPIs to userland. INIT and Startup IPIs are now returned to userland.
One question is does this mean that an old bhyve will now break? If so, perhaps we should add a new
capability for userland to request userland exits vs ignoring the IPIs?
sys/amd64/vmm/io/vlapic.c | ||
---|---|---|
1002 | Probably this should use __assert_unreachable instead. | |
1005 | I think this warrants a comment perhaps, something like: /* ipimask is a set of vCPUs needing userland handling of the current IPI. */ |
Probably want to adjust the commit log a bit more now for the addition of the new capability. But in general I think this is fine.
sys/amd64/vmm/intel/vmx.c | ||
---|---|---|
3614 | Do we also want to support this new capability for vm_get_capability for completeness? |
@jhb seems all comments have been resolved, will commit later today unless you have objections.
So what was the race you had to fix from the previous version? (I have a huge branch I had already rebased on top of this and was hoping this would go back in without having to rebase on the reverted version, so would love to see this back in if the reported issue is fixed? On my branch on top of the original commit I was able to boot via bhyveload fine FWIW)
It was only happening with Xeon CPU iirc, Corvin is on holiday ATM, not sure when he's back.
On Xeon CPUs and when using bhyveload, vlapic_reset(vlapic2); crashes because the vcpu belonging to vlapic2 is running and you can't set (apic-)register values of running vcpus.
When a vcpu receives an INIT IPI, this patch suspends the vcpu by calling vm_suspend. vm_suspend is asynchronous. So, the vcpu can be still running when the SIPI is received. Additionally, a suspended vcpu still calls vm_run in a loop as long as it's not debugged. So, it enters the run state for a while and exits immediately. My updated patch ensures that the vcpu is in run state when it receives a SIPI. Nevertheless, it seems to work for Xeon CPUs and 14.0-CURRENT but it deadlocks on a Core CPU and 13.1. I'm still searching for a solution.
My patch was reverted upstream. So, it contains all changes. @jhb if you're only interested in the fix, take a look at
https://github.com/Beckhoff/freebsd-src/commit/17a9732e80cf191398e0f0d10d6249abff3974d8
https://github.com/Beckhoff/freebsd-src/commit/f49380ca50b9b4599bd9ded159463844ceeba109
sys/amd64/vmm/io/vlapic.c | ||
---|---|---|
64 | I would maybe do a standalone commit for this change first before the main commit to say this is to prevent Linux from using broadcast INIT IPIs. Probably the other way to fix the Linux problem btw would be to make the broadcast INIT actually be an "all but self" INIT, but avoiding broadcast IPIs is probably better anyway. | |
1034–1035 | The gross hack for Linux here perhaps would be to always remove the sending vCPU from dmask. It never makes sense for a CPU to send an INIT IPI to itself. Something like: CPU_CLR(vlapic->vcpuid, &dmask); If that fixes Linux without requiring the LAPIC version bump, I think I'd like to include the fix as it is more future proof (and probably more accurate to how INIT IPIs are treated). I'd probably still want to bump the APIC version as a separate commit just because we don't need to try to emulate a real P5 or PPro, but this fix above might also prevent malicious guests from causing weird behavior by sending an INIT IPI to the executing vCPU. |
sys/amd64/vmm/io/vlapic.c | ||
---|---|---|
1034–1035 | When we (illumos) overhauled our LAPIC emulation to support OVMF (which was using more of INIT/SIPI than previously supported), we chose to ignore the problematic shorthand emitted by the Linux code in question. I know the overall approach here is different than what we did, but perhaps at least this tidbit is worth consideration. We've been running with it for a couple years. |
sys/amd64/vmm/io/vlapic.c | ||
---|---|---|
1034–1035 | I'd prefer to validate all IPIs. The Intel SDM contains a table of valid combinations for the local interrupt command register. So, we could check for valid combinations and ignore invalid ones (or throw an exception). If we like to fix this without a version bump, I'd prefer to emulate it correctly. The SDM states that the correct behaviour on reception of the INIT deassert would be to write the apic id into the arb id register. Haven't checked the AMD manuals yet. |
sys/amd64/vmm/io/vlapic.c | ||
---|---|---|
1034–1035 | I've noticed that vm_smp_rendezvous crashes on Intel Xeon CPUs. It looks like vm_smp_rendezvous calls mtx_sleep which clears the CR0_TS bit. This causes a panic in save_guest_fpustate. Can we avoid this? |
sys/amd64/vmm/io/vlapic.c | ||
---|---|---|
64 | ||
1034–1035 | Sadly, neither Intel nor AMD documents what happens when someone tries to send an invalid IPI (like INIT broadcast to all including self). However, they document which ICR combinations are invalid. I've created a patch for it. Please note, that it behaves like an APIC version 0x14 or higher. |
- set boot state in icrlo_write handler too. It's required for userland applications which do not support the ipi exit.