This is an alternative to D13780.
The virtual interrupt method uses V_IRQ, V_INTR_PRIO, and V_INTR_VECTOR fields
of VMCB to inject a virtual interrupt into a guest VM. This method has many
advantages over the direct event injection as it offloads all decisions of
whether and when the interrupt can be delivered to the guest. But with a purely
software emulated vAPIC the advantage is also a problem. The problem is that
the hypervisor does not have any precise control over when the interrupt is
actually delivered to the guest (or a notification about that). Because of
that the hypervisor cannot update the interrupt vector in IRR and ISR in the
same way as real hardware would. The hypervisor becomes aware that the
interrupt is being serviced only upon the first VMEXIT after the interrupt is
delivered. This creates a window between the actual interrupt delivery and the
update of IRR and ISR. That means that IRR and ISR might not be correctly set
up to the point of the end-of-interrupt signal.
The described deviation has been observed to cause an interrupt loss in the
following scenario. vCPU0 posts an inter-processor interrupt to vCPU1. The
interrupt is injected as a virtual interrupt by the hypervisor. The interrupt
is delivered to a guest and an interrupt handler is invoked. The handler
performs a requested action and acknowledges the request by modifying a global
variable. So far, there is no VMEXIT and the hypervisor is unaware of the
events. Then, vCPU0 notices the acknowledgment and sends another IPI with the
same vector. The IPI gets collapsed into the previous IPI in the IRR of vCPU1.
Only after that a VMEXIT of vCPU1 occurs. At that time the vector is cleared in
the IRR and is set in the ISR. vCPU1 has vAPIC state as if the second IPI has
never been sent. I believe that we see this scenario in bug 215972 The
scenario is impossible on the real hardware because IRR and ISR are updated
just before the interrupt handler gets started.
To solve this problem I propose to move the injected interrupt from IRR
into a special internal LAPIC state. The interrupt is held in that state until
a #VMEXIT at which time we know whether the interrupt was delivered.
Based on that information we either put the vector back into IRR or move it
forward to ISR. Since all access to LAPIC registers are intercepted we are
guaranteed that a read will always see a correct state of the interrupt.
The new approach ensures that the new incoming interrupt will never be collapsed
in IRR with the interrupt that's already being serviced.
I have added some KASSERTs to verify that the new injected (pending) state is
correctly handled. I have also added a new KTR trace to explicitly report
any collapsed interrupts.