Page MenuHomeFreeBSD

vmm/svm: use special state for LAPIC interrupt injected as virtual interrupt
AbandonedPublic

Authored by avg on Jan 10 2018, 6:08 PM.

Details

Summary

This is an alternative to D13780.

The virtual interrupt method uses V_IRQ, V_INTR_PRIO, and V_INTR_VECTOR fields
of VMCB to inject a virtual interrupt into a guest VM. This method has many
advantages over the direct event injection as it offloads all decisions of
whether and when the interrupt can be delivered to the guest. But with a purely
software emulated vAPIC the advantage is also a problem. The problem is that
the hypervisor does not have any precise control over when the interrupt is
actually delivered to the guest (or a notification about that). Because of
that the hypervisor cannot update the interrupt vector in IRR and ISR in the
same way as real hardware would. The hypervisor becomes aware that the
interrupt is being serviced only upon the first VMEXIT after the interrupt is
delivered. This creates a window between the actual interrupt delivery and the
update of IRR and ISR. That means that IRR and ISR might not be correctly set
up to the point of the end-of-interrupt signal.

The described deviation has been observed to cause an interrupt loss in the
following scenario. vCPU0 posts an inter-processor interrupt to vCPU1. The
interrupt is injected as a virtual interrupt by the hypervisor. The interrupt
is delivered to a guest and an interrupt handler is invoked. The handler
performs a requested action and acknowledges the request by modifying a global
variable. So far, there is no VMEXIT and the hypervisor is unaware of the
events. Then, vCPU0 notices the acknowledgment and sends another IPI with the
same vector. The IPI gets collapsed into the previous IPI in the IRR of vCPU1.
Only after that a VMEXIT of vCPU1 occurs. At that time the vector is cleared in
the IRR and is set in the ISR. vCPU1 has vAPIC state as if the second IPI has
never been sent. I believe that we see this scenario in bug 215972 The
scenario is impossible on the real hardware because IRR and ISR are updated
just before the interrupt handler gets started.

To solve this problem I propose to move the injected interrupt from IRR
into a special internal LAPIC state. The interrupt is held in that state until
a #VMEXIT at which time we know whether the interrupt was delivered.
Based on that information we either put the vector back into IRR or move it
forward to ISR. Since all access to LAPIC registers are intercepted we are
guaranteed that a read will always see a correct state of the interrupt.
The new approach ensures that the new incoming interrupt will never be collapsed
in IRR with the interrupt that's already being serviced.

I have added some KASSERTs to verify that the new injected (pending) state is
correctly handled. I have also added a new KTR trace to explicitly report
any collapsed interrupts.

Test Plan

Tested on Phenom II X4 955.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 14337
Build 14495: arc lint + arc unit

Event Timeline

avg created this revision.Jan 10 2018, 6:08 PM
avg updated this revision to Diff 37736.Jan 10 2018, 6:27 PM

remove a stray change that broke !KTR build

The VIRQ injection doesn't cover all cases - it misses out the modification of the TPR register via CR8.

Also, there may be a long delay until a VMEXIT occurs to inject an interrupt. This may not show up under load, but when a system is lightly loaded (and the clock maybe slowed down) latency may be impacted.

My suggestion is to keep the change for your prior review. AMD systems since Carrizo have supported apic virtualization, and IMHO that is the way to go to get better performance.

avg added a comment.Jan 11 2018, 11:56 AM

The VIRQ injection doesn't cover all cases - it misses out the modification of the TPR register via CR8.

I agree that our code has a problem but it should be easy to overcome as that's what V_IRQ was designed to handle.
When checking for a pending interrupt we should ignore TPR and only honor ISR. That way we inject a pending interrupt and let the hardware decide when it can be delivered without VMM having to watch for TPR changes.

Also, there may be a long delay until a VMEXIT occurs to inject an interrupt. This may not show up under load, but when a system is lightly loaded (and the clock maybe slowed down) latency may be impacted.

Do you mean the same scenario as above? Or another scenario?

My suggestion is to keep the change for your prior review.

I am not sure myself which approach is better.
I think that we have the CR8 problem in either case.
With the event injection we have to intercept CR8 writes.
With V_IRQ we can do either that or what I suggested above (let the hardware handle V_TPR changes).

AMD systems since Carrizo have supported apic virtualization, and IMHO that is the way to go to get better performance.

Yes, AVIC would be the best solution.
But we still need to support processors without it.
All of mine are such at the moment.

avg updated this revision to Diff 37875.Jan 12 2018, 5:25 PM

Ignore software emulated LAPIC TPR when checking for a pending vector (SVM only).

Guest TPR is virtualized by the hardware as V_TPR and it can be changed
by the guest without the host being aware. So, we simply inject the highest
pending vector as a virtual interrupt and leave it up to the hardware
to decide whether and when the interrupt can be delivered.

emaste added a subscriber: emaste.Jan 15 2018, 2:29 PM
avg abandoned this revision.Jan 31 2018, 11:38 AM

Abandoned in favor of D13780.