Page MenuHomeFreeBSD

bhyve vioapic writes can deadlock instance
AbandonedPublic

Authored by pmooney_pfmooney.com on May 23 2019, 8:55 PM.
Tags
Referenced Files
Unknown Object (File)
Sun, Apr 14, 7:36 PM
Unknown Object (File)
Thu, Apr 11, 11:28 PM
Unknown Object (File)
Wed, Apr 10, 4:04 AM
Unknown Object (File)
Wed, Apr 10, 2:28 AM
Unknown Object (File)
Sat, Apr 6, 12:56 AM
Unknown Object (File)
Feb 20 2024, 2:24 AM
Unknown Object (File)
Feb 6 2024, 5:28 PM
Unknown Object (File)
Jan 31 2024, 12:52 AM

Details

Reviewers
jhb
markj
rgrimes
Group Reviewers
bhyve
Summary

I found that writes to the vioapic are capable of deadlocking with certain other operations on the VM. Say that a multi-vCPU instance is running and a vCPU other than 0 writes to the vioapic redirection table. Assuming certain fields in the register are changed, a call to vm_smp_rendezvous will be made in order to update the trigger mode registers on all the vCPUs. The vioapic write, including the call to vm_smp_rendenzvous is all made in the context of that vCPU. Now say that another thread attempts one of the ioctls which requires locking all the vCPUs. It will do so, starting at vCPU 0 to VM_MAXCPU-1. If it is able to lock any of the vCPUs before the rendezvous begins, a deadlock will occur. This is because it won't be able to lock the vCPU handling the vioapic write, which is waiting for the rendezvous to finish. That rendezvous cannot finish, since any of the vCPUs locked by the other ioctl will never be released until they are all locked, the ioctl completes its work, and they are released.

SmartOS ticket with more detail (including test results): OS-7622

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

Rebased to reflect upstream changes

sys/amd64/vmm/vmm.c
2273

Won't we generally be holding the vioapic spin lock here, now that it is no longer dropped when triggering an update of the TMRs? If so, this sleep is not permitted.

Thanks for pointing that out @markj. This wasn't a problem in SmartOS bhyve since all of the mutexes there are adaptive, rather than spinlocks. (Due to interface differences, we couldn't guarantee a lack of voluntary context switching inside the critical_enter/critical_exit section, and spinlocks came with their own host of deadlock problems.) Considering the constraints in play on the FreeBSD side, I'm not sure this possible without serious restructuring. The vioapic lock would need to be sleeping, so the EOI processing would require a full exit from the VMRUN critical section. Other consumers (hpet, etc) would probably require similar treatment.

This could be changed to drop the vioapic for the duration of TMR updates, perhaps with some extra synch primitive to prevent simultaneous writes from colliding. It wouldn't have the same correctness guarantee in the face of racing vioapic operations, but it may not be worse than what exists now? Considering how infrequent level-triggered interrupts are, it might be an acceptable trade-off.

After discussion with @markj and @jhb, I drafted up a change which involves using an sxlock to guard against racing vioapic writes while allowing the main vioapic mutex to be dropped during the sensitive portion of the TMR update process. I don't have test hardware I can dedicate to running FreeBSD bhyve, so its performance is speculative.

Could you rebase this on FreeBSD HEAD? The current diff doesn't apply due to trivial conflicts with r353747 and r351609.

sys/amd64/vmm/vmm.c
2263

I think this can be a KASSERT() instead of an explicit check.

2284

I think this can be a KASSERT() instead of an explicit check.

sys/amd64/vmm/io/vioapic.c
43

This should be sorted after param.h. (param.h is a special header that always gets included first.)

353

This assertion is redundant.

sys/amd64/vmm/vmm.c
1201

Why does this come before the error check? I think it shouldn't matter in practice since we will panic if a transition to RUNNING returns an error, but it looks a bit odd.

1203

On at least FreeBSD this function may be called with preemption disabled, in which case sleeping is prohibited.

I suspect that the check+sleep should be lifted into vm_run()

Abandoned, as it is to be replaced by the approach documented in illumos #13007.

sys/amd64/vmm/vmm.c
2263

I was matching the style of vcpu_[gs]et_state() above.

2284

I was matching the style of vcpu_[gs]et_state() above.