Paths

Table of Contentst

vmm: fix Windows 11 deadlock
AbandonedPublic
Actions

Authored by corvink on Sep 29 2025, 9:41 AM.

Details

Reviewers

markj
jhb
kib

Group Reviewers

bhyve

Summary

There's a race condition which is so far only triggered by Windows 11 when
using PCI passthrough. It usually causes the Windows 11 guest to deadlock on
boot. When a vCPU triggers a rendezvous, it waits for all other vCPUs to
complete the rendezvous. At that same moment, a different vCPU could try to
lock all other vCPUs by calling vcpu_lock_all. In that case, both vCPU are
waiting for each other to complete their request (one vCPU hangs in
vm_handle_rendezvous while the other hangs in vcpu_set_state_locked). In order
to solve this, the locking vCPU has to take part in the rendezvous.
Unfortunately, we can't check in vcpu_set_state_locked on which vCPU we're
currently running. For that reason, we're iterating over all vCPUs and helping
them to complete the rendezvous.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 67575
Build 64458: arc lint + arc unit

Event Timeline

corvink created this revision.Sep 29 2025, 9:41 AM

Herald added subscribers: bcran, rgrimes, imp. · View Herald TranscriptSep 29 2025, 9:41 AM

corvink requested review of this revision.Sep 29 2025, 9:41 AM

Harbormaster completed remote builds in B67388: Diff 163012.Sep 29 2025, 9:41 AM

corvink mentioned this in D37972: bhyve: help rendezvous request when waiting for IDLE vcpu state.Sep 29 2025, 9:42 AM

jbo added a subscriber: jbo.Sep 29 2025, 12:27 PM

kib added inline comments.Sep 30 2025, 4:04 AM

sys/amd64/vmm/vmm.c
1025	You could do something like this: if (vcpu->vm->r_f != NULL) { vcpu_unlock(); if (vcpu->vm->r_f != NULL) { ..... } } in other words, check the condition without the unlocking. I am not even sure that re-check after unlock is needed, since your patch checks only after unlock.

I have tested this patch on stable/14 (specifically, d0e8c94631267f07c90465faf5b47dcce0e232be) and I was finally able to pass through a PCIe device to my Windows 11 guest without it locking up during boot.

corvink added inline comments.Sep 30 2025, 2:19 PM

sys/amd64/vmm/vmm.c
1025	Yeah, it's better to lock and unlock in the if condition. Rechecking after unlock shouldn't be required because vm_handle_rendezvous will recheck it anyways.

crowston_protonmail.com added a subscriber: crowston_protonmail.com.Oct 1 2025, 6:52 AM

crowston_protonmail.com added inline comments.

sys/amd64/vmm/vmm.c
1026	In other places we only load the `vm->rendezvous_func` variable under the `mtx_lock(&vm->rendezvous_mtx)`. Are we ok to look at it here without holding that lock? Is the vcpu lock sufficient?

This is a much simpler change than I had proposed. I like it.

corvink marked an inline comment as not done.Oct 1 2025, 7:21 PM

corvink added inline comments.

sys/amd64/vmm/vmm.c
1026	We're just checking it here. We don't make use of it. If it's a false positive, `vm_handle_rendezvous` will recheck it again with `rendezvous_mtx` locked and exits early if it's NULL. So, it wouldn't lead to wrong behavior.

corvink added inline comments.Oct 2 2025, 8:02 AM

sys/amd64/vmm/vmm.c
1025	Btw. i've noticed that it doesn't seem necessary to call vcpu_unlock at all. However, I'm not sure whether it's a good idea to keep vcpu locked while calling vm_handle_rendezvous.

It may be possible that the issues I've been experiencing on Windows 10 Pro (22H2) could also be related to this as well. Although my VM has been running fine for the past 5 days (no reboots). I'm going to apply this patch on my server (running FreeBSD 14.3-RELEASE) and will test the stability of it on Windows 10 Pro (my main gaming VM) which has been experiencing similar issues. I'll also test it on Windows 11 Pro (25H2) running in bhyve (all requirements bypassed).

I applied the patch on top of FreeBSD 14.3-RELEASE and tested the Windows 11 25H2 VM. It ran for about 1h34m before I got my first "WHEA_UNCORRECTABLE_ERROR" crash. On the next boot it lasted about 15m before it crashed with the same error. It may have possibly improved things given before it would crash more frequently, but there either might be a different issue, or another edge case. I'm going to test my Windows 11 24H2 VM which I already know how it normally behaves (or how often it "should" be crashing). I'll report back when I get more data.

I also forgot to mention that I have both Windows 10 and 11 using the bhyve nvme, not virtio.. and I do notice that the NVME is running at 100% in the windows task manager (since I'm installing games on it and they take a while to install). I do remember reading somewhere else that some people were experiencing some crashes with the nvme driver but that was many years ago and it has been fixed.

In D52781#1208610, @jon_xyinn.org wrote:

before I got my first "WHEA_UNCORRECTABLE_ERROR" crash.

I'm not a vmm dev but that sounds like an unrelated/different issue to me.
I have been running both Windows 10 and Windows 11 guests for years and never encountered an issue other than the PCIe passthru boot deadlock issue that this patch fixes.

You might be best off raising a separate PR.

I think there may still be a relationship, jbo, since I've been having a different experience than you. However for now I'll try out some other bisecting before coming back to this. I'm going to test out switching out the "nvme" emulation for my drive to "ahci-hd" and see if that has any stability effect. I know people have also experienced issues with the nvme emulation before.

At that same moment, a different vCPU could try to lock all other vCPUs.

Is this referring to vcpu_lock_all()? Does the deadlock only happen early during guest boot?

sys/amd64/vmm/vmm.c
997	Please keep function declarations grouped together at the beginning of the file, e.g., together with `vcpu_notify_event_locked()`.
1025	We cannot keep the vcpu locked while calling vm_handle_rendezvous(): that function acquires a MTX_DEF mutex, and the vcpu mutex is a spin mutex (which holds interrupts disabled). I agree that we should avoid unlocking the vcpu unnecessarily here. vcpu_set_state_locked() is called many thousands of times per second and we should avoid unnecessary work. So something like this: if (__predict_false(atomic_load_ptr(&vcpu->vm_rendezvous_func) != NULL)) { vcpu_unlock(vcpu); ... vcpu_lock(vcpu); } More generally though, is it really safe to drop the vcpu lock here? I guess so: `vcpu_set_state_locked()` is currently always called from `vcpu_set_state()`, which locks the vcpu directly, or `vcpu_require_state_locked()`, which is always called immediately before or after `msleep_spin(&vcpu->mtx)`. This deserves a comment though.
1035	Shouldn't we recheck `vcpu->state` after reacquiring the lock? It might have transitioned to `VCPU_IDLE`, in which case we should not sleep.

only unlock vcpu when necessary
avoid uneccessary sleeps
fix some style issues

Harbormaster completed remote builds in B67575: Diff 163615.Oct 6 2025, 7:42 AM

corvink edited the summary of this revision. (Show Details)Oct 6 2025, 7:42 AM

In D52781#1208670, @jon_xyinn.org wrote:

I think there may still be a relationship, jbo, since I've been having a different experience than you. However for now I'll try out some other bisecting before coming back to this. I'm going to test out switching out the "nvme" emulation for my drive to "ahci-hd" and see if that has any stability effect. I know people have also experienced issues with the nvme emulation before.

I don't think that it's related. This patch fixes a deadlock on guest boot. It's not related to BSODs.

In D52781#1208708, @markj wrote:

At that same moment, a different vCPU could try to lock all other vCPUs.

Is this referring to vcpu_lock_all()? Does the deadlock only happen early during guest boot?

So far, I've only seen it deadlocking on boot.

Thanks for that context Corvin. I was interpreting this code as a general vcpu synchronization fix that happens to target the deadlock at boot time, since I didn't see any conditional logic of it specifically written for boot and/or Windows 11. Sorry for the noise!

markj added inline comments.Oct 6 2025, 1:03 PM

sys/amd64/vmm/vmm.c
1034	We should assert that vm_handle_rendezvous() returns 0.
1035	It would be better to introduce a new helper, `vm_handle_rendezvous_all()`, which effectively implements this loop, but without acquiring and dropping the rendezvous mutex for each vcpu. That is, add vm_handle_rendezvous_locked(), which is called with the rendezvous lock already acquired, add vm_handle_rendezvous_all(), which acquires the rendezvous lock and calls vm_handle_rendezvous_locked() for each vCPU. This way, we avoid a potential race. Suppose there are three vCPUs, 0, 1, 2, and vCPU thread 2 requests a rendezvous. vCPU thread 1 completes it and blocks. vCPU thread 0 tries to lock all vCPUs, then executes the callback for vCPU 0. Then, before it continues to vCPU 1, vCPU 2 issues another rendezvous. vCPU 0 will execute the callback for vCPU 1, then skip 2, and continue. But vCPU 2 will still be sleeping waiting for vCPU 0 to rendezvous. I think this race is avoided if vCPU 0 does not drop the rendezvous lock while executing the loop. Another problem though, are we sure it's safe to "steal" the rendezvous callback this way? Look at the rendezvous with vlapic_handle_init(), for example. vlapic_handle_init() does not acquire any locks, so there is nothing synchronizing this thread with the vCPU thread for that vCPU. What if that thread is also accessing the vlapic? The vlapic code assumes that all accesses are vCPU-local, I think.

corvink added inline comments.Oct 6 2025, 1:44 PM

sys/amd64/vmm/vmm.c
1035	Another problem though, are we sure it's safe to "steal" the rendezvous callback this way? Look at the rendezvous with vlapic_handle_init(), for example. vlapic_handle_init() does not acquire any locks, so there is nothing synchronizing this thread with the vCPU thread for that vCPU. What if that thread is also accessing the vlapic? The vlapic code assumes that all accesses are vCPU-local, I think. We only have to call the rendezvous for the vCPU thread we're currently running on, e.g. if vCPU 0 requests a rendezvous and vCPU 1 calls vcpu_lock_all (and then vcpu_set_state_locked), we only have to call handle_rendezvous for vCPU 1 to resolve the deadlock. This would avoid "stealing" the rendezvous callback. So, ideally we can somehow get the vcpu struct of the current thread. Unfortunately, I don't know if that's possible.

@corvink Let me know once you'd like me to test the patch again on my machine where I can reliably reproduce the issue.

I have observed this lock up post boot.

crowston_protonmail.com added inline comments.Oct 6 2025, 8:37 PM

sys/amd64/vmm/vmm.c
1026	Do we need to worry about a false negative — that is, `rendezvous_func` is null at the time it is checked, and a rendezvous begins instantly after we are satisfied it is null?

corvink marked 3 inline comments as done.Oct 7 2025, 6:24 AM

corvink added inline comments.

sys/amd64/vmm/vmm.c
1026	It will cause a quite big delay but msleep_spin has a timeout. So, after expiring the timeout, it should run the loop again and then sees the rendezvous.

markj added inline comments.Oct 7 2025, 8:14 PM

sys/amd64/vmm/vmm.c
1035	Unfortunately, I don't know if that's possible. I don't think it is. vcpu_lock_all() is called from the ioctl path, and some ioctls that use it are not invoked from a vCPU thread in general. So let's take a step back and make some observations: it is safe to invoke the rendezvous callback for a vCPU if the vCPU state is IDLE, the state of a vCPU thread sleeping in vm_smp_rendezvous() will be FROZEN, if two threads call vcpu_lock_all() at the same time, they will be serialized since they lock the vCPUs in a consistent order, the deadlock arises when we try to set the state of a thread sleeping in vm_smp_rendezvous(), and that thread will stop sleeping when all /other/ vCPUs have executed their callback So I propose the following to avoid the deadlock: use an sx lock to serialize vcpu_lock_all() (i.e., make the existing vm_slock_vcpus() an exclusive lock) in vcpu_lock_all(), lock all IDLE vCPUs first (i.e., set their state to FROZEN) and invoke the rendezvous call back for them then, lock all the remaining vCPUs I believe this will fix the deadlock as well. Moreover, it is safe to invoke the rendezvous callback on an IDLE vCPU. I implemented my idea in https://reviews.freebsd.org/D52968 but I'm not set up to reproduce the deadlock. Could you test it?

corvink marked an inline comment as done.Oct 9 2025, 12:27 PM

corvink added inline comments.

sys/amd64/vmm/vmm.c
1035	I implemented my idea in https://reviews.freebsd.org/D52968 but I'm not set up to reproduce the deadlock. Could you test it? Will test it. Unfortunately, my test system is unavailable at the moment. It will be back in a few days.

Fixed by D52968

jbo mentioned this in D52968: vmm: My attempt at fixing the rendezvous deadlock.Nov 11 2025, 1:49 PM

Revision Contents
Changeset List

Path

Size

sys/

amd64/

vmm/

vmm.c

43 lines

Diff 163615

View Options

sys/amd64/vmm: fix Windows 11 deadlockAbandonedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 163615

sys/amd64/vmm/vmm.c

sys/amd64/vmm: fix Windows 11 deadlock
AbandonedPublic
Actions

Revision Contents
Changeset List