Page MenuHomeFreeBSD

bhyve virtio needs barriers
ClosedPublic

Authored by pmooney_pfmooney.com on Mar 7 2019, 8:48 PM.

Details

Summary

Under certain tight race conditions, we found that the lack of a memory barrier in bhyve's virtio handling causes it to miss a NO_NOTIFY state transition on block devices, resulting in guest stall. The investigation is recorded in OS-7613. As part of the examination into bhyve's use of barriers, one other section was found to be problematic, but only on non-x86 ISAs with less strict memory ordering. That was addressed in this patch as well, although it was not at all a problem on x86.

Test Plan

We have only observed this issue when guests ran a specific customer workload. Without the patch, they would encounter the race/stall within hours of running. With the patch in place, their workload was able to run for days without issue. PR https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231117 is going to test as well.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

rgrimes accepted this revision as: rgrimes.Mar 7 2019, 8:51 PM
This revision is now accepted and ready to land.Mar 7 2019, 8:51 PM
jhb added a reviewer: kib.May 7 2019, 4:03 PM
jhb added a comment.May 7 2019, 4:11 PM

@kib, I assume we probably want to use 'atomic_thread_fence_*' variants instead of mb and wmb? I think 'atomic_fence_rel()' is sufficient for the first, but does the second need 'seq_cst' instead of 'acq_rel'?

kib added inline comments.May 7 2019, 4:22 PM
usr.sbin/bhyve/virtio.c
430 ↗(On Diff #54815)

This should be substituted by atomic_thread_fence_rel(), indeed.

473 ↗(On Diff #54815)

This should be atomic_thread_fence_seq_cst(). Atomic_thread_fence_acq_rel() does not order reads after writes, it is a mechanical combination of acq and rel.

I'm fine with @kib's suggested changes, with the caveat that we (Joyent) only positively tested the mfence version.

Implemented @kib's suggestions.

This revision now requires review to proceed.May 7 2019, 9:11 PM
pmooney_pfmooney.com marked 2 inline comments as done.May 7 2019, 9:12 PM
rgrimes accepted this revision as: rgrimes.May 7 2019, 10:22 PM

Thanks for the quick spin on this Patrick

This revision is now accepted and ready to land.May 7 2019, 10:22 PM
kib accepted this revision.May 8 2019, 8:02 AM
rgrimes edited the test plan for this revision. (Show Details)May 9 2019, 4:51 PM

Isn't include of machine/atomic.h missing here?

Isn't include of machine/atomic.h missing here?

After adding #include <machine/atomic.h> it compiles fine. Virtual Machine that used to crash every few hours works stable for 3 days after switching to patched bhyve. Can we have this merged? Thanks for your work!

FWIW, it happened on AMD Epyc with 16 cores (32 threads).

I would like to see this in next weeks 11.3-PRERELEASE build, this looks to be a low risk barrier addition, @jhb does that seem to be a reasonable target, or would you like to see a longer ^head test cycle? Does anyone see any risk in this change?

Added missing atomics header

This revision now requires review to proceed.May 17 2019, 3:28 PM
kib added inline comments.May 17 2019, 4:13 PM
usr.sbin/bhyve/virtio.c
41 ↗(On Diff #57484)

Usually machine/ comes after sys/, before userspace headers.

usr.sbin/bhyve/virtio.c
41 ↗(On Diff #57484)

There was some inconsistency in the other bhyve examples for that which lead me astray. Will fix.

Fixed include ordering per @kib's comment.

pmooney_pfmooney.com marked an inline comment as done.May 17 2019, 4:19 PM
kib accepted this revision.May 17 2019, 4:57 PM
This revision is now accepted and ready to land.May 17 2019, 4:57 PM

Ok to commit once blank lines are added per jhb in person

usr.sbin/bhyve/virtio.c
428 ↗(On Diff #57487)

jhb: blank line before block comments.

470 ↗(On Diff #57487)

Blank line before block comment

rgrimes accepted this revision.May 18 2019, 5:10 AM
This revision was automatically updated to reflect the committed changes.