Page MenuHomeFreeBSD

hyperv: Use mfence to make sure about the store-load order for msgs
ClosedPublic

Authored by sepherosa_gmail.com on Feb 25 2016, 8:07 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Jan 6, 11:11 PM
Unknown Object (File)
Dec 7 2024, 9:42 AM
Unknown Object (File)
Dec 5 2024, 9:08 PM
Unknown Object (File)
Nov 4 2024, 7:40 PM
Unknown Object (File)
Oct 28 2024, 2:52 PM
Unknown Object (File)
Oct 6 2024, 10:55 AM
Unknown Object (File)
Oct 3 2024, 10:28 AM
Unknown Object (File)
Oct 3 2024, 9:22 AM
Subscribers
None

Details

Summary

sfence only makes sure about the store-store order, which is not sufficient here.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

sepherosa_gmail.com retitled this revision from to hyperv: Use mfence to make sure about the store-load order for msgs.
sepherosa_gmail.com updated this object.
sepherosa_gmail.com edited the test plan for this revision. (Show Details)
sys/dev/hyperv/vmbus/hv_vmbus_drv_freebsd.c
120 ↗(On Diff #13720)

FreeBSD's atomic(9) API recently gained more expressive fences and are preferred over mb/wmb. For here I believe what you want is 'atomic_fetch_seq_cst()'. It actually uses a locked addl to a thread-local address instead of 'mfence' (more on this in sys/amd64/include/atomic.h).

jhb edited edge metadata.

Adding kib@ since he added atomic_fence_*().

sys/dev/hyperv/vmbus/hv_vmbus_drv_freebsd.c
120 ↗(On Diff #13720)

I think it is more a question of what is required by the specification. If it is stated that mfence should be issued, so be it.

Note that atomic_fetch_seq_cst() is an implementation of the C11 model operation, and indeed in the current set of the atomics on FreeBSD might be implemented as mfence, or more optimally, as locked op. But we might change the implementation. If the spec is formulated in C11 terms, we would in fact have the ABI compatibility requirement between our atomics and whatever the hypervisor uses.

Practically speaking, I suspect that locked op is enough for your purposes, and fetch_seq_cst() is faster than mfence.

sys/dev/hyperv/vmbus/hv_vmbus_drv_freebsd.c
120 ↗(On Diff #13720)

Yeah, I remembered some Solaris folks said "locked" op is faster then mfence/lfence on x86. Ok, let me do some test and switch. Thanks for the information.

sepherosa_gmail.com edited edge metadata.

Use atomic_thread_fence_seq_cst as suggested by jhb and kib. It seems to work fine so far

there is more wmb and mb() in the tree. please cleanup them together to switch to atomic(9)

there is more wmb and mb() in the tree. please cleanup them together to switch to atomic(9)

Sure, will do, when my code reading goes across them.

jhb edited edge metadata.

there is more wmb and mb() in the tree. please cleanup them together to switch to atomic(9)

Given that this is a functional change (wmb -> mfence like op) I think this should be standalone. If you want to do a sweep to convert mb/wmb to corresponding atomic_fence_*(), I think that should be separate from this change.

This revision is now accepted and ready to land.Feb 26 2016, 7:47 PM
This revision was automatically updated to reflect the committed changes.

I'm afraid atomic_thread_fence_seq_cst() here has a potential issue in the case of UP kernel: it is downgraded to a compiler barrier only, which is incorrect here, because the shared memory here is between the host virtual CPU and the UP guest virtual CPU -- at least 2 CPUs are involved (we assume the guest/host virtual CPUs are running on different physical CPUs)

I'm afraid atomic_thread_fence_seq_cst() here has a potential issue in the case of UP kernel: it is downgraded to a compiler barrier only, which is incorrect here, because the shared memory here is between the host virtual CPU and the UP guest virtual CPU -- at least 2 CPUs are involved (we assume the guest/host virtual CPUs are running on different physical CPUs)

Good point. We probably have to add a wrapper for it (in case SMP is not defined). We could fix it later, since SMP is by default defined :)

I'm afraid atomic_thread_fence_seq_cst() here has a potential issue in the case of UP kernel: it is downgraded to a compiler barrier only, which is incorrect here, because the shared memory here is between the host virtual CPU and the UP guest virtual CPU -- at least 2 CPUs are involved (we assume the guest/host virtual CPUs are running on different physical CPUs)

Good point. We probably have to add a wrapper for it (in case SMP is not defined). We could fix it later, since SMP is by default defined :)

This is why I said that this is the question of what a specification requires. mb() with the proper comment about it being x86-specific and required even on UP config looks best for me. There is no reciprocal C11-stule sequential point in the hypervisor, from what I understand.