hyperv: Use mfence to make sure about the store-load order for msgs
ClosedPublic
Actions

Authored by sepherosa_gmail.com on Feb 25 2016, 8:07 AM.

Details

Reviewers

kib
howard0su_gmail.com
honzhan_microsoft.com
decui_microsoft.com
jhb
delphij
royger
adrian

Commits

rS296180: hyperv: Use proper fence function to keep store-load order for msgs

Summary

sfence only makes sure about the store-store order, which is not sufficient here.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

sepherosa_gmail.com updated this revision to Diff 13720.Feb 25 2016, 8:07 AM

sepherosa_gmail.com retitled this revision from to hyperv: Use mfence to make sure about the store-load order for msgs.

sepherosa_gmail.com updated this object.

sepherosa_gmail.com edited the test plan for this revision. (Show Details)

sepherosa_gmail.com added reviewers: jhb, adrian, delphij, royger, decui_microsoft.com, honzhan_microsoft.com, howard0su_gmail.com.

jhb added inline comments.Feb 25 2016, 5:26 PM

sys/dev/hyperv/vmbus/hv_vmbus_drv_freebsd.c
120 ↗	(On Diff #13720)	FreeBSD's atomic(9) API recently gained more expressive fences and are preferred over mb/wmb. For here I believe what you want is 'atomic_fetch_seq_cst()'. It actually uses a locked addl to a thread-local address instead of 'mfence' (more on this in sys/amd64/include/atomic.h).

Adding kib@ since he added atomic_fence_*().

kib added inline comments.Feb 25 2016, 7:36 PM

sys/dev/hyperv/vmbus/hv_vmbus_drv_freebsd.c
120 ↗	(On Diff #13720)	I think it is more a question of what is required by the specification. If it is stated that mfence should be issued, so be it. Note that atomic_fetch_seq_cst() is an implementation of the C11 model operation, and indeed in the current set of the atomics on FreeBSD might be implemented as mfence, or more optimally, as locked op. But we might change the implementation. If the spec is formulated in C11 terms, we would in fact have the ABI compatibility requirement between our atomics and whatever the hypervisor uses. Practically speaking, I suspect that locked op is enough for your purposes, and fetch_seq_cst() is faster than mfence.

sepherosa_gmail.com added inline comments.Feb 26 2016, 1:37 AM

sys/dev/hyperv/vmbus/hv_vmbus_drv_freebsd.c
120 ↗	(On Diff #13720)	Yeah, I remembered some Solaris folks said "locked" op is faster then mfence/lfence on x86. Ok, let me do some test and switch. Thanks for the information.

Use atomic_thread_fence_seq_cst as suggested by jhb and kib. It seems to work fine so far

there is more wmb and mb() in the tree. please cleanup them together to switch to atomic(9)

In D5436#116132, @howard0su_gmail.com wrote:

there is more wmb and mb() in the tree. please cleanup them together to switch to atomic(9)

Sure, will do, when my code reading goes across them.

In D5436#116132, @howard0su_gmail.com wrote:

there is more wmb and mb() in the tree. please cleanup them together to switch to atomic(9)

Given that this is a functional change (wmb -> mfence like op) I think this should be standalone. If you want to do a sweep to convert mb/wmb to corresponding atomic_fence_*(), I think that should be separate from this change.

This revision is now accepted and ready to land.Feb 26 2016, 7:47 PM

Closed by commit rS296180: hyperv: Use proper fence function to keep store-load order for msgs (authored by sephe). · Explain WhyFeb 29 2016, 4:58 AM

This revision was automatically updated to reflect the committed changes.

I'm afraid atomic_thread_fence_seq_cst() here has a potential issue in the case of UP kernel: it is downgraded to a compiler barrier only, which is incorrect here, because the shared memory here is between the host virtual CPU and the UP guest virtual CPU -- at least 2 CPUs are involved (we assume the guest/host virtual CPUs are running on different physical CPUs)

In D5436#118019, @decui_microsoft.com wrote:

I'm afraid atomic_thread_fence_seq_cst() here has a potential issue in the case of UP kernel: it is downgraded to a compiler barrier only, which is incorrect here, because the shared memory here is between the host virtual CPU and the UP guest virtual CPU -- at least 2 CPUs are involved (we assume the guest/host virtual CPUs are running on different physical CPUs)

Good point. We probably have to add a wrapper for it (in case SMP is not defined). We could fix it later, since SMP is by default defined :)

In D5436#118303, @sepherosa_gmail.com wrote:

In D5436#118019, @decui_microsoft.com wrote:

I'm afraid atomic_thread_fence_seq_cst() here has a potential issue in the case of UP kernel: it is downgraded to a compiler barrier only, which is incorrect here, because the shared memory here is between the host virtual CPU and the UP guest virtual CPU -- at least 2 CPUs are involved (we assume the guest/host virtual CPUs are running on different physical CPUs)

Good point. We probably have to add a wrapper for it (in case SMP is not defined). We could fix it later, since SMP is by default defined :)

This is why I said that this is the question of what a specification requires. mb() with the proper comment about it being x86-specific and required even on UP config looks best for me. There is no reciprocal C11-stule sequential point in the hypervisor, from what I understand.

Revision Contents
Changeset List

Path

Size

head/

sys/

dev/

hyperv/

vmbus/

hv_vmbus_drv_freebsd.c

4 lines

Diff 13867

View Options

head/sys/dev/hyperv/vmbus/hv_vmbus_drv_freebsd.c

hyperv: Use mfence to make sure about the store-load order for msgsClosedPublicActions