sfence only makes sure about the store-store order, which is not sufficient here.
Details
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
sys/dev/hyperv/vmbus/hv_vmbus_drv_freebsd.c | ||
---|---|---|
120 | FreeBSD's atomic(9) API recently gained more expressive fences and are preferred over mb/wmb. For here I believe what you want is 'atomic_fetch_seq_cst()'. It actually uses a locked addl to a thread-local address instead of 'mfence' (more on this in sys/amd64/include/atomic.h). |
sys/dev/hyperv/vmbus/hv_vmbus_drv_freebsd.c | ||
---|---|---|
120 | I think it is more a question of what is required by the specification. If it is stated that mfence should be issued, so be it. Note that atomic_fetch_seq_cst() is an implementation of the C11 model operation, and indeed in the current set of the atomics on FreeBSD might be implemented as mfence, or more optimally, as locked op. But we might change the implementation. If the spec is formulated in C11 terms, we would in fact have the ABI compatibility requirement between our atomics and whatever the hypervisor uses. Practically speaking, I suspect that locked op is enough for your purposes, and fetch_seq_cst() is faster than mfence. |
sys/dev/hyperv/vmbus/hv_vmbus_drv_freebsd.c | ||
---|---|---|
120 | Yeah, I remembered some Solaris folks said "locked" op is faster then mfence/lfence on x86. Ok, let me do some test and switch. Thanks for the information. |
Use atomic_thread_fence_seq_cst as suggested by jhb and kib. It seems to work fine so far
there is more wmb and mb() in the tree. please cleanup them together to switch to atomic(9)
Given that this is a functional change (wmb -> mfence like op) I think this should be standalone. If you want to do a sweep to convert mb/wmb to corresponding atomic_fence_*(), I think that should be separate from this change.
I'm afraid atomic_thread_fence_seq_cst() here has a potential issue in the case of UP kernel: it is downgraded to a compiler barrier only, which is incorrect here, because the shared memory here is between the host virtual CPU and the UP guest virtual CPU -- at least 2 CPUs are involved (we assume the guest/host virtual CPUs are running on different physical CPUs)
Good point. We probably have to add a wrapper for it (in case SMP is not defined). We could fix it later, since SMP is by default defined :)
This is why I said that this is the question of what a specification requires. mb() with the proper comment about it being x86-specific and required even on UP config looks best for me. There is no reciprocal C11-stule sequential point in the hypervisor, from what I understand.