Paths

Table of Contentst

Differential D35989

wg: Use plain loads and stores for so_so4 and so_so6.
AbandonedPublic
Actions

Authored by jhb on Jul 29 2022, 4:57 PM.

Details

Reviewers

markj

Summary

Atomics are not needed to read the variables after NET_EPOCH_ENTER due
to the embedded acquire barrier in epoch_enter.

Atomics are also not needed to write the variables prior to
NET_EPOCH_WAIT due to the embedded memory fence in epoch_wait.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 46681
Build 43570: arc lint + arc unit

Event Timeline

jhb created this revision.Jul 29 2022, 4:57 PM

Herald added a subscriber: imp. · View Herald TranscriptJul 29 2022, 4:57 PM

jhb requested review of this revision.Jul 29 2022, 4:57 PM

Harbormaster completed remote builds in B46681: Diff 108714.Jul 29 2022, 4:57 PM

jhb added a child revision: D35990: wg: Use plain loads/stores for keepalive management..Jul 29 2022, 4:57 PM

Hi @jhb ,
I'm a little confused as there's no sys/dev/wg source directory in current HEAD.

In D35989#817522, @zlei.huang_gmail.com wrote:

Hi @jhb ,
I'm a little confused as there's no sys/dev/wg source directory in current HEAD.

This is a diff I've primarily uploaded for markj@ to review. I have a tree that imports the wg(4) driver from wireguard-freebsd into sys/dev/wg and this change (along with others in the stack) are further changes to the current wireguard-freebsd driver.

emaste added a subscriber: emaste.Aug 2 2022, 4:28 PM

markj accepted this revision.Aug 8 2022, 5:13 PM

This revision is now accepted and ready to land.Aug 8 2022, 5:13 PM

jhb added a parent revision: D36909: Import the WireGuard driver from zx2c4.com..Oct 7 2022, 9:59 PM

I don't see how the fences have anything to do with it.

The point of atomic load/stores is so that the compiler does not get ideas of splitting them into multiple instructions and does not get ideas about re-reading them, both of which are no longer guaranteed with the patch at hand.

In D35989#838250, @mjg wrote:

I don't see how the fences have anything to do with it.

The point of atomic load/stores is so that the compiler does not get ideas of splitting them into multiple instructions and does not get ideas about re-reading them, both of which are no longer guaranteed with the patch at hand.

We always assume that a load of an aligned pointer is atomic.

In D35989#838826, @markj wrote:

In D35989#838250, @mjg wrote:

I don't see how the fences have anything to do with it.

The point of atomic load/stores is so that the compiler does not get ideas of splitting them into multiple instructions and does not get ideas about re-reading them, both of which are no longer guaranteed with the patch at hand.

We always assume that a load of an aligned pointer is atomic.

That assumption is not valid, although it almost always works out.

What is a serious problem is allowing the compiler to decide to re-read these values, meaning multiple uses of 'so4' in the same func can end up with different values. While the sample above is so short I don't believe any compiler would do it, I see 0 reason to make the proposed change. If anything, it is better to add these loads if in any doubt than to skip them.

In D35989#839387, @mjg wrote:

In D35989#838826, @markj wrote:

In D35989#838250, @mjg wrote:

I don't see how the fences have anything to do with it.

The point of atomic load/stores is so that the compiler does not get ideas of splitting them into multiple instructions and does not get ideas about re-reading them, both of which are no longer guaranteed with the patch at hand.

We always assume that a load of an aligned pointer is atomic.

That assumption is not valid, although it almost always works out.

It's valid for the platforms FreeBSD runs on, this assumption is pervasive in the kernel. (Try enabling KCSAN.)

Above you say that the purpose of atomic_load/store is to provide atomicity of operations, while below you say it's to ensure that values are read or written exactly once so as to satisfy assumptions of some lockfree code. A third reason is to provide sanitizers with a list of "known" unserialized memory accesses.

What is a serious problem is allowing the compiler to decide to re-read these values, meaning multiple uses of 'so4' in the same func can end up with different values. While the sample above is so short I don't believe any compiler would do it, I see 0 reason to make the proposed change. If anything, it is better to add these loads if in any doubt than to skip them.

Looking at the code again, I agree. The accesses in wg_send() should use atomic_load to ensure that the compiler does not reload a socket pointer after it's compared with NULL. The accesses in wg_socket_set() are serialized by the softc mutex, so there we are merely assuming that the stores are atomic and atomic_store does not accomplish anything except educating KCSAN. Clearly I didn't look carefully enough when I reviewed the first time, sorry about that.

We do not have a consistent policy on whether to use atomics when performing serialized stores to memory which is accessed in unserialized contexts, such as here. KCSAN (as it is implemented in FreeBSD, though it's easy to change) and Linux require both sides to use atomics (READ_ONCE/WRITE_ONCE). So, what should we do in general?

In D35989#839569, @markj wrote:

In D35989#839387, @mjg wrote:

In D35989#838826, @markj wrote:

In D35989#838250, @mjg wrote:

I don't see how the fences have anything to do with it.

The point of atomic load/stores is so that the compiler does not get ideas of splitting them into multiple instructions and does not get ideas about re-reading them, both of which are no longer guaranteed with the patch at hand.

We always assume that a load of an aligned pointer is atomic.

That assumption is not valid, although it almost always works out.

It's valid for the platforms FreeBSD runs on, this assumption is pervasive in the kernel. (Try enabling KCSAN.)

I see part of the point was lost, so I'm going to quote myself:

The point of atomic load/stores is so that the compiler does not get ideas of splitting them into multiple instructions

(point one of two)

So it is not about architectures getting torn loads/stores when the compiler emitted a load/store to a properly aligned var. It is about a hypothetical possibility of the compiler splitting the op -- for example generating 8 one-byte loads instead of 1 eight-byte load. While I never ran into such insanity myself, I do know it happened in the Linux land.

We do not have a consistent policy on whether to use atomics when performing serialized stores to memory which is accessed in unserialized contexts, such as here. KCSAN (as it is implemented in FreeBSD, though it's easy to change) and Linux require both sides to use atomics (READ_ONCE/WRITE_ONCE). So, what should we do in general?

What we should do is play it safe and make it a policy that unlocked loads and stores always use atomic_* primitives. To that end type checking needs to be added with atomic vars becoming opaque so that accidental direct access fails to compile.

This is easy for all types except for pointers, which will perhaps require some hackery to retain checking for the intended pointed-to-type apart from it being accessed with atomic.

In D35989#839724, @mjg wrote:

In D35989#839569, @markj wrote:

In D35989#839387, @mjg wrote:

In D35989#838826, @markj wrote:

In D35989#838250, @mjg wrote:

I don't see how the fences have anything to do with it.

The point of atomic load/stores is so that the compiler does not get ideas of splitting them into multiple instructions and does not get ideas about re-reading them, both of which are no longer guaranteed with the patch at hand.

We always assume that a load of an aligned pointer is atomic.

That assumption is not valid, although it almost always works out.

It's valid for the platforms FreeBSD runs on, this assumption is pervasive in the kernel. (Try enabling KCSAN.)

I see part of the point was lost, so I'm going to quote myself:

The point of atomic load/stores is so that the compiler does not get ideas of splitting them into multiple instructions

(point one of two)

So it is not about architectures getting torn loads/stores when the compiler emitted a load/store to a properly aligned var. It is about a hypothetical possibility of the compiler splitting the op -- for example generating 8 one-byte loads instead of 1 eight-byte load. While I never ran into such insanity myself, I do know it happened in the Linux land.

We do not have a consistent policy on whether to use atomics when performing serialized stores to memory which is accessed in unserialized contexts, such as here. KCSAN (as it is implemented in FreeBSD, though it's easy to change) and Linux require both sides to use atomics (READ_ONCE/WRITE_ONCE). So, what should we do in general?

What we should do is play it safe and make it a policy that unlocked loads and stores always use atomic_* primitives. To that end type checking needs to be added with atomic vars becoming opaque so that accidental direct access fails to compile.

This is easy for all types except for pointers, which will perhaps require some hackery to retain checking for the intended pointed-to-type apart from it being accessed with atomic.

Actually, this is not hard if we just use C11 atomics and use _Atomic directly, even for pointers.

jhb added inline comments.Oct 15 2022, 12:13 AM

sys/dev/wg/if_wg.c
790	The atomics here do not seem to be needed. The fence in NET_EPOCH_WAIT means the compiler cannot reorder the writes of the new values into the socket later than the calls to soclose(), and the fence in NET_EPOCH_ENTER below in the read should ensure that the reading code below can't read the old values? I can drop this diff, but these specific atomics do seem over-much.
894	READ_ONCE semantics make sense here, yes.

jhb abandoned this revision.Oct 15 2022, 12:16 AM

Revision Contents
Changeset List

Path

Size

sys/

dev/

wg/

if_wg.c

12 lines

Diff 108714

View Options

wg: Use plain loads and stores for so_so4 and so_so6.AbandonedPublicActions