Paths

Table of Contentst

Add atomic_testandset_acq_* on arm64
ClosedPublic
Actions

Authored by andrew on Dec 20 2021, 2:04 PM.

Details

Reviewers

markj
kib
manu
alc

Group Reviewers

arm64

Commits

rG713b7f1a3b6d: Add atomic_testandset_acq_* on arm64
rG02c16e2174ba: Add atomic_testandset_acq_* on arm64

Summary

We only need to include sys/_atomic_subword.h on arm64 yo provide
atomic_testandset_acq_long. Add an implementation in the arm64 atomic.h
based on the existing atomic_testandset macro.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

andrew created this revision.Dec 20 2021, 2:04 PM

Herald added a reviewer: manu. · View Herald TranscriptDec 20 2021, 2:04 PM

Herald added subscribers: emaste, imp. · View Herald Transcript

andrew requested review of this revision.Dec 20 2021, 2:04 PM

Harbormaster completed remote builds in B43502: Diff 100355.Dec 20 2021, 2:04 PM

It is strange, does anybody need atomic_testand{set,clear}_rel_<type>? It seems that LSE instructions LDCLRA and LDEORA both acquire and release semantic, am I right that we end up use them?

The only user of testand{set,clear} with a barrier in the kernel is sys/dev/random/fenestrasX/fx_pool.c.

Based on the description of the LDCLRA instruction LD*A and LD*AL will have acquire semantics, LD*AL and LD*L will have release semantics, so only LD*AL will have both. With this change the kernel can use LD{SET,CLR}A, so will have acquire semantics.

kib accepted this revision.Dec 21 2021, 12:02 PM

This revision is now accepted and ready to land.Dec 21 2021, 12:02 PM

Closed by commit rG02c16e2174ba: Add atomic_testandset_acq_* on arm64 (authored by andrew). · Explain WhyDec 21 2021, 1:07 PM

This revision was automatically updated to reflect the committed changes.

andrew added a commit: rG02c16e2174ba: Add atomic_testandset_acq_* on arm64.

Just a general comment about atomic_testandset_acq_* and similar RMW operations - I'm fairly sure that operations like this do not behave as described in atomic(9):

For example, to add two long integers ensuring that all prior loads and
stores are completed before the addition is performed, use
atomic_add_rel_long().

In particular, for both the LL/SC and LSE variants of the arm64 implementation, the acquire semantics apply only to the load portion of the operation. In other words, there is nothing preventing a subsequent load from being reordered with the store. But that example seems to suggest that the store will be visible before subsequent loads.

In D33587#761357, @markj wrote:
Just a general comment about atomic_testandset_acq_* and similar RMW operations - I'm fairly sure that operations like this do not behave as described in atomic(9):
For example, to add two long integers ensuring that all prior loads and
stores are completed before the addition is performed, use
atomic_add_rel_long().

Oops, I meant to quote this section instead:

For example, to subtract two integers ensuring that the subtraction is
completed before any subsequent loads and stores are performed, use
atomic_subtract_acq_int().

The C17 standard defines _acq as

For memory_order_acquire, memory_order_acq_rel, and memory_order_seq_cst, a load operation performs an acquire operation on the affected memory location.

For me this sounds exactly as implemented, i.e. for acquire, the load portion of RWM is acquire. Symmetrically for release, the store portion of RMW is release.

There is no reordering terminology. Probably our man page requires some tweak.

kib added a reviewer: alc.Dec 29 2021, 10:52 PM

There was a discussion on the arch list about this a few years ago: https://lists.freebsd.org/pipermail/freebsd-arch/2014-October/016160.html.

With LL/SC atomics we may nove a later load before the store-exclusive, but it will be a successful store as any failed store-exclusive would rerun the load-exclusive and will have acquire semantics.

The one place we need to be careful is if we have both acquire and release semantics (accesses from before and after may be reordered to be within the read-modify-write sequence, so be reordered with each other), however as we don't have these in atomic(9) it shouldn't be an issue.

In D33587#761650, @andrew wrote:

There was a discussion on the arch list about this a few years ago: https://lists.freebsd.org/pipermail/freebsd-arch/2014-October/016160.html.

Was there a discussion about specific issue Mark noted? Because the points from the mail you reference are all gone for long time. Atomic_load_acq and store_rel on x86 fully utilize TSO model provided by hardware. We also have atomic_thread_fence_XXX() primitives which provide desired memop-less fences without breaking C11 memory model.

With LL/SC atomics we may nove a later load before the store-exclusive, but it will be a successful store as any failed store-exclusive would rerun the load-exclusive and will have acquire semantics.

The one place we need to be careful is if we have both acquire and release semantics (accesses from before and after may be reordered to be within the read-modify-write sequence, so be reordered with each other), however as we don't have these in atomic(9) it shouldn't be an issue.

This is the common (allowed) issue with acq_rel operation, basically this is one of the aspects they are different from true sequentially consistent op (but there are more).

In D33587#761528, @kib wrote:
The C17 standard defines _acq as
For memory_order_acquire, memory_order_acq_rel, and memory_order_seq_cst, a load operation performs an acquire operation on the affected memory location.
For me this sounds exactly as implemented, i.e. for acquire, the load portion of RWM is acquire. Symmetrically for release, the store portion of RMW is release.

There is no reordering terminology. Probably our man page requires some tweak.

Yes, I think atomic(9) needs to be updated to reflect that acquire semantics for atomic_<op>_acq_<type>() apply specifically to the load portion of the operation. At least one consumer, smr_enter(), gets this wrong. There, the store must complete before any subsequent loads. On amd64 this is true since a load will not be reordered before a locked instruction (i.e., atomic_add_acq_int() behaves like __storeload_barrier()), but on arm64 there is no guarantee.

andrew added a commit: rG713b7f1a3b6d: Add atomic_testandset_acq_* on arm64.Jan 4 2022, 10:51 AM

Revision Contents
Changeset List

Path

Size

sys/

arm64/

include/

atomic.h

30 lines

Diff 100415

View Options

sys/arm64/include/atomic.h

Add atomic_testandset_acq_* on arm64ClosedPublicActions