atomic.9: fix description of acquire and release The ordering point is not the atomic operation itself, but the load for acquire or store for release done as part of the atomic. This does not matter for atomic_load_acq and atomic_store_rel, but does matter for RWM operations. atomic.9: provide fine details about CAS memory model MD semantic
Details
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
This is my reading of both lse and llcs implementation of CAS ops on aarch64.
I am not sure should it be documented even if I am right, but I tend to think that it is worth it.
On arm64 atomics with an acquire then later memory operations can move before the store, however the store needs to succeed as if it doesn't then we will execute the load-acquire again.
I don't think the CAS instructions allow this. The pseudo-code has the comment All observers in the shareability domain observe the following load and store atomically before listing the load and store operations. This seems to indicate if the load has been observed the store has too.
Note that the llsc instructions have some subtly when implementing fully sequentially consistent semantics. As the acquire and release are on different instructions this means memory operations may move into the sequence & be swapped. As we don't implement them in atomic(9) this shouldn't be an issue, however will be if we latter add them.
But if CAS failed, could we claim that the load was not observed?
Anyway, I do not question the behavior.
Note that the llsc instructions have some subtly when implementing fully sequentially consistent semantics. As the acquire and release are on different instructions this means memory operations may move into the sequence & be swapped. As we don't implement them in atomic(9) this shouldn't be an issue, however will be if we latter add them.
But then don't we have a similar (or even larger) problem on ppc and risc-v, where the cas is fully relaxed?
| share/man/man9/atomic.9 | ||
|---|---|---|
| 251 | The Tn macro is dead, you can check the linter warning with mandoc -T lint ./atomic.9. | |
| share/man/man9/atomic.9 | ||
|---|---|---|
| 251 | Why? | |
Why?
The linter command I posted says it is useless and does nothing, iirc.
Did groff drop it as well?
Yes. The bottom of the groff_mdoc(7) manual says: "As of groff 1.23, ‘Tn’ no longer changes the type size; this functionality may return in the next release."
I've known about the issue on Linux
This is the fix for Linux issue https://github.com/torvalds/linux/commit/8e86f0b409a44193f1587e87b69c5dcf8f65be67. It's not been a problem on FreeBSD as we don't have atomic operations with a full barrier in atomic(9).
Yes, this is what I was remembering. smr_enter has an #ifdef for handling a case where this would be an issue. While it is true that there is not a problem with the primitives in atomic.h themselves, I am afraid of their misuse in the sense of expecting x86-like behavior elsewhere.
@kib I am supportive of a change like this. However, in regards to the arm64 text, I don't think that it will lead the reader to the intended conclusion. At present, you are trying to describe what the operations do provide, but that leaves the reader to infer what they don't provide. I think that we need to be more direct, and describe what the problem is. That for the lack of a better word, the load and store making up the operation are not performed "indivisibly", and so reordering of other accesses with respect to one or the other can occur.
| share/man/man9/atomic.9 | ||
|---|---|---|
| 249 | ||
| 256 | The original is ok, but IMHO my suggestion is a bit clearer. | |
| 261 | ||
| 266 | ||
| 267 | My recollection is that this is not specific to the LL/SC variant, it is true of the LSE atomic instruction as well. Despite the fact that it is a single instruction, its memory accesses are effectively separate from the POV of memory access ordering. Is that right? | |
| share/man/man9/atomic.9 | ||
|---|---|---|
| 267 | I believe yes, this follows from the description of CASAL -> reference to J1.1.3.101 MemAtomic, where the pseudocode has separate load and store. I believe andrew@ explained the comment All observers in the shareability domain observe the following load and store atomically, as implying exclusive location ownership, and not as the indivisible event of load+store. | |
| share/man/man9/atomic.9 | ||
|---|---|---|
| 247 | Doesn't this caveat apply to RMW atomics generally, not just atomic_(f)cmpset? e.g., atomic_testandset. | |
| share/man/man9/atomic.9 | ||
|---|---|---|
| 247 | For instance, aarch64 fetchadd is implemented with plain LDXR/STXR instructions and does not provide any fence semantic. readandclear seems to be same. | |
| share/man/man9/atomic.9 | ||
|---|---|---|
| 247 | You mean, we do not provide any variants other than _relaxed? That's true, but maybe one will be added someday. | |
| share/man/man9/atomic.9 | ||
|---|---|---|
| 247 | I mean, only aarch64 [f]cmpset have acq+rel semantic. All other rwm ops seems to be either seq-cst (x86) or fully relaxed. But I did not fully reviewed the implementations to make this claim. | |
It has been a long time since I looked at this man page. In the interest of making sure that these changes were written in an style consistent with the rest of the man page, I decided to take a look, and having done so the following bits concern me:
For example, to subtract two integers ensuring that the subtraction is completed before any subsequent loads and stores are performed, use atomic_subtract_acq_int().
and
For example, to add two long integers ensuring that all prior loads and stores are completed before the addition is performed, use atomic_add_rel_long().
A reasonable interpretation of these statements seems in conflict with the change under discussion here. For example, if you interpret "the subtraction is completed" as the store of the computed difference having been completed, then this is most definitely not true on arm64.
Here's a modest proposal to consider: We update the man page's section "Acquire and Release Operations" to correctly describe what happens in general, e.g., on arm64. Then, the proposed caveats section simply describes how amd64 and i386 differ, and says machine independent code should not rely on this behavior.
| share/man/man9/atomic.9 | ||
|---|---|---|
| 246–251 | ||
| 253–255 | I suggest this because we don't use x86 elsewhere in this man page. We spell out amd64 and i386. | |
| 258–261 | ||
Yes indeed what you noted is a bug in the description of acq/rel for atomics that perform more than one memory access, basically all RWM atomics. I think just fixing that would be enough.