Page MenuHomeFreeBSD

Expand generic subword atomic primitives
ClosedPublic

Authored by cem on Dec 30 2019, 3:47 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 25, 12:48 AM
Unknown Object (File)
Mon, Jan 20, 5:05 PM
Unknown Object (File)
Mon, Jan 20, 4:58 PM
Unknown Object (File)
Mon, Jan 20, 4:46 PM
Unknown Object (File)
Mon, Jan 20, 1:57 PM
Unknown Object (File)
Sun, Jan 19, 7:39 PM
Unknown Object (File)
Sat, Jan 18, 8:07 PM
Unknown Object (File)
Sat, Jan 18, 5:23 PM

Details

Summary

Guard generic definitions on ifndef the macro of the same name. To avoid
picking up conflicting generic definitions, some macro defines are added to
various MD machine/atomic.h.

Include _atomic_subword.h in arm and arm64 machine/atomic.h.

For some odd reason, KCSAN only generated some versions of primitives.
Generate the _acq variants of atomic_load.*_8 and atomic_load.*_16. There
are other questionably disabled primitives, but I didn't run into them, so I
left them alone.

Add atomic_subword implementations of atomic_load_acq_{8,16} implemented
using masking and atomic_load_acq_32.

Add generic atomic_subword implementation of atomic_testandset_long(), using
atomic_fcmpset_acq_long().

Test Plan

tinderbox clean

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Which of these are needed on arm64? I created D22966 to add the 8 and 16 bit load_acq & store_rel functions.

Any chance you could be convinced to do the quick copy/paste to add testandclear_long, too, while you're there? If not, no worries- I'll hit it up afterwards. It's another one that Jeff wants for uma/bitstring stuff.

sys/sys/_atomic_subword.h
214 ↗(On Diff #66131)

"early"

Which of these are needed on arm64? I created D22966 to add the 8 and 16 bit load_acq & store_rel functions.

I believe arm64 is still missing atomic_testandset_long().

Any chance you could be convinced to do the quick copy/paste to add testandclear_long, too, while you're there? If not, no worries- I'll hit it up afterwards. It's another one that Jeff wants for uma/bitstring stuff.

Sure. I'm not 100% sure my implementation is correct, but the transformation to testandclear_long is pretty trivial.

sys/sys/_atomic_subword.h
214 ↗(On Diff #66131)

Thanks, will fix

cem marked an inline comment as done.
  • Add testandclear_long variant
  • Fix "earlier" typo/braino

OK for sys/_atomic_subword.h

This revision is now accepted and ready to land.Jan 2 2020, 7:56 PM

Any chance you could be convinced to do the quick copy/paste to add testandclear_long, too, while you're there? If not, no worries- I'll hit it up afterwards. It's another one that Jeff wants for uma/bitstring stuff.

I've added them in D23019.

This revision now requires review to proceed.Jan 5 2020, 1:10 AM

I only looked at _atomic_subword.h.

The logic looks good except the c&p error.

It might be good to add a .c file to #ifndef FOO #warn "Implement FOO".

sys/sys/_atomic_subword.h
192–204 ↗(On Diff #66354)

I don't like that this implemntation silently hides misalignment, and indeed produces a wrong result. Consider what happens when the pointer is aligned to byte 3 of the 4-byte word. I think it would be better to do a misaligned load and maybe trap.

Here's a thought:

uint16_t
atomic_load_acq_16(volatile uint16_t *p)
{
	union { uint32_t x32; uint16_t x16[2]; } u;
	int i;

	i = ((uintptr_t)p >> 1) & 1;
	p = (uint16_t *)(((uintptr_t)p) & ~(uintptr_t)2); /* Keep mis-alignment */
	u.x32 = atomic_load_acq_32(p);
	return (u.x16[i]);
}

I see the (f)cmpset_16 here has the same problem, and I realize you probably don't care much about misaligned pointers. I wouldn't insist on a change. I just find this behavior surprising.

238 ↗(On Diff #66354)

s/testandset/testandclear/

sys/sys/_atomic_subword.h
192–204 ↗(On Diff #66354)

Actually I think maybe the whole thing can be addressed with this

/* Align to word, but keep 2-byte misalignment if present. */
#define	_ATOMIC_ALIGN_HWORD_PTR(p)	\
    (uint32_t *)((uintptr_t)(p) & ~(uintptr_t)2)

#if _BYTE_ORDER == _BIG_ENDIAN
#define	_ATOMIC_HWORD_SHIFT(p)		\
    ((1 - (((uintptr_t)(p) >> 1) & 1)) * 2 * NBBY)
#else
#define	_ATOMIC_HWORD_SHIFT(p)		\
    ((((uintptr_t)(p) >> 1) & 1) * 2 * NBBY)
#endif
sys/sys/_atomic_subword.h
192–204 ↗(On Diff #66354)

Hmm, I thought I had tested all the flavors of misalignment on both BE/LE with the original implementation. The math was intended to work out such that you align to the containing word and shift in/out of the correct position within the containing word... I'll take another look again, though.

sys/arm64/include/atomic.h
638 ↗(On Diff #66354)

Is this still needed after rS356550?

sys/sys/_atomic_subword.h
192–204 ↗(On Diff #66354)

Right, the problem is that with misalignment at 3/4, there is no containing word for a 2-byte op, since it spans two 4-byte words. What happens now is we round down to the 4-byte boundary. The proposal above is to round down to 1/4 and pass along the misalignment to the 4-byte op.

I'm fine leaving this out of the current diff and addressing in follow up, if preferred.

I don't think expecting misaligned atomics to work is reasonable. We could KASSERT alignment? Asking for 2-byte alignment isn't asking much.

In D22963#506510, @cem wrote:

I don't think expecting misaligned atomics to work is reasonable. We could KASSERT alignment? Asking for 2-byte alignment isn't asking much.

I don't disagree, and like I said I'm fine not addressing this. My surprise is that we silently turn the misalignment into a wrong result. No 2-byte alignment is not asking much, but we do use attribute((packed)) in places, and squelch the alignment warnings with Wno-address-of-packed-member.

Ahh, sorry, I see what you're pointing out now.

In D22963#506510, @cem wrote:

I don't think expecting misaligned atomics to work is reasonable. We could KASSERT alignment? Asking for 2-byte alignment isn't asking much.

KASSERT here is a no-go without some systm.h surgery to make sure it's available due to the systm.h dependency on machine/atomic.h, unfortunately. I'll follow up on it after this goes in.

sys/arm64/include/atomic.h
638 ↗(On Diff #66354)

No, it isn’t. The patch hasn’t been rebased passed that revision yet.

sys/sys/_atomic_subword.h
192–204 ↗(On Diff #66354)

If (p & 0x1) panic(“misaligned”);

238 ↗(On Diff #66354)

Will fix, thanks

Rebase over r356550. Thanks!

cem marked 2 inline comments as done and an inline comment as not done.

Fix pasto in ifndef conditional

sys/sys/_atomic_subword.h
192–204 ↗(On Diff #66354)

Unfortunately, including panic()'s systm.h would be extremely header-pollutiony, and other hacks are unappealing. I think we just trust people not to misalign 16-bit atomics on platforms that do not natively support atomics smaller than a 32-bit word. I don't think such platforms would support an atomic 32-bit word operation misaligned by 2 bytes anyway.

_atomic_subword.h LGTM.

sys/sys/_atomic_subword.h
192–204 ↗(On Diff #66354)

I'm okay with not addressing this.

This revision is now accepted and ready to land.Jan 25 2020, 6:38 PM
sys/sys/_atomic_subword.h
229–232 ↗(On Diff #67278)

Just noticed, why do we have acquire semantics here? atomic(9) does not say these should have any memory barriers. I think plain atomics should be sufficient for the logic here, or am I misunderstanding what the barrier is providing?

sys/sys/_atomic_subword.h
229–232 ↗(On Diff #67278)

I don't recall exactly what I was thinking. I think plausibly I was imagining this primitive not making sense without acquire semantics? Of course, to provide those semantics, I think we only need _acq on the first atomic_load, not on the rest.

There are very few consumers in tree: qlnxe(4), and hyperv's vmbus(4) and netvsc(4). (Also my out of tree fxrng set uses it, which is my interest in adding it to _atomic_subword.)

In vmbus(4)'s case, it is used as an assertion, as if it has acquire semantics; certainly operations after it in program order must not be reordered before it.

In netvsc(4) it is used as an allocator from a bitmap. The nearby code could tolerate reordering, but you wouldn't want to allow the CPU to reorder consumers of the allocator before allocation. I guess there might be a data dependency here that prevents reordering anyway. I'm not super familiar with when it would be acceptable to elide otherwise correct semantics on the basis of a data dependency.

qlnxe(4), in proper qlnx* style, does not actually use the testand property of this primitive; it would be fine with atomic_set_foo.

(In my out of tree use, I also want to prevent reordering of loads/stores before the testandset.)

sys/sys/_atomic_subword.h
229–232 ↗(On Diff #67278)

I'm not arguing against providing acq variants, if needed, but ISTM no caller can actually rely on the acquire semantics in the present procedures, because they are not documented and the non-generic arch implementations may not provide them.

D22702 / D22703 will also use this, when available.

sys/sys/_atomic_subword.h
229–232 ↗(On Diff #67278)

If it is the case that this primitive only makes sense with acq semantics, I think it would make sense to fix the documentation. But I'm not sure.

From discussion with kib on IRC I think the right thing to do here is provide the relaxed variant, and then for my use, also provide a generic acquire semantics version. I'll update the patch. I'm not sure if the best generic way to provide acquire semantics is to use acquire primitives, or to slap a fence in somewhere. On platforms with cheap acquire primitives, the fence is costlier, but if a platform is just implementing acquire with fenced primitives, many fences is probably worse.

I guess we can slap a single fence in and let platforms provide better MD variants themselves if it is not optimal.

  • Rework atomic_testand*_long with non-acquire semantics
  • Add atomic_testandset_acq_long with acquire semantics
  • As a side effect, re-include _subword_atomic.h in arm64 atomic.h
  • Ad a side effect, some additional new support needed in KCSAN
  • On x86, atomic_testandset_long already had these semantics, so just add the new definitions.

Things may not be ideal or even a bit ugly, but tinderbox is building now and
this kind of patch rots quickly unfortunately. I'd like to get it out of my
tree promptly.

This revision now requires review to proceed.Mar 24 2020, 3:54 AM
This revision was not accepted when it landed; it landed in state Needs Review.Mar 25 2020, 11:13 PM
This revision was automatically updated to reflect the committed changes.