Page MenuHomeFreeBSD

Implement a translation from FreeBSD atomic API to gcc intrinsics
Needs ReviewPublic

Authored by mjg on Feb 13 2020, 2:23 PM.

Details

Reviewers
kib
markj
jeff
jhb
Summary

Which are of course supported by clang.

Primary motivation is to eliminate problems like this (mtx lock):

0xffffffff80cb873b <+139>:	xor    %eax,%eax
0xffffffff80cb873d <+141>:	lock cmpxchg %r14,0x18(%rbx)
0xffffffff80cb8743 <+147>:	sete   %cl
0xffffffff80cb8746 <+150>:	test   %cl,%cl
0xffffffff80cb8748 <+152>:	je     0xffffffff80cb8798 <vfs_ref+232>

patched:

0xffffffff80caaa3b <+139>:	xor    %eax,%eax
0xffffffff80caaa3d <+141>:	lock cmpxchg %r14,0x18(%rbx)
0xffffffff80caaa43 <+147>:	jne    0xffffffff80caaa8e <vfs_ref+222>

Note that clang apparently now supports indicating the result in flags, meaning the hand-rolled routine can be patched to end up generating equivalent code. Howver, I don't think there is any good reason now to handroll said routines.

Note there is no equivalent for testandclear/readandclear so they are left be.

Also I did not alter load/store routines. In my opinion that part needs to be reworked to be type-friendly to passed arguments and goes beyond the scope of this patch.

The patch does not add routines which were not already present on amd64, but this can be changed later. The intent is to eventually remove all hand-rolled routines.

I provided powerpc64 people with a patch for their platform and asked that they test. Works for me on amd64 of course.

Kernel .text changed only a little bit:

    text      data       bss        dec         hex   filename
22234960   1778275   4505600   28518835   0x1b329b3   GENERIC-NODEBUG.patched/kernel
22340096   1778275   4505600   28623971   0x1b4c463   GENERIC-NODEBUG.stock/kernel

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 29344

Event Timeline

mjg created this revision.Feb 13 2020, 2:23 PM
jhb added a comment.Feb 17 2020, 6:54 PM

So in CheriBSD we just replaced all of atomic(9) with the __atomic API supported in both modern GCC and clang for RISC-V:

https://github.com/CTSRD-CHERI/cheribsd/commit/e9e9ff10652179eaeec911ed804326bee482a0b3#diff-dc47b5c2fe7ae88823267345a6d1ae5c

However, that header could likely just be used on any architecture as none of it is MD. It also does the full atomic API and works with both clang and gcc.

I do think it would be nice to also add untyped variants as part of doing this as well.

There is a risk of trusting the compiler (we had to fix a bug in RISC-V clang I'm not sure is yet upstream for the 32-bit cmpset).

mjg added a comment.Feb 17 2020, 7:16 PM
In D23661#521321, @jhb wrote:

So in CheriBSD we just replaced all of atomic(9) with the __atomic API supported in both modern GCC and clang for RISC-V:

https://github.com/CTSRD-CHERI/cheribsd/commit/e9e9ff10652179eaeec911ed804326bee482a0b3#diff-dc47b5c2fe7ae88823267345a6d1ae5c

However, that header could likely just be used on any architecture as none of it is MD. It also does the full atomic API and works with both clang and gcc.

Well my header is very similar but superior in that interested architectures can pick and alias whatever they want however they see fit, most notably not running into any 32 vs 64-bit problems or forcing weak/strong fcmpset when other variant is desired. Otherwise it's the same stuff.

I do think it would be nice to also add untyped variants as part of doing this as well.

Untyped variants of what specifically? All types or just ptr? I think this is separate effort.

Note there are 2 nits I'll fix later:

  1. the clear routine can push negation one level below
  2. failing (f)cmpset does not have to post a fence
kib added a comment.Feb 17 2020, 7:52 PM

Biggest objection is that there is still no ABI for atomic stuff even on x86, so each compiler (version) is free to choose the implementation that it finds suitable and that non necessary compatible with the other' compiler implementation.

I.e., is the generated code compatible with existing atomic.h on x86 is the valid question as well.

mjg updated this revision to Diff 68471.Feb 17 2020, 7:52 PM
  • force strong cmpset since this seems to be the requirement right now
  • push ~ lower in atomic_clear
  • use relaxed barrier for failing (f)cmpset
mjg added a comment.Feb 17 2020, 8:00 PM

I think it's a fair worry that mixing hand-rolled atomics with compiler intrinsics may lead to trouble, but it's not hard to remedy.

Note the 2 mentioned missing testandclear and readandclear can be implemented without significant difficulty (the latter is a oneliner, the former worst case can be a cmpset loop. I grepped only 2 users.)

rlibby added a subscriber: rlibby.Feb 28 2020, 4:45 PM
arichardson added inline comments.Feb 29 2020, 4:12 PM
sys/sys/_gcc_atomic.h
52

Why is success_memorder always relaxed?

mjg added inline comments.Feb 29 2020, 4:16 PM
sys/sys/_gcc_atomic.h
52

ops, that's breakage introduced in subsequent update. of course it was meant to be for success, will fix later

In D23661#521387, @mjg wrote:

I think it's a fair worry that mixing hand-rolled atomics with compiler intrinsics may lead to trouble, but it's not hard to remedy.

Note the 2 mentioned missing testandclear and readandclear can be implemented without significant difficulty (the latter is a oneliner, the former worst case can be a cmpset loop. I grepped only 2 users.)

atomic_testandset/atomic_testandclear could be implemented with __atomic_fetch_or/__atomic_fetch_and, but we'd have to check that the intrinsics generate no worse code (currently we use bts/btr). We have more planned uses of these but were waiting on generic implementations of long variants in D22963 (which indeed uses a cmpset loop).

mjg added a comment.EditedFeb 29 2020, 7:49 PM
In D23661#521387, @mjg wrote:

I think it's a fair worry that mixing hand-rolled atomics with compiler intrinsics may lead to trouble, but it's not hard to remedy.

Note the 2 mentioned missing testandclear and readandclear can be implemented without significant difficulty (the latter is a oneliner, the former worst case can be a cmpset loop. I grepped only 2 users.)

atomic_testandset/atomic_testandclear could be implemented with __atomic_fetch_or/__atomic_fetch_and, but we'd have to check that the intrinsics generate no worse code (currently we use bts/btr). We have more planned uses of these but were waiting on generic implementations of long variants in D22963 (which indeed uses a cmpset loop).

No, they modify the entire memory operand and in fact are equivalent to atomic_set/atomic_clear (and are used in this very patch to implement them). testandset only modifies the bit, which can be preserved with a cmpset loop. Otherwise you can lose concurrent updates.

Now that I wrote the above this may actually be doable, I'll have to take a sober look later.

sys/sys/_gcc_atomic.h
52

sigh, I meant the actual barrier was meant for success and relaxed for failure.