fcmpset)
AbandonedPublic
Actions

Authored by kevans on Sep 17 2019, 2:44 AM.

Details

Reviewers

imp
jhb
adrian
jmallett
arichardson

Summary

I have no idea if this works in practice, but the eyeballed assembly looks OK at a glance and it seems to work in single-threaded application...

Each one does ll (containing word) -> drop to C for logic -> sc (containing word)

Test Plan

Tested single-threaded in userland on MALTA/MALTAEL, once-upon-a-time on MALTA64

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 26532

Event Timeline

kevans created this revision.Sep 17 2019, 2:44 AM

Harbormaster completed remote builds in B26527: Diff 62194.Sep 17 2019, 2:44 AM

On second thought, the *cmpset are clearly wrong- they must hop around the sc on comparison failure or (fcmpset at least) will do the wrong thing if the sc failed (assuming a write occurred)... Will fix tomorrow

Fix *cmpset logic; bail out if the comparison fails, rather than attempt a store.

Harbormaster completed remote builds in B26532: Diff 62210.Sep 17 2019, 1:05 PM

kevans added a reviewer: jmallett.Sep 17 2019, 9:47 PM

I would wonder if it makes sense to implement these in terms of cmpset operations on register-sized quantities in C with the appropriate arithmetic shifts, rather than doing the assembly sort of half by-hand. That's maybe less optimal, but do we have performance-critical 8- and 16-bit atomics in performance-critical areas? I've always been skeptical of smaller-than-register atomics and tend to resist them, so that may just be a personal bias. I can't speak to correctness beyond that.

I believe that splitting the operation into two asm blocks with C code between will break at -O0 since any variable access goes via the stack. This might cause the sc to fail.

I would prefer if atomic.h used C11 atomics or the __atomic_foo() builtins with a memory mode instead (in which case the compiler automatically does the masking for smaller sizes). However, it seems that requires GCC4.7+ so it would have to wait until GCC4.2 is gone.
Those builtins will also work for CHERI so using them would reduce the size of the CheriBSD diff, too.

In D21681#473240, @arichardson wrote:

I believe that splitting the operation into two asm blocks with C code between will break at -O0 since any variable access goes via the stack. This might cause the sc to fail.

I would prefer if atomic.h used C11 atomics or the __atomic_foo() builtins with a memory mode instead (in which case the compiler automatically does the masking for smaller sizes). However, it seems that requires GCC4.7+ so it would have to wait until GCC4.2 is gone.
Those builtins will also work for CHERI so using them would reduce the size of the CheriBSD diff, too.

I was unaware of __atomic_blah -- I'll take a look at this. I have an llvm-mips branch that's functional for mips32 (but mips64 kernel is half-fubar- I have a workaround, but it's unclear to me why it works around the issue), and it seems like that may also be sufficient.

In D21681#473203, @jmallett wrote:

I would wonder if it makes sense to implement these in terms of cmpset operations on register-sized quantities in C with the appropriate arithmetic shifts, rather than doing the assembly sort of half by-hand. That's maybe less optimal, but do we have performance-critical 8- and 16-bit atomics in performance-critical areas? I've always been skeptical of smaller-than-register atomics and tend to resist them, so that may just be a personal bias. I can't speak to correctness beyond that.

There are no 8 and 16 bit atomics in use right now, I plan to add one use for 16 bit in something which is used a lot but not contended for the most part. The main reason to have sub-register atomics is damage control in face of concurrent access -- if one CPU sets 1 bit, another sets different bit, whatever conflicts arise they can solve them in whatever manner they "see" fit (which I presume is not a mips thing though). Apart from that you get smaller code as you don't have to prep anything.

In D21681#473240, @arichardson wrote:

I would prefer if atomic.h used C11 atomics or the __atomic_foo() builtins with a memory mode instead (in which case the compiler automatically does the masking for smaller sizes). However, it seems that requires GCC4.7+ so it would have to wait until GCC4.2 is gone.

Funny you mention that, I plan to implement a full set of currently used primitives as wrappers around c11 and then switch amd64 to it. It is going to be a separate header anyone interested can include.

Abandoning in favor of just implementing the atomics needed now in C in terms of atomic_*cmpset_32 (D21822)

Revision Contents
Changeset List

Path

Size

sys/

mips/

include/

atomic.h

473 lines

Diff 62210

View Options

mips: incredibly naive attempt at 8/16-bit atomics (set/clear/add/sub/cmpset/fcmpset)AbandonedPublicActions