Page MenuHomeFreeBSD

mips: incredibly naive attempt at 8/16-bit atomics (set/clear/add/sub/cmpset/fcmpset)
AbandonedPublic

Authored by kevans on Sep 17 2019, 2:44 AM.
Tags
None
Referenced Files
F106662778: D21681.diff
Fri, Jan 3, 2:06 PM
Unknown Object (File)
Nov 21 2024, 1:07 AM
Unknown Object (File)
Nov 6 2024, 1:08 PM
Unknown Object (File)
Nov 2 2024, 3:18 AM
Unknown Object (File)
Oct 25 2024, 1:02 AM
Unknown Object (File)
Oct 23 2024, 12:59 PM
Unknown Object (File)
Oct 23 2024, 12:39 PM
Unknown Object (File)
Oct 19 2024, 3:55 PM
Subscribers

Details

Summary

I have no idea if this works in practice, but the eyeballed assembly looks OK at a glance and it seems to work in single-threaded application...

Each one does ll (containing word) -> drop to C for logic -> sc (containing word)

Test Plan

Tested single-threaded in userland on MALTA/MALTAEL, once-upon-a-time on MALTA64

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 26527

Event Timeline

On second thought, the *cmpset are clearly wrong- they must hop around the sc on comparison failure or (fcmpset at least) will do the wrong thing if the sc failed (assuming a write occurred)... Will fix tomorrow

kevans edited the summary of this revision. (Show Details)

Fix *cmpset logic; bail out if the comparison fails, rather than attempt a store.

I would wonder if it makes sense to implement these in terms of cmpset operations on register-sized quantities in C with the appropriate arithmetic shifts, rather than doing the assembly sort of half by-hand. That's maybe less optimal, but do we have performance-critical 8- and 16-bit atomics in performance-critical areas? I've always been skeptical of smaller-than-register atomics and tend to resist them, so that may just be a personal bias. I can't speak to correctness beyond that.

arichardson added a subscriber: arichardson.

I believe that splitting the operation into two asm blocks with C code between will break at -O0 since any variable access goes via the stack. This might cause the sc to fail.

I would prefer if atomic.h used C11 atomics or the __atomic_foo() builtins with a memory mode instead (in which case the compiler automatically does the masking for smaller sizes). However, it seems that requires GCC4.7+ so it would have to wait until GCC4.2 is gone.
Those builtins will also work for CHERI so using them would reduce the size of the CheriBSD diff, too.

I believe that splitting the operation into two asm blocks with C code between will break at -O0 since any variable access goes via the stack. This might cause the sc to fail.

I would prefer if atomic.h used C11 atomics or the __atomic_foo() builtins with a memory mode instead (in which case the compiler automatically does the masking for smaller sizes). However, it seems that requires GCC4.7+ so it would have to wait until GCC4.2 is gone.
Those builtins will also work for CHERI so using them would reduce the size of the CheriBSD diff, too.

I was unaware of __atomic_blah -- I'll take a look at this. I have an llvm-mips branch that's functional for mips32 (but mips64 kernel is half-fubar- I have a workaround, but it's unclear to me why it works around the issue), and it seems like that may also be sufficient.

I would wonder if it makes sense to implement these in terms of cmpset operations on register-sized quantities in C with the appropriate arithmetic shifts, rather than doing the assembly sort of half by-hand. That's maybe less optimal, but do we have performance-critical 8- and 16-bit atomics in performance-critical areas? I've always been skeptical of smaller-than-register atomics and tend to resist them, so that may just be a personal bias. I can't speak to correctness beyond that.

There are no 8 and 16 bit atomics in use right now, I plan to add one use for 16 bit in something which is used a lot but not contended for the most part. The main reason to have sub-register atomics is damage control in face of concurrent access -- if one CPU sets 1 bit, another sets different bit, whatever conflicts arise they can solve them in whatever manner they "see" fit (which I presume is not a mips thing though). Apart from that you get smaller code as you don't have to prep anything.

I would prefer if atomic.h used C11 atomics or the __atomic_foo() builtins with a memory mode instead (in which case the compiler automatically does the masking for smaller sizes). However, it seems that requires GCC4.7+ so it would have to wait until GCC4.2 is gone.

Funny you mention that, I plan to implement a full set of currently used primitives as wrappers around c11 and then switch amd64 to it. It is going to be a separate header anyone interested can include.

Abandoning in favor of just implementing the atomics needed now in C in terms of atomic_*cmpset_32 (D21822)