I have no idea if this works in practice, but the eyeballed assembly looks OK at a glance and it seems to work in single-threaded application...
Each one does ll (containing word) -> drop to C for logic -> sc (containing word)
Differential D21681
mips: incredibly naive attempt at 8/16-bit atomics (set/clear/add/sub/cmpset/fcmpset) kevans on Sep 17 2019, 2:44 AM. Authored by Tags None Referenced Files
Subscribers
Details
I have no idea if this works in practice, but the eyeballed assembly looks OK at a glance and it seems to work in single-threaded application... Each one does ll (containing word) -> drop to C for logic -> sc (containing word) Tested single-threaded in userland on MALTA/MALTAEL, once-upon-a-time on MALTA64
Diff Detail
Event TimelineComment Actions On second thought, the *cmpset are clearly wrong- they must hop around the sc on comparison failure or (fcmpset at least) will do the wrong thing if the sc failed (assuming a write occurred)... Will fix tomorrow Comment Actions I would wonder if it makes sense to implement these in terms of cmpset operations on register-sized quantities in C with the appropriate arithmetic shifts, rather than doing the assembly sort of half by-hand. That's maybe less optimal, but do we have performance-critical 8- and 16-bit atomics in performance-critical areas? I've always been skeptical of smaller-than-register atomics and tend to resist them, so that may just be a personal bias. I can't speak to correctness beyond that. Comment Actions I believe that splitting the operation into two asm blocks with C code between will break at -O0 since any variable access goes via the stack. This might cause the sc to fail. I would prefer if atomic.h used C11 atomics or the __atomic_foo() builtins with a memory mode instead (in which case the compiler automatically does the masking for smaller sizes). However, it seems that requires GCC4.7+ so it would have to wait until GCC4.2 is gone. Comment Actions I was unaware of __atomic_blah -- I'll take a look at this. I have an llvm-mips branch that's functional for mips32 (but mips64 kernel is half-fubar- I have a workaround, but it's unclear to me why it works around the issue), and it seems like that may also be sufficient. Comment Actions There are no 8 and 16 bit atomics in use right now, I plan to add one use for 16 bit in something which is used a lot but not contended for the most part. The main reason to have sub-register atomics is damage control in face of concurrent access -- if one CPU sets 1 bit, another sets different bit, whatever conflicts arise they can solve them in whatever manner they "see" fit (which I presume is not a mips thing though). Apart from that you get smaller code as you don't have to prep anything. Funny you mention that, I plan to implement a full set of currently used primitives as wrappers around c11 and then switch amd64 to it. It is going to be a separate header anyone interested can include. Comment Actions Abandoning in favor of just implementing the atomics needed now in C in terms of atomic_*cmpset_32 (D21822) |