Add more 8 and 16 bit variants of the the atomic(9) functions on arm64.
These are direct copies of the 32 bit functions, adjusted ad needed.
While here fix atomic_fcmpset_16 to use the valid load and store exclusive
instructions.
Sponsored by: DARPA, AFRL