amd64 atomic.h: minor codegen optimization in flag access
Previously the pattern to extract status flags from inline assembly
blocks was to use setcc in the block to write the flag to a register.
This was suboptimal in a few ways:
- It would lead to code like: sete %cl; test %cl; jne, i.e. a flag would just be loaded into a register and then reloaded to a flag.
- The setcc would force the block to use an additional register.
- If the client code didn't care for the flag value then the setcc would be entirely pointless but could not be eliminated by the optimizer.
A more modern inline asm construct (since gcc 6 and clang 9) allows for
"flag output operands", where a C variable can be written directly from
a flag. The optimizer can then use this to produce direct code where
the flag does not take a trip through a register.
In practice this makes each affected operation sequence shorter by five
bytes of instructions. It's unlikely this has a measurable performance
Reviewed by: kib, markj, mjg
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D23869