Once all CPUs are online, determine if they all support LSE atomics and
set lse_supported to indicate this.
An alternative implementation uses ifuncs to select the correct
implementation, but ifunc resolution happens before we bring up the APs,
and there exist big.little systems where some CPUs support LSE and the
rest do not(!). I think the ifunc approach is a little nicer and it
possibly gives slightly better performance, but in the absence of some
mechanism to defer ifunc resolution it is easier to go with dynamic
selection for now.
Some performance results are here:
https://people.freebsd.org/~markj/arm64_atomic/bench/
Each directory contains results from a trial of 20 -j64 buildkernels on
an EC2 graviton instance. Four kernels are used: unmodified head
(dev-base), lse-dynsel (this patch), lse-ifunc (ifunc-based atomic(9)),
and lse-uncond (unconditional use of LSE implementations).
ministat.real.txt and ministat.sys.txt contain comparisons of real and
system CPU time, respectively.
All three modified kernels give a good improvement over head, and the
overhead of dynamic selection/ifuncs is relatively small.