Add a synchronizing instruction to flush and wait until the local
CPU's writes are observable to other CPUs before sending IPIs.
This fixes an issue where recipient CPUs doing a rendezvous could
enter the rendezvous handling code before the initiator's writes
to the smp_rv_* variables were visible. This manifested as a
system hang, where a single CPU's increment of smp_rv_waiters
actually happened "before" the initiator's zeroing of that field,
so all CPUs were stuck with the field appearing to be at
ncpus - 1.
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.