rmlock: Add a required compiler membar to the rlock slow path
The tracker flags need to be loaded only after the tracker is removed
from its per-CPU queue. Otherwise, readers may fail to synchronize with
pending writers attempting to propagate priority to active readers, and
readers and writers deadlock on each other. This was observed in a
stable/12-based armv7 kernel where the compiler had reordered the load
of rmp_flags to before the stores updating the queue.
Approved by: re (gjb)
Reviewed by: rlibby, scottl
Discussed with: kib
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D28821
(cherry picked from commit 1d44514fcd68809cfd493a7352ace29ddad443d6)
(cherry picked from commit c29d3ecdc8b3903d812c0467319ced6f85872d0e)