On an armv7 kernel we are seeing deadlocks where an _rm_wlock() caller
is blocked after propagating priority to a reader, and the reader is
blocked in _rm_rlock_hard() on the rmlock interlock.
A look at the disassembly of _rm_rlock_hard() shows that the compiler is
emitting loads for
- tracker->rmp_cpuQueue.rmq_next
- tracker->rmp_cpuQueue.rmq_prev, and
- tracker->rmp_flags
before performing any stores to update the per-CPU queue. This breaks
synchronization with the writer IPI, which removes the tracker from the
per-CPU queue and updates the flags. If the reader does not observe
that RMPF_ONQUEUE is set before blocking on the interlock, we get a
deadlock.
I don't claim that this change is complete, it is just enough to fix the
deadlocks we are seeing. I will audit the code some more, any help
would be very appreciated.