MFC r311172,r311194,r311226,r312389,r312390:
mtx: reduce lock accesses
Instead of spuriously re-reading the lock value, read it once.
This change also has a side effect of fixing a performance bug:
on failed _mtx_obtain_lock, it was possible that re-read would find
the lock is unowned, but in this case the primitive would make a trip
through turnstile code.
This is diff reduction to a variant which uses atomic_fcmpset.
Reduce lock accesses in thread lock similarly to r311172
mtx: plug open-coded mtx_lock access missed in r311172
rwlock: reduce lock accesses similarly to r311172
sx: reduce lock accesses similarly to r311172