Diffusion FreeBSD src repository - subversion rS315377

MFC r313269,r313270,r313271,r313272,r313274,r313278,r313279,r313996,r314474
rS315377
Actions

Description

MFC r313269,r313270,r313271,r313272,r313274,r313278,r313279,r313996,r314474

mtx: switch to fcmpset

The found value is passed to locking routines in order to reduce cacheline
accesses.

mtx_unlock grows an explicit check for regular unlock. On ll/sc architectures
the routine can fail even if the lock could have been handled by the inline
primitive.

rwlock: switch to fcmpset

sx: switch to fcmpset

sx: uninline slock/sunlock

Shared locking routines explicitly read the value and test it. If the
change attempt fails, they fall back to a regular function which would
retry in a loop.

The problem is that with many concurrent readers the risk of failure is pretty
high and even the value returned by fcmpset is very likely going to be stale
by the time the loop in the fallback routine is reached.

Uninline said primitives. It gives a throughput increase when doing concurrent
slocks/sunlocks with 80 hardware threads from ~50 mln/s to ~56 mln/s.

Interestingly, rwlock primitives are already not inlined.

sx: add witness support missed in r313272

mtx: fix up _mtx_obtain_lock_fetch usage in thread lock

Since _mtx_obtain_lock_fetch no longer sets the argument to MTX_UNOWNED,
callers have to do it on their own.

mtx: fixup r313278, the assignemnt was supposed to go inside the loop

mtx: fix spin mutexes interaction with failed fcmpset

While doing so move recursion support down to the fallback routine.

locks: ensure proper barriers are used with atomic ops when necessary

Unclear how, but the locking routine for mutexes was using the *release*
barrier instead of acquire. This must have been either a copy-pasto or bad
completion.

Going through other uses of atomics shows no barriers in: