The idea is that a locker can safely sleep without an interlock if it
registers as a holder by incrementing an atomic counter associated with
the lock. After taking the sleepq lock but before going to sleep, we
check that the sleepgen is still current. If not, we skip the sleep and
return ENOLCK.
This is a work in progress, but I wanted to share in order to see if
there are objections to the basic idea, and get feedback before I start
polishing. The point of this is again to reduce contention on the
bufobj interlock when there is a lot of concurrent activity against a
file or vnode.
I intend to put something like the following into a comment in
kern_lock.c, and also to add entries to the lock.9 man page.
The basic idea is to provide an alternative to an interlock for safely
sleeping on a lock without a wakeup race and when the identity of the
lock could change. This can avoid lock contention on an interlock.
Applied to bufs, this means we can sleep on buf locks without the bufobj
interlock.
In very short, a new locker avoids sleeping if the sleepgen changes
after it first loads it and checks conditions. An invalidator bumps the
sleepgen and wakes up any sleepers.
In the new locker thread, the new locker first registers itself as a
"holder" of the sleepgen lock. It atomically increments the lock's
holder count. AFTER that, it loads the lock's current sleepgen. AFTER
that, it checks the identity conditions for sleeping on the lock (e.g.
for buf, it checks that the b_bufobj and b_lblkno are as expected).
At this point, the new locker still holds no lock, and doesn't even know
that the conditions that it just checked were satisfied under the
sleepgen that it loaded (the sleepgen could already have been bumped).
The new locker then proceeds into lockmgr, passing along the sleepgen
that it loaded. If it gets to the point where it would need to sleep to
acquire the lock, then AFTER acquiring the sleepq lock it checks whether
the sleepgen is still current. If so, it can safely sleep. Otherwise,
it errors out with ENOLCK. In any case, after returning from lockmgr,
it atomically decrements the lock's holder count.
The other actor is a thread which might invalidate the identity
conditions. (In the buf example, this would be brelvp().) AFTER the
conditions are invalidated, the invalidator checks if there are any
holders. If not, it does nothing further. Otherwise it increments the
sleepgen, acquires the sleepq lock, and wakes anyone sleeping on the
lock.
If the conditions were to be invalidated after a new locker registers as
a holder, the invalidator will see the hold count, bump the sleepgen,
and wakeup. If the invalidation completes before the new locker loads
the sleepgen, the sleepgen it loads is current. It it completes after
new locker loads it sleepgen, then the new locker will not sleep. If
the invalidator completes after the new locker acquires the sleepq lock
and goes to sleep, then it will be woken.
The invalidator and new locker need to cooperate on the conditions that
validate the sleep.
Some effort has been made to reduce the extra work needed for the
non-contending cases. However, the sleepgen lock will still be somewhat
more expensive, especially for the new locker, than the non-sleepgen
lock, due to extra atomic operations. It probably makes most sense for
the new locker first to attempt optimistically an LK_NOWAIT lock, and
then fall back to the SLEEPGEN lock.
I have also though of a significantly less complex approach that also
avoids the bufobj interlock. That would be to use an alternative global
rwlock as an interlock, which could be one or several sharded locks
(e.g. by bufdomain). Then the only lockmgr support needed would be the
lockmgr_wakeall.
The advantage of the alternative would be simplicity, and needing one
less int in buf, but it would be a less complete and less general
solution.