Count write delegations so nfsrv_checkgetattr() can just return if the count is 0

Authored by rmacklem on Dec 1 2017, 10:25 PM.



This patch adds a count of current write delegations issued to clients
by the NFSV4 server. This count is maintained by atomic operations,
so that no lock is required.

This count is used by nfsrv_checkgetattr() to determine when it can
return without acquiring any state lock. Contention on the state lock
can cause significant performance degradation and this patch can
avoid that by disabling issue of delegations.

Test Plan

I have tested it to see that it seems to work.
Hopefully the person reporting the performance
issue can test it to determine if it resolves their
performance problem.

Diff Detail

rS FreeBSD src repository
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.
rmacklem created this revision.Dec 1 2017, 10:25 PM
kib added inline comments.Dec 2 2017, 11:39 AM
5310 ↗(On Diff #36073)

Acquire load without paired release store is not useful.

That said, before allocating barriers over the atomics, could you please explain why atomics use is fine there. I.e., suppose that we loaded wdelegcnt as zero and skipped refreshing the attributes. Can new delegation instantiated meantime and make the state inconsistent ?

If not, why do you need atomic increments for nfsrv_writedelegcnt ?

manu added a subscriber: manu.Dec 2 2017, 1:19 PM

Added inline reply to kib@'s comment.

5310 ↗(On Diff #36073)

My concern (because I don't understand the properties of "atomic ops" is
that this load would return a "quite stale" value for the variable that was
cached in the CPU executing this code.

If that is not the case without any atomic op or similar here, then you
are correct, the code doesn;t need atomics, since all the increments/decrements
are done when the lock is held.

As for your question w.r.t. "couldn't another thread add a delegation", the answer
is that the variable is incremented before the delegation is actually added.

  • All this does is try to ensure that the attributes for writing "size, mtime and change" are not returned to another client when they are stale. (There will always be some level of "raceyness", since the client that acquires the write delegation may then write it and that might happen in a temporal sense before the other client receives the reply to its Getattr RPC. In NFS these cases always exist unless the clients use byte range locking to guarantee timing of I/O.)

--> This code just tries to ensure that the attributes are not "very stale".

kib added inline comments.Dec 2 2017, 2:36 PM
5310 ↗(On Diff #36073)

Can you specify which lock is held ? Is it NFSLOCKSTATE()/UNLOCKSTATE() ? In nfsrv_delgconflict() the nfsrv_freedeleg() is called without holding the lock, it seems.

I do not think that any amount of atomic tricks would allow the nfsrv_checkgetattr() to avoid a race which I pointed in previous reply: if we read zero and meantime a delegation appeared, we skep the rest of the code in checkgetattr().

If you think that the code is fine with this race kept around, then keeping the rest of the patch as is but using plain read of the volatile instead of load_acq() would do it.

Add an inline comment related to nfsd thread locking.

5310 ↗(On Diff #36073)

Agreed w.r.t. the race, but that is the way it always is...

  • either nfsrv_checkgetattr() happens first and replies without acquiring attributes from the other client that might have just acquired a write delegation


  • the acquisition of the delegations happens first and the Getattr then acquires the attributes from the other client that now has the delegation.

--> This happens with the unpatched code, except that nfsrv_checkgetattr() aquires

the lock, sees no delegation, the unlocks and returns. As soon as it unlocks,
another nfsd thread can issue a write delegation, which may arrive at the one
client before the Getattr RPC reply arrives at the other client.

As for the "lock", the nfsd thread must hold lock(s) one of two ways to modify state:

The common case is...
1 - A shared sleep lock (this is a locally implemented one, because it has to has some
     semantics that the generic ones don't have) + the NFSLOCKSTATE() mutex.


For rare cases like Delegations recall...
2 - An exclusive lock on the sleep lock. (This case ensures that all other nfsd threads
     cannot even look at the state.) The shared lock on the sleep lock in #1 ensures that
     there are no threads accessing state for this case.

All the cases where this variable is incremented/decremented are protected by #1 or #2.
(And the sleep lock is implemented using a mutex, so there is always a mutex unlock
before other nfsd threads access the variable.)

Since the atomic_load() isn't useful, I think the atomic_add_int()s can be dropped and
the code can just "++" and "--". I'll change the patch to that.

rmacklem updated this revision to Diff 36124.Dec 2 2017, 10:19 PM

Modify the patch to not use atomic ops, which do not appear to be needed.

kib accepted this revision.Dec 3 2017, 8:53 AM

I would be easier to reason about the change if code properly annotated with lock assertions.

This revision is now accepted and ready to land.Dec 3 2017, 8:53 AM
This revision was automatically updated to reflect the committed changes.

Adding a KASSERT() to check for correct lock acquisition when incrementing/decrementing
the counter is more awkward than it might seem.
The obvious place for such a check would be nfsrv_freedeleg(), which is where
this counter is decremented. Unfortunately this function is called during module
unload, when there are no nfsd threads and no locks held.
The locations where the counter is incremented are all in rather long chunks of
code that is updating state. Putting a KASSERT() at the beginning of these chunks
could be done, but would be almost silly, since the call would be right after the
code that acquires the locking.