Page MenuHomeFreeBSD

Add a sleep lock to the NFS node to protect n_size
Needs ReviewPublic

Authored by rmacklem on Sep 23 2019, 2:39 AM.

Details

Reviewers
kib
Summary

To deal with the problem of calling vnode_pager_setsize() with a mutex held, I first tried...

  • Locking/upgrading the vnode lock to an exclusive lock. This deadlocked between the attempt to LK_EXCLUSIVE lock the vnode in nfscl_loadattrcache() and the acquisition of a buffer cache block (sleeping on bo_wwait). There was also the issue that many calls to nfscl_loadattrcache() are done with the vnode LK_SHARED locked, but this would have required an LK_EXCLUSIVE lock to serialize use of n_size with vnode_pager_setsize().

Then I considered replacing the NFS node mutex with a sleep lock. However, there are many
places in the code where the NFS node mutex was used to protect fields of the NFS node other
than n_size and most of these are just a few lines of code. It seemed like overkill to make all of
these use a sleep lock.

So, I added a sleep lock to the NFS node and changed the code so that it is used to protect
the n_size field (the one that vnode_pager_setsize() cares about).
For cases where only n_size was being manipulated, I replaced mtx_lock()/mtx_unlock() with
sx_xlock()/sx_xunlock().
For cases where use of n_size was mixed with use of other fields, I wrapped the sx_xlock()/sx_xunlock()
around the mtx_lock()/mtx_unlock(), so that both locks were held for these code blocks.

For the cases where vnode_pager_setsize() was being called, I did the mtx_unlock() just before the
vnode_pager_setsize() call and sx_xunlock() just after the call, so only the sx lock is held for the call.

The patch no longer delays calls to vnode_pager_setsize().

Test Plan

I have been doing some testing and have not found any problems.
I intend to ask Peter Holmes to test the patch, once kib@ has given me
his opinion w.r.t. whether this approach seems reasonable.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 26682

Event Timeline

rmacklem created this revision.Sep 23 2019, 2:39 AM
kib added a comment.Sep 23 2019, 1:44 PM

First, I am not sure that this change is enough. Ideally, the consistency would be kept between the buffer cache state and vnode object as well, so e.g. it is impossible to observe a buffer with b_lblkno outside of the v_object size. This is true for other filesystems but not for NFS.

Second, I do not see a need in n_slock. The sleepable mutex and sx are only different in the container where the thread is put when the lock is contested. In other words, for the amortized cost of acquiring the mutex or sx is same. Significant difference is that mtx_lock() does priority propagation, but I do not think it matters for our use case.

So, do you think that just replacing n_mtx with n_solock makes more sense?
(I'll admit I was thinking it was a lot of editing, but so what.)

The other reason I didn't do the global change to an sx lock was I didn't
realize until last night that there is a sx_sleep() to replace msleep() with.
(There are three cases where n_mtx is used with an msleep().)

I can easily do the global replacement using sx_sleep().

kib added a comment.Sep 23 2019, 5:16 PM

So, do you think that just replacing n_mtx with n_solock makes more sense?

Yes.

(I'll admit I was thinking it was a lot of editing, but so what.)

Might be, add a macro to lock the node, as the first step. It would be pure editorial.

The other reason I didn't do the global change to an sx lock was I didn't
realize until last night that there is a sx_sleep() to replace msleep() with.
(There are three cases where n_mtx is used with an msleep().)
I can easily do the global replacement using sx_sleep().

rmacklem updated this revision to Diff 62531.Sep 25 2019, 1:40 AM

This version of the patch changes the n_mtx lock to an sx lock (n_sx) and
the ncl_iod_mutex to ncl_iod_sx. The former is done so that it can safely be
locked while calling vnode_pager_setsize().
The latter is changed to an sx lock since it is held when n_sx is acquired.
All the msleep()s are changed to sx_sleep() calls.

The nfscl_loadattrcache() code is simplified so that it always calls
vnode_pager_setsize() before releasing the n_sx node lock.
(Thanks to the lock/unlock macros, the patch isn't too large.)

kib added inline comments.Sep 25 2019, 12:25 PM
fs/nfs/nfs_commonkrpc.c
1316

Might be call it newnfs_sleep(). And do the rename as part of the preliminary series ?

1322

FIx style while there, return (sx_sleep());

fs/nfsclient/nfs_clnfsiod.c
172

I suggest to add assert macros too, in preliminary commits.

fs/nfsclient/nfs_clport.c
566

You do not need to check this, vnode_pager_setsize() already does the check.

IMO it could be more useful to check the sizes as I did at line 572 and avoid calling vnode_pager_setsize() at all, this avoids locking the object. But it is fine to call it unconditionally too.

Unfortunately testing found a deadlock problem.

  • One of the iod threads is sleeping on "vmopar", so it is in vnode_pager_setsize() when the size is being reduced.
  • The rest of the iod threads are waiting for the NFS node lock.

It looks like holding the node lock when the size is being reduced isn't going to work.
(I don't think doing my first version would be ok either, since nfscl_loadattrcache() gets called for
every I/O RPC to process the attributes replied by the server.)

Unless you have more insight into this (I'm guessing the page is busied because another iod thread
was doing an RPC for the page in the buffer cache?), I can't see an easy fix?