Add a sleep lock to the NFS node to protect n_size
Needs ReviewPublic
Actions

Authored by rmacklem on Sep 23 2019, 2:39 AM.

Details

Reviewers

kib

Summary

To deal with the problem of calling vnode_pager_setsize() with a mutex held, I first tried...

Locking/upgrading the vnode lock to an exclusive lock. This deadlocked between the attempt to LK_EXCLUSIVE lock the vnode in nfscl_loadattrcache() and the acquisition of a buffer cache block (sleeping on bo_wwait). There was also the issue that many calls to nfscl_loadattrcache() are done with the vnode LK_SHARED locked, but this would have required an LK_EXCLUSIVE lock to serialize use of n_size with vnode_pager_setsize().

Then I considered replacing the NFS node mutex with a sleep lock. However, there are many
places in the code where the NFS node mutex was used to protect fields of the NFS node other
than n_size and most of these are just a few lines of code. It seemed like overkill to make all of
these use a sleep lock.

So, I added a sleep lock to the NFS node and changed the code so that it is used to protect
the n_size field (the one that vnode_pager_setsize() cares about).
For cases where only n_size was being manipulated, I replaced mtx_lock()/mtx_unlock() with
sx_xlock()/sx_xunlock().
For cases where use of n_size was mixed with use of other fields, I wrapped the sx_xlock()/sx_xunlock()
around the mtx_lock()/mtx_unlock(), so that both locks were held for these code blocks.

For the cases where vnode_pager_setsize() was being called, I did the mtx_unlock() just before the
vnode_pager_setsize() call and sx_xunlock() just after the call, so only the sx lock is held for the call.

The patch no longer delays calls to vnode_pager_setsize().

Test Plan

I have been doing some testing and have not found any problems.
I intend to ask Peter Holmes to test the patch, once kib@ has given me
his opinion w.r.t. whether this approach seems reasonable.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 26637

Event Timeline

rmacklem created this revision.Sep 23 2019, 2:39 AM

Herald added a subscriber: imp. · View Herald TranscriptSep 23 2019, 2:39 AM

Harbormaster completed remote builds in B26637: Diff 62451.Sep 23 2019, 2:39 AM

First, I am not sure that this change is enough. Ideally, the consistency would be kept between the buffer cache state and vnode object as well, so e.g. it is impossible to observe a buffer with b_lblkno outside of the v_object size. This is true for other filesystems but not for NFS.

Second, I do not see a need in n_slock. The sleepable mutex and sx are only different in the container where the thread is put when the lock is contested. In other words, for the amortized cost of acquiring the mutex or sx is same. Significant difference is that mtx_lock() does priority propagation, but I do not think it matters for our use case.

So, do you think that just replacing n_mtx with n_solock makes more sense?
(I'll admit I was thinking it was a lot of editing, but so what.)

The other reason I didn't do the global change to an sx lock was I didn't
realize until last night that there is a sx_sleep() to replace msleep() with.
(There are three cases where n_mtx is used with an msleep().)

I can easily do the global replacement using sx_sleep().

In D21762#474968, @rmacklem wrote:

So, do you think that just replacing n_mtx with n_solock makes more sense?

Yes.

(I'll admit I was thinking it was a lot of editing, but so what.)

Might be, add a macro to lock the node, as the first step. It would be pure editorial.

The other reason I didn't do the global change to an sx lock was I didn't
realize until last night that there is a sx_sleep() to replace msleep() with.
(There are three cases where n_mtx is used with an msleep().)

I can easily do the global replacement using sx_sleep().

This version of the patch changes the n_mtx lock to an sx lock (n_sx) and
the ncl_iod_mutex to ncl_iod_sx. The former is done so that it can safely be
locked while calling vnode_pager_setsize().
The latter is changed to an sx lock since it is held when n_sx is acquired.
All the msleep()s are changed to sx_sleep() calls.

The nfscl_loadattrcache() code is simplified so that it always calls
vnode_pager_setsize() before releasing the n_sx node lock.
(Thanks to the lock/unlock macros, the patch isn't too large.)

Harbormaster completed remote builds in B26682: Diff 62531.Sep 25 2019, 1:40 AM

kib added inline comments.Sep 25 2019, 12:25 PM

fs/nfs/nfs_commonkrpc.c
1316 ↗	(On Diff #62531)	Might be call it newnfs_sleep(). And do the rename as part of the preliminary series ?
1322 ↗	(On Diff #62531)	FIx style while there, `return (sx_sleep());`
fs/nfsclient/nfs_clnfsiod.c
172 ↗	(On Diff #62531)	I suggest to add assert macros too, in preliminary commits.
fs/nfsclient/nfs_clport.c
580–581	You do not need to check this, vnode_pager_setsize() already does the check. IMO it could be more useful to check the sizes as I did at line 572 and avoid calling vnode_pager_setsize() at all, this avoids locking the object. But it is fine to call it unconditionally too.

Unfortunately testing found a deadlock problem.

One of the iod threads is sleeping on "vmopar", so it is in vnode_pager_setsize() when the size is being reduced.
The rest of the iod threads are waiting for the NFS node lock.

It looks like holding the node lock when the size is being reduced isn't going to work.
(I don't think doing my first version would be ok either, since nfscl_loadattrcache() gets called for
every I/O RPC to process the attributes replied by the server.)

Unless you have more insight into this (I'm guessing the page is busied because another iod thread
was doing an RPC for the page in the buffer cache?), I can't see an easy fix?

rmacklem mentioned this in D21814: factor out the code that checks to see if vnode_pager_setsize() can be called with the NFS node lock held.Sep 27 2019, 1:40 AM