Paths

Table of Contentst

fusefs: redo vnode attribute locking
Needs ReviewPublic
Actions

Authored by asomers on Tue, Feb 10, 8:27 PM.

Details

Reviewers

kib
markj
vishwin

Summary

Previously most fields in fuse_vnode_data were protected by the vnode
lock. But because DEBUG_VFS_LOCKS was never enabled by default until
stable/15 the assertions were never checked, and many were wrong.
Others were missing. This led to crashes in stable/15 and 16.0-CURRENT,
when a vnode was expected to be exclusively locked but wasn't, for fuse
file systems that mount with "-o async".

In some places it isn't possible to upgrade the vnode lock to exclusive
when accessing these fields. So protect them with a new mutex instead.
This fixes crashes and unprotected field accesses in VOP_READ,
VOP_COPY_FILE_RANGE, VOP_GETATTR, VOP_BMAP, and FUSE_NOTIFY_INVAL_ENTRY.
Add assertions everywhere the protected fields are accessed.

Lock the vnode exclusively when handling FUSE_NOTIFY_INVAL_INODE.

Upgrade the vnode lock, if necessary, during fuse_vnode_setsize, to fix
crashes during VOP_READ or VOP_GETATTR.

Also, ensure that fuse_vnop_rename locks the from vnode.

Finally, reorder elements in struct fuse_vnode_data to reduce the
structure size.

Fixes: 283391
Reported by: kargl, markj, vishwin, Abdelkader Boudih, groenveld@acm.org
MFC after: 2 weeks
Sponsored by: ConnectWise

Test Plan

Test cases added

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 70614
Build 67497: arc lint + arc unit

Event Timeline

asomers created this revision.Tue, Feb 10, 8:27 PM

Herald added a subscriber: imp. · View Herald TranscriptTue, Feb 10, 8:27 PM

asomers requested review of this revision.Tue, Feb 10, 8:27 PM

Harbormaster completed remote builds in B70591: Diff 171690.Tue, Feb 10, 8:27 PM

asomers mentioned this in D55145: fusefs: Fix vnode lock mismatch in fuse_internal_do_getattr().Tue, Feb 10, 9:46 PM

markj added inline comments.Tue, Feb 10, 10:09 PM

sys/fs/fuse/fuse_node.c
161	Indentation looks off here.
458	Isn't it really fusefs that requires the exclusive lock for those calls? The mentioned functions themselves do not require the vnode lock if I understand correctly. If so it might be better to explain exactly why the exclusive lock is needed.
536–546	This should be on the previous line.
sys/fs/fuse/fuse_node.h
103	I think these macros need the do-while treatment, otherwise you will get incorrect behaviour (or a compiler error?) from a construct like if (foo) CACHED_ATTR_LOCK(vp); else ...
189	Use VNASSERT?
sys/fs/fuse/fuse_vnops.c
2296	Should this say "avoiding the LOR"?

asomers marked 3 inline comments as done.Tue, Feb 10, 11:28 PM

asomers added inline comments.

sys/fs/fuse/fuse_node.c
458	vnode_pager_setsize does require it; see vm/vnode_pager.c line 511. And vtruncbuf calls vnode_pager_setsize.
sys/fs/fuse/fuse_node.h
103	Good catch.

Respond to markj's comments

Harbormaster completed remote builds in B70614: Diff 171715.Tue, Feb 10, 11:28 PM

kib added inline comments.Wed, Feb 11, 3:21 AM

sys/fs/fuse/fuse_file.c
220	This is perhaps generic question about fuse, triggered by the addition of the assert of the vnode lock at the start of the function. fdisp_wait_answ() does an unbound sleep while owning the vnode lock. This is not right. Essentially, this is a recipe for lock cascade, which might affect the daemon that must reply to kernel itself, deadlocking the whole machine.
sys/fs/fuse/fuse_node.c
453	fuse_vnode_setsize() sometimes called when other vnode is locked,e.g. in fuse_vnop_copy_file_range(). Dropping and re-acquiring the lock means that you create LoR.
sys/fs/fuse/fuse_node.h
102	I think you want an assert that the vnode is locked, before checking VOP_ISLOCKED().
189	Similarly, vp lock should be asserted before this.

asomers marked 2 inline comments as done.Wed, Feb 11, 1:40 PM

asomers added inline comments.

sys/fs/fuse/fuse_file.c
220	It's not actually unbounded. It's limited by `daemon_timeout`, which is 60s by default, but can be changed by a mount option.
sys/fs/fuse/fuse_node.c
453	For copy_file_range, the vnode that we will call fuse_vnode_setsize on will always be exclusively locked. So fuse_vnode_setsize won't need to upgrade the lock. So in that case, there's no risk of LOR. The only times that it will need to upgrade the lock are during `VOP_GETATTR`, `VOP_ACCESS`, `VOP_ADVLOCK`, `VOP_GETEXTATTR`, `VOP_SETEXTATTR`, `VOP_LISTEXTATTR`, `VOP_DELETEEXTATTR`, `VOP_CLOSE` (if updating atime), `VOP_LOOKUP` (when removing a file, if the parent's sticky bit is set), and `VOP_VPTOFH`. I should double check VOP_CLOSE. It's man page doesn't say that the vnode will be locked. And I'll double check `VOP_L0OKUP` when removing with a sticky parent.

kib added inline comments.Wed, Feb 11, 1:49 PM

sys/fs/fuse/fuse_file.c
220	System getting stuck for 1 minute does not sound good.
sys/fs/fuse/fuse_node.c
453	For VOP_LOOKUP() for DELETE, both parent and target must be exclusively locked. I am not sure why you listed VOP_CLOSE() as potential trouble point. Still, such relock is very fragile. Could you cache the new size, and then have a fusefs-specific VOP_LOCK() which sets the size on the next lock? I did that for nfsclient to avoid these issues in principle.

asomers added inline comments.Wed, Feb 11, 2:45 PM

sys/fs/fuse/fuse_file.c
220	In practice it's not that bad. It usually only happens if the daemon crashes or hangs. And in that case, you can always hit the mountpoint with a "umount -f". I'm imagining now what fusefs would look like it we never held a vnode lock while blocking. It sounds hard. Are you imagining, for example, that fuse_vnop_setattr would release the vnode lock before sending the request to the daemon, and reacquire it on return? If so, then multiple chmod calls could stack up within fusefs. And whether a sequence of chmod calls all succeed could depend on how fast the daemon processes them. The first call might change file permissions in a way that should block the second call. But if the second call passes its access checks before the daemon responds to the first call, then we could end up sending the second to the daemon anyway.
sys/fs/fuse/fuse_node.c
453	I mentioned `VOP_CLOSE` because the man page doesn't specify the locking type. Looking deeper, I see that it will always be LK_EXCLUSIVE since we don't set MNTK_EXTENDED_SHARED. I could try the VOP_LOCK trick. There might be some tricky issues there. Do you require me to do it before merging this PR, or can I do it afterwards?

kib added inline comments.Thu, Feb 12, 6:48 AM

sys/fs/fuse/fuse_file.c
220	Does unmount -f able to unblock threads owning the fusefs vnode locks? For NFS client, it does, but it requires a lot of specific features in the fs. Also, this is the reason why we have the rule 'vnode might be reclaimed any time the lock is not owned'.
sys/fs/fuse/fuse_node.c
453	Sure I do not require anything. It is just technically better path, IMO. The current patch is improvement, so it makes no sense to postpone if you plan to work on other scheme some time later.

asomers added inline comments.Thu, Feb 12, 1:31 PM

sys/fs/fuse/fuse_file.c
220	`unmount -f` will unmount the file system immediately, without waiting for the server to respond to anything. But by itself it doesn't unblock any threads. If a thread is blocked in `fdisp_wait_answ` but the operation hasn't yet been sent to userland, then it can be interrupted by a signal. If a thread is blocked in `fdisp_wait_answ` and its operation has been sent to userland, then it can be interrupted only with the cooperation of the server. Some servers support that feature and some don't. If they don't, then there's no way to interrupt the operation before the 60s timeout.

kib added inline comments.Thu, Feb 12, 3:10 PM

sys/fs/fuse/fuse_file.c
220	`unmount -f` will unmount the file system immediately, without waiting for the server to respond to anything. But by itself it doesn't unblock any threads. From my reading of the code, it does not. The fuse' VFS_UNMOUNT() correctly starts with the vflush() which reclaims each vnode on the mp list. To reclaim, vflush() locks the vnode exclusively, that would just block if there is the locked vnode by a thread waiting for the userspace reply.

asomers added inline comments.Thu, Feb 12, 5:55 PM

sys/fs/fuse/fuse_file.c
220	Right. fuse_vfsop_unmount may block when trying to flush vnodes, or when waiting on the FUSE_DESTROY operation. But the file system is already unmounted by that point; other processes can no longer access it.

kib added inline comments.Thu, Feb 12, 7:09 PM

sys/fs/fuse/fuse_file.c
220	Other processes might have already blocked on the fuse vnodes. Also, while the fs is unmounted, its covered vnode is exclusively locked (this is how the unmounted fs is made inaccessible while being unmounted). All of them are happy to participate in the lock cascade.

@kib is there anything else you'd like me to do for this PR, short of rewriting everything so the vnode lock isn't held while waiting for the daemon?

In D55230#1269768, @asomers wrote:

@kib is there anything else you'd like me to do for this PR, short of rewriting everything so the vnode lock isn't held while waiting for the daemon?

We talked about avoiding lock upgrade. I am fine with that done after this change is committed.

vishwin added inline comments.Wed, Feb 25, 8:20 PM

sys/fs/fuse/fuse_vnops.c
2312	After D55231 landed, this is now a rebase conflict, unless other changes are coming.

In D55230#1269852, @kib wrote:

In D55230#1269768, @asomers wrote:

@kib is there anything else you'd like me to do for this PR, short of rewriting everything so the vnode lock isn't held while waiting for the daemon?

We talked about avoiding lock upgrade. I am fine with that done after this change is committed.