Page MenuHomeFreeBSD

fusefs: redo vnode attribute locking
Needs ReviewPublic

Authored by asomers on Tue, Feb 10, 8:27 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Feb 12, 2:51 AM
Unknown Object (File)
Wed, Feb 11, 5:08 PM
Unknown Object (File)
Wed, Feb 11, 7:15 AM
Unknown Object (File)
Wed, Feb 11, 3:50 AM
Unknown Object (File)
Wed, Feb 11, 1:13 AM
Unknown Object (File)
Wed, Feb 11, 12:33 AM
Unknown Object (File)
Wed, Feb 11, 12:06 AM
Subscribers

Details

Reviewers
kib
markj
vishwin
Summary

Previously most fields in fuse_vnode_data were protected by the vnode
lock. But because DEBUG_VFS_LOCKS was never enabled by default until
stable/15 the assertions were never checked, and many were wrong.
Others were missing. This led to crashes in stable/15 and 16.0-CURRENT,
when a vnode was expected to be exclusively locked but wasn't, for fuse
file systems that mount with "-o async".

In some places it isn't possible to upgrade the vnode lock to exclusive
when accessing these fields. So protect them with a new mutex instead.
This fixes crashes and unprotected field accesses in VOP_READ,
VOP_COPY_FILE_RANGE, VOP_GETATTR, VOP_BMAP, and FUSE_NOTIFY_INVAL_ENTRY.
Add assertions everywhere the protected fields are accessed.

Lock the vnode exclusively when handling FUSE_NOTIFY_INVAL_INODE.

Upgrade the vnode lock, if necessary, during fuse_vnode_setsize, to fix
crashes during VOP_READ or VOP_GETATTR.

Also, ensure that fuse_vnop_rename locks the from vnode.

Finally, reorder elements in struct fuse_vnode_data to reduce the
structure size.

Fixes: 283391
Reported by: kargl, markj, vishwin, Abdelkader Boudih, groenveld@acm.org
MFC after: 2 weeks
Sponsored by: ConnectWise

Test Plan

Test cases added

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 70614
Build 67497: arc lint + arc unit

Event Timeline

sys/fs/fuse/fuse_node.c
161

Indentation looks off here.

458

Isn't it really fusefs that requires the exclusive lock for those calls? The mentioned functions themselves do not require the vnode lock if I understand correctly. If so it might be better to explain exactly why the exclusive lock is needed.

536–546

This should be on the previous line.

sys/fs/fuse/fuse_node.h
103

I think these macros need the do-while treatment, otherwise you will get incorrect behaviour (or a compiler error?) from a construct like

if (foo)
    CACHED_ATTR_LOCK(vp);
else
    ...
189

Use VNASSERT?

sys/fs/fuse/fuse_vnops.c
2296

Should this say "avoiding the LOR"?

asomers added inline comments.
sys/fs/fuse/fuse_node.c
458

vnode_pager_setsize does require it; see vm/vnode_pager.c line 511. And vtruncbuf calls vnode_pager_setsize.

sys/fs/fuse/fuse_node.h
103

Good catch.

  • Respond to markj's comments
sys/fs/fuse/fuse_file.c
220

This is perhaps generic question about fuse, triggered by the addition of the assert of the vnode lock at the start of the function.

fdisp_wait_answ() does an unbound sleep while owning the vnode lock. This is not right. Essentially, this is a recipe for lock cascade, which might affect the daemon that must reply to kernel itself, deadlocking the whole machine.

sys/fs/fuse/fuse_node.c
453

fuse_vnode_setsize() sometimes called when other vnode is locked,e.g. in fuse_vnop_copy_file_range(). Dropping and re-acquiring the lock means that you create LoR.

sys/fs/fuse/fuse_node.h
102

I think you want an assert that the vnode is locked, before checking VOP_ISLOCKED().

189

Similarly, vp lock should be asserted before this.

asomers added inline comments.
sys/fs/fuse/fuse_file.c
220

It's not actually unbounded. It's limited by daemon_timeout, which is 60s by default, but can be changed by a mount option.

sys/fs/fuse/fuse_node.c
453

For copy_file_range, the vnode that we will call fuse_vnode_setsize on will always be exclusively locked. So fuse_vnode_setsize won't need to upgrade the lock. So in that case, there's no risk of LOR. The only times that it will need to upgrade the lock are during VOP_GETATTR, VOP_ACCESS, VOP_ADVLOCK, VOP_GETEXTATTR, VOP_SETEXTATTR, VOP_LISTEXTATTR, VOP_DELETEEXTATTR, VOP_CLOSE (if updating atime), VOP_LOOKUP (when removing a file, if the parent's sticky bit is set), and VOP_VPTOFH.

I should double check VOP_CLOSE. It's man page doesn't say that the vnode will be locked. And I'll double check VOP_L0OKUP when removing with a sticky parent.

sys/fs/fuse/fuse_file.c
220

System getting stuck for 1 minute does not sound good.

sys/fs/fuse/fuse_node.c
453

For VOP_LOOKUP() for DELETE, both parent and target must be exclusively locked. I am not sure why you listed VOP_CLOSE() as potential trouble point.

Still, such relock is very fragile. Could you cache the new size, and then have a fusefs-specific VOP_LOCK() which sets the size on the next lock? I did that for nfsclient to avoid these issues in principle.

sys/fs/fuse/fuse_file.c
220

In practice it's not that bad. It usually only happens if the daemon crashes or hangs. And in that case, you can always hit the mountpoint with a "umount -f".

I'm imagining now what fusefs would look like it we never held a vnode lock while blocking. It sounds hard. Are you imagining, for example, that fuse_vnop_setattr would release the vnode lock before sending the request to the daemon, and reacquire it on return? If so, then multiple chmod calls could stack up within fusefs. And whether a sequence of chmod calls all succeed could depend on how fast the daemon processes them. The first call might change file permissions in a way that should block the second call. But if the second call passes its access checks before the daemon responds to the first call, then we could end up sending the second to the daemon anyway.

sys/fs/fuse/fuse_node.c
453

I mentioned VOP_CLOSE because the man page doesn't specify the locking type. Looking deeper, I see that it will always be LK_EXCLUSIVE since we don't set MNTK_EXTENDED_SHARED.

I could try the VOP_LOCK trick. There might be some tricky issues there. Do you require me to do it before merging this PR, or can I do it afterwards?

sys/fs/fuse/fuse_file.c
220

Does unmount -f able to unblock threads owning the fusefs vnode locks?

For NFS client, it does, but it requires a lot of specific features in the fs. Also, this is the reason why we have the rule 'vnode might be reclaimed any time the lock is not owned'.

sys/fs/fuse/fuse_node.c
453

Sure I do not require anything. It is just technically better path, IMO. The current patch is improvement, so it makes no sense to postpone if you plan to work on other scheme some time later.

sys/fs/fuse/fuse_file.c
220

unmount -f will unmount the file system immediately, without waiting for the server to respond to anything. But by itself it doesn't unblock any threads.

If a thread is blocked in fdisp_wait_answ but the operation hasn't yet been sent to userland, then it can be interrupted by a signal.

If a thread is blocked in fdisp_wait_answ and its operation has been sent to userland, then it can be interrupted only with the cooperation of the server. Some servers support that feature and some don't. If they don't, then there's no way to interrupt the operation before the 60s timeout.

sys/fs/fuse/fuse_file.c
220

unmount -f will unmount the file system immediately, without waiting for the server to respond to anything. But by itself it doesn't unblock any threads.

From my reading of the code, it does not. The fuse' VFS_UNMOUNT() correctly starts with the vflush() which reclaims each vnode on the mp list. To reclaim, vflush() locks the vnode exclusively, that would just block if there is the locked vnode by a thread waiting for the userspace reply.

sys/fs/fuse/fuse_file.c
220

Right. fuse_vfsop_unmount may block when trying to flush vnodes, or when waiting on the FUSE_DESTROY operation. But the file system is already unmounted by that point; other processes can no longer access it.

sys/fs/fuse/fuse_file.c
220

Other processes might have already blocked on the fuse vnodes. Also, while the fs is unmounted, its covered vnode is exclusively locked (this is how the unmounted fs is made inaccessible while being unmounted). All of them are happy to participate in the lock cascade.