Page MenuHomeFreeBSD

fusefs: redo vnode attribute locking
ClosedPublic

Authored by asomers on Feb 10 2026, 8:27 PM.
Tags
None
Referenced Files
F147976866: D55230.id171715.diff
Sat, Mar 14, 11:35 PM
Unknown Object (File)
Sat, Mar 14, 2:27 AM
Unknown Object (File)
Wed, Mar 11, 2:40 PM
Unknown Object (File)
Wed, Mar 11, 9:56 AM
Unknown Object (File)
Mon, Mar 9, 10:44 AM
Unknown Object (File)
Tue, Feb 24, 4:46 AM
Unknown Object (File)
Sun, Feb 22, 1:28 PM
Unknown Object (File)
Sun, Feb 22, 1:22 PM
Subscribers

Details

Summary

Previously most fields in fuse_vnode_data were protected by the vnode
lock. But because DEBUG_VFS_LOCKS was never enabled by default until
stable/15 the assertions were never checked, and many were wrong.
Others were missing. This led to crashes in stable/15 and 16.0-CURRENT,
when a vnode was expected to be exclusively locked but wasn't, for fuse
file systems that mount with "-o async".

In some places it isn't possible to upgrade the vnode lock to exclusive
when accessing these fields. So protect them with a new mutex instead.
This fixes crashes and unprotected field accesses in VOP_READ,
VOP_COPY_FILE_RANGE, VOP_GETATTR, VOP_BMAP, and FUSE_NOTIFY_INVAL_ENTRY.
Add assertions everywhere the protected fields are accessed.

Lock the vnode exclusively when handling FUSE_NOTIFY_INVAL_INODE.

Upgrade the vnode lock, if necessary, during fuse_vnode_setsize, to fix
crashes during VOP_READ or VOP_GETATTR.

Also, ensure that fuse_vnop_rename locks the from vnode.

Finally, reorder elements in struct fuse_vnode_data to reduce the
structure size.

Fixes: 283391
Reported by: kargl, markj, vishwin, Abdelkader Boudih, groenveld@acm.org
MFC after: 2 weeks
Sponsored by: ConnectWise

Test Plan

Test cases added

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 71346
Build 68229: arc lint + arc unit

Event Timeline

sys/fs/fuse/fuse_node.c
161

Indentation looks off here.

456

Isn't it really fusefs that requires the exclusive lock for those calls? The mentioned functions themselves do not require the vnode lock if I understand correctly. If so it might be better to explain exactly why the exclusive lock is needed.

544–554

This should be on the previous line.

sys/fs/fuse/fuse_node.h
103

I think these macros need the do-while treatment, otherwise you will get incorrect behaviour (or a compiler error?) from a construct like

if (foo)
    CACHED_ATTR_LOCK(vp);
else
    ...
194

Use VNASSERT?

sys/fs/fuse/fuse_vnops.c
2302

Should this say "avoiding the LOR"?

asomers added inline comments.
sys/fs/fuse/fuse_node.c
456

vnode_pager_setsize does require it; see vm/vnode_pager.c line 511. And vtruncbuf calls vnode_pager_setsize.

sys/fs/fuse/fuse_node.h
103

Good catch.

  • Respond to markj's comments
sys/fs/fuse/fuse_file.c
220

This is perhaps generic question about fuse, triggered by the addition of the assert of the vnode lock at the start of the function.

fdisp_wait_answ() does an unbound sleep while owning the vnode lock. This is not right. Essentially, this is a recipe for lock cascade, which might affect the daemon that must reply to kernel itself, deadlocking the whole machine.

sys/fs/fuse/fuse_node.c
451

fuse_vnode_setsize() sometimes called when other vnode is locked,e.g. in fuse_vnop_copy_file_range(). Dropping and re-acquiring the lock means that you create LoR.

sys/fs/fuse/fuse_node.h
102

I think you want an assert that the vnode is locked, before checking VOP_ISLOCKED().

194

Similarly, vp lock should be asserted before this.

asomers added inline comments.
sys/fs/fuse/fuse_file.c
220

It's not actually unbounded. It's limited by daemon_timeout, which is 60s by default, but can be changed by a mount option.

sys/fs/fuse/fuse_node.c
451

For copy_file_range, the vnode that we will call fuse_vnode_setsize on will always be exclusively locked. So fuse_vnode_setsize won't need to upgrade the lock. So in that case, there's no risk of LOR. The only times that it will need to upgrade the lock are during VOP_GETATTR, VOP_ACCESS, VOP_ADVLOCK, VOP_GETEXTATTR, VOP_SETEXTATTR, VOP_LISTEXTATTR, VOP_DELETEEXTATTR, VOP_CLOSE (if updating atime), VOP_LOOKUP (when removing a file, if the parent's sticky bit is set), and VOP_VPTOFH.

I should double check VOP_CLOSE. It's man page doesn't say that the vnode will be locked. And I'll double check VOP_L0OKUP when removing with a sticky parent.

sys/fs/fuse/fuse_file.c
220

System getting stuck for 1 minute does not sound good.

sys/fs/fuse/fuse_node.c
451

For VOP_LOOKUP() for DELETE, both parent and target must be exclusively locked. I am not sure why you listed VOP_CLOSE() as potential trouble point.

Still, such relock is very fragile. Could you cache the new size, and then have a fusefs-specific VOP_LOCK() which sets the size on the next lock? I did that for nfsclient to avoid these issues in principle.

sys/fs/fuse/fuse_file.c
220

In practice it's not that bad. It usually only happens if the daemon crashes or hangs. And in that case, you can always hit the mountpoint with a "umount -f".

I'm imagining now what fusefs would look like it we never held a vnode lock while blocking. It sounds hard. Are you imagining, for example, that fuse_vnop_setattr would release the vnode lock before sending the request to the daemon, and reacquire it on return? If so, then multiple chmod calls could stack up within fusefs. And whether a sequence of chmod calls all succeed could depend on how fast the daemon processes them. The first call might change file permissions in a way that should block the second call. But if the second call passes its access checks before the daemon responds to the first call, then we could end up sending the second to the daemon anyway.

sys/fs/fuse/fuse_node.c
451

I mentioned VOP_CLOSE because the man page doesn't specify the locking type. Looking deeper, I see that it will always be LK_EXCLUSIVE since we don't set MNTK_EXTENDED_SHARED.

I could try the VOP_LOCK trick. There might be some tricky issues there. Do you require me to do it before merging this PR, or can I do it afterwards?

sys/fs/fuse/fuse_file.c
220

Does unmount -f able to unblock threads owning the fusefs vnode locks?

For NFS client, it does, but it requires a lot of specific features in the fs. Also, this is the reason why we have the rule 'vnode might be reclaimed any time the lock is not owned'.

sys/fs/fuse/fuse_node.c
451

Sure I do not require anything. It is just technically better path, IMO. The current patch is improvement, so it makes no sense to postpone if you plan to work on other scheme some time later.

sys/fs/fuse/fuse_file.c
220

unmount -f will unmount the file system immediately, without waiting for the server to respond to anything. But by itself it doesn't unblock any threads.

If a thread is blocked in fdisp_wait_answ but the operation hasn't yet been sent to userland, then it can be interrupted by a signal.

If a thread is blocked in fdisp_wait_answ and its operation has been sent to userland, then it can be interrupted only with the cooperation of the server. Some servers support that feature and some don't. If they don't, then there's no way to interrupt the operation before the 60s timeout.

sys/fs/fuse/fuse_file.c
220

unmount -f will unmount the file system immediately, without waiting for the server to respond to anything. But by itself it doesn't unblock any threads.

From my reading of the code, it does not. The fuse' VFS_UNMOUNT() correctly starts with the vflush() which reclaims each vnode on the mp list. To reclaim, vflush() locks the vnode exclusively, that would just block if there is the locked vnode by a thread waiting for the userspace reply.

sys/fs/fuse/fuse_file.c
220

Right. fuse_vfsop_unmount may block when trying to flush vnodes, or when waiting on the FUSE_DESTROY operation. But the file system is already unmounted by that point; other processes can no longer access it.

sys/fs/fuse/fuse_file.c
220

Other processes might have already blocked on the fuse vnodes. Also, while the fs is unmounted, its covered vnode is exclusively locked (this is how the unmounted fs is made inaccessible while being unmounted). All of them are happy to participate in the lock cascade.

@kib is there anything else you'd like me to do for this PR, short of rewriting everything so the vnode lock isn't held while waiting for the daemon?

@kib is there anything else you'd like me to do for this PR, short of rewriting everything so the vnode lock isn't held while waiting for the daemon?

We talked about avoiding lock upgrade. I am fine with that done after this change is committed.

sys/fs/fuse/fuse_vnops.c
2316–2317

After D55231 landed, this is now a rebase conflict, unless other changes are coming.

In D55230#1269852, @kib wrote:

@kib is there anything else you'd like me to do for this PR, short of rewriting everything so the vnode lock isn't held while waiting for the daemon?

We talked about avoiding lock upgrade. I am fine with that done after this change is committed.

D55595 should help with the vnode_pager_setsize(), in particular, it should allow to avoid the lock upgrade.

In D55230#1271915, @kib wrote:

D55595 should help with the vnode_pager_setsize(), in particular, it should allow to avoid the lock upgrade.

I think I can use vn_delay_setsize, except in one case: sometimes fusefs's attribute cache is expired before VOP_READ begins. So it needs to fetch attributes from the server with FUSE_GETATTR _before_ it can send FUSE_READ. If the new attributes indicate that the file's size has changed, then it needs to update the file's size (which includes calling vnode_pager_setsize), _before_ sending FUSE_READ. It does all of that without ever unlocking the vnode. If I use vn_delay_setsize, then the pager won't be updated before the FUSE_READ returns, and bread will be operating on an out-of-date pager. To handle this situation, should I keep the lock upgrade code? Or is it ok for bread to operate on an out-of-date pager?

In D55230#1271915, @kib wrote:

D55595 should help with the vnode_pager_setsize(), in particular, it should allow to avoid the lock upgrade.

I think I can use vn_delay_setsize, except in one case: sometimes fusefs's attribute cache is expired before VOP_READ begins. So it needs to fetch attributes from the server with FUSE_GETATTR _before_ it can send FUSE_READ. If the new attributes indicate that the file's size has changed, then it needs to update the file's size (which includes calling vnode_pager_setsize), _before_ sending FUSE_READ. It does all of that without ever unlocking the vnode. If I use vn_delay_setsize, then the pager won't be updated before the FUSE_READ returns, and bread will be operating on an out-of-date pager. To handle this situation, should I keep the lock upgrade code? Or is it ok for bread to operate on an out-of-date pager?

Am I understand you right: after entering VOP_READ() for fusefs, and having the vnode shared-locked, VOP_READ() might discover that the attributes changed and e.g. the node size increased? So it has to set delayed setsize, but needs to read some block after the current EOF? If yes, this is fine. For instance, nfs operates similarly.

The vnode pager size is effectively used only for one purpose: it prevents the page fault from mapping pages beyond the EOF. For instance, UFS uses pages with negative (but effectively, very high positive) page indexes as the storage for indirect blocks content. And the pager check for EOF exactly prevents mapping of the metadata from the indirect blocks into userspace.

So I believe it should work fine for fusefs.

  • Respond to markj's comments
  • Assert vnode lock before checking lock type
  • Use vn_delay_setsize to avoid lock upgrade in fuse_vnode_setsize
In D55230#1276682, @kib wrote:

Am I understand you right: after entering VOP_READ() for fusefs, and having the vnode shared-locked, VOP_READ() might discover that the attributes changed and e.g. the node size increased? So it has to set delayed setsize, but needs to read some block after the current EOF? If yes, this is fine. For instance, nfs operates similarly.

The vnode pager size is effectively used only for one purpose: it prevents the page fault from mapping pages beyond the EOF. For instance, UFS uses pages with negative (but effectively, very high positive) page indexes as the storage for indirect blocks content. And the pager check for EOF exactly prevents mapping of the metadata from the indirect blocks into userspace.

So I believe it should work fine for fusefs.

Ok, that's a relief. I've made the requested change, and all existing tests (and one new one) pass. Are there any other tricky requirements for vn_delayed_setsize that I should be aware of? An earlier version of this change used vn_delayed_setsize unconditionally, even if there was already an exclusive vnode lock. But that led to vm_fault_lookup: fault on nofault entry panics that I didn't fully understand.

In D55230#1276682, @kib wrote:

Am I understand you right: after entering VOP_READ() for fusefs, and having the vnode shared-locked, VOP_READ() might discover that the attributes changed and e.g. the node size increased? So it has to set delayed setsize, but needs to read some block after the current EOF? If yes, this is fine. For instance, nfs operates similarly.

The vnode pager size is effectively used only for one purpose: it prevents the page fault from mapping pages beyond the EOF. For instance, UFS uses pages with negative (but effectively, very high positive) page indexes as the storage for indirect blocks content. And the pager check for EOF exactly prevents mapping of the metadata from the indirect blocks into userspace.

So I believe it should work fine for fusefs.

Ok, that's a relief. I've made the requested change, and all existing tests (and one new one) pass. Are there any other tricky requirements for vn_delayed_setsize that I should be aware of? An earlier version of this change used vn_delayed_setsize unconditionally, even if there was already an exclusive vnode lock. But that led to vm_fault_lookup: fault on nofault entry panics that I didn't fully understand.

It is impossible to guess. At least, provide the full kernel log including the panic and backtrace, from the issue.

sys/fs/fuse/fuse_node.h
108
115
sys/fs/fuse/fuse_vnops.c
3283

This is excessive, the assert is provided by VFS

  • Remove an excessive vnode lock assertion
This revision is now accepted and ready to land.Thu, Mar 12, 2:27 PM
This revision was automatically updated to reflect the committed changes.