Details

Reviewers

kib
markj
vishwin

Commits

rG6a1ebd14aa96: fusefs: redo vnode attribute locking
rG7e68af7ce2c1: fusefs: redo vnode attribute locking

Summary

Previously most fields in fuse_vnode_data were protected by the vnode
lock. But because DEBUG_VFS_LOCKS was never enabled by default until
stable/15 the assertions were never checked, and many were wrong.
Others were missing. This led to crashes in stable/15 and 16.0-CURRENT,
when a vnode was expected to be exclusively locked but wasn't, for fuse
file systems that mount with "-o async".

In some places it isn't possible to upgrade the vnode lock to exclusive
when accessing these fields. So protect them with a new mutex instead.
This fixes crashes and unprotected field accesses in VOP_READ,
VOP_COPY_FILE_RANGE, VOP_GETATTR, VOP_BMAP, and FUSE_NOTIFY_INVAL_ENTRY.
Add assertions everywhere the protected fields are accessed.

Lock the vnode exclusively when handling FUSE_NOTIFY_INVAL_INODE.

Upgrade the vnode lock, if necessary, during fuse_vnode_setsize, to fix
crashes during VOP_READ or VOP_GETATTR.

Also, ensure that fuse_vnop_rename locks the from vnode.

Finally, reorder elements in struct fuse_vnode_data to reduce the
structure size.

Fixes: 283391
Reported by: kargl, markj, vishwin, Abdelkader Boudih, groenveld@acm.org
MFC after: 2 weeks
Sponsored by: ConnectWise

Test Plan

Test cases added

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 71346
Build 68229: arc lint + arc unit

Event Timeline

asomers created this revision.Feb 10 2026, 8:27 PM

Herald added a subscriber: imp. · View Herald TranscriptFeb 10 2026, 8:27 PM

asomers requested review of this revision.Feb 10 2026, 8:27 PM

Harbormaster completed remote builds in B70591: Diff 171690.Feb 10 2026, 8:27 PM

asomers mentioned this in D55145: fusefs: Fix vnode lock mismatch in fuse_internal_do_getattr().Feb 10 2026, 9:46 PM

markj added inline comments.Feb 10 2026, 10:09 PM

sys/fs/fuse/fuse_node.c
161	Indentation looks off here.
456	Isn't it really fusefs that requires the exclusive lock for those calls? The mentioned functions themselves do not require the vnode lock if I understand correctly. If so it might be better to explain exactly why the exclusive lock is needed.
544–554	This should be on the previous line.
sys/fs/fuse/fuse_node.h
103	I think these macros need the do-while treatment, otherwise you will get incorrect behaviour (or a compiler error?) from a construct like if (foo) CACHED_ATTR_LOCK(vp); else ...
194	Use VNASSERT?
sys/fs/fuse/fuse_vnops.c
2302	Should this say "avoiding the LOR"?

asomers marked 3 inline comments as done.Feb 10 2026, 11:28 PM

asomers added inline comments.

sys/fs/fuse/fuse_node.c
456	vnode_pager_setsize does require it; see vm/vnode_pager.c line 511. And vtruncbuf calls vnode_pager_setsize.
sys/fs/fuse/fuse_node.h
103	Good catch.

Respond to markj's comments

Harbormaster completed remote builds in B70614: Diff 171715.Feb 10 2026, 11:28 PM

kib added inline comments.Feb 11 2026, 3:21 AM

sys/fs/fuse/fuse_file.c
220	This is perhaps generic question about fuse, triggered by the addition of the assert of the vnode lock at the start of the function. fdisp_wait_answ() does an unbound sleep while owning the vnode lock. This is not right. Essentially, this is a recipe for lock cascade, which might affect the daemon that must reply to kernel itself, deadlocking the whole machine.
sys/fs/fuse/fuse_node.c
451	fuse_vnode_setsize() sometimes called when other vnode is locked,e.g. in fuse_vnop_copy_file_range(). Dropping and re-acquiring the lock means that you create LoR.
sys/fs/fuse/fuse_node.h
102	I think you want an assert that the vnode is locked, before checking VOP_ISLOCKED().
194	Similarly, vp lock should be asserted before this.

asomers marked 2 inline comments as done.Feb 11 2026, 1:40 PM

asomers added inline comments.

sys/fs/fuse/fuse_file.c
220	It's not actually unbounded. It's limited by `daemon_timeout`, which is 60s by default, but can be changed by a mount option.
sys/fs/fuse/fuse_node.c
451	For copy_file_range, the vnode that we will call fuse_vnode_setsize on will always be exclusively locked. So fuse_vnode_setsize won't need to upgrade the lock. So in that case, there's no risk of LOR. The only times that it will need to upgrade the lock are during `VOP_GETATTR`, `VOP_ACCESS`, `VOP_ADVLOCK`, `VOP_GETEXTATTR`, `VOP_SETEXTATTR`, `VOP_LISTEXTATTR`, `VOP_DELETEEXTATTR`, `VOP_CLOSE` (if updating atime), `VOP_LOOKUP` (when removing a file, if the parent's sticky bit is set), and `VOP_VPTOFH`. I should double check VOP_CLOSE. It's man page doesn't say that the vnode will be locked. And I'll double check `VOP_L0OKUP` when removing with a sticky parent.

kib added inline comments.Feb 11 2026, 1:49 PM

sys/fs/fuse/fuse_file.c
220	System getting stuck for 1 minute does not sound good.
sys/fs/fuse/fuse_node.c
451	For VOP_LOOKUP() for DELETE, both parent and target must be exclusively locked. I am not sure why you listed VOP_CLOSE() as potential trouble point. Still, such relock is very fragile. Could you cache the new size, and then have a fusefs-specific VOP_LOCK() which sets the size on the next lock? I did that for nfsclient to avoid these issues in principle.

asomers added inline comments.Feb 11 2026, 2:45 PM

sys/fs/fuse/fuse_file.c
220	In practice it's not that bad. It usually only happens if the daemon crashes or hangs. And in that case, you can always hit the mountpoint with a "umount -f". I'm imagining now what fusefs would look like it we never held a vnode lock while blocking. It sounds hard. Are you imagining, for example, that fuse_vnop_setattr would release the vnode lock before sending the request to the daemon, and reacquire it on return? If so, then multiple chmod calls could stack up within fusefs. And whether a sequence of chmod calls all succeed could depend on how fast the daemon processes them. The first call might change file permissions in a way that should block the second call. But if the second call passes its access checks before the daemon responds to the first call, then we could end up sending the second to the daemon anyway.
sys/fs/fuse/fuse_node.c
451	I mentioned `VOP_CLOSE` because the man page doesn't specify the locking type. Looking deeper, I see that it will always be LK_EXCLUSIVE since we don't set MNTK_EXTENDED_SHARED. I could try the VOP_LOCK trick. There might be some tricky issues there. Do you require me to do it before merging this PR, or can I do it afterwards?

kib added inline comments.Feb 12 2026, 6:48 AM

sys/fs/fuse/fuse_file.c
220	Does unmount -f able to unblock threads owning the fusefs vnode locks? For NFS client, it does, but it requires a lot of specific features in the fs. Also, this is the reason why we have the rule 'vnode might be reclaimed any time the lock is not owned'.
sys/fs/fuse/fuse_node.c
451	Sure I do not require anything. It is just technically better path, IMO. The current patch is improvement, so it makes no sense to postpone if you plan to work on other scheme some time later.

asomers added inline comments.Feb 12 2026, 1:31 PM

sys/fs/fuse/fuse_file.c
220	`unmount -f` will unmount the file system immediately, without waiting for the server to respond to anything. But by itself it doesn't unblock any threads. If a thread is blocked in `fdisp_wait_answ` but the operation hasn't yet been sent to userland, then it can be interrupted by a signal. If a thread is blocked in `fdisp_wait_answ` and its operation has been sent to userland, then it can be interrupted only with the cooperation of the server. Some servers support that feature and some don't. If they don't, then there's no way to interrupt the operation before the 60s timeout.

kib added inline comments.Feb 12 2026, 3:10 PM

sys/fs/fuse/fuse_file.c
220	`unmount -f` will unmount the file system immediately, without waiting for the server to respond to anything. But by itself it doesn't unblock any threads. From my reading of the code, it does not. The fuse' VFS_UNMOUNT() correctly starts with the vflush() which reclaims each vnode on the mp list. To reclaim, vflush() locks the vnode exclusively, that would just block if there is the locked vnode by a thread waiting for the userspace reply.

asomers added inline comments.Feb 12 2026, 5:55 PM

sys/fs/fuse/fuse_file.c
220	Right. fuse_vfsop_unmount may block when trying to flush vnodes, or when waiting on the FUSE_DESTROY operation. But the file system is already unmounted by that point; other processes can no longer access it.

kib added inline comments.Feb 12 2026, 7:09 PM

sys/fs/fuse/fuse_file.c
220	Other processes might have already blocked on the fuse vnodes. Also, while the fs is unmounted, its covered vnode is exclusively locked (this is how the unmounted fs is made inaccessible while being unmounted). All of them are happy to participate in the lock cascade.

@kib is there anything else you'd like me to do for this PR, short of rewriting everything so the vnode lock isn't held while waiting for the daemon?

In D55230#1269768, @asomers wrote:

@kib is there anything else you'd like me to do for this PR, short of rewriting everything so the vnode lock isn't held while waiting for the daemon?

We talked about avoiding lock upgrade. I am fine with that done after this change is committed.

vishwin added inline comments.Feb 25 2026, 8:20 PM

sys/fs/fuse/fuse_vnops.c
2316–2317	After D55231 landed, this is now a rebase conflict, unless other changes are coming.

In D55230#1269852, @kib wrote:

In D55230#1269768, @asomers wrote:

@kib is there anything else you'd like me to do for this PR, short of rewriting everything so the vnode lock isn't held while waiting for the daemon?

We talked about avoiding lock upgrade. I am fine with that done after this change is committed.

D55595 should help with the vnode_pager_setsize(), in particular, it should allow to avoid the lock upgrade.

In D55230#1271915, @kib wrote:

D55595 should help with the vnode_pager_setsize(), in particular, it should allow to avoid the lock upgrade.

I think I can use vn_delay_setsize, except in one case: sometimes fusefs's attribute cache is expired before VOP_READ begins. So it needs to fetch attributes from the server with FUSE_GETATTR _before_ it can send FUSE_READ. If the new attributes indicate that the file's size has changed, then it needs to update the file's size (which includes calling vnode_pager_setsize), _before_ sending FUSE_READ. It does all of that without ever unlocking the vnode. If I use vn_delay_setsize, then the pager won't be updated before the FUSE_READ returns, and bread will be operating on an out-of-date pager. To handle this situation, should I keep the lock upgrade code? Or is it ok for bread to operate on an out-of-date pager?

In D55230#1276542, @asomers wrote:

In D55230#1271915, @kib wrote:

D55595 should help with the vnode_pager_setsize(), in particular, it should allow to avoid the lock upgrade.

I think I can use vn_delay_setsize, except in one case: sometimes fusefs's attribute cache is expired before VOP_READ begins. So it needs to fetch attributes from the server with FUSE_GETATTR _before_ it can send FUSE_READ. If the new attributes indicate that the file's size has changed, then it needs to update the file's size (which includes calling vnode_pager_setsize), _before_ sending FUSE_READ. It does all of that without ever unlocking the vnode. If I use vn_delay_setsize, then the pager won't be updated before the FUSE_READ returns, and bread will be operating on an out-of-date pager. To handle this situation, should I keep the lock upgrade code? Or is it ok for bread to operate on an out-of-date pager?

Am I understand you right: after entering VOP_READ() for fusefs, and having the vnode shared-locked, VOP_READ() might discover that the attributes changed and e.g. the node size increased? So it has to set delayed setsize, but needs to read some block after the current EOF? If yes, this is fine. For instance, nfs operates similarly.

The vnode pager size is effectively used only for one purpose: it prevents the page fault from mapping pages beyond the EOF. For instance, UFS uses pages with negative (but effectively, very high positive) page indexes as the storage for indirect blocks content. And the pager check for EOF exactly prevents mapping of the metadata from the indirect blocks into userspace.

So I believe it should work fine for fusefs.

Respond to markj's comments
Assert vnode lock before checking lock type
Use vn_delay_setsize to avoid lock upgrade in fuse_vnode_setsize

Harbormaster completed remote builds in B71346: Diff 173581.Wed, Mar 11, 4:54 PM

In D55230#1276682, @kib wrote:

Am I understand you right: after entering VOP_READ() for fusefs, and having the vnode shared-locked, VOP_READ() might discover that the attributes changed and e.g. the node size increased? So it has to set delayed setsize, but needs to read some block after the current EOF? If yes, this is fine. For instance, nfs operates similarly.

The vnode pager size is effectively used only for one purpose: it prevents the page fault from mapping pages beyond the EOF. For instance, UFS uses pages with negative (but effectively, very high positive) page indexes as the storage for indirect blocks content. And the pager check for EOF exactly prevents mapping of the metadata from the indirect blocks into userspace.

So I believe it should work fine for fusefs.

Ok, that's a relief. I've made the requested change, and all existing tests (and one new one) pass. Are there any other tricky requirements for vn_delayed_setsize that I should be aware of? An earlier version of this change used vn_delayed_setsize unconditionally, even if there was already an exclusive vnode lock. But that led to vm_fault_lookup: fault on nofault entry panics that I didn't fully understand.

In D55230#1276905, @asomers wrote:

In D55230#1276682, @kib wrote:

Am I understand you right: after entering VOP_READ() for fusefs, and having the vnode shared-locked, VOP_READ() might discover that the attributes changed and e.g. the node size increased? So it has to set delayed setsize, but needs to read some block after the current EOF? If yes, this is fine. For instance, nfs operates similarly.

The vnode pager size is effectively used only for one purpose: it prevents the page fault from mapping pages beyond the EOF. For instance, UFS uses pages with negative (but effectively, very high positive) page indexes as the storage for indirect blocks content. And the pager check for EOF exactly prevents mapping of the metadata from the indirect blocks into userspace.

So I believe it should work fine for fusefs.

Ok, that's a relief. I've made the requested change, and all existing tests (and one new one) pass. Are there any other tricky requirements for vn_delayed_setsize that I should be aware of? An earlier version of this change used vn_delayed_setsize unconditionally, even if there was already an exclusive vnode lock. But that led to vm_fault_lookup: fault on nofault entry panics that I didn't fully understand.

It is impossible to guess. At least, provide the full kernel log including the panic and backtrace, from the issue.

sys/fs/fuse/fuse_node.h
108
115
sys/fs/fuse/fuse_vnops.c
3283	This is excessive, the assert is provided by VFS

Remove an excessive vnode lock assertion

Harbormaster completed remote builds in B71362: Diff 173604.Thu, Mar 12, 12:30 PM

kib accepted this revision.Thu, Mar 12, 2:27 PM

This revision is now accepted and ready to land.Thu, Mar 12, 2:27 PM

Closed by commit rG7e68af7ce2c1: fusefs: redo vnode attribute locking (authored by asomers). · Explain WhyThu, Mar 12, 4:12 PM

This revision was automatically updated to reflect the committed changes.

asomers added a commit: rG7e68af7ce2c1: fusefs: redo vnode attribute locking.

asomers added a commit: rG6a1ebd14aa96: fusefs: redo vnode attribute locking.Thu, Mar 26, 3:08 PM

fusefs: redo vnode attribute locking
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 173581

sys/fs/fuse/fuse_file.c

sys/fs/fuse/fuse_internal.c

sys/fs/fuse/fuse_io.c

sys/fs/fuse/fuse_node.h

sys/fs/fuse/fuse_node.c

sys/fs/fuse/fuse_vfsops.c

sys/fs/fuse/fuse_vnops.c

tests/sys/fs/fusefs/bmap.cc

tests/sys/fs/fusefs/notify.cc

tests/sys/fs/fusefs/read.cc

tests/sys/fs/fusefs/rename.cc

fusefs: redo vnode attribute lockingClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 173581

sys/fs/fuse/fuse_file.c

sys/fs/fuse/fuse_internal.c

sys/fs/fuse/fuse_io.c

sys/fs/fuse/fuse_node.h

sys/fs/fuse/fuse_node.c

sys/fs/fuse/fuse_vfsops.c

sys/fs/fuse/fuse_vnops.c

tests/sys/fs/fusefs/bmap.cc

tests/sys/fs/fusefs/notify.cc

tests/sys/fs/fusefs/read.cc

tests/sys/fs/fusefs/rename.cc

fusefs: redo vnode attribute locking
ClosedPublic
Actions

Revision Contents
Changeset List