Page MenuHomeFreeBSD

fuse: Do not unconditionally overwrite local vnode size with remote
Needs ReviewPublic

Authored by emil_etsalapatis.com on Mon, Feb 2, 12:24 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Feb 25, 3:25 AM
Unknown Object (File)
Sat, Feb 21, 1:19 AM
Unknown Object (File)
Wed, Feb 18, 2:29 AM
Unknown Object (File)
Mon, Feb 16, 6:25 PM
Unknown Object (File)
Sat, Feb 14, 1:25 PM
Unknown Object (File)
Tue, Feb 3, 12:41 AM
Unknown Object (File)
Mon, Feb 2, 9:41 PM
Unknown Object (File)
Mon, Feb 2, 3:06 PM
Subscribers

Details

Reviewers
asomers
Summary

The code in fuse_internal_cache_attrs() assumes that the size reported
by a FUSE_GETATTR request is always canonical. That is not accurate -
it is possible the file has had its FN_SIZECHANGE set during the
getattr call due to a concurrent write. However, the code currently
still considers the server's response canonical and overwrite the size
of the file with the one reported by the server, most likely erasing
data due to file truncation. Adjust the code to consider whether the
file has its FN_SIZECHANGE flag set, and if so use the local cached
size instead.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 70334
Build 67217: arc lint + arc unit

Event Timeline

Interesting find. I'll review this next week, after I'm home from vacation.

Thank you! For context, this diff along with the elock issue we discussed on D55046 are the main blockers for getting virtiofs working (D46295 + D46296). This issue from this diff specifically causes constant truncations when using metadata caching from the virtiofs (and assume FUSE) server. This looks like corruption from a user's point of view. I actually think that it'd be better if we removed the warnings altogether because they trigger spuriously and so are not helpful as diagnostics.

This looks like a real problem. But how did you find it? Was it just virtio-fs, or did you find it with a regular fuse file system too? If either case, do you know if FUSE_ASYNC_READ was set? And do you know what values are typically used for the entry and attribute cache?

The odd thing here is that VOP_WRITE should always be called with the vnode exclusively locked. So it shouldn't be possible for it to race with a separate operation, like VOP_READ, that might call fuse_internal_cache_attrs . Do you have a stack trace for either thread involved with this race?

This looks like a real problem. But how did you find it? Was it just virtio-fs, or did you find it with a regular fuse file system too? If either case, do you know if FUSE_ASYNC_READ was set? And do you know what values are typically used for the entry and attribute cache?

This was with virtiofs, yes. The feature that seems to trigger this is metadata caching: I can reliably get it to happen when I turn it on. The default values are 86400 for those, but I don't think the number matters, the race seems pretty fast (see below).

I can reliably trigger this by cloning a repo into the mount (Redis , but any repo will probably do it). What seems to happen is that a getattr() call returning from the host is stale and causes fuse_vnode_setsize to truncate writes that happened in blocks past the reported value. I don't turn on FSESS_ASYNC_READ for virtiofs, so it's not used.

The odd thing here is that VOP_WRITE should always be called with the vnode exclusively locked. So it shouldn't be possible for it to race with a separate operation, like VOP_READ, that might call fuse_internal_cache_attrs . Do you have a stack trace for either thread involved with this race?

I can try getting one, I'll post it here. I can also get a trace of the FUSE tickets from the server. Anecdotally, when the bug triggered for me I would always see a getattr with the stale size from the server side.