Paths

Table of Contentst

vfs_cluster.c: Do not propagate VOP_BMAP errors to the caller
ClosedPublic
Actions

Authored by arrowd on Jul 11 2025, 7:54 AM.

Details

Reviewers

asomers
kib
markj

Group Reviewers

Contributor Reviews (src)

Commits

rG62aef3f73f38: vfs_cluster.c: Do not propagate VOP_BMAP errors to the caller

Summary

The code that makes this VOP_BMAP call tries to perform a read-ahead I/O
operation. Failing to do that for any reason isn't fatal for cluster_read(),
because we still can return some data to the caller. This is consistent with
other places within cluster_read(), where error returned by VOP_BMAP is not
returned to the caller - see the if (nblks > 1) block above the changed lines
and if (reqbp) at the end of the function.

PR: 264196

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 65348
Build 62231: arc lint + arc unit

Event Timeline

arrowd created this revision.Jul 11 2025, 7:54 AM

Herald added subscribers: olce, imp. · View Herald TranscriptJul 11 2025, 7:54 AM

arrowd requested review of this revision.Jul 11 2025, 7:54 AM

Harbormaster completed remote builds in B65348: Diff 158316.Jul 11 2025, 7:54 AM

Interesting, although it's still unclear what's going on. Perhaps the error should be reset upon some condition, like in commit 0c38e3dbbf6e where a similar pattern could be seen?

markj added a subscriber: markj.Jul 14 2025, 3:19 PM

markj added inline comments.

sys/kern/vfs_cluster.c
264	Why is VOP_BMAP failing to begin with?

arrowd added inline comments.Jul 14 2025, 3:24 PM

sys/kern/vfs_cluster.c
264	This is in the context of a FUSE implementation of NTFS (`filesystems/ntfs` port). I haven't debugged this problem yet, but this seems to be an orthogonal to this one?

markj added inline comments.Jul 14 2025, 3:52 PM

sys/kern/vfs_cluster.c
264	There is a VOP returning an error, we don't know why, but the solution must be to silence all errors? I don't think this is the right approach. Your patch may be right in the end, but to reach that conclusion I think it is necessary to understand the root cause of the error.

arrowd added inline comments.Jul 14 2025, 4:07 PM

sys/kern/vfs_cluster.c
264	As far as I understand, this VOP_BMAP call happens in the read-ahead code path. The `cluster_read()` does not treat read-ahead failures as fatal, because the request to read something is fulfilled anyways and data can be returned to the caller. The fact that hiding this error fixes reading via `dd` adds to this argument.

markj added inline comments.Jul 14 2025, 4:13 PM

sys/kern/vfs_cluster.c
264	I understand, but we are not hiding just "this error" but all errors, and the code has worked this way for over 20 years. So I think it is reasonable to understand the root cause better before changing anything.

arrowd added inline comments.Jul 14 2025, 4:41 PM

sys/kern/vfs_cluster.c
264	Ok, I'll dive into ntfs-3g, but as a last argument - the `error` coming from VOP_BMAP is already ignored if we hit the `if (reqbp)` case.

markj added inline comments.Jul 14 2025, 5:01 PM

sys/kern/vfs_cluster.c
264	I am not saying the patch is wrong or that there are no bugs in the current version of cluster_read(). I am saying that the justification for the patch is not complete and doesn't meet the de facto standard for changing such code.

I have tested this patch and confirm that it works. I've also written a minimal reproduction case, which I'll post as a separate review. And while I understand Mark's concern, I tend to agree with arrowd's approach. BMAP should be advisory, and readahead should be optional, so errors in neither should cause the main read to fail.
But I would like to see a better commit message. It should contain enough information to describe the problem without resorting to Bugzilla, since Bugzilla might not be around forever.

This revision now requires changes to proceed.Jul 14 2025, 11:28 PM

asomers mentioned this in D51316: fusefs: add a regression test for a cluster_read bug.Jul 14 2025, 11:34 PM

The regression test is at D51316 .

I did some research on the NTFS side and found some seemingly useful info:

The bmap failure happens here: https://github.com/tuxera/ntfs-3g/blob/75dcdc2cf37478fad6c0e3427403d198b554951d/src/ntfs-3g.c#L2957
blocksize = 65536 and ctx->vol->cluster_size = 4096
This value for blocksize comes from https://github.com/freebsd/freebsd-src/blob/f261b63307fca34f27e4d12384d19cb543b4867a/sys/fs/fuse/fuse_vnops.c#L762C20-L762C27
... and is basically equal to sysctl vfs.maxbcachebuf, filled in here: https://github.com/freebsd/freebsd-src/blob/f261b63307fca34f27e4d12384d19cb543b4867a/sys/fs/fuse/fuse_vfsops.c#L444

I don't know if it is correct for our FUSE implementation to set blocksize based on this sysctl. Does this info ring a bell for someone?

I have no other idea how to move this forward from my side. Could someone please help me pushing this?

@arrowd I agree with your change. The only thing I wanted different is a more detailed commit message. Of course, Mark might disagree.

What about fbi->blocksize setting in https://github.com/freebsd/freebsd-src/blob/f261b63307fca34f27e4d12384d19cb543b4867a/sys/fs/fuse/fuse_vnops.c#L762C20-L762C27 ? Is correct to use such large block size?

@asomers wrote:

The only thing I wanted different is a more detailed commit message. Of course, Mark might disagree.

Frankly, I'm also generally uncomfortable about changing the code which worked for 20 years without fully understanding the kitchen behind.

@markj wrote:

I don't think this is the right approach. Your patch may be right in the end, but to reach that conclusion I think it is necessary to understand the root cause of the error.

So I'm with Mark here. Even if the unconditional error silencing inside the loop here is indeed correct, we still need to know exactly what's going on to be able to write that aforementioned detailed commit message. :-)

code which worked for 20 years

How do you tell it is working? It might be that this code path was never triggered by anything but fusefs-ntfs with UBLIO disabled, which I think is quite possible. FUSE may work as a mocking framework for VOP_* operations - for testing purposes one can make any operation fail in arbitrary matter and the caller code should correctly handle the failure.

without fully understanding the kitchen behind.

I already provided my analysis in the inline comment https://reviews.freebsd.org/D51254#1171703 Unless I'm wrong with it, we now do understand what's going on and the proposed change in correct on its own, even without taking fusefs-ntfs into the context.

A bit of more research: this is what Linux does

https://github.com/torvalds/linux/blob/76eeb9b8de9880ca38696b2fb56ac45ac0a25c6c/fs/fuse/file.c#L2516
https://github.com/torvalds/linux/blob/f777d1112ee597d7f7dd3ca232220873a34ad0c8/fs/fuse/inode.c#L1825

It does not use a fixed value for blocksize for BMAP calls, but instead consults the s_blocksize field of struct superblock. This field is seemingly filled with PAGE_SIZE that is equal to 4096 and that matches what ntfs-3g expects.

I'm now certain that we should change https://github.com/freebsd/freebsd-src/blob/f261b63307fca34f27e4d12384d19cb543b4867a/sys/fs/fuse/fuse_vfsops.c#L444 to something else, but I still have no idea to what.

We already ignore VOP_BMAP() error when we handle the requested buffer itself, looking how many contiguous blocks we can read with it, see line 207. Then, ignoring the error and not accumulating more read-ahead buffers is even more ok, From this PoV I do not think that patch deserves such hot discussion, it can be argued that it adds the consistency, and more important case already does exactly this.

For UFS or any other in-tree local fs (msdosfs, cd9660, udf) it is practically impossible to get a failure from VOP_BMAP(), unless the fs is corrupted or the driver is failing. So I do not think it would really affect UFS.

Looking at the ntfs code, BMAP is failing because fusefs is calling the fuse BMAP operation with blocksize == maxiosize, and that'll generally be much larger than the NTFS cluster size. I'm not sure why fusefs passes that instead of the block size (or why the BMAP operation would take the blocksize as a parameter to begin with, is the block size not fixed by the client filesystem?), but it makes sense that the filesystem driver would return an error in that case.

And yes, it's ok in general for VOP_BMAP to fail. Before, I was concerned that VOP_BMAP errors would cause vnode_pager_generic_getpages() to misbehave, but I think fusefs doesn't use it.

So I have no objection to the change, but yes the commit log message should be more detailed.

sys/kern/vfs_cluster.c
262	Better would be to just write `(void)VOP_BMAP(...);`.

In D51254#1215055, @markj wrote:

why the BMAP operation would take the blocksize as a parameter to begin with, is the block size not fixed by the client filesystem?

For instance, UFS has fragments, which already means that block size is varying.

Another case where varying block size would matter (in much more significant way than fragments) are extent-based fs. We do not have any right now, but would we have, the existing bmap interface fits.

In D51254#1215182, @kib wrote:

In D51254#1215055, @markj wrote:

why the BMAP operation would take the blocksize as a parameter to begin with, is the block size not fixed by the client filesystem?

For instance, UFS has fragments, which already means that block size is varying.

Another case where varying block size would matter (in much more significant way than fragments) are extent-based fs. We do not have any right now, but would we have, the existing bmap interface fits.

And in fact NTFS is extent-based, so this probably explain that thing about passing the max block size. It would be more correct to pass the current extent size, which cluster cannot.

But anyway, everything that vfs_cluster does is optional and we only must ensure that the very first buffer of the cluster is validated.

Updated the commit message, waiting for the approval.

sys/kern/vfs_cluster.c
262	We still need to break out of the loop here in case of error.

kib accepted this revision.Tue, Oct 21, 1:33 PM

markj accepted this revision as: markj.Wed, Oct 22, 12:40 PM

This revision was not accepted when it landed; it landed in state Needs Revision.Wed, Oct 22, 5:00 PM

Closed by commit rG62aef3f73f38: vfs_cluster.c: Do not propagate VOP_BMAP errors to the caller (authored by arrowd). · Explain Why

This revision was automatically updated to reflect the committed changes.

arrowd added a commit: rG62aef3f73f38: vfs_cluster.c: Do not propagate VOP_BMAP errors to the caller.

arrowd added a comment.Thu, Oct 23, 7:11 AM

This comment was removed by arrowd.

Revision Contents
Changeset List

Path

Size

sys/

kern/

vfs_cluster.c

4 lines

Diff 158316

View Options

vfs_cluster.c: Do not propagate VOP_BMAP errors to the callerClosedPublicActions