This fixes the issue described in the comment 16 of https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264196#c16
Details
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 65348 Build 62231: arc lint + arc unit
Event Timeline
Interesting, although it's still unclear what's going on. Perhaps the error should be reset upon some condition, like in commit 0c38e3dbbf6e where a similar pattern could be seen?
sys/kern/vfs_cluster.c | ||
---|---|---|
264 | Why is VOP_BMAP failing to begin with? |
sys/kern/vfs_cluster.c | ||
---|---|---|
264 | This is in the context of a FUSE implementation of NTFS (filesystems/ntfs port). I haven't debugged this problem yet, but this seems to be an orthogonal to this one? |
sys/kern/vfs_cluster.c | ||
---|---|---|
264 | There is a VOP returning an error, we don't know why, but the solution must be to silence all errors? I don't think this is the right approach. Your patch may be right in the end, but to reach that conclusion I think it is necessary to understand the root cause of the error. |
sys/kern/vfs_cluster.c | ||
---|---|---|
264 | As far as I understand, this VOP_BMAP call happens in the read-ahead code path. The cluster_read() does not treat read-ahead failures as fatal, because the request to read something is fulfilled anyways and data can be returned to the caller. The fact that hiding this error fixes reading via dd adds to this argument. |
sys/kern/vfs_cluster.c | ||
---|---|---|
264 | I understand, but we are not hiding just "this error" but all errors, and the code has worked this way for over 20 years. So I think it is reasonable to understand the root cause better before changing anything. |
sys/kern/vfs_cluster.c | ||
---|---|---|
264 | Ok, I'll dive into ntfs-3g, but as a last argument - the error coming from VOP_BMAP is already ignored if we hit the if (reqbp) case. |
sys/kern/vfs_cluster.c | ||
---|---|---|
264 | I am not saying the patch is wrong or that there are no bugs in the current version of cluster_read(). I am saying that the justification for the patch is not complete and doesn't meet the de facto standard for changing such code. |
I have tested this patch and confirm that it works. I've also written a minimal reproduction case, which I'll post as a separate review. And while I understand Mark's concern, I tend to agree with arrowd's approach. BMAP should be advisory, and readahead should be optional, so errors in neither should cause the main read to fail.
But I would like to see a better commit message. It should contain enough information to describe the problem without resorting to Bugzilla, since Bugzilla might not be around forever.
I did some research on the NTFS side and found some seemingly useful info:
- The bmap failure happens here: https://github.com/tuxera/ntfs-3g/blob/75dcdc2cf37478fad6c0e3427403d198b554951d/src/ntfs-3g.c#L2957
- blocksize = 65536 and ctx->vol->cluster_size = 4096
- This value for blocksize comes from https://github.com/freebsd/freebsd-src/blob/f261b63307fca34f27e4d12384d19cb543b4867a/sys/fs/fuse/fuse_vnops.c#L762C20-L762C27
- ... and is basically equal to sysctl vfs.maxbcachebuf, filled in here: https://github.com/freebsd/freebsd-src/blob/f261b63307fca34f27e4d12384d19cb543b4867a/sys/fs/fuse/fuse_vfsops.c#L444
I don't know if it is correct for our FUSE implementation to set blocksize based on this sysctl. Does this info ring a bell for someone?
I have no other idea how to move this forward from my side. Could someone please help me pushing this?
@arrowd I agree with your change. The only thing I wanted different is a more detailed commit message. Of course, Mark might disagree.
What about fbi->blocksize setting in https://github.com/freebsd/freebsd-src/blob/f261b63307fca34f27e4d12384d19cb543b4867a/sys/fs/fuse/fuse_vnops.c#L762C20-L762C27 ? Is correct to use such large block size?
Frankly, I'm also generally uncomfortable about changing the code which worked for 20 years without fully understanding the kitchen behind.
So I'm with Mark here. Even if the unconditional error silencing inside the loop here is indeed correct, we still need to know exactly what's going on to be able to write that aforementioned detailed commit message. :-)
code which worked for 20 years
How do you tell it is working? It might be that this code path was never triggered by anything but fusefs-ntfs with UBLIO disabled, which I think is quite possible. FUSE may work as a mocking framework for VOP_* operations - for testing purposes one can make any operation fail in arbitrary matter and the caller code should correctly handle the failure.
without fully understanding the kitchen behind.
I already provided my analysis in the inline comment https://reviews.freebsd.org/D51254#1171703 Unless I'm wrong with it, we now do understand what's going on and the proposed change in correct on its own, even without taking fusefs-ntfs into the context.
A bit of more research: this is what Linux does
https://github.com/torvalds/linux/blob/76eeb9b8de9880ca38696b2fb56ac45ac0a25c6c/fs/fuse/file.c#L2516
https://github.com/torvalds/linux/blob/f777d1112ee597d7f7dd3ca232220873a34ad0c8/fs/fuse/inode.c#L1825
It does not use a fixed value for blocksize for BMAP calls, but instead consults the s_blocksize field of struct superblock. This field is seemingly filled with PAGE_SIZE that is equal to 4096 and that matches what ntfs-3g expects.
I'm now certain that we should change https://github.com/freebsd/freebsd-src/blob/f261b63307fca34f27e4d12384d19cb543b4867a/sys/fs/fuse/fuse_vfsops.c#L444 to something else, but I still have no idea to what.
We already ignore VOP_BMAP() error when we handle the requested buffer itself, looking how many contiguous blocks we can read with it, see line 207. Then, ignoring the error and not accumulating more read-ahead buffers is even more ok, From this PoV I do not think that patch deserves such hot discussion, it can be argued that it adds the consistency, and more important case already does exactly this.
For UFS or any other in-tree local fs (msdosfs, cd9660, udf) it is practically impossible to get a failure from VOP_BMAP(), unless the fs is corrupted or the driver is failing. So I do not think it would really affect UFS.