Page MenuHomeFreeBSD

Add v_inval_buf_range, like vtruncbuf but for a range of a file
ClosedPublic

Authored by asomers on Jul 22 2019, 9:00 PM.
Tags
None
Referenced Files
F107144224: D21032.diff
Fri, Jan 10, 7:30 PM
Unknown Object (File)
Mon, Dec 23, 1:41 PM
Unknown Object (File)
Tue, Dec 17, 3:26 PM
Unknown Object (File)
Nov 26 2024, 3:57 AM
Unknown Object (File)
Nov 13 2024, 1:35 PM
Unknown Object (File)
Nov 13 2024, 8:32 AM
Unknown Object (File)
Nov 2 2024, 1:09 AM
Unknown Object (File)
Nov 2 2024, 1:08 AM
Subscribers

Details

Summary

Add v_inval_buf_range, like vtruncbuf but for a range of a file

v_inval_buf_range invalidates all buffers within a certain LBA range of a
file. It will be used by fusefs(5). This commit is a partial merge of
r346162, r346606, and r346756 from projects/fuse2.

Test Plan

Tested by the fusefs test suite in the projects/fuse2 branch.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 25462
Build 24082: arc lint + arc unit

Event Timeline

sys/kern/vfs_subr.c
1970

I think that the vnode must be always excl locked there.

2015

What is the point of passing blksize ? Shouldn't it always equal to the vp' bo block size ?

2027

I think you need exclusive vnode lock always.

2030

There is no point in re-evaluating bo on restart.

2048

I do not understand this at all. You are asserting that there is no dirty buffers outside the range specified ? Why ?

2064

Use bool.

2066

You should assert the locks state there.

2067

for (anyfreed = true; anyfreed;)
or
while (anyfreed)

2070

This actually makes it impossible to invalidate a buffer with INT64_MAX lbn.

2075

return (EAGAIN);

2078

Excessive ().

2087

A lot of excessive ().
Compare bit using != 0 at the line 2087.

2116

Isn't the bo locked at this point ?

I do not think that we ever want a function (not locking primitive) that returns with different lock state depending on some hard-to-decode conditions.

IMO it is easier to consistently return with bo locked.

asomers added inline comments.
sys/kern/vfs_subr.c
1970

It isn't in at least two code paths:

VOP_WRITE // vnode is locked shared
msdosfs_write
deextend
detrunc
VOP_WRITE // vnode is locked shared
ffs_write
ffs_truncate
vtruncbuf
2015

Perhaps. But this is the way that vtruncbuf, getblk, bread, etc already do it. Would you like me to try removing the argument? If so, I'd like to do it in a separate commit.

2030

Ok, I'll change it here and in vtruncbuf.

2048

Good catch! The intent was to ensure that the caller didn't accidentally invalidate any dirty data beyond the byte offsets he specified. That would most likely indicate a programming error in the caller. But I see that expression is incomplete. It should say something like:

(blkstart >= start && blkend <= end) // buffer is wholly invalidated
|| blkstart >= end                                  // buffer lies past the invalidated range
|| blkend < start                                    // buffer lies before the invalidated range
2070

Good catch. I'll make 0 be a flag value for endlbn that means "to EOF"

2087

I don't follow.

2116

lol, you mean that we shouldn't do exactly what BUF_LOCK does? Ok, I'll refactor it.

sys/kern/vfs_subr.c
1970

I highly doubt that either UFS or msdosfs make writes or truncations with shared-locked vnode.

2015

Perhaps add assert that the bo block size is equal to the argument.

2087
if (nbp != NULL &&
     (((nbp->b_xflags & BX_VNCLEAN) == 0 ||
     nbp->b_vp != vp ||
     (nbp->b_flags & B_DELWRI) != 0)) {

`

2116

BUF_LOCK always drops the bo lock if interlocked, otherwise it does not touch it.

asomers marked 2 inline comments as done.

Respond to kib's comments:

  • Move initializtion of struct bufobj *bo out of the restart loop
  • Always return bo locked from v_inval_buf_range1
  • Fix comment block for v_inval_buf_range
  • Fix logic in v_inval_buf_range's excess invalidation check
  • Use bools instead of ints, where applicable
  • ASSERT_VOP_LOCKED in v_inval_buf_range
  • remove unneeded parens
asomers added inline comments.
sys/kern/vfs_subr.c
1970

Ok, it looks like we could upgrade this to an exclusive lock assertion because ufs and msdosfs don't set MNTK_SHARED_WRITES. Only ZFS sets that, and it doesn't call vtruncbuf. I can do that if you like, but I'll do it in a separate commit.

2070

Actually, there's no point. It's impossible for an lbn to have the value INT64_MAX since the minimum blksize is greater than 1 and the maximum offset is INT64_MAX.

sys/kern/vfs_subr.c
1959–1960

Do not use initialization in declaration.

1970

Ok, you may ask Peter Holm to test the upgrade of the assert.

2070

Then perhaps assert that the end blkno is strictly less than INT64_MAX.

2075

LK_INTERLOCK means that bo lock is always dropped.

Did you tested the patch with WITNESS + DEBUG_VFS_LOCKS ?

2108

I still do not understand the assert. You are disallowing any delayed write buffers for the vnode outside the specified range. Why ?

2116

return (0);

2125

Assert bo lock ownership.

2128

You are initializing anyfreed to true, then immediately set it to false. As I understand, it is done to ignite the first iteration. Would it be more natural to use do/while loop instead ?

asomers marked 9 inline comments as done.

Respond to kib's latest comments

sys/kern/vfs_subr.c
2075

you can add e.g. restart_unlocked label before BO_LOCK() at line 1973 and avoid BO_LOCK() call there.

Eliminate one BO_LOCK call, as suggested by kib

asomers added inline comments.
sys/kern/vfs_subr.c
1959–1960

style(9) says that we can initialize in the declaration as long as we don't make any function calls. But I can change it if you like.

1970

ok.

2027

I'll try doing that when I change vtruncbuf's lock assertion, too.

2070

Can't, because vtruncbuf passes INT64_MAX to mean "invalidate everything after startlbn.

2075

I did test with WITNESS and DEBUG_VFS_LOCKS. But I don't have any special trick for hitting this race. I'll add a BO_LOCK here, too.

2108

Not "outside of the specified range", but "partially overlapping with the specified range". The reason is that the caller only wanted to invalidate data within byte offsets [start, end). But since data must be invalidated in units of whole blocks, v_inval_buf_range may invalidate a little bit more than the caller requested. That's ok if the data is cached but clean. However, if the data is dirty then it's a bug! I've personally done that a couple of times already.

2128

Sure. I just did it this way because this is how the existing code worked.

sys/kern/vfs_subr.c
2070

You talked about using zero as 'till the end' indicator ?

2108

No, I do not see any link between what you explain above and what do you assert. You are enforcing some state of the delayed writes which you, as a caller of v_inval_buf_range(), do not have any control. Or let me put it other way, the v_inval_buf_range() cannot be used unless the caller flushed dirty buffers not satisfying the assert you write, but this is typically not under the writer control.

asomers added inline comments.
sys/kern/vfs_subr.c
2070

I talked about it, then realized that there would be no point since endlbn must always be less than INT64_MAX.

2108

Yes, that's exactly how fusefs uses this method. Just before calling v_inval_buf_range, it flushes any conflicting buffers with bwrite.

sys/kern/vfs_subr.c
2070

So I do not see why not:

  1. use zero as the end indicator
  2. assert that no buffer found has the blno INT64_MAX.
2108

Then this is something fusefs specific and I do not see it as appropriate for the vfs_subr.c where generally useful functions should be exported.

asomers added inline comments.
sys/kern/vfs_subr.c
2070
  1. More work to check the zero flag at runtime
  2. That check can't be done in v_inval_buf_range1 because vtruncbuf passes INT64_MAX . And I feel that it would be kind of tautological to do the check in v_inval_buf_range, because the calculation for endlbn is so simple.
2108

Ad yet, I see it as a layering violation to embed such intimate knowledge of the buffer cache into fusefs. What would you say if I changed the interface to accept whole buffer indexes only, instead of byte ranges? That would be less consistent with vtruncbuf's interface, but it would eliminate any possibility of the caller unintentionally invalidating more than he specified.

sys/kern/vfs_subr.c
2070

I believe this INT64_MAX thing is a mine for later.

2108

If you pass block numbers (I suppose this is what you mean by buffer indexes) to the function, how would it be inconsistent with vtruncbuf ? The later invalidates everything.

Whatever route you take, I believe that the #invariants loop is incompatible with vfs_subr.c. If you eliminate it, we can refine the interface.

sys/kern/vfs_subr.c
2108

vtruncbuf invalidates everything on the high side. But on the low side it takes an off_t argument.

v_inval_buf_range: take arguments by block number instead of byte offset

I am fine with this version. Please note several minor style suggestions.

sys/kern/vfs_subr.c
2071

';' should be on the next line, indented by a tab.

2117

There is no much point in startp and endp single-use variables.

2124

';' on the next line.

This revision is now accepted and ready to land.Jul 27 2019, 1:59 PM
This revision was automatically updated to reflect the committed changes.
asomers marked 3 inline comments as done.