Page MenuHomeFreeBSD

Add v_inval_buf_range, like vtruncbuf but for a range of a file
ClosedPublic

Authored by asomers on Jul 22 2019, 9:00 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Jan 21, 6:01 PM
Unknown Object (File)
Mon, Jan 20, 9:15 PM
Unknown Object (File)
Fri, Jan 10, 7:30 PM
Unknown Object (File)
Dec 23 2024, 1:41 PM
Unknown Object (File)
Dec 17 2024, 3:26 PM
Unknown Object (File)
Nov 26 2024, 3:57 AM
Unknown Object (File)
Nov 13 2024, 1:35 PM
Unknown Object (File)
Nov 13 2024, 8:32 AM
Subscribers

Details

Summary

Add v_inval_buf_range, like vtruncbuf but for a range of a file

v_inval_buf_range invalidates all buffers within a certain LBA range of a
file. It will be used by fusefs(5). This commit is a partial merge of
r346162, r346606, and r346756 from projects/fuse2.

Test Plan

Tested by the fusefs test suite in the projects/fuse2 branch.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 25524
Build 24142: arc lint + arc unit

Event Timeline

sys/kern/vfs_subr.c
1970

I think that the vnode must be always excl locked there.

1974–1975

for (anyfreed = true; anyfreed;)
or
while (anyfreed)

1974–1975

You should assert the locks state there.

1975–1981

This actually makes it impossible to invalidate a buffer with INT64_MAX lbn.

1990

return (EAGAIN);

1991–1992

Excessive ().

1997–1998

A lot of excessive ().
Compare bit using != 0 at the line 2087.

2015

What is the point of passing blksize ? Shouldn't it always equal to the vp' bo block size ?

2027

I think you need exclusive vnode lock always.

2030

There is no point in re-evaluating bo on restart.

2048

I do not understand this at all. You are asserting that there is no dirty buffers outside the range specified ? Why ?

2064

Use bool.

2097

Isn't the bo locked at this point ?

I do not think that we ever want a function (not locking primitive) that returns with different lock state depending on some hard-to-decode conditions.

IMO it is easier to consistently return with bo locked.

asomers added inline comments.
sys/kern/vfs_subr.c
1970

It isn't in at least two code paths:

VOP_WRITE // vnode is locked shared
msdosfs_write
deextend
detrunc
VOP_WRITE // vnode is locked shared
ffs_write
ffs_truncate
vtruncbuf
1975–1981

Good catch. I'll make 0 be a flag value for endlbn that means "to EOF"

1997–1998

I don't follow.

2015

Perhaps. But this is the way that vtruncbuf, getblk, bread, etc already do it. Would you like me to try removing the argument? If so, I'd like to do it in a separate commit.

2030

Ok, I'll change it here and in vtruncbuf.

2048

Good catch! The intent was to ensure that the caller didn't accidentally invalidate any dirty data beyond the byte offsets he specified. That would most likely indicate a programming error in the caller. But I see that expression is incomplete. It should say something like:

(blkstart >= start && blkend <= end) // buffer is wholly invalidated
|| blkstart >= end                                  // buffer lies past the invalidated range
|| blkend < start                                    // buffer lies before the invalidated range
2097

lol, you mean that we shouldn't do exactly what BUF_LOCK does? Ok, I'll refactor it.

sys/kern/vfs_subr.c
1970

I highly doubt that either UFS or msdosfs make writes or truncations with shared-locked vnode.

1997–1998
if (nbp != NULL &&
     (((nbp->b_xflags & BX_VNCLEAN) == 0 ||
     nbp->b_vp != vp ||
     (nbp->b_flags & B_DELWRI) != 0)) {

`

2015

Perhaps add assert that the bo block size is equal to the argument.

2097

BUF_LOCK always drops the bo lock if interlocked, otherwise it does not touch it.

asomers marked 2 inline comments as done.

Respond to kib's comments:

  • Move initializtion of struct bufobj *bo out of the restart loop
  • Always return bo locked from v_inval_buf_range1
  • Fix comment block for v_inval_buf_range
  • Fix logic in v_inval_buf_range's excess invalidation check
  • Use bools instead of ints, where applicable
  • ASSERT_VOP_LOCKED in v_inval_buf_range
  • remove unneeded parens
asomers added inline comments.
sys/kern/vfs_subr.c
1970

Ok, it looks like we could upgrade this to an exclusive lock assertion because ufs and msdosfs don't set MNTK_SHARED_WRITES. Only ZFS sets that, and it doesn't call vtruncbuf. I can do that if you like, but I'll do it in a separate commit.

1975–1981

Actually, there's no point. It's impossible for an lbn to have the value INT64_MAX since the minimum blksize is greater than 1 and the maximum offset is INT64_MAX.

sys/kern/vfs_subr.c
1959–1960

Do not use initialization in declaration.

1970

Ok, you may ask Peter Holm to test the upgrade of the assert.

1975–1981

Then perhaps assert that the end blkno is strictly less than INT64_MAX.

1990

LK_INTERLOCK means that bo lock is always dropped.

Did you tested the patch with WITNESS + DEBUG_VFS_LOCKS ?

2016

I still do not understand the assert. You are disallowing any delayed write buffers for the vnode outside the specified range. Why ?

2033

Assert bo lock ownership.

2036

You are initializing anyfreed to true, then immediately set it to false. As I understand, it is done to ignite the first iteration. Would it be more natural to use do/while loop instead ?

2097

return (0);

asomers marked 9 inline comments as done.

Respond to kib's latest comments

sys/kern/vfs_subr.c
1990

you can add e.g. restart_unlocked label before BO_LOCK() at line 1973 and avoid BO_LOCK() call there.

Eliminate one BO_LOCK call, as suggested by kib

asomers added inline comments.
sys/kern/vfs_subr.c
1959–1960

style(9) says that we can initialize in the declaration as long as we don't make any function calls. But I can change it if you like.

1970

ok.

1975–1981

Can't, because vtruncbuf passes INT64_MAX to mean "invalidate everything after startlbn.

1990

I did test with WITNESS and DEBUG_VFS_LOCKS. But I don't have any special trick for hitting this race. I'll add a BO_LOCK here, too.

2016

Not "outside of the specified range", but "partially overlapping with the specified range". The reason is that the caller only wanted to invalidate data within byte offsets [start, end). But since data must be invalidated in units of whole blocks, v_inval_buf_range may invalidate a little bit more than the caller requested. That's ok if the data is cached but clean. However, if the data is dirty then it's a bug! I've personally done that a couple of times already.

2027

I'll try doing that when I change vtruncbuf's lock assertion, too.

2036

Sure. I just did it this way because this is how the existing code worked.

sys/kern/vfs_subr.c
1975–1981

You talked about using zero as 'till the end' indicator ?

2016

No, I do not see any link between what you explain above and what do you assert. You are enforcing some state of the delayed writes which you, as a caller of v_inval_buf_range(), do not have any control. Or let me put it other way, the v_inval_buf_range() cannot be used unless the caller flushed dirty buffers not satisfying the assert you write, but this is typically not under the writer control.

asomers added inline comments.
sys/kern/vfs_subr.c
1975–1981

I talked about it, then realized that there would be no point since endlbn must always be less than INT64_MAX.

2016

Yes, that's exactly how fusefs uses this method. Just before calling v_inval_buf_range, it flushes any conflicting buffers with bwrite.

sys/kern/vfs_subr.c
1975–1981

So I do not see why not:

  1. use zero as the end indicator
  2. assert that no buffer found has the blno INT64_MAX.
2016

Then this is something fusefs specific and I do not see it as appropriate for the vfs_subr.c where generally useful functions should be exported.

asomers added inline comments.
sys/kern/vfs_subr.c
1975–1981
  1. More work to check the zero flag at runtime
  2. That check can't be done in v_inval_buf_range1 because vtruncbuf passes INT64_MAX . And I feel that it would be kind of tautological to do the check in v_inval_buf_range, because the calculation for endlbn is so simple.
2016

Ad yet, I see it as a layering violation to embed such intimate knowledge of the buffer cache into fusefs. What would you say if I changed the interface to accept whole buffer indexes only, instead of byte ranges? That would be less consistent with vtruncbuf's interface, but it would eliminate any possibility of the caller unintentionally invalidating more than he specified.

sys/kern/vfs_subr.c
1975–1981

I believe this INT64_MAX thing is a mine for later.

2016

If you pass block numbers (I suppose this is what you mean by buffer indexes) to the function, how would it be inconsistent with vtruncbuf ? The later invalidates everything.

Whatever route you take, I believe that the #invariants loop is incompatible with vfs_subr.c. If you eliminate it, we can refine the interface.

sys/kern/vfs_subr.c
2016

vtruncbuf invalidates everything on the high side. But on the low side it takes an off_t argument.

v_inval_buf_range: take arguments by block number instead of byte offset

I am fine with this version. Please note several minor style suggestions.

sys/kern/vfs_subr.c
1976

';' should be on the next line, indented by a tab.

2025

There is no much point in startp and endp single-use variables.

2032

';' on the next line.

This revision is now accepted and ready to land.Jul 27 2019, 1:59 PM
This revision was automatically updated to reflect the committed changes.
asomers marked 3 inline comments as done.