Page MenuHomeFreeBSD

Add v_inval_buf_range, like vtruncbuf but for a range of a file
ClosedPublic

Authored by asomers on Jul 22 2019, 9:00 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Mar 23, 5:40 AM
Unknown Object (File)
Mar 7 2024, 9:27 PM
Unknown Object (File)
Mar 7 2024, 2:48 PM
Unknown Object (File)
Mar 5 2024, 5:25 PM
Unknown Object (File)
Mar 5 2024, 4:30 AM
Unknown Object (File)
Jan 27 2024, 5:12 AM
Unknown Object (File)
Jan 16 2024, 7:33 AM
Unknown Object (File)
Jan 14 2024, 8:52 PM
Subscribers

Details

Summary

Add v_inval_buf_range, like vtruncbuf but for a range of a file

v_inval_buf_range invalidates all buffers within a certain LBA range of a
file. It will be used by fusefs(5). This commit is a partial merge of
r346162, r346606, and r346756 from projects/fuse2.

Test Plan

Tested by the fusefs test suite in the projects/fuse2 branch.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

sys/kern/vfs_subr.c
1970 ↗(On Diff #60033)

I think that the vnode must be always excl locked there.

2015 ↗(On Diff #60033)

What is the point of passing blksize ? Shouldn't it always equal to the vp' bo block size ?

2027 ↗(On Diff #60033)

I think you need exclusive vnode lock always.

2030 ↗(On Diff #60033)

There is no point in re-evaluating bo on restart.

2048 ↗(On Diff #60033)

I do not understand this at all. You are asserting that there is no dirty buffers outside the range specified ? Why ?

2064 ↗(On Diff #60033)

Use bool.

2066 ↗(On Diff #60033)

You should assert the locks state there.

2067 ↗(On Diff #60033)

for (anyfreed = true; anyfreed;)
or
while (anyfreed)

2070 ↗(On Diff #60033)

This actually makes it impossible to invalidate a buffer with INT64_MAX lbn.

2075 ↗(On Diff #60033)

return (EAGAIN);

2078 ↗(On Diff #60033)

Excessive ().

2087 ↗(On Diff #60033)

A lot of excessive ().
Compare bit using != 0 at the line 2087.

2116 ↗(On Diff #60033)

Isn't the bo locked at this point ?

I do not think that we ever want a function (not locking primitive) that returns with different lock state depending on some hard-to-decode conditions.

IMO it is easier to consistently return with bo locked.

asomers added inline comments.
sys/kern/vfs_subr.c
1970 ↗(On Diff #60033)

It isn't in at least two code paths:

VOP_WRITE // vnode is locked shared
msdosfs_write
deextend
detrunc
VOP_WRITE // vnode is locked shared
ffs_write
ffs_truncate
vtruncbuf
2015 ↗(On Diff #60033)

Perhaps. But this is the way that vtruncbuf, getblk, bread, etc already do it. Would you like me to try removing the argument? If so, I'd like to do it in a separate commit.

2030 ↗(On Diff #60033)

Ok, I'll change it here and in vtruncbuf.

2048 ↗(On Diff #60033)

Good catch! The intent was to ensure that the caller didn't accidentally invalidate any dirty data beyond the byte offsets he specified. That would most likely indicate a programming error in the caller. But I see that expression is incomplete. It should say something like:

(blkstart >= start && blkend <= end) // buffer is wholly invalidated
|| blkstart >= end                                  // buffer lies past the invalidated range
|| blkend < start                                    // buffer lies before the invalidated range
2070 ↗(On Diff #60033)

Good catch. I'll make 0 be a flag value for endlbn that means "to EOF"

2087 ↗(On Diff #60033)

I don't follow.

2116 ↗(On Diff #60033)

lol, you mean that we shouldn't do exactly what BUF_LOCK does? Ok, I'll refactor it.

sys/kern/vfs_subr.c
1970 ↗(On Diff #60033)

I highly doubt that either UFS or msdosfs make writes or truncations with shared-locked vnode.

2015 ↗(On Diff #60033)

Perhaps add assert that the bo block size is equal to the argument.

2087 ↗(On Diff #60033)
if (nbp != NULL &&
     (((nbp->b_xflags & BX_VNCLEAN) == 0 ||
     nbp->b_vp != vp ||
     (nbp->b_flags & B_DELWRI) != 0)) {

`

2116 ↗(On Diff #60033)

BUF_LOCK always drops the bo lock if interlocked, otherwise it does not touch it.

asomers marked 2 inline comments as done.

Respond to kib's comments:

  • Move initializtion of struct bufobj *bo out of the restart loop
  • Always return bo locked from v_inval_buf_range1
  • Fix comment block for v_inval_buf_range
  • Fix logic in v_inval_buf_range's excess invalidation check
  • Use bools instead of ints, where applicable
  • ASSERT_VOP_LOCKED in v_inval_buf_range
  • remove unneeded parens
asomers added inline comments.
sys/kern/vfs_subr.c
1970 ↗(On Diff #60033)

Ok, it looks like we could upgrade this to an exclusive lock assertion because ufs and msdosfs don't set MNTK_SHARED_WRITES. Only ZFS sets that, and it doesn't call vtruncbuf. I can do that if you like, but I'll do it in a separate commit.

2070 ↗(On Diff #60033)

Actually, there's no point. It's impossible for an lbn to have the value INT64_MAX since the minimum blksize is greater than 1 and the maximum offset is INT64_MAX.

sys/kern/vfs_subr.c
1959 ↗(On Diff #60062)

Do not use initialization in declaration.

1989 ↗(On Diff #60062)

LK_INTERLOCK means that bo lock is always dropped.

Did you tested the patch with WITNESS + DEBUG_VFS_LOCKS ?

2048 ↗(On Diff #60062)

I still do not understand the assert. You are disallowing any delayed write buffers for the vnode outside the specified range. Why ?

2065 ↗(On Diff #60062)

Assert bo lock ownership.

2068 ↗(On Diff #60062)

You are initializing anyfreed to true, then immediately set it to false. As I understand, it is done to ignite the first iteration. Would it be more natural to use do/while loop instead ?

2116 ↗(On Diff #60062)

return (0);

1970 ↗(On Diff #60033)

Ok, you may ask Peter Holm to test the upgrade of the assert.

2070 ↗(On Diff #60033)

Then perhaps assert that the end blkno is strictly less than INT64_MAX.

asomers marked 9 inline comments as done.

Respond to kib's latest comments

sys/kern/vfs_subr.c
1991 ↗(On Diff #60094)

you can add e.g. restart_unlocked label before BO_LOCK() at line 1973 and avoid BO_LOCK() call there.

Eliminate one BO_LOCK call, as suggested by kib

asomers added inline comments.
sys/kern/vfs_subr.c
1959 ↗(On Diff #60062)

style(9) says that we can initialize in the declaration as long as we don't make any function calls. But I can change it if you like.

1989 ↗(On Diff #60062)

I did test with WITNESS and DEBUG_VFS_LOCKS. But I don't have any special trick for hitting this race. I'll add a BO_LOCK here, too.

2048 ↗(On Diff #60062)

Not "outside of the specified range", but "partially overlapping with the specified range". The reason is that the caller only wanted to invalidate data within byte offsets [start, end). But since data must be invalidated in units of whole blocks, v_inval_buf_range may invalidate a little bit more than the caller requested. That's ok if the data is cached but clean. However, if the data is dirty then it's a bug! I've personally done that a couple of times already.

2068 ↗(On Diff #60062)

Sure. I just did it this way because this is how the existing code worked.

1970 ↗(On Diff #60033)

ok.

2027 ↗(On Diff #60033)

I'll try doing that when I change vtruncbuf's lock assertion, too.

2070 ↗(On Diff #60033)

Can't, because vtruncbuf passes INT64_MAX to mean "invalidate everything after startlbn.

sys/kern/vfs_subr.c
2048 ↗(On Diff #60062)

No, I do not see any link between what you explain above and what do you assert. You are enforcing some state of the delayed writes which you, as a caller of v_inval_buf_range(), do not have any control. Or let me put it other way, the v_inval_buf_range() cannot be used unless the caller flushed dirty buffers not satisfying the assert you write, but this is typically not under the writer control.

2070 ↗(On Diff #60033)

You talked about using zero as 'till the end' indicator ?

asomers added inline comments.
sys/kern/vfs_subr.c
2048 ↗(On Diff #60062)

Yes, that's exactly how fusefs uses this method. Just before calling v_inval_buf_range, it flushes any conflicting buffers with bwrite.

2070 ↗(On Diff #60033)

I talked about it, then realized that there would be no point since endlbn must always be less than INT64_MAX.

sys/kern/vfs_subr.c
2048 ↗(On Diff #60062)

Then this is something fusefs specific and I do not see it as appropriate for the vfs_subr.c where generally useful functions should be exported.

2070 ↗(On Diff #60033)

So I do not see why not:

  1. use zero as the end indicator
  2. assert that no buffer found has the blno INT64_MAX.
asomers added inline comments.
sys/kern/vfs_subr.c
2048 ↗(On Diff #60062)

Ad yet, I see it as a layering violation to embed such intimate knowledge of the buffer cache into fusefs. What would you say if I changed the interface to accept whole buffer indexes only, instead of byte ranges? That would be less consistent with vtruncbuf's interface, but it would eliminate any possibility of the caller unintentionally invalidating more than he specified.

2070 ↗(On Diff #60033)
  1. More work to check the zero flag at runtime
  2. That check can't be done in v_inval_buf_range1 because vtruncbuf passes INT64_MAX . And I feel that it would be kind of tautological to do the check in v_inval_buf_range, because the calculation for endlbn is so simple.
sys/kern/vfs_subr.c
2048 ↗(On Diff #60062)

If you pass block numbers (I suppose this is what you mean by buffer indexes) to the function, how would it be inconsistent with vtruncbuf ? The later invalidates everything.

Whatever route you take, I believe that the #invariants loop is incompatible with vfs_subr.c. If you eliminate it, we can refine the interface.

2070 ↗(On Diff #60033)

I believe this INT64_MAX thing is a mine for later.

sys/kern/vfs_subr.c
2048 ↗(On Diff #60062)

vtruncbuf invalidates everything on the high side. But on the low side it takes an off_t argument.

v_inval_buf_range: take arguments by block number instead of byte offset

I am fine with this version. Please note several minor style suggestions.

sys/kern/vfs_subr.c
1976 ↗(On Diff #60176)

';' should be on the next line, indented by a tab.

2025 ↗(On Diff #60176)

There is no much point in startp and endp single-use variables.

2032 ↗(On Diff #60176)

';' on the next line.

This revision is now accepted and ready to land.Jul 27 2019, 1:59 PM
This revision was automatically updated to reflect the committed changes.
asomers marked 3 inline comments as done.