Page MenuHomeFreeBSD

add a nfs_deallocate VOP
ClosedPublic

Authored by rmacklem on Aug 22 2021, 10:58 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Sep 10, 1:27 AM
Unknown Object (File)
Sun, Sep 8, 4:41 AM
Unknown Object (File)
Fri, Sep 6, 7:15 PM
Unknown Object (File)
Fri, Aug 30, 9:25 AM
Unknown Object (File)
Fri, Aug 30, 8:02 AM
Unknown Object (File)
Fri, Aug 30, 7:57 AM
Unknown Object (File)
Sat, Aug 24, 8:33 AM
Unknown Object (File)
Wed, Aug 21, 8:52 PM
Subscribers
None

Details

Summary

This patch adds a VOP_DEALLOCATE() to the NFS client.
For NFSv4.2 servers that support the Deallocate operation,
it is used. Otherwise, it falls back on calling
vop_stddeallocate(), which is defined as a non-static global
function.

It calls vinvalbuf() via ncl_vinvalbuf() and vnode_pager_purge_range()
before doing the deallocate, to ensure that stale cached file
data purged.

It returns EFBIG when the rqst->r_offset + rqst->r_len > maxfilesize.
This error needs to be added to VOP_DEALLOCATE.9 in a future commit.

Test Plan

It has been tested against both a patched FreeBSD and Linux
NFSv4.2 servers. It was also tested for NFSv3 and NFSv4.1
mounts to exercise the fallback to vop_stddeallocate().

The main chunk of code for review is nfs_deallocate(),
round in sys/fs/nfsclient/nfs_clvnops.c.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

rmacklem created this revision.

It returns EFBIG when the rqst->r_offset + rqst->r_len > maxfilesize.

In my opinion this eliminates a use case that specifying offset to be 0 and length to be OFF_MAX turns the whole file sparse.

Modify nfs_deallocate() so that it returns EFBIG
if rqst->r_offset >= maxfilesize.
If rqst->r_offset < maxfilesize and rqst->r_offset + rqst->r_len > maxfilesize

rqst->r_len = maxfilesize - rqst->r_offset

This should avoid the EFBIG error return from the NFSv4.2 servers,
such as the Linux one, for the case where the request is

rqst->r_offset == 0
rqst->r_len == OFF_MAX

as requested by khng@.

Hmm, I think that both Ka Ho' and mine opinions now is that deallocate should stop at EOF, and report r_offset == EOF, len == 0.

So for instance, if somebody wants to deallocate everything except the first block, he would do deallocate(offset = 512, len = OFF_MAX - 512) and get the result 0, with r_offset = EOF, len = 0.

sys/kern/vfs_default.c
1128

Please commit this in advance.

The current patch should exhibit that behaviour. In nfs_deallocate()
line#3713 - will set len = mafilesize - offset (which avoids the EFBIG error reply)

  • the RPC will zero from offset (512bytes for the example) to EOF and reply NFS_OK

line#3729 - will set r_offset = file_size (EOF)
line#3733 - will set r_len = 0

The only case where behaviour will be slightly different than vop_stddeallocate()
is when rqst->r_offset >= mafilesize
--> This results in an EFBIG error reply, because NFSv4.2 servers such as Linux will

generate the same error for that case.

Is it possible to simply turn offset >= maxfilesize into a no-op and just return the EOF for client? The interface's error case when offset + length >= some_random_value seems more consistent in this manner.

Not directly related to this, maxfilesize should be something that can be reflected through statfs or other means, so that callers need not to guess what the file size limit is in the dark.

In D31640#713956, @khng wrote:

Is it possible to simply turn offset >= maxfilesize into a no-op and just return the EOF for client? The interface's error case when offset + length >= some_random_value seems more consistent in this manner.

I agree with this request.

Not directly related to this, maxfilesize should be something that can be reflected through statfs or other means, so that callers need not to guess what the file size limit is in the dark.

I agree with this on principle, but I think it should become a pathconf value, statfs() is more about mount itself than about the volume properties.

I could make r_offset >= maxfilesize a no-op.
The problem is that the client will only have a (possibly stale)
va.va_size for the file to use to set the reply r_offset.
--> To be honest, it is always a bit bogus, since another

client can change the file size at any time.

If just guessing at what va_size is to set r_offset to is fine with
you guys, I can do that.

I think the Linux fallocate(2) returns EFBIG for the cases where
r_offset or r_offset + r_len > maxfilesize, so this will make the interface
less Linux compatible.

Btw, yes, it would be nice if code above the VOP/VFS could
get maxfilesize.. The NFS server currently just makes a conservative
guess (what I once thought was the minimum for FFS2 with a small blocksize).

I could make r_offset >= maxfilesize a no-op.
The problem is that the client will only have a (possibly stale)
va.va_size for the file to use to set the reply r_offset.
--> To be honest, it is always a bit bogus, since another

client can change the file size at any time.

If just guessing at what va_size is to set r_offset to is fine with
you guys, I can do that.

I am fine with this under such condition.

This version has nfs_deallocate() modified so that it
will return r_len = 0, r_offset = <best guess of file size>
and no error when rqst->r_offset >= maxfilesize.

This change was requested by khng@ and kib@.

sys/fs/nfsclient/nfs_clvnops.c
3712 ↗(On Diff #94155)

starting from 1400032 the new r_offset semantic should make this case easier.

3731–3735 ↗(On Diff #94155)

starting from 1400032 the new r_offset semantic should make this case easier. Under the new semantic rmsr.r_offset equals rqsr.r_offset + number of bytes zeroed before EOF. So this does not require returning file size at all.

sys/fs/nfsclient/nfs_clvnops.c
3712 ↗(On Diff #94155)

Ok, just to make this clear, you and kib@ have now
decided that the correct setting for case where
rqsr->r_offset > file size (past EOF) is to just
leave rmsr->r_offset there and not set it to file size (EOF)?

sys/fs/nfsclient/nfs_clvnops.c
3712 ↗(On Diff #94155)

Yes.

Modify the calculation of rmsr->r_offset so
that it is incremented by the number of bytes zeroed out,
as requested by khng@.

This revision is now accepted and ready to land.Aug 27 2021, 10:45 PM