Page MenuHomeFreeBSD

fix copy_file_range(2) so that it doesn't allocate unneeded data blocks of all 0s in the output file
ClosedPublic

Authored by rmacklem on Aug 4 2019, 1:07 AM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 30 2023, 6:20 AM
Unknown Object (File)
Dec 22 2023, 11:08 PM
Unknown Object (File)
Nov 29 2023, 4:16 AM
Unknown Object (File)
Nov 7 2023, 3:37 AM
Unknown Object (File)
Aug 13 2023, 6:25 AM
Unknown Object (File)
Aug 2 2023, 6:17 PM
Unknown Object (File)
Jul 23 2023, 8:37 PM
Unknown Object (File)
May 10 2023, 3:03 PM
Subscribers

Details

Summary

When the byte range for copy_file_range(2) doesn't go to EOF on the
output file and there is a hole in the input file, a hole must be
"punched" in the output file. This is done by writing a block of bytes
all set to 0.
Without this patch, the write is done unconditionally which meant that,
if the output file already had a hole in that byte range, a unneeded data block
of all 0 bytes would be allocated.
This patch adds code to check for a hole in the output file, so it can
skip doing the write if there is already a hole in that byte range of
the output file. This avoids unnecessary allocation of blocks to the
output file.

Test Plan

Running the same tests I've already developed for copy_file_range(2) and
checking block allocations for the input and output files via "ls -ls".

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 25682

Event Timeline

kern/vfs_vnops.c
2815

Am I right that the rangelock guarantees that the hole is not instantiated into real content by other thread ? What about calls from NFS server, where range lock is not taken ?

2825

Should this be KASSERT instead ?

Replied to kib@'s comments inline.

kern/vfs_vnops.c
2815

Yes. And since the NFSv4.2 server takes the rangelocks as well, the NFSv4.2
Copy will be ok as well.

Slightly off topic, but I have been thinking that the NFS server should take
rangelocks for read/write as well,, but haven't coded it yet.
Do you think it should?

2825

Good point. I was thinking that it didn't need to panic(), since copy_file_range(2)
still works for this failure case.
However, since I put it in INVARIANTS, it won't be used for production kernels
and a panic() will get people's attention.

Do you think a KASSERT()/panic() is better?

rmacklem marked 2 inline comments as done.

This version of the patch has the INVARIANTS printf() replaced by a KASSERT().
It also has KASSERT()s added for the results of FIOSEEKDATA/FIOSEEKHOLE.

The main change is that it now does a sanity check for dataoff == holeoff,
which I believe could happen if another thread were to do a truncate()
followed by a write() between the two VOP_IOCTL() calls.
(As noted in the comment for it.)

kern/vfs_vnops.c
2815

Rangelocks only make sense when the locked ranges correspond to the user-initiated reads and writes. I have no idea about NFSv4.x, but for plain old v3 server I am sure that NFS client' initiated RPCs for reads and writes typically do not match user operations. Instead, they are driven by the buffer caching and clustering. In other words, for NFS server serving the same file to several clients, rangelocking is probably useless.

Rangelocking on NFS server might be somewhat useful when the remotely accessed file is also accessed by the local userspace. In this case, I think, local operations would see more consistent results.

Replied to inline comment.

kern/vfs_vnops.c
2815

Yes, agreed with all of that.

The reason I was thinking of adding it is that the NFSv4.2 Copy operation will loop on
read/write locking/unlocking the vnodes, so a rangelock ensures that the entire
range gets copied without corruption from another Copy or local write.
However, a concurrent NFS write would corrupt the result unless the NFS write
does a rangelock as well. (Doing them for read/write in the NFS server is a bit of a
bother, since it requires the vnode be unlocked/relocked, but I don't think that's a
big deal.)

I added the "dataoff == holeoff" check to try and at least make sure no crashes
can occur and the worst case is a file that is a mix of Copy data and Write data.
(I'll check the vn_copy_file_range() code more carefully, to see if I can find any other
cases where this might cause a crash/loop.)

This version of the patch adds an additional sanity check for
the results from VOP_IOCTL(). This change can only have an effect
for a non-INVARIANTS kernel where one of the VOP_IOCTL()s
return a bogus offset.
It ensures a reasonable positive value for xfer2 is returned for this case.

Only change is a modified comment explaining how FIOSEEKHOLE might
find the same offset as FIOSEEKDATA already has returned for the file.

This revision is now accepted and ready to land.Aug 15 2019, 3:47 PM