Details

Reviewers

kib
asomers

Commits

rS351106: Fix copy_file_range(2) so that unneeded blocks are not allocated to the output…

Summary

When the byte range for copy_file_range(2) doesn't go to EOF on the
output file and there is a hole in the input file, a hole must be
"punched" in the output file. This is done by writing a block of bytes
all set to 0.
Without this patch, the write is done unconditionally which meant that,
if the output file already had a hole in that byte range, a unneeded data block
of all 0 bytes would be allocated.
This patch adds code to check for a hole in the output file, so it can
skip doing the write if there is already a hole in that byte range of
the output file. This avoids unnecessary allocation of blocks to the
output file.

Test Plan

Running the same tests I've already developed for copy_file_range(2) and
checking block allocations for the input and output files via "ls -ls".

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

rmacklem created this revision.Aug 4 2019, 1:07 AM

Herald added a subscriber: imp. · View Herald TranscriptAug 4 2019, 1:07 AM

Harbormaster completed remote builds in B25668: Diff 60435.Aug 4 2019, 1:07 AM

kib added inline comments.Aug 4 2019, 12:28 PM

kern/vfs_vnops.c
2799 ↗	(On Diff #60435)	Am I right that the rangelock guarantees that the hole is not instantiated into real content by other thread ? What about calls from NFS server, where range lock is not taken ?
2809 ↗	(On Diff #60435)	Should this be KASSERT instead ?

Replied to kib@'s comments inline.

kern/vfs_vnops.c
2799 ↗	(On Diff #60435)	Yes. And since the NFSv4.2 server takes the rangelocks as well, the NFSv4.2 Copy will be ok as well. Slightly off topic, but I have been thinking that the NFS server should take rangelocks for read/write as well,, but haven't coded it yet. Do you think it should?
2809 ↗	(On Diff #60435)	Good point. I was thinking that it didn't need to panic(), since copy_file_range(2) still works for this failure case. However, since I put it in INVARIANTS, it won't be used for production kernels and a panic() will get people's attention. Do you think a KASSERT()/panic() is better?

This version of the patch has the INVARIANTS printf() replaced by a KASSERT().
It also has KASSERT()s added for the results of FIOSEEKDATA/FIOSEEKHOLE.

The main change is that it now does a sanity check for dataoff == holeoff,
which I believe could happen if another thread were to do a truncate()
followed by a write() between the two VOP_IOCTL() calls.
(As noted in the comment for it.)

Harbormaster completed remote builds in B25678: Diff 60448.Aug 4 2019, 4:32 PM

kib added inline comments.Aug 4 2019, 8:19 PM

kern/vfs_vnops.c
2799 ↗	(On Diff #60435)	Rangelocks only make sense when the locked ranges correspond to the user-initiated reads and writes. I have no idea about NFSv4.x, but for plain old v3 server I am sure that NFS client' initiated RPCs for reads and writes typically do not match user operations. Instead, they are driven by the buffer caching and clustering. In other words, for NFS server serving the same file to several clients, rangelocking is probably useless. Rangelocking on NFS server might be somewhat useful when the remotely accessed file is also accessed by the local userspace. In this case, I think, local operations would see more consistent results.

Replied to inline comment.

kern/vfs_vnops.c
2799 ↗	(On Diff #60435)	Yes, agreed with all of that. The reason I was thinking of adding it is that the NFSv4.2 Copy operation will loop on read/write locking/unlocking the vnodes, so a rangelock ensures that the entire range gets copied without corruption from another Copy or local write. However, a concurrent NFS write would corrupt the result unless the NFS write does a rangelock as well. (Doing them for read/write in the NFS server is a bit of a bother, since it requires the vnode be unlocked/relocked, but I don't think that's a big deal.) I added the "dataoff == holeoff" check to try and at least make sure no crashes can occur and the worst case is a file that is a mix of Copy data and Write data. (I'll check the vn_copy_file_range() code more carefully, to see if I can find any other cases where this might cause a crash/loop.)

This version of the patch adds an additional sanity check for
the results from VOP_IOCTL(). This change can only have an effect
for a non-INVARIANTS kernel where one of the VOP_IOCTL()s
return a bogus offset.
It ensures a reasonable positive value for xfer2 is returned for this case.

Harbormaster completed remote builds in B25682: Diff 60455.Aug 4 2019, 10:10 PM

Only change is a modified comment explaining how FIOSEEKHOLE might
find the same offset as FIOSEEKDATA already has returned for the file.

Harbormaster completed remote builds in B25758: Diff 60603.Aug 8 2019, 10:56 PM

kib accepted this revision.Aug 15 2019, 3:47 PM

This revision is now accepted and ready to land.Aug 15 2019, 3:47 PM

Closed by commit rS351106: Fix copy_file_range(2) so that unneeded blocks are not allocated to the output… (authored by rmacklem). · Explain WhyAug 15 2019, 11:21 PM

This revision was automatically updated to reflect the committed changes.

rmacklem added a commit: rS351106: Fix copy_file_range(2) so that unneeded blocks are not allocated to the output….

fix copy_file_range(2) so that it doesn't allocate unneeded data blocks of all 0s in the output file
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 60858

head/sys/kern/vfs_vnops.c

fix copy_file_range(2) so that it doesn't allocate unneeded data blocks of all 0s in the output fileClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 60858

head/sys/kern/vfs_vnops.c

fix copy_file_range(2) so that it doesn't allocate unneeded data blocks of all 0s in the output file
ClosedPublic
Actions

Revision Contents
Changeset List