Page MenuHomeFreeBSD

Cross-mountpoint copy_file_range() for ZFS
ClosedPublic

Authored by mm on Sep 4 2023, 8:15 PM.
Tags
Referenced Files
Unknown Object (File)
Sat, Jun 15, 6:25 PM
Unknown Object (File)
Apr 30 2024, 11:44 PM
Unknown Object (File)
Apr 30 2024, 11:27 PM
Unknown Object (File)
Apr 30 2024, 11:27 PM
Unknown Object (File)
Apr 30 2024, 11:26 PM
Unknown Object (File)
Apr 30 2024, 11:26 PM
Unknown Object (File)
Apr 30 2024, 8:46 PM
Unknown Object (File)
Jan 7 2024, 1:24 PM
Subscribers

Details

Summary

This patch enables the use of copy_file_range beyond mountpoints in ZFS case as ZFS is capable to do block cloning on all mountpoints (including snapshots) that share the same pool.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

mm requested review of this revision.Sep 4 2023, 8:15 PM
mm created this revision.
sys/kern/vfs_vnops.c
3087

This is a layering violation, IMO it's not a good way to implement this.

Instead, either some flag on the mountpoint should indicate that a cross-MP copy_file_range is allowed, or the entire check should be pushed into the VOP_COPY_FILE_RANGE implementation of each FS with a custom implementation (there are only three) and simply omitted for ZFS.

mm edited the summary of this revision. (Show Details)

I have updated the diff to use a kernel mount flag.

As noted, I think a check for "same file system type" is needed
before a call to VOP_COPY_FILE_RANGE().

sys/kern/vfs_vnops.c
3087

Actually, I think you have to use the original checks
for file system type "zfs" if you do it this way.
What happens if the two vnodes are for two different
file system types that both implement MNTK_COPYFR_MULTIFS.

Basically, doing the VOP_COPY_FILE_RANGE must at least
make sure both vnodes are the same file system type,
because the call is into that one file system type's code.

Then something needs to decide if that file system type can
handle cross mount copies.
I think this decision is best left to the specific VOP_COPY_FILE_RANGE().
Right now, only NFS and ZFS implement VOP_COPY_FILE_RANGE(), so
adding the checks to both shouldn't be a lot of work.
For example, NFSv4.2 could do a cross mount copy if both mounts are
NFSv4.2. (The case where the two mounts are to different servers is messy
and I might not implement it. It is not implemented at this time. However,
the case of two mounts to the same NFSv4.2 server would work now, if
nfs_copy_file_range() is done.

Maybe I didn't make the NFS case clear.
For two different NFS mounts it can do the Copy
without reads/writes for some situations.
(Both mounts NFSv4.2 and either same server
or maybe different servers.)

As such, NFS does not know whether or not to set
a flag like MNTK_COPYFR_MULTIFS.

There are two smple approaches I can think of.
1 - The file system's VOP_COPY_FILE_RANGE() returns
an error if it cannot do the cross-mount copy easily
and then the code falls back on calling vn_generic_copy_file_range().
(Basically the "else" becomes a separate check for some error
being returned by VOP_COPY_FILE_RANGE().
Or
2 - The VOP_COPY_FILE_RANGE() calls vn_generic_copy_file_range()
when it cannot easily do it in a fs specific way.

I'd lean towards 2 and just changing the
if (invp->v_mount == outvp->v_mount)
to something like
if (strcmp(invp->v_mount->mnt_vfc->vfc_name, outvp->v_mount->mnt_vfc->vfc_name) == 0)

I have changed the code to the way rmacklem@ recommended. In fusefs we don't have to change anything as there is already a mountpoint check at the very beginning. In nfsclient I have added the mountpoint check. ZFS can handle this change without modifications.

I'll added some inline comments, but they are only suggestions.
I think this minor semantics change for VOP_COPY_FILE_RANGE()
is ok, but we'll see if others disagree.

Since it is technically a VOP KABI change, it would be nice if it
makes it in 14.0.

Thanks for doing this, rick. I thought about doing it so that the
NFS cases could be handled (which I can do later).

share/man/man9/VOP_COPY_FILE_RANGE.9
31–32

Maybe "or within one file" should be moved to the end,
since "multiple file systems" can only apply to "one file to another",
or something like that.

If you leave the man page update as a separate review, the
manpages group can review it and help come up with the right
words. (I'm terrible at man pages.;-)

sys/fs/nfsclient/nfs_clvnops.c
3912

Why not just add "|| invp->v_mount != outvp->v_mount"
to "if (invp == outvp)" instead of a separate "if" and goto?

I can fix up NFS later, since there are cases where the copy can
be done between multiple mounts.

sys/kern/vfs_vnops.c
3087

Yea, this seems ok to me. It is kinda weird to check for
same mount point and then same type, since same mount
point is obviously same type, but I suppose it is a minor
optimization to avoid the strcmp().

I'm fine with it the way it is or without the invp->v_mount == outvp->v_mount.
(If I was being really nitpicky, I don't think the extra "()" are needed and might
violate style(9) in a minor way, but I don't really care.

This revision is now accepted and ready to land.Sep 6 2023, 12:10 AM