Page MenuHomeFreeBSD

Call VOP_COPY_FILE_RANGE() for the same file system type.
Needs RevisionPublic

Authored by pjd on Feb 27 2023, 10:12 PM.
Tags
None
Referenced Files
Unknown Object (File)
Nov 16 2024, 8:06 PM
Unknown Object (File)
Nov 8 2024, 1:48 PM
Unknown Object (File)
Sep 29 2024, 8:21 PM
Unknown Object (File)
Sep 24 2024, 1:49 AM
Unknown Object (File)
Sep 21 2024, 7:25 AM
Unknown Object (File)
Sep 15 2024, 10:40 PM
Unknown Object (File)
Sep 8 2024, 10:52 AM
Unknown Object (File)
Sep 5 2024, 6:04 AM

Details

Reviewers
mjg
asomers
Summary

Before this change file system specific copy_file_range method was only
called when both the source and the destination files reside on the same
file system.

Relax that requirement and call file system specific copy_file_range when
both files are on the same file system _type_, even if they are under
separate mount points.

Simplify some code by not calling vn_generic_copy_file_range() internally
and just return EOPNOTSUPP that would tell vn_copy_file_range() to call
vn_generic_copy_file_range() in response to that error.

Call the vn_generic_copy_file_range() also on the EXDEV error.

Signed-off-by: Pawel Jakub Dawidek <pjd@FreeBSD.org>

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 50050
Build 46942: arc lint + arc unit

Event Timeline

pjd requested review of this revision.Feb 27 2023, 10:12 PM

LGTM. But is there any file system that you expect to support cross-mountpoint copy_file_range ?

share/man/man9/VOP_COPY_FILE_RANGE.9
143
146
This revision is now accepted and ready to land.Feb 27 2023, 10:24 PM

LGTM. But is there any file system that you expect to support cross-mountpoint copy_file_range ?

Yes, it is coming to ZFS: https://github.com/openzfs/zfs/pull/13392

As mjg@ mentioned in the other pull request, nullfs might also be a good candidate:
https://reviews.freebsd.org/D38803

I think this is a step backwards and it is not what I suggested.

For example now any fs which does not provide it's own primitive will take a trip through returning EOPNOTSUPP when it can be avoided.

I suggested everyone handling as much as they can and otherwise punting to vn_generic_copy_file_range or a differently named fallback with the same result.

To further illustrate the win in nullfs case, consider the following with 2 nullfs mount points:

  1. we land in null_copy_file_range or whatever
  2. both target vnodes get "unwrapped"
  3. the vop gets invoked in vnode1 with both as arguments and we are done here

With the proposed patch it will be instead:

  1. we land in null_copy_file_range or whatever
  2. both target vnodes get found + refed
  3. the vop returns EOPNOTSUPP
  4. the routine unrefs vnodes and returns
  5. ... now vn_generic_copy_file_range will keep calling VOP_READ and VOP_WRITE while looking up + refing (and unrefing) vnodes for each call

iow what this patch should do instead is add some form of

if (src->v_mount != dst->v_mount) { goto fallback; }

... to cases which can't handle it

where fallback calls vn_generic_copy_file_range, which the above routines are already doing

mjg requested changes to this revision.Mar 1 2023, 1:07 PM
This revision now requires changes to proceed.Mar 1 2023, 1:07 PM
In D38814#884116, @mjg wrote:

I think this is a step backwards and it is not what I suggested.

For example now any fs which does not provide it's own primitive will take a trip through returning EOPNOTSUPP when it can be avoided.

I suggested everyone handling as much as they can and otherwise punting to vn_generic_copy_file_range or a differently named fallback with the same result.

To further illustrate the win in nullfs case, consider the following with 2 nullfs mount points:

  1. we land in null_copy_file_range or whatever
  2. both target vnodes get "unwrapped"
  3. the vop gets invoked in vnode1 with both as arguments and we are done here

With the proposed patch it will be instead:

  1. we land in null_copy_file_range or whatever
  2. both target vnodes get found + refed
  3. the vop returns EOPNOTSUPP
  4. the routine unrefs vnodes and returns
  5. ... now vn_generic_copy_file_range will keep calling VOP_READ and VOP_WRITE while looking up + refing (and unrefing) vnodes for each call

Where exactly are the vnodes referenced and unreferenced? From what I can tell VOP_COPY_FILE_RANGE() doesn't do that.
This proposal simplifies the code by not requiring to call vn_generic_copy_file_range(). I cannot handle the operation? I simply return an error so that upper layers can deal with it.
I think it is cleaner and I don't think there is performance penalty. From what I can tell there is no extra locking or extra ref/unref, but maybe I'm wrong.

In the fallback (vn_generic_copy_file_range) there is:

error = vn_rdwr(UIO_READ, invp, dat, xfer,
    startoff, UIO_SYSSPACE, IO_NODELOCKED,
    curthread->td_ucred, incred, &aresid,
    curthread);

and a call to vn_write_outvp doing:

error = vn_rdwr(UIO_WRITE, outvp, dat, xfer2,
    outoff, UIO_SYSSPACE, IO_NODELOCKED,
    curthread->td_ucred, cred, NULL, curthread);

This in turns results in VOP_READ or VOP_WRITE respectively. For nullfs this means a call to null_bypass, which is most pessimal (also see vhold/vdrop in said routine). This is 2 calls for *each* r/w cycle. With my proposal there will be 2 calls in total -- nullfs will find both lower vnodes and invoke copy_file_range on the first one. If it can do cross copy (e.g., both vnodes are on the same zfs pool) we are golden as is. If they are not and the fallback is needed, the fallback can keep utilizing already-found vnodes without repeated null_bypass calls.

If the API contract demands you bail with EOPNOTSUPP if you can't do optimized copy, then nullfs can't utilize the above. In comparison, if the API contract is you swallow both vnodes and if you don't know what to do with them, you call vn_generic_copy_file_range or similar, it is all taken care of.

Arguably nullfs should be fixed to whack the overhad, but what really needs to be done here is not have nullfs in the current form to begin with.