Paths

Table of Contentst

Modify vn_generic_copy_file_range() so that it will return after approximately 1second
AbandonedPublic
Actions

Authored by rmacklem on Sep 28 2020, 1:30 AM.

Details

Reviewers

asomers
kib

Summary

Without this patch, vn_generic_copy_file_range() can loop for an
extended period of time when a large copy range is specified.
The problem with this is mainly response time to signal
delivery for the application that did the copy_file_range(2)
call.

This patch makes vn_generic_copy_file_range() return after
approximately 1second.
A discussion on freebsd-hackers@ seemed to indicate that
1second was a reasonable limit for response time to a
termination signal.

Applying the 1second limit to file system specific
VOP_COPY_FILE_RANGE() calls such as NFSv4.2 will
be up to the file system specific VOP call.

Test Plan

Tested by running a large copy and posting a
SIGTERM to the program and timing how long
it took to terminate.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 33894

Event Timeline

rmacklem created this revision.Sep 28 2020, 1:30 AM

Herald added a subscriber: imp. · View Herald TranscriptSep 28 2020, 1:30 AM

rmacklem requested review of this revision.Sep 28 2020, 1:30 AM

Harbormaster completed remote builds in B33846: Diff 77583.Sep 28 2020, 1:30 AM

It should not return after any specific period, instead it should check for yield and pending signals every n iterations.

Ok, so I have no idea how to do that.
Any suggestions?

if (SIGPENDING(td))
     return (EINTR);
maybe_yield();

or so

In D26571#591841, @mjg wrote:
if (SIGPENDING(td))
     return (EINTR);
maybe_yield();
or so

The generic principle of checking signals and suspensions is right, but the implementation is not. And of course, if there was any data moved, syscall must not return EINTR but success and indicate the amount of data processed.

I extracted the code to check for conditions, this probably would go into self-contained function somewhere in kern_sig.c and reused in subr_sleepqueue.c:

bool
sig_intr(struct thread *td)
{
	struct proc *p;
	struct sigacts *ps;
	int ret, sig;

	if ((td->td_flags & (TDF_NEEDSIGCHK | TDF_NEEDSUSPCHK)) == 0)
		return (0);

	p = td->td_proc;
	PROC_LOCK(p);

	if ((td->td_flags & TDF_NEEDSUSPCHK) != 0) {
		ret = thread_suspend_check(1);
		MPASS(ret == 0 || ret == EINTR || ret == ERESTART);
		if (ret != 0)
			goto out;
	}

	ret = 0;
	if ((td->td_flags & TDF_NEEDSIGCHK) != 0) {
		ps = p->p_sigacts;
		mtx_lock(&ps->ps_mtx);
		sig = cursig(td);
		if (sig == -1) {
			mtx_unlock(&ps->ps_mtx);
			KASSERT((td->td_flags & TDF_SBDRY) != 0,
			    ("lost TDF_SBDRY"));
			KASSERT(TD_SBDRY_INTR(td),
			    ("lost TDF_SERESTART of TDF_SEINTR"));
			KASSERT((td->td_flags & (TDF_SEINTR | TDF_SERESTART)) !=
			    (TDF_SEINTR | TDF_SERESTART),
			    ("both TDF_SEINTR and TDF_SERESTART"));
			ret = TD_SBDRY_ERRNO(td);
		} else if (sig != 0) {
			ret = SIGISMEMBER(ps->ps_sigintr, sig) ? EINTR :
			    ERESTART;
			mtx_unlock(&ps->ps_mtx);
		} else {
			mtx_unlock(&ps->ps_mtx);
		}

		if ((td->td_dbgflags & TDB_FSTP) != 0) {
			if (ret == 0)
				ret = EINTR;
			td->td_dbgflags &= ~TDB_FSTP;
		}
	}
out:
	PROC_UNLOCK(p);
	return (ret);
}

I did not even compiled this for now, but will work on committing this helper after you are fine with the whole patch.

One problem I see is that on a slow and heavily laden system the timeout might be reached before any data has been copied. The caller would wrongly interpret that as EOF. So no matter how long it's been, you shouldn't return before at least one byte of data has been copied.

sys/kern/vfs_vnops.c.sav2
3020	Why don't you define `curtime` and `stoptime` in terms of nanoseconds instead of microseconds? That would save some division instructions.
3103	Since this 1-second timeout is only a goal, not a hard limit, I suggest removing the check here and just relying on the inner check.

This revision now requires changes to proceed.Sep 28 2020, 4:38 AM

The problem with using the "check for signals" approach
is that it won't work for the NFS server.
--> The NFS server will still need the 1sec timeout.
The VOP call can probably get away with using a
kernel only "flags" bit to indicate whether it should
use "1sec" vs "signal pending", but I'm not sure
if such semantics is overkill (the overhead of doing
1 syscall/sec isn't particularly high for a copy
like this.
Do others think this extra complexity is worth it?

can you elaborate on the nfs server problem? for example, if the concern is that it will tie down a thread for an extended amount of time, the 1 second timeout is already orders of magnitude too big. I know nfs likes to postpone signals in some cases, but all the copying here should be above any of that code.

Although there is no explicit statement in the NFS
RFCs indicating an upper limit on an RPC response
time, the assumption is that the server will reply
in a "reasonable time".
--> I think 1second is a reasonable time limit,

although some might argue it should be less
than that.

--> In NFSv4.2, a client can optionally specify the

  Copy operation be done "asynchronously",
  where the server replies to the Copy RPC when
  it is started and then does a server->client
  callback to notify completion.
  I have not implemented that, since I believe
  that the Linux server only does this for Copies
  between servers, which FreeBSD does not
  know how to do.
  Without the "asynchronous" option
, the Copy time needs to be limited
  to ensure a "timely" RPC reply.

Made the changes suggested by asomers@

Harbormaster completed remote builds in B33894: Diff 77668.Sep 30 2020, 2:30 AM

Updated the inline comments.

I did move the setting of curtime to the two places within
the loops that follows code that has done I/O.

sys/kern/vfs_vnops.c.sav2
3020	I, personally, do not think that a micro-optimization like saving a "divide instruction" is relevant to code that is not executed many times each second. However, I agree that it makes the code simpler to read and since overflow won't occur for something like 544years, I think the change is appropriate.
3103	It would simply loop here without the check. However, I do agree that updating curtime here was unnecessary and it is now just initialized to 0. I'll assume that is what you meant.

In D26571#591978, @rmacklem wrote:
Although there is no explicit statement in the NFS
RFCs indicating an upper limit on an RPC response
time, the assumption is that the server will reply
in a "reasonable time".
--> I think 1second is a reasonable time limit,
although some might argue it should be less
than that.
--> In NFSv4.2, a client can optionally specify the
  Copy operation be done "asynchronously",
  where the server replies to the Copy RPC when
  it is started and then does a server->client
  callback to notify completion.
  I have not implemented that, since I believe
  that the Linux server only does this for Copies
  between servers, which FreeBSD does not
  know how to do.
  Without the "asynchronous" option
, the Copy time needs to be limited
  to ensure a "timely" RPC reply.

But I still do not understand why 1sec timeout is in any way more useful for NFS server. I would agree for NFS server to limit accepted length instead of randomly aborting RPC. Esp. if this makes it impossible to report partially done operation (I am not sure about this part).

For local operation, timeout should not be used at all. As mjg rightfully noted, signal check is the right way to go. If inclined, userspace might arm SIGALRM at the desired timeout.

I think I agree with kib about timeout vs signal check.

sys/kern/vfs_vnops.c.sav2
3103	Yes
3243	I think the latest version fixes the risk of returning EOF due to timer expiry. But I would still be tempted to add an assertion to guarantee it.

Well, when I looked at a wireshark trace of copying a 1Gbyte file,
it turns out that it is the Commit RPC (the one that does VOP_FSYNC())
that takes the time and not the Copy RPC, so using time on the
Copy RPC is useless.

For the rather slow laptop NFS server over UFS, the time taken to
do a Commit appears to be roughly linear w.r.t. the size of the
Copy that preceeded it. Roughly the times are:
0.1sec for 16Mbytes
0.2sec for 32Mbytes
1sec for 128Mbytes
so the NFS server needs to clip the Copy size and not worry about
the time the Copy takes (typically 0.01sec or less).

I will come up with an initial patch for vn_generic_copy_file_range()
to check for/return when a signal has been posted, after doing
some copying, so that it always returns a non-zero copied length
unless at EOF.

Thanks everyone for looking at this, rick
ps: I'll put the pending signal patch under a new review.

Revision Contents
Changeset List

Path

Size

sys/

kern/

	vfs_vnops.c.sav2
	vfs_vnops.c.sav

18 lines

Diff 77668

View Options

Modify vn_generic_copy_file_range() so that it will return after approximately 1secondAbandonedPublicActions