Page MenuHomeFreeBSD

Modify vn_generic_copy_file_range() so that it will return after approximately 1second
AbandonedPublic

Authored by rmacklem on Mon, Sep 28, 1:30 AM.

Details

Reviewers
asomers
kib
Summary

Without this patch, vn_generic_copy_file_range() can loop for an
extended period of time when a large copy range is specified.
The problem with this is mainly response time to signal
delivery for the application that did the copy_file_range(2)
call.

This patch makes vn_generic_copy_file_range() return after
approximately 1second.
A discussion on freebsd-hackers@ seemed to indicate that
1second was a reasonable limit for response time to a
termination signal.

Applying the 1second limit to file system specific
VOP_COPY_FILE_RANGE() calls such as NFSv4.2 will
be up to the file system specific VOP call.

Test Plan

Tested by running a large copy and posting a
SIGTERM to the program and timing how long
it took to terminate.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 33894

Event Timeline

It should not return after any specific period, instead it should check for yield and pending signals every n iterations.

Ok, so I have no idea how to do that.
Any suggestions?

if (SIGPENDING(td))
     return (EINTR);
maybe_yield();

or so

In D26571#591841, @mjg wrote:
if (SIGPENDING(td))
     return (EINTR);
maybe_yield();

or so

The generic principle of checking signals and suspensions is right, but the implementation is not. And of course, if there was any data moved, syscall must not return EINTR but success and indicate the amount of data processed.

I extracted the code to check for conditions, this probably would go into self-contained function somewhere in kern_sig.c and reused in subr_sleepqueue.c:

bool
sig_intr(struct thread *td)
{
	struct proc *p;
	struct sigacts *ps;
	int ret, sig;

	if ((td->td_flags & (TDF_NEEDSIGCHK | TDF_NEEDSUSPCHK)) == 0)
		return (0);

	p = td->td_proc;
	PROC_LOCK(p);

	if ((td->td_flags & TDF_NEEDSUSPCHK) != 0) {
		ret = thread_suspend_check(1);
		MPASS(ret == 0 || ret == EINTR || ret == ERESTART);
		if (ret != 0)
			goto out;
	}

	ret = 0;
	if ((td->td_flags & TDF_NEEDSIGCHK) != 0) {
		ps = p->p_sigacts;
		mtx_lock(&ps->ps_mtx);
		sig = cursig(td);
		if (sig == -1) {
			mtx_unlock(&ps->ps_mtx);
			KASSERT((td->td_flags & TDF_SBDRY) != 0,
			    ("lost TDF_SBDRY"));
			KASSERT(TD_SBDRY_INTR(td),
			    ("lost TDF_SERESTART of TDF_SEINTR"));
			KASSERT((td->td_flags & (TDF_SEINTR | TDF_SERESTART)) !=
			    (TDF_SEINTR | TDF_SERESTART),
			    ("both TDF_SEINTR and TDF_SERESTART"));
			ret = TD_SBDRY_ERRNO(td);
		} else if (sig != 0) {
			ret = SIGISMEMBER(ps->ps_sigintr, sig) ? EINTR :
			    ERESTART;
			mtx_unlock(&ps->ps_mtx);
		} else {
			mtx_unlock(&ps->ps_mtx);
		}

		if ((td->td_dbgflags & TDB_FSTP) != 0) {
			if (ret == 0)
				ret = EINTR;
			td->td_dbgflags &= ~TDB_FSTP;
		}
	}
out:
	PROC_UNLOCK(p);
	return (ret);
}

I did not even compiled this for now, but will work on committing this helper after you are fine with the whole patch.

asomers requested changes to this revision.Mon, Sep 28, 4:38 AM

One problem I see is that on a slow and heavily laden system the timeout might be reached before any data has been copied. The caller would wrongly interpret that as EOF. So no matter how long it's been, you shouldn't return before at least one byte of data has been copied.

sys/kern/vfs_vnops.c.sav2
3019

Why don't you define curtime and stoptime in terms of nanoseconds instead of microseconds? That would save some division instructions.

3102

Since this 1-second timeout is only a goal, not a hard limit, I suggest removing the check here and just relying on the inner check.

This revision now requires changes to proceed.Mon, Sep 28, 4:38 AM

The problem with using the "check for signals" approach
is that it won't work for the NFS server.
--> The NFS server will still need the 1sec timeout.
The VOP call can probably get away with using a
kernel only "flags" bit to indicate whether it should
use "1sec" vs "signal pending", but I'm not sure
if such semantics is overkill (the overhead of doing
1 syscall/sec isn't particularly high for a copy
like this.
Do others think this extra complexity is worth it?

can you elaborate on the nfs server problem? for example, if the concern is that it will tie down a thread for an extended amount of time, the 1 second timeout is already orders of magnitude too big. I know nfs likes to postpone signals in some cases, but all the copying here should be above any of that code.

Although there is no explicit statement in the NFS
RFCs indicating an upper limit on an RPC response
time, the assumption is that the server will reply
in a "reasonable time".
--> I think 1second is a reasonable time limit,

although some might argue it should be less
than that.

--> In NFSv4.2, a client can optionally specify the

  Copy operation be done "asynchronously",
  where the server replies to the Copy RPC when
  it is started and then does a server->client
  callback to notify completion.
  I have not implemented that, since I believe
  that the Linux server only does this for Copies
  between servers, which FreeBSD does not
  know how to do.
  Without the "asynchronous" option
, the Copy time needs to be limited
  to ensure a "timely" RPC reply.

Made the changes suggested by asomers@

Updated the inline comments.

I did move the setting of curtime to the two places within
the loops that follows code that has done I/O.

sys/kern/vfs_vnops.c.sav2
3020

I, personally, do not think that a micro-optimization
like saving a "divide instruction" is relevant to code
that is not executed many times each second.

However, I agree that it makes the code simpler to
read and since overflow won't occur for something
like 544years, I think the change is appropriate.

3103

It would simply loop here without the check.
However, I do agree that updating curtime here
was unnecessary and it is now just initialized to 0.

I'll assume that is what you meant.

Although there is no explicit statement in the NFS
RFCs indicating an upper limit on an RPC response
time, the assumption is that the server will reply
in a "reasonable time".
--> I think 1second is a reasonable time limit,

although some might argue it should be less
than that.

--> In NFSv4.2, a client can optionally specify the

  Copy operation be done "asynchronously",
  where the server replies to the Copy RPC when
  it is started and then does a server->client
  callback to notify completion.
  I have not implemented that, since I believe
  that the Linux server only does this for Copies
  between servers, which FreeBSD does not
  know how to do.
  Without the "asynchronous" option
, the Copy time needs to be limited
  to ensure a "timely" RPC reply.

But I still do not understand why 1sec timeout is in any way more useful for NFS server. I would agree for NFS server to limit accepted length instead of randomly aborting RPC. Esp. if this makes it impossible to report partially done operation (I am not sure about this part).

For local operation, timeout should not be used at all. As mjg rightfully noted, signal check is the right way to go. If inclined, userspace might arm SIGALRM at the desired timeout.

I think I agree with kib about timeout vs signal check.

sys/kern/vfs_vnops.c.sav2
3103

Yes

3243

I think the latest version fixes the risk of returning EOF due to timer expiry. But I would still be tempted to add an assertion to guarantee it.

Well, when I looked at a wireshark trace of copying a 1Gbyte file,
it turns out that it is the Commit RPC (the one that does VOP_FSYNC())
that takes the time and not the Copy RPC, so using time on the
Copy RPC is useless.

For the rather slow laptop NFS server over UFS, the time taken to
do a Commit appears to be roughly linear w.r.t. the size of the
Copy that preceeded it. Roughly the times are:
0.1sec for 16Mbytes
0.2sec for 32Mbytes
1sec for 128Mbytes
so the NFS server needs to clip the Copy size and not worry about
the time the Copy takes (typically 0.01sec or less).

I will come up with an initial patch for vn_generic_copy_file_range()
to check for/return when a signal has been posted, after doing
some copying, so that it always returns a non-zero copied length
unless at EOF.

Thanks everyone for looking at this, rick
ps: I'll put the pending signal patch under a new review.