Details

Reviewers

dfr
rmacklem
mav

Group Reviewers

network

Commits

rS290203: Long-overdue MFC of r280930:
rS280930: Fix overflow bugs in and remove obsolete limit from kernel RPC

Summary

The kernel RPC code, which we inherited in part from Sun and NetBSD, contains a hard-coded limit that prevents more than 45 MB of RPC requests from being outstanding at any one time. Modern machines can handle far more than this. Because WRITE RPCs are large and all other RPCs are small, this limit particularly impacts write performance, but once the throttle is engaged -- a single fast client can cause this -- all RPCs from all clients are left to rot in the incoming socket buffers, causing intolerable performance even for small requests like LOOKUP or GETATTR. (This means that when a server is throttled, even mounting the filesystem often fails.) The original code had several integer overflow bugs which were particularly apparent once the old 45 MB limit was removed.

Test Plan

Will shortly be running this code on a production fileserver.

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

wollman updated this revision to Diff 4481.Mar 28 2015, 8:02 PM

wollman retitled this revision from to Remove antique hard-coded limit in kernel RPC request throttling code.

wollman updated this object.

wollman edited the test plan for this revision. (Show Details)

wollman added reviewers: network, rmacklem, dfr, mav.

wollman set the repository for this revision to rS FreeBSD src repository - subversion.

wollman updated this object.

Sounds good to me.

This revision is now accepted and ready to land.Mar 28 2015, 8:29 PM

This simple change actually failed even worse than the original code. I suspect signedness problems surfaced by the switch from gcc 4.2 to clang (we just upgraded from 9.3 to 10.1 on our NFS srevers), and I'm preparing a new diff that tries to deal with that.

In the previous diff, arithmetic overflow resulted in incorrect values being computed. This version changes all of the data types to be longs rather than ints, as is more appropriate for 64-bit machines. (I haven't compile-tested on an ILP32 machine to see if it even still builds -- I suspect not, and I'll need to figure out a way to work around the lack of atomic 64-bit ops on 32-bit architectures.)

One important change, which I think is actually the important part, makes the type of "sz" in svc_run_internal() be signed.

This revision now requires review to proceed.Mar 28 2015, 8:44 PM

wollman added inline comments.Mar 28 2015, 8:49 PM

sys/rpc/svc.c
120–123	clang gives a tautological-compare warning here on LP64 platforms (nmbclusters is an int). What's the test I really should be doing?
1258–1259	Previously, sz was a size_t, which is unsigned long, and -sz (still unsigned long) was passed to a signed int formal parameter, resulting in overflow and undefined behavior. I suspect gcc 4.2 did the "obvious" thing and clang generates different code here that doesn't.

wollman added inline comments.Mar 28 2015, 9:11 PM

sys/rpc/svc.c
1258–1259	Here's how GCC implemented it in my 9.3 kernel: 0xffffffff807bd50e <+1454>: mov %r13d,%esi 0xffffffff807bd511 <+1457>: mov %r14,%rdi 0xffffffff807bd514 <+1460>: neg %esi 0xffffffff807bd516 <+1462>: callq 0xffffffff807bc820 <svc_change_space_used> It actually performs the negation in the parameter type (int) rather than the value type (size_t). Of course any situation in which the result would be different involves undefined behavior so GCC was not in error here (the code was still wrong).

wollman edited the test plan for this revision. (Show Details)Mar 28 2015, 9:13 PM

wollman edited edge metadata.

wollman added inline comments.Mar 28 2015, 9:23 PM

sys/rpc/svc.c
1258–1259	Thinking that from the behavior I'm seeing, this should perhaps be hoisted up into the loop, so that the space-in-use is immediately credited when the request competes. We don't batch this adjustment on insert, so I don't think it should be batched on dequeue either. Thoughts?

This version eliminates the variable "sz" in svc_run_internal() entirely by not batching updates to the available request space.

wollman retitled this revision from Remove antique hard-coded limit in kernel RPC request throttling code to Remove antique hard-coded limit and integer overflow bugs in kernel RPC request throttling code.Mar 28 2015, 9:51 PM

wollman updated this object.

It looks ok to me. I might have used int64_t instead of long, so that they're
64bits on all arches, but I don't think it will matter, since nmbclusters won't
be that large for 32bit arches, I think?

This revision is now accepted and ready to land.Mar 29 2015, 3:12 AM

In D2165#20, @rmacklem wrote:

It looks ok to me. I might have used int64_t instead of long, so that they're
64bits on all arches, but I don't think it will matter, since nmbclusters won't
be that large for 32bit arches, I think?

I don't think we can expect to have a 64-bit atomic_fetchadd on ILP32 arches, whereas long (by definition) will be available there, and (perhaps with the exception of i386+PAE) I don't think nmbclusters can ever be large enough for it to matter on those arches and still have a functioning system. Next revision is going to remove the belt-and-suspenders overflow check.

Remove the redundant overflow check and expand the comment to make it clear what's going on when we compute the request buffer limits. This should be the last iteration of this patch. This version isn't running on a production server yet; hopefully that will happen on Monday.

This revision now requires review to proceed.Mar 29 2015, 6:00 AM

Yep, this looks ok to me.

This revision is now accepted and ready to land.Mar 30 2015, 12:12 AM

NB: This diff is relative to 10.1. Context lines have changed in 11-current so the diff won't apply directly.

Previous patch had a use-after-free bug (that should have been obvious). This version reintroduces the "sz" variable to prevent this.

This revision now requires review to proceed.Mar 30 2015, 2:38 AM

rmacklem accepted this revision.Mar 30 2015, 3:02 AM

rmacklem edited edge metadata.

This revision is now accepted and ready to land.Mar 30 2015, 3:02 AM

This code is running on a production fileserver now. Waiting for the all-arches compile test before I'm ready to commit.

It looks good to me.

svc_change_space_used() call out of the loop was kind of optimization when modifying variable shared by many threads, but that may be not so important.

In D2165#39, @mav wrote:

svc_change_space_used() call out of the loop was kind of optimization when modifying variable shared by many threads, but that may be not so important.

This actually hurts on a busy server, because the request-processing loop can livelock until the throttling limit has been reached. On one server with heavy write activity, I saw the request_space_used increasing steadily until it hit the limit; then the throttle would engage, the request-processing loops would all run to completion, and request_space_used would go back to zero. Lather, rinse, repeat, ad infinitum. This doesn't happen when traffic is light because the request-processing loop hits the end of the queue before more work can be appended to it.

Closed by commit rS280930 (authored by @wollman).

Remove antique hard-coded limit and integer overflow bugs in kernel RPC request throttling code
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 4489

sys/rpc/svc.h

sys/rpc/svc.c

Remove antique hard-coded limit and integer overflow bugs in kernel RPC request throttling codeClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 4489

sys/rpc/svc.h

sys/rpc/svc.c

Remove antique hard-coded limit and integer overflow bugs in kernel RPC request throttling code
ClosedPublic
Actions

Revision Contents
Changeset List