Details

Reviewers

scottl
rrs
gnn
• karels

Group Reviewers

transport

Commits

rS303457: Call tcp_notify() directly to shoot down routes, rather than

Summary

In r297225 / D4306, route caching was re-enabled and added a mechanism
to cache flushed routes via tcp_ctlinput() using in_pcbnotifyall().
Unfortunately, as mentioned a few lines down in tcp_ctlinput(),
in_pcbnotifyall() is so expensive that it is almost a DOS. It holds
the tcpbinfo write lock while walking all connections. When this
happens on a heavily loaded internet facing box at Netflix with ~100K
connections, we see massive load spikes, packet drops, and throughput
loss as all CPUs spin waiting for the tcpbinfo lock.

Rather than calling in_pcbnotifyall(), let's just call tcp_notify()
directly to shoot down the routes. With this fix in place, our
IPv4 based load spikes have gone away. There are similar issues
in IPv6 which I'll address in a separate commit.

Test Plan

Run for 48 hours on heavily loaded CDN server.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

gallatin updated this revision to Diff 18556.Jul 19 2016, 3:58 PM

gallatin retitled this revision from to Avoid calling in_pcbnotifyall() to flush cached routes.

gallatin updated this object.

gallatin edited the test plan for this revision. (Show Details)

gallatin added reviewers: scottl, rrs, • karels, gnn.

Herald added a reviewer: transport. · View Herald TranscriptJul 19 2016, 3:58 PM

Herald added a subscriber: imp. · View Herald Transcript

jch added a subscriber: jch.Jul 19 2016, 4:30 PM

gallatin mentioned this in D7272: Rework IPv6 TCP path MTU discovery to match IPv4.Jul 21 2016, 3:08 PM

gallatin added a child revision: D7272: Rework IPv6 TCP path MTU discovery to match IPv4.Jul 21 2016, 3:09 PM

address feedback from ae

Herald edited edge metadata. · View Herald TranscriptJul 22 2016, 6:04 PM

OK, so my attempt at properly using arc has failed miserably.

Revert mistplaced arc diff

Herald edited edge metadata. · View Herald TranscriptJul 22 2016, 6:16 PM

Revert other files touched in mis-placed arc diff

Herald edited edge metadata. · View Herald TranscriptJul 22 2016, 6:20 PM

OK, the diff is back to the way it should be. Now to try to make arc apply my change to the correct, dependant review.

Sorry for the mess. Long time reviewer of arc, first time uploader of diffs :(

Looks good to me. This will only update one connection, but there is nothing wrong with that. We could receive a large number of redirects if there are a lot of connections using the same route. Another option would be to increment the route table revision if a redirect adds a new route, doing nothing if it changes a route. But this is quick and expedient. It should obviously be MFC'ed to 11.0.

I'm not sure my acceptance "counts" using mike-karels.net; if you add karels@, I can re-approve.

In D7251#151654, @mike-karels.net wrote:

I'm not sure my acceptance "counts" using mike-karels.net; if you add karels@, I can re-approve.

Hmm.. I tried to edit the review to add karels@ as a reviewer, but that username does not seem to exist. Did you sign up for phabricator with your FreeBSD username? I think maybe you only have the original @mile-karels.net login for phabricator. As you can tell from the misplaced diff snafu yesterday, I'm far from a phabricator expert, but I think that should be OK.

I'm looking into the account issue.

Looks good Drew :)

This revision is now accepted and ready to land.Jul 26 2016, 3:17 PM

Closed by commit rS303457: Call tcp_notify() directly to shoot down routes, rather than (authored by gallatin). · Explain WhyJul 28 2016, 7:32 PM

This revision was automatically updated to reflect the committed changes.

Avoid calling in_pcbnotifyall() to flush cached routes
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 18858

head/sys/netinet/tcp_subr.c

Avoid calling in_pcbnotifyall() to flush cached routesClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 18858

head/sys/netinet/tcp_subr.c

Avoid calling in_pcbnotifyall() to flush cached routes
ClosedPublic
Actions

Revision Contents
Changeset List