Page MenuHomeFreeBSD

Avoid calling in_pcbnotifyall() to flush cached routes
ClosedPublic

Authored by gallatin on Jul 19 2016, 3:58 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Apr 3, 11:59 PM
Unknown Object (File)
Mar 20 2024, 2:57 PM
Unknown Object (File)
Feb 9 2024, 11:35 AM
Unknown Object (File)
Jan 27 2024, 12:11 AM
Unknown Object (File)
Jan 14 2024, 5:40 AM
Unknown Object (File)
Dec 26 2023, 1:54 AM
Unknown Object (File)
Dec 21 2023, 7:31 AM
Unknown Object (File)
Dec 20 2023, 12:19 AM
Subscribers

Details

Summary

In r297225 / D4306, route caching was re-enabled and added a mechanism
to cache flushed routes via tcp_ctlinput() using in_pcbnotifyall().
Unfortunately, as mentioned a few lines down in tcp_ctlinput(),
in_pcbnotifyall() is so expensive that it is almost a DOS. It holds
the tcpbinfo write lock while walking all connections. When this
happens on a heavily loaded internet facing box at Netflix with ~100K
connections, we see massive load spikes, packet drops, and throughput
loss as all CPUs spin waiting for the tcpbinfo lock.

Rather than calling in_pcbnotifyall(), let's just call tcp_notify()
directly to shoot down the routes. With this fix in place, our
IPv4 based load spikes have gone away. There are similar issues
in IPv6 which I'll address in a separate commit.

Test Plan

Run for 48 hours on heavily loaded CDN server.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

gallatin retitled this revision from to Avoid calling in_pcbnotifyall() to flush cached routes.
gallatin updated this object.
gallatin edited the test plan for this revision. (Show Details)
gallatin added reviewers: scottl, rrs, karels, gnn.
gallatin edited edge metadata.

address feedback from ae

OK, so my attempt at properly using arc has failed miserably.

gallatin edited edge metadata.
  • Revert mistplaced arc diff
gallatin edited edge metadata.
  • Revert other files touched in mis-placed arc diff

OK, the diff is back to the way it should be. Now to try to make arc apply my change to the correct, dependant review.

Sorry for the mess. Long time reviewer of arc, first time uploader of diffs :(

karels edited edge metadata.

Looks good to me. This will only update one connection, but there is nothing wrong with that. We could receive a large number of redirects if there are a lot of connections using the same route. Another option would be to increment the route table revision if a redirect adds a new route, doing nothing if it changes a route. But this is quick and expedient. It should obviously be MFC'ed to 11.0.

I'm not sure my acceptance "counts" using mike-karels.net; if you add karels@, I can re-approve.

In D7251#151654, @mike-karels.net wrote:

I'm not sure my acceptance "counts" using mike-karels.net; if you add karels@, I can re-approve.

Hmm.. I tried to edit the review to add karels@ as a reviewer, but that username does not seem to exist. Did you sign up for phabricator with your FreeBSD username? I think maybe you only have the original @mile-karels.net login for phabricator. As you can tell from the misplaced diff snafu yesterday, I'm far from a phabricator expert, but I think that should be OK.

I'm looking into the account issue.

rrs edited edge metadata.

Looks good Drew :)

This revision is now accepted and ready to land.Jul 26 2016, 3:17 PM
This revision was automatically updated to reflect the committed changes.