Page MenuHomeFreeBSD

Rework IPv6 TCP path MTU discovery to match IPv4
ClosedPublic

Authored by gallatin on Jul 21 2016, 3:08 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Jan 13, 7:13 AM
Unknown Object (File)
Sun, Dec 29, 12:55 PM
Unknown Object (File)
Nov 18 2024, 4:36 AM
Unknown Object (File)
Nov 17 2024, 1:40 AM
Unknown Object (File)
Nov 10 2024, 7:37 PM
Unknown Object (File)
Oct 15 2024, 1:32 PM
Unknown Object (File)
Sep 20 2024, 8:37 AM
Unknown Object (File)
Sep 17 2024, 4:31 PM
Subscribers

Details

Summary
  • Re-write tcp_ctlinput6() to closely mimic the IPv4 tcp_ctlinput() as it stands in D7251. Hopefully the code will be close enough to prevent future regressions and bitrot in the v6 code.
  • Now that tcp_ctlinput6() updates t_maxseg, we can allow ip6_output() to send TCP packets without looking at the tcp host cache for every single transmit.
  • Prevent the icmp6 code from returning PRC_HOSTDEAD so as to avoid the massive lock contention caused by in6_pcbnotify() walking tcpbinfo with the write lock held.

Without these changes in place, even 5-10% of traffic being IPv6
causes huge scalability issues with large numbers (order 100K) of TCP
connections. With these changes in place, these scalability issues
are gone.

Test Plan

I have run this for 72+ hours on a Netflix CDN server with no
ill effects observed.

I have verified via Dtrace that tp->t_maxseg is getting updated via
the tcp_ctlinput6() routine, and have verified via our logs that
several IPv6 clients who needed a pmtu change went on to have good
sessions.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

gallatin retitled this revision from to Rework IPv6 TCP path MTU discovery to match IPv4.
gallatin updated this object.
gallatin edited the test plan for this revision. (Show Details)
gallatin added reviewers: bz, rrs, gnn, scottl.
ae added inline comments.
sys/netinet/tcp_subr.c
2073 ↗(On Diff #18632)

What is the reason of this check? Maybe it will be better to move it into icmp6_notify_error()?

2086 ↗(On Diff #18632)

Please, you use ip6 != NULL here

2114 ↗(On Diff #18632)

th = mtodo(m, off);

2123 ↗(On Diff #18632)

Extra newline here

2139 ↗(On Diff #18632)

leading spaces

sys/netinet6/ip6_output.c
722 ↗(On Diff #18632)

I think you expect to see IPPROTO_TCP in the ip6_nxt, but in some cases at this place it can have another value due to the extension headers. Probably you need to use *nexthdrp here.

1338 ↗(On Diff #18632)

Is this check correct?

Thanks for the valuable input. I will uprev with my fixes later today.

sys/netinet/tcp_subr.c
2073 ↗(On Diff #18632)

Purely to mimic the faddr.s_addr == INADDR_ANY () in the tcp_ctlinput(). I have no objection to removing it, if you feel that it is not needed.

2086 ↗(On Diff #18632)

Hmm.. It was intended to exactly match tcp_cllinput() so it was obvious they are the same, but yes, it is new code, so I suppose the extra verbose style is called for

2114 ↗(On Diff #18632)

Thanks. That's new enough that i forget it exists.

sys/netinet6/ip6_output.c
722 ↗(On Diff #18632)

I considered this. However, I just wanted to keep it simple. Note that this is just an optimization to allow TCP to avoid the tcp_hc_getmtu() call that currently happens on every single packet. For the pessimal edge case of extension headers, many other pessimal paths are taken (eg, no csum and TSO), that missing this optimization does not matter.

1338 ↗(On Diff #18632)

Ironically, yes it is. TCP is the only proto I'm sure does not need to have the MTU looked up in the tcp host cache. This is because the tcp_ctlinput() changes now allow the MTU to be updated in tcpcb's t_maxseg. With that change, the tcp host cache lookup is not required and can be skipped. I have no idea what other, not TCP, protos do.

gallatin edited edge metadata.
  • Address ae's feedback
sys/netinet/tcp_subr.c
2144 ↗(On Diff #18682)

One day I should add a comment about the -8 there ... SA-08:09. No action needed from you.

bz edited edge metadata.

Given I am only asking for an extra comment, this seems to look OK to me. I have not tested or applied the patch.

sys/netinet6/ip6_output.c
1338 ↗(On Diff #18632)

This time I got the edit box. I'd say add a comment before that check explaining why TCP does not need it (anymore).

This revision is now accepted and ready to land.Jul 26 2016, 12:55 PM
rrs edited edge metadata.
gallatin edited edge metadata.

Update with some cleanups in reaction to comments from lstewart@

  • Add a comment as to why TCP is exempted from tcp_hg_getmtu() in ip6_calcmtu()
  • Use nexthdrp for the proto arg to ip6_getpmtu().
  • Collapse the ICMP6_DST_UNREACH_ADDR case with ICMP6_DST_UNREACH_NOROUTE in icmp6_input() rather than keeping it separate.
This revision now requires review to proceed.Jul 27 2016, 6:46 PM
gallatin edited edge metadata.

Fix LINT-NOINET and a few line width problems

This revision was automatically updated to reflect the committed changes.