Page MenuHomeFreeBSD

Allow TCP to reuse local port with different destinations
ClosedPublic

Authored by karels on May 9 2020, 3:37 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Nov 23, 2:14 AM
Unknown Object (File)
Fri, Nov 15, 5:08 AM
Unknown Object (File)
Tue, Nov 12, 3:42 AM
Unknown Object (File)
Oct 23 2024, 6:00 AM
Unknown Object (File)
Oct 9 2024, 5:27 AM
Unknown Object (File)
Oct 4 2024, 3:45 PM
Unknown Object (File)
Sep 30 2024, 9:27 PM
Unknown Object (File)
Sep 24 2024, 4:38 PM

Details

Summary

Previously, tcp_connect() would bind a local port before connecting,
forcing the local port to be unique across all outgoing TCP connections
for the address family. Instead, choose a local port after selecting
the destination and the local address, requiring only that the tuple
is unique.

Test Plan

Tested manually using both IPv4 and IPv6 with a small pool of ephemeral
ports, verifying that they could be reused for different destinations.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Warnings
SeverityLocationCodeMessage
Warningsys/netinet6/in6_pcb.c:430SPELL1Possible Spelling Mistake
Unit
No Test Coverage
Build Status
Buildable 31115
Build 28801: arc lint + arc unit

Event Timeline

Can you point to existing implementations of this idea?
Several middle-ware boxes are prone to assumptions like one-port-one-connection.
I doubt, that this will work with i.e. restricted cone NAT (https://en.wikipedia.org/wiki/Network_address_translation)

The enterprise-grade Sidewinder firewall has had this feature for years. I don't know what other implementations work this way, but it seems like the obvious way to do this "right". A NAT box cannot serve multiple clients without handling port overlap, nor connections to multiple services (ports) on a server.

What was the special handling of a inp_port == 0 doing, or why did you remove this in this patch?

Also, while this patch should make randomly selected ports collide much less frequently (which should be a win for busy machines) I wonder if there is a more effective alternative than a linear walk on a collision.

At least with most TCP functionality changes, they usually go like this:

a) introduce new functionality, tunable by sysctl, default off
:
b) default sysctl to on (for larger exposure, and if things do break, people have an "easy" (if they know about it) way back
:
c) remove old functionality and sysctl (this rarely happens)

While I applaud this approach to scalability, just wondering if you aren't jumping from a to c too quickly here - even though I agree, that there shouldn't be too many issues with a change like this.

Thanks, Richard.

What was the special handling of a inp_port == 0 doing, or why did you remove this in this patch?

Are you talking about the blocks that do in_pcbbind and in6_pcbbind? Those did the "pre-bind" that caused the local port to be unique.

Also, while this patch should make randomly selected ports collide much less frequently (which should be a win for busy machines) I wonder if there is a more effective alternative than a linear walk on a collision.

The port-binding code is the same as was used earlier (was in_pcb_lport), which normally does a random port choice followed by a linear search. We don't have data structures that would optimize this, but it is no worse than before.

At least with most TCP functionality changes, they usually go like this:

a) introduce new functionality, tunable by sysctl, default off
:
b) default sysctl to on (for larger exposure, and if things do break, people have an "easy" (if they know about it) way back
:
c) remove old functionality and sysctl (this rarely happens)

While I applaud this approach to scalability, just wondering if you aren't jumping from a to c too quickly here - even though I agree, that there shouldn't be too many issues with a change like this.

It would be straightforward to add a sysctl controlling the pre-bind. In this case, I don't see a point to (a), as no one is likely to enable the sysctl, but I could do (b).

Add sysctl to control new feature

Add "net.inet.tcp.require_unique_port" sysctl, default to off, to disable
new feature if necessary.

Hi Mike,
let me ask you two questions:

  • If there is a wildcard socket bound against *:port, will this local port be used for a particular remote address?
  • If these is a local port and address bound to a specific remote address port. Can you still bind a wildcard address to this port?

Thanks for the answers.
Best regards
Michael

Hi Mike,
let me ask you two questions:

Good questions!

  • If there is a wildcard socket bound against *:port, will this local port be used for a particular remote address?

I believe that the answer is no, but I will confirm. Of course, this could only happen in the ephemeral port range.

  • If these is a local port and address bound to a specific remote address port. Can you still bind a wildcard address to this port?

I'm reasonably sure that the answer depends on the SO_REUSEADDR option for the socket attempting the wildcard address. I will confirm this as well.

rrs added a subscriber: rrs.

Yeah the spelling error check is rather dumb :)

This revision is now accepted and ready to land.May 18 2020, 6:29 AM

Do not allow a port with a wildcard bind to be used for connect.

This revision now requires review to proceed.May 18 2020, 12:39 PM

In fact, tcp_connect was able to use a port that had a wildcard bind. I can't think of a real-world problem this would cause, but it was easy to fix.

The other case, doing a wildcard bind after a port is used for a connection, is unchanged by this patch. It still requires SO_REUSEADDR.

This revision is now accepted and ready to land.May 18 2020, 2:23 PM
This revision was landed with ongoing or failed builds.May 18 2020, 10:53 PM
This revision was automatically updated to reflect the committed changes.

Thanks, Conrad. I'm checking it out.

You'll want to match https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=174087 against this, and close that Bug with this fix...