Allow TCP to reuse local port with different destinations
ClosedPublic
Actions

Authored by • karels on May 9 2020, 3:37 PM.

Details

Reviewers

tuexen
bz
cem
rscheff
rrs

Group Reviewers

transport

Commits

rS361228: Allow TCP to reuse local port with different destinations

Summary

Previously, tcp_connect() would bind a local port before connecting,
forcing the local port to be unique across all outgoing TCP connections
for the address family. Instead, choose a local port after selecting
the destination and the local address, requiring only that the tuple
is unique.

Test Plan

Tested manually using both IPv4 and IPv6 with a small pool of ephemeral
ports, verifying that they could be reused for different destinations.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Warnings

Severity	Location	Code	Message
Warning	sys/netinet6/in6_pcb.c:430	SPELL1	Possible Spelling Mistake

Unit

No Test Coverage

Build Status

Buildable 31115
Build 28801: arc lint + arc unit

Event Timeline

• karels created this revision.May 9 2020, 3:37 PM

Herald added a reviewer: cem. · View Herald TranscriptMay 9 2020, 3:37 PM

Herald added a reviewer: transport. · View Herald Transcript

Herald added subscribers: melifaro, ae, imp. · View Herald Transcript

• karels requested review of this revision.May 9 2020, 3:37 PM

Harbormaster completed remote builds in B31002: Diff 71588.May 9 2020, 3:37 PM

Can you point to existing implementations of this idea?
Several middle-ware boxes are prone to assumptions like one-port-one-connection.
I doubt, that this will work with i.e. restricted cone NAT (https://en.wikipedia.org/wiki/Network_address_translation)

The enterprise-grade Sidewinder firewall has had this feature for years. I don't know what other implementations work this way, but it seems like the obvious way to do this "right". A NAT box cannot serve multiple clients without handling port overlap, nor connections to multiple services (ports) on a server.

What was the special handling of a inp_port == 0 doing, or why did you remove this in this patch?

Also, while this patch should make randomly selected ports collide much less frequently (which should be a win for busy machines) I wonder if there is a more effective alternative than a linear walk on a collision.

At least with most TCP functionality changes, they usually go like this:

a) introduce new functionality, tunable by sysctl, default off
:
b) default sysctl to on (for larger exposure, and if things do break, people have an "easy" (if they know about it) way back
:
c) remove old functionality and sysctl (this rarely happens)

While I applaud this approach to scalability, just wondering if you aren't jumping from a to c too quickly here - even though I agree, that there shouldn't be too many issues with a change like this.

Thanks, Richard.

In D24781#546975, @rscheff wrote:

What was the special handling of a inp_port == 0 doing, or why did you remove this in this patch?

Are you talking about the blocks that do in_pcbbind and in6_pcbbind? Those did the "pre-bind" that caused the local port to be unique.

Also, while this patch should make randomly selected ports collide much less frequently (which should be a win for busy machines) I wonder if there is a more effective alternative than a linear walk on a collision.

The port-binding code is the same as was used earlier (was in_pcb_lport), which normally does a random port choice followed by a linear search. We don't have data structures that would optimize this, but it is no worse than before.

At least with most TCP functionality changes, they usually go like this:

a) introduce new functionality, tunable by sysctl, default off
:
b) default sysctl to on (for larger exposure, and if things do break, people have an "easy" (if they know about it) way back
:
c) remove old functionality and sysctl (this rarely happens)

While I applaud this approach to scalability, just wondering if you aren't jumping from a to c too quickly here - even though I agree, that there shouldn't be too many issues with a change like this.

It would be straightforward to add a sysctl controlling the pre-bind. In this case, I don't see a point to (a), as no one is likely to enable the sysctl, but I could do (b).

Add sysctl to control new feature

Add "net.inet.tcp.require_unique_port" sysctl, default to off, to disable
new feature if necessary.

Harbormaster completed remote builds in B31115: Diff 71863.May 16 2020, 6:18 PM

LGTM.

Hi Mike,
let me ask you two questions:

If there is a wildcard socket bound against *:port, will this local port be used for a particular remote address?
If these is a local port and address bound to a specific remote address port. Can you still bind a wildcard address to this port?

Thanks for the answers.
Best regards
Michael