Previously, tcp_connect() would bind a local port before connecting,
forcing the local port to be unique across all outgoing TCP connections
for the address family. Instead, choose a local port after selecting
the destination and the local address, requiring only that the tuple
is unique.
Details
Tested manually using both IPv4 and IPv6 with a small pool of ephemeral
ports, verifying that they could be reused for different destinations.
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Warnings Severity Location Code Message Warning sys/netinet6/in6_pcb.c:431 SPELL1 Possible Spelling Mistake - Unit
No Test Coverage - Build Status
Buildable 31002 Build 28708: arc lint + arc unit
Event Timeline
Can you point to existing implementations of this idea?
Several middle-ware boxes are prone to assumptions like one-port-one-connection.
I doubt, that this will work with i.e. restricted cone NAT (https://en.wikipedia.org/wiki/Network_address_translation)
The enterprise-grade Sidewinder firewall has had this feature for years. I don't know what other implementations work this way, but it seems like the obvious way to do this "right". A NAT box cannot serve multiple clients without handling port overlap, nor connections to multiple services (ports) on a server.
What was the special handling of a inp_port == 0 doing, or why did you remove this in this patch?
Also, while this patch should make randomly selected ports collide much less frequently (which should be a win for busy machines) I wonder if there is a more effective alternative than a linear walk on a collision.
At least with most TCP functionality changes, they usually go like this:
a) introduce new functionality, tunable by sysctl, default off
:
b) default sysctl to on (for larger exposure, and if things do break, people have an "easy" (if they know about it) way back
:
c) remove old functionality and sysctl (this rarely happens)
While I applaud this approach to scalability, just wondering if you aren't jumping from a to c too quickly here - even though I agree, that there shouldn't be too many issues with a change like this.
Thanks, Richard.
Are you talking about the blocks that do in_pcbbind and in6_pcbbind? Those did the "pre-bind" that caused the local port to be unique.
Also, while this patch should make randomly selected ports collide much less frequently (which should be a win for busy machines) I wonder if there is a more effective alternative than a linear walk on a collision.
The port-binding code is the same as was used earlier (was in_pcb_lport), which normally does a random port choice followed by a linear search. We don't have data structures that would optimize this, but it is no worse than before.
At least with most TCP functionality changes, they usually go like this:
a) introduce new functionality, tunable by sysctl, default off
:
b) default sysctl to on (for larger exposure, and if things do break, people have an "easy" (if they know about it) way back
:
c) remove old functionality and sysctl (this rarely happens)While I applaud this approach to scalability, just wondering if you aren't jumping from a to c too quickly here - even though I agree, that there shouldn't be too many issues with a change like this.
It would be straightforward to add a sysctl controlling the pre-bind. In this case, I don't see a point to (a), as no one is likely to enable the sysctl, but I could do (b).
Add sysctl to control new feature
Add "net.inet.tcp.require_unique_port" sysctl, default to off, to disable
new feature if necessary.
Hi Mike,
let me ask you two questions:
- If there is a wildcard socket bound against *:port, will this local port be used for a particular remote address?
- If these is a local port and address bound to a specific remote address port. Can you still bind a wildcard address to this port?
Thanks for the answers.
Best regards
Michael
Good questions!
- If there is a wildcard socket bound against *:port, will this local port be used for a particular remote address?
I believe that the answer is no, but I will confirm. Of course, this could only happen in the ephemeral port range.
- If these is a local port and address bound to a specific remote address port. Can you still bind a wildcard address to this port?
I'm reasonably sure that the answer depends on the SO_REUSEADDR option for the socket attempting the wildcard address. I will confirm this as well.
In fact, tcp_connect was able to use a port that had a wildcard bind. I can't think of a real-world problem this would cause, but it was easy to fix.
The other case, doing a wildcard bind after a port is used for a connection, is unchanged by this patch. It still requires SO_REUSEADDR.
FYI, possibly fallout from this change:
https://syzkaller.appspot.com/bug?extid=005eee1ce96ea4716af8
You'll want to match https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=174087 against this, and close that Bug with this fix...