Paths

Table of Contentst

pf: if a new RDR state connect be created, modulate src port
Needs ReviewPublic
Actions

Authored by allanjude on Mar 24 2024, 3:57 AM.

Details

Reviewers

Summary

If multiple inbound connections through a RDR rule cannot be accomodated
due to conflicting src/dst ip/port combination, modulate the src port in
the state entry to allow the connection rather than dropping it.

Co-authored-by: Allan Jude <allanjude@freebsd.org>
Sponsored by: Modirum MDPay
Sponsored by: Klara, Inc.

Test Plan

all tcp 203.0.113.50:8000 (198.51.100.1:7777) <- 198.51.100.50:5454       FIN_WAIT_2:ESTABLISHED
all tcp 203.0.113.50:8000 (198.51.100.2:7777) <- 198.51.100.50:5710 (198.51.100.50:5454)       FIN_WAIT_2:ESTABLISHED
all tcp 203.0.113.50:8000 (198.51.100.3:7777) <- 198.51.100.50:5966 (198.51.100.50:5454)       FIN_WAIT_2:ESTABLISHED

The first connection is RDR'd normally. The second and 3rd have the src port on the 'far' side of the RDR rewritten (with state entries to route it back correctly)

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 56762
Build 53650: arc lint + arc unit

Event Timeline

allanjude created this revision.Mar 24 2024, 3:57 AM

Herald added subscribers: vegeta_tuxpowered.net, glebius, melifaro and 2 others. · View Herald TranscriptMar 24 2024, 3:57 AM

allanjude requested review of this revision.Mar 24 2024, 3:57 AM

Harbormaster completed remote builds in B56759: Diff 136142.Mar 24 2024, 3:58 AM

This also really needs a test case.

sys/netpfil/pf/pf.c
5036	Declarations go at the top of the block.
5055	There's a pf SDT namespace, so that really wants to be in there.
5059	That could roll over into port 0. It probably also needs a bailout condition to avoid spinning forever if there are no more available ports. See PR 233867 for a similar bug.

allanjude edited the test plan for this revision. (Show Details)Mar 24 2024, 6:55 AM

Add test case

Harbormaster completed remote builds in B56761: Diff 136145.Mar 24 2024, 6:57 AM

kp added inline comments.Mar 24 2024, 7:01 AM

tests/sys/netpfil/pf/rdr.sh
194	Why not just nc?
199	That really needs to be compiled by the makefile, not on every test run. See tests/sys/net's randsleep for an example.

Address feedback

sys/netpfil/pf/pf.c
5059	Is there a specific threshold that makes sense to give up at? Or just when we would overflow and loop back to port 0?

Harbormaster completed remote builds in B56762: Diff 136146.Mar 24 2024, 7:34 AM

kp added inline comments.Mar 24 2024, 7:46 AM

sys/netpfil/pf/pf.c
5059	I was inclined to say once we've looped though all 65K ports, because then we know we can't possibly handle this state, but that's 65K hashes and lock / unlocks and that's going to be very, very expensive. So for we may want to limit it much more than that. Given that we only try once now just trying the next say 32 ports may be enough to fix most of the occurrences. This also suggests two test cases to add: using port 65535, so we have to roll over to 0 (and test that we do not use port 0), as well as a scenario where there are no ports available in whatever range of tries we settle on.

What real-life situation is this fix for? If this is for connections coming from behind a 3rd party SNAT, where SNAT reuses source ports faster than pf expires states, then maybe tuning pf timeouts would be enough. Or we could allow pf states to transition from TCPS_FIN_WAIT_2 back to TCPS_SYN_SENT, basically implementing SO_REUSEPORT for pf.

In D44488#1014674, @vegeta_tuxpowered.net wrote:

What real-life situation is this fix for? If this is for connections coming from behind a 3rd party SNAT, where SNAT reuses source ports faster than pf expires states, then maybe tuning pf timeouts would be enough. Or we could allow pf states to transition from TCPS_FIN_WAIT_2 back to TCPS_SYN_SENT, basically implementing SO_REUSEPORT for pf.

It is for behind a load balancer, where the initial connection comes in one public IP #1, but then all future connections come in on public IP #2, but the external source port from the incoming IP stays the same.
So when PF NAT's it to the web server behind the PF RDR rules, the incoming source ports conflict.

marius.h_lden.org added a subscriber: marius.h_lden.org.Mar 29 2024, 10:49 PM

markj added a subscriber: markj.Apr 12 2024, 6:26 PM

markj added inline comments.

sys/netpfil/pf/pf.c
5050	Can we get away with doing this on demand, i.e., try to insert a new state without checking and try with a different port if pf_state_insert() returns EEXIST? (I'll try to answer my own question, but posing it here in case someone has a quick reason to say "no".)

kp added inline comments.Apr 13 2024, 10:41 AM

sys/netpfil/pf/pf.c
5050	I think that ought to work. There's a log entry that will potentially fire in that case, but we can either fix that or live with it, given that collisions ought to be rare anyway. It's probably even the best approach, because trying to find a free port here and then running pf_state_insert() is potentially racy.

I've spent some time looking into the underlying problem, and I'm not sure that this approach (quietly rewriting the src port) is correct. The pf.conf manual page says:

In addition to modifying the address, some translation rules may modify
source or destination ports for tcp(4) or udp(4) connections; implicitly
in the case of nat rules and explicitly in the case of rdr rules.  Port
numbers are never translated with a binat rule.

but with this change we are implicitly rewriting the source port. What explicit mechanism is the man page alluding to here?

As an aside, I also found that if a nat rule matches an outbound packet, but we fail to allocate a src port for it (e.g., because static-port is configured and a collision occurs, or pf_get_sport() fails to find a free port), then pf will pass the packet out without performing any translation. That is, we end up passing out unmodified packets from the internal network. This happens in the "NAT proxy port allocation failed" path in pf_get_translation(); I'd really expect pf to drop the packet instead. Is this a bug?

In D44488#1031012, @markj wrote:

but with this change we are implicitly rewriting the source port. What explicit mechanism is the man page alluding to here?

That'd be things like rdr on $ext_if proto tcp from any to any port 80 -> 127.0.0.1 port 8080, where we redirect incoming connections on port 80 to localhost port 8080.
When we're doing explicit rewrites it's always going to be the destination port, of course.

As an aside, I also found that if a nat rule matches an outbound packet, but we fail to allocate a src port for it (e.g., because static-port is configured and a collision occurs, or pf_get_sport() fails to find a free port), then pf will pass the packet out without performing any translation. That is, we end up passing out unmodified packets from the internal network. This happens in the "NAT proxy port allocation failed" path in pf_get_translation(); I'd really expect pf to drop the packet instead. Is this a bug?

Yeah, I'd say that's a bug. I suspect it mostly stems from confusion between 'there is no translation required' and 'something went wrong', because both return NULL.

In D44488#1031012, @markj wrote:
I've spent some time looking into the underlying problem, and I'm not sure that this approach (quietly rewriting the src port) is correct. The pf.conf manual page says:
In addition to modifying the address, some translation rules may modify
source or destination ports for tcp(4) or udp(4) connections; implicitly
in the case of nat rules and explicitly in the case of rdr rules.  Port
numbers are never translated with a binat rule.
but with this change we are implicitly rewriting the source port. What explicit mechanism is the man page alluding to here?

There may be a subtlety here, the static-port flag is mostly talking about 'NAT' rules, and the purpose of the static-port flag is to keep the source-port used by internal machines (on the inside) when translated to the public IP (outside).
In the case this patch is addressing, we're modulating the source-port of an inbound connection, and we are only changing it 'inside' our network, because the only other choice is to drop the packet.

In D44488#1031217, @kp wrote:

In D44488#1031012, @markj wrote:

but with this change we are implicitly rewriting the source port. What explicit mechanism is the man page alluding to here?

That'd be things like rdr on $ext_if proto tcp from any to any port 80 -> 127.0.0.1 port 8080, where we redirect incoming connections on port 80 to localhost port 8080.
When we're doing explicit rewrites it's always going to be the destination port, of course.

Ah, I had read that as meaning that it was possible for rdr rules to rewrite source ports, but upon a second look, it seems not.

There may be a subtlety here, the static-port flag is mostly talking about 'NAT' rules, and the purpose of the static-port flag is to keep the source-port used by internal machines (on the inside) when translated to the public IP (outside).
In the case this patch is addressing, we're modulating the source-port of an inbound connection, and we are only changing it 'inside' our network, because the only other choice is to drop the packet.

Right, I guess what I'm getting at is that the rdr rule syntax probably ought to provide some control over how source port rewriting is done. For instance, one might want to provide a range of source ports to use, or to disable rewriting entirely (ala static-port for nat rules).