Page MenuHomeFreeBSD

pf: if a new RDR state connect be created, modulate src port
Needs ReviewPublic

Authored by allanjude on Mar 24 2024, 3:57 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, May 18, 1:26 AM
Unknown Object (File)
Mon, May 13, 2:26 AM
Unknown Object (File)
Tue, May 7, 12:47 PM
Unknown Object (File)
Tue, May 7, 3:32 AM
Unknown Object (File)
Sat, May 4, 1:23 PM
Unknown Object (File)
Thu, May 2, 4:23 AM
Unknown Object (File)
Fri, Apr 26, 7:26 PM
Unknown Object (File)
Fri, Apr 26, 6:07 PM

Details

Reviewers
kp
Summary

If multiple inbound connections through a RDR rule cannot be accomodated
due to conflicting src/dst ip/port combination, modulate the src port in
the state entry to allow the connection rather than dropping it.

Co-authored-by: Allan Jude <allanjude@freebsd.org>
Sponsored by: Modirum MDPay
Sponsored by: Klara, Inc.

Test Plan
all tcp 203.0.113.50:8000 (198.51.100.1:7777) <- 198.51.100.50:5454       FIN_WAIT_2:ESTABLISHED
all tcp 203.0.113.50:8000 (198.51.100.2:7777) <- 198.51.100.50:5710 (198.51.100.50:5454)       FIN_WAIT_2:ESTABLISHED
all tcp 203.0.113.50:8000 (198.51.100.3:7777) <- 198.51.100.50:5966 (198.51.100.50:5454)       FIN_WAIT_2:ESTABLISHED

The first connection is RDR'd normally. The second and 3rd have the src port on the 'far' side of the RDR rewritten (with state entries to route it back correctly)

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 56762
Build 53650: arc lint + arc unit

Event Timeline

This also really needs a test case.

sys/netpfil/pf/pf.c
5036

Declarations go at the top of the block.

5055

There's a pf SDT namespace, so that really wants to be in there.

5059

That could roll over into port 0.

It probably also needs a bailout condition to avoid spinning forever if there are no more available ports.
See PR 233867 for a similar bug.

allanjude edited the test plan for this revision. (Show Details)

Add test case

tests/sys/netpfil/pf/rdr.sh
194

Why not just nc?

199

That really needs to be compiled by the makefile, not on every test run.

See tests/sys/net's randsleep for an example.

allanjude marked an inline comment as done.

Address feedback

sys/netpfil/pf/pf.c
5059

Is there a specific threshold that makes sense to give up at? Or just when we would overflow and loop back to port 0?

sys/netpfil/pf/pf.c
5059

I was inclined to say once we've looped though all 65K ports, because then we know we can't possibly handle this state, but that's 65K hashes and lock / unlocks and that's going to be very, very expensive.

So for we may want to limit it much more than that. Given that we only try once now just trying the next say 32 ports may be enough to fix most of the occurrences.

This also suggests two test cases to add: using port 65535, so we have to roll over to 0 (and test that we do not use port 0), as well as a scenario where there are no ports available in whatever range of tries we settle on.

What real-life situation is this fix for? If this is for connections coming from behind a 3rd party SNAT, where SNAT reuses source ports faster than pf expires states, then maybe tuning pf timeouts would be enough. Or we could allow pf states to transition from TCPS_FIN_WAIT_2 back to TCPS_SYN_SENT, basically implementing SO_REUSEPORT for pf.

What real-life situation is this fix for? If this is for connections coming from behind a 3rd party SNAT, where SNAT reuses source ports faster than pf expires states, then maybe tuning pf timeouts would be enough. Or we could allow pf states to transition from TCPS_FIN_WAIT_2 back to TCPS_SYN_SENT, basically implementing SO_REUSEPORT for pf.

It is for behind a load balancer, where the initial connection comes in one public IP #1, but then all future connections come in on public IP #2, but the external source port from the incoming IP stays the same.
So when PF NAT's it to the web server behind the PF RDR rules, the incoming source ports conflict.

markj added inline comments.
sys/netpfil/pf/pf.c
5050

Can we get away with doing this on demand, i.e., try to insert a new state without checking and try with a different port if pf_state_insert() returns EEXIST?

(I'll try to answer my own question, but posing it here in case someone has a quick reason to say "no".)

sys/netpfil/pf/pf.c
5050

I think that ought to work.

There's a log entry that will potentially fire in that case, but we can either fix that or live with it, given that collisions ought to be rare anyway.

It's probably even the best approach, because trying to find a free port here and then running pf_state_insert() is potentially racy.

I've spent some time looking into the underlying problem, and I'm not sure that this approach (quietly rewriting the src port) is correct. The pf.conf manual page says:

In addition to modifying the address, some translation rules may modify
source or destination ports for tcp(4) or udp(4) connections; implicitly
in the case of nat rules and explicitly in the case of rdr rules.  Port
numbers are never translated with a binat rule.

but with this change we are implicitly rewriting the source port. What explicit mechanism is the man page alluding to here?

As an aside, I also found that if a nat rule matches an outbound packet, but we fail to allocate a src port for it (e.g., because static-port is configured and a collision occurs, or pf_get_sport() fails to find a free port), then pf will pass the packet out without performing any translation. That is, we end up passing out unmodified packets from the internal network. This happens in the "NAT proxy port allocation failed" path in pf_get_translation(); I'd really expect pf to drop the packet instead. Is this a bug?

but with this change we are implicitly rewriting the source port. What explicit mechanism is the man page alluding to here?

That'd be things like rdr on $ext_if proto tcp from any to any port 80 -> 127.0.0.1 port 8080, where we redirect incoming connections on port 80 to localhost port 8080.
When we're doing explicit rewrites it's always going to be the destination port, of course.

As an aside, I also found that if a nat rule matches an outbound packet, but we fail to allocate a src port for it (e.g., because static-port is configured and a collision occurs, or pf_get_sport() fails to find a free port), then pf will pass the packet out without performing any translation. That is, we end up passing out unmodified packets from the internal network. This happens in the "NAT proxy port allocation failed" path in pf_get_translation(); I'd really expect pf to drop the packet instead. Is this a bug?

Yeah, I'd say that's a bug. I suspect it mostly stems from confusion between 'there is no translation required' and 'something went wrong', because both return NULL.

I've spent some time looking into the underlying problem, and I'm not sure that this approach (quietly rewriting the src port) is correct. The pf.conf manual page says:

In addition to modifying the address, some translation rules may modify
source or destination ports for tcp(4) or udp(4) connections; implicitly
in the case of nat rules and explicitly in the case of rdr rules.  Port
numbers are never translated with a binat rule.

but with this change we are implicitly rewriting the source port. What explicit mechanism is the man page alluding to here?

There may be a subtlety here, the static-port flag is mostly talking about 'NAT' rules, and the purpose of the static-port flag is to keep the source-port used by internal machines (on the inside) when translated to the public IP (outside).
In the case this patch is addressing, we're modulating the source-port of an inbound connection, and we are only changing it 'inside' our network, because the only other choice is to drop the packet.

In D44488#1031217, @kp wrote:

but with this change we are implicitly rewriting the source port. What explicit mechanism is the man page alluding to here?

That'd be things like rdr on $ext_if proto tcp from any to any port 80 -> 127.0.0.1 port 8080, where we redirect incoming connections on port 80 to localhost port 8080.
When we're doing explicit rewrites it's always going to be the destination port, of course.

Ah, I had read that as meaning that it was possible for rdr rules to rewrite source ports, but upon a second look, it seems not.

There may be a subtlety here, the static-port flag is mostly talking about 'NAT' rules, and the purpose of the static-port flag is to keep the source-port used by internal machines (on the inside) when translated to the public IP (outside).
In the case this patch is addressing, we're modulating the source-port of an inbound connection, and we are only changing it 'inside' our network, because the only other choice is to drop the packet.

Right, I guess what I'm getting at is that the rdr rule syntax probably ought to provide some control over how source port rewriting is done. For instance, one might want to provide a range of source ports to use, or to disable rewriting entirely (ala static-port for nat rules).