Page MenuHomeFreeBSD

Fix pf divert-to loop
ClosedPublic

Authored by igor.ostapenko_pm.me on Oct 9 2023, 10:28 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Jun 24, 6:16 PM
Unknown Object (File)
May 1 2024, 7:16 AM
Unknown Object (File)
May 1 2024, 7:16 AM
Unknown Object (File)
May 1 2024, 7:16 AM
Unknown Object (File)
Apr 25 2024, 5:00 AM
Unknown Object (File)
Apr 25 2024, 4:55 AM
Unknown Object (File)
Apr 25 2024, 4:45 AM
Unknown Object (File)
Feb 8 2024, 10:29 PM

Details

Summary

I've been working on https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272770 ("divert-to" rule creates packet loops on all FreeBSD 11.0 to 14.0 CURRENT versions).

I've done some research and my current context and conclusions are as follows.

The history I've collected:

  • Originally "divert-to" in OpenBSD was not about divert(4)
  • divert(4) support was added to OpenBSD as "divert-packet <divport>" syntax in 2009
  • there is no "divert-packet" syntax in FreeBSD
  • divert(4) support in FreeBSD was originally built around ipfw with rule numbering concept in mind
  • "divert-to <addr> port <port>" syntax in FreeBSD can be used for divert(4) since 9.x

/* 'divapp' stands for an app who reads from a divert socket and probably writes back to */
/* 'divsrc' and 'divdst' are source or destination sockaddr_in within divert(4) socket connection */

Considering a rule like "pass in proto udp from any to any port 53 divert-to 127.0.0.1 port 3355" the sequence of events is as follows:

  • divapp gets a packet via recvfrom() with divsrc port set to the initial packet direction (PF_IN or PF_OUT, i.e. literally 1 or 2)
  • divapp writes back via sendto() with the same divsrc port as divdst port (it's expected to be opaque value for a divapp)
  • upon write divert(4)'s logic adds ipfw_rule_ref mtag with rulenum = <divdst port> + 1 (what computes to PF_IN+1=2 or PF_OUT+1=3)
  • pf_test() logic marks a diverted packet as PF_MTAG_FLAG_PACKET_LOOPED only if rulenum is 0, but actually it could be 2 or 3 only
  • as a result, the packet written back by divapp is not marked as looped and if its content is not modified and it matches the same pf rule then it is diverted again -- and this creates the infinite loop of the same packet while divapp is running and the pf rule is active

The ipfw_rule_ref.rulenum has no meaning for pf, and it looks like this "&& rulenum == 0" condition was always there. So far, I have not found a point in the src history where this condition has some meaning in correspondence to divert(4) logic. I saw oldschool divert(4) with mtag.cookie and the current one with mtag.rulenum = port+1, and both do not yield rulenum=0.

A question comes why it was not spotted for many years if it's an obvious defect. Probably it's due to divert(4) historically was used via ipfw and if pf started to be used for divert(4) then probably packets are usually altered and do not match the same rule to be re-diverted. But now it seems we have pf+divert(4) users which expect ipfw like behaviour of re-injection of unalterted packet which does not repeat diversion.

OpenBSD's "divert-packet" logic is very simple, their divert(4)'s sendto() marks a packet with PF_TAG_DIVERTED_PACKET flag and pf simply does PF_PASS if such flag is set, without any rule consideration. Exactly as their man 4 divert says. I guess we could leave this logic for future as-is implementation of the same "divert-packet" syntax in FreeBSD pf, if it's wanted. And for now FreeBSD pf could go ahead with the current idea that a diverted packet from divapp goes through the ruleset (if no state is used, otherwise it gets PF_PASS'ed) but is not diverted again if it matches any divert-to rule.

This diff matches the change proposed by the author of the related PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=260867. Let's credit.

I would like to work on respective "tests/sys/netpfil/pf" tests after it's agreed on the approach.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

I think the rulenum started out in ipfw, where it means 'continue evaluating rules from this rule number'. Hence the increment, which caused it to skip the divert-to rule.
That's not really a thing in pf. Instead we'll want the 'divert-to' to translate to a 'pass quick', which we also accomplish with this patch.

We set write the direction 'dir' to the rulenum when we decide to divert, so I wonder if the intention is instead to ensure that we divert once in PF_IN, and potentially again in 'PF_OUT'. That might make sense for a setup where we're forwarding (and this will do PF_IN first, and then re-evaluate the rules for PF_OUT).
In that case we'd not want to remove the rr->rulenum == 0 check, but change it to rr->rulenum != dir (assuming that the rulenum gets incremented by divert(4) or userspace, I'm not very familiar with divert(4)).

It'd be really useful to have a test case or two for this (I'm thinking one where we divert locally terminated traffic, and one where we divert forwarded traffic) so we can confirm that we understand what's happening (and prevent future regressions).

In D42142#961485, @kp wrote:

I think the rulenum started out in ipfw, where it means 'continue evaluating rules from this rule number'. Hence the increment, which caused it to skip the divert-to rule.

Yes, ipfw is more flexible and allows to re-enter at the next rule number, e.g. this is how ipfw nat was done years ago, before the kernel nat. Prehistoric times...

We set write the direction 'dir' to the rulenum when we decide to divert, so I wonder if the intention is instead to ensure that we divert once in PF_IN, and potentially again in 'PF_OUT'. That might make sense for a setup where we're forwarding (and this will do PF_IN first, and then re-evaluate the rules for PF_OUT).
In that case we'd not want to remove the rr->rulenum == 0 check, but change it to rr->rulenum != dir (assuming that the rulenum gets incremented by divert(4) or userspace, I'm not very familiar with divert(4)).

Actually, I was thinking about the same. But it looked like additional complexity and I decided to check some existing production-ready solution, i.e. OpenBSD pf, where I found very straightforward solution, but without support of such forwarding setups. And I was thinking that probably such additional complexity is not needed, anyway FreeBSD can use ipfw for almost anything.

It'd be really useful to have a test case or two for this (I'm thinking one where we divert locally terminated traffic, and one where we divert forwarded traffic) so we can confirm that we understand what's happening (and prevent future regressions).

Yes, it looks like we could make FreeBSD pf's "divert-to" support such forwarding cases, as long as "divert-to" syntax meaning is already different between FreeBSD and OpenBSD (and the alignment could be done with addition of "divert-packet" syntax). Okay, I will work on respective tests for the (rr->rulenum-1) != dir variant.

Added the first draft of the respective tests over the agreed fix.

Two of the new tests seem to fail on my test VM:

divert-to:in_div -> passed [16.923s]
divert-to:in_div_in -> passed [7.643s]
divert-to:in_div_in_fwd_out_div_out -> failed: Test case body returned a non-ok exit code, but this is not allowed [10.114s]
divert-to:out_div -> passed [15.990s]
divert-to:out_div_out -> failed: atf-check failed; see the output of the test for details [17.350s]

sys/netpfil/pf/pf.c
8043

I'm not clear on why we need this.

(Also, we could probably just skip the PACKET_LOOPED check, because the effect would be the same.)

In D42142#962199, @kp wrote:

Two of the new tests seem to fail on my test VM:

divert-to:in_div -> passed [16.923s]
divert-to:in_div_in -> passed [7.643s]
divert-to:in_div_in_fwd_out_div_out -> failed: Test case body returned a non-ok exit code, but this is not allowed [10.114s]
divert-to:out_div -> passed [15.990s]
divert-to:out_div_out -> failed: atf-check failed; see the output of the test for details [17.350s]

Could you please provide the details, e.g. kyua report --verbose for those two failed ones.

Could you please provide the details, e.g. kyua report --verbose for those two failed ones.

===> Execution context
Current directory: /usr/tests/sys/netpfil/pf
Environment variables:
    HOME=/root
    LANG=C
    LC_CTYPE=en_US.UTF-8
    LC_PAPER=en_GB.UTF-8
    LOGNAME=root
    MAIL=/var/mail/root
    PATH=/home/kp/bin:/usr/local/sbin:/sbin:/usr/local/bin:/bin:/usr/bin:/usr/sbin:/home/kp/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/games:/usr/local/bin:/usr/local/sbin:/usr/pkg/bin:/usr/pkg/sbin:/Library/TeX/texbin
    SHELL=/bin/sh
    SSH_AUTH_SOCK=/home/kp/.ssh/ssh-auth-sock.
    SUDO_COMMAND=/usr/bin/kyua test divert-to
    SUDO_GID=1001
    SUDO_UID=1001
    SUDO_USER=kp
    TERM=xterm-256color
    USER=root
===> divert-to:in_div_in_fwd_out_div_out
Result:     failed: atf-check failed; see the output of the test for details
Start time: 2023-10-12T14:15:28.053188Z
End time:   2023-10-12T14:15:45.784632Z
Duration:   17.731s

Metadata:
    allowed_architectures is empty
    allowed_platforms is empty
    description = Test inbound > diverted > inbound > forwarded > outbound > diverted > outbound | network terminated
    has_cleanup = true
    is_exclusive = false
    required_configs is empty
    required_disk_space = 0
    required_files is empty
    required_memory = 0
    required_programs is empty
    required_user = root
    timeout = 300

Standard output:
net.inet.ip.forwarding: 0 -> 1
add net default: gateway 198.51.100.1
add net 198.51.100.0: gateway 192.0.2.2 fib 0
Executing command [ ping -c3 192.0.2.2 ]
Executing command [ ping -c3 198.51.100.2 ]
Executing command [ ping -c1 198.51.100.2 ]
router: removed
site: removed

Standard error:
pf enabled
Ethernet rules cleared
rules cleared
nat cleared
0 tables deleted.
0 states cleared
source tracking entries cleared
pf: statistics cleared
pf: interface flags reset
divapp: 2002: npkt=10.
Fail: incorrect exit status: 2, expected: 0
stdout:
PING 198.51.100.2 (198.51.100.2): 56 data bytes

--- 198.51.100.2 ping statistics ---
1 packets transmitted, 0 packets received, 100.0% packet loss

stderr:

Files left in work directory after failure: created_interfaces.lst, created_jails.lst
ifconfig: interface epair1b does not exist
ifconfig: interface epair0a does not exist
===> divert-to:out_div_out
Result:     failed: atf-check failed; see the output of the test for details
Start time: 2023-10-12T14:16:09.877662Z
End time:   2023-10-12T14:16:26.278785Z
Duration:   16.401s

Metadata:
    allowed_architectures is empty
    allowed_platforms is empty
    description = Test outbound > diverted > outbound | network terminated
    has_cleanup = true
    is_exclusive = false
    required_configs is empty
    required_disk_space = 0
    required_files is empty
    required_memory = 0
    required_programs is empty
    required_user = root
    timeout = 300

Standard output:
Executing command [ ping -c3 192.0.2.2 ]
Executing command [ ping -c1 192.0.2.2 ]
div: removed

Standard error:
pf enabled
Ethernet rules cleared
rules cleared
nat cleared
0 tables deleted.
0 states cleared
source tracking entries cleared
pf: statistics cleared
pf: interface flags reset
divapp: 2000: npkt=10.
Fail: incorrect exit status: 2, expected: 0
stdout:
PING 192.0.2.2 (192.0.2.2): 56 data bytes

--- 192.0.2.2 ping statistics ---
1 packets transmitted, 0 packets received, 100.0% packet loss

stderr:

Files left in work directory after failure: created_interfaces.lst, created_jails.lst
ifconfig: interface epair0a does not exist
===> Failed tests
divert-to:in_div_in_fwd_out_div_out  ->  failed: atf-check failed; see the output of the test for details  [17.731s]
divert-to:out_div_out  ->  failed: atf-check failed; see the output of the test for details  [16.401s]
===> Summary
Results read from /root/.kyua/store/results.usr_tests_sys_netpfil_pf.20231012-141459-276665.db
Test cases: 5 total, 0 skipped, 0 expected failures, 0 broken, 2 failed
Start time: 2023-10-12T14:14:59.747769Z
End time:   2023-10-12T14:16:26.278785Z
Total time: 74.947s
In D42142#962202, @kp wrote:

===> Execution context
...

Thank you. Yes, it loops in those two outbound cases -- npkt=10 is reported by divapp.
Hm... I think I need to reproduce exactly your case. What is the commit the patch was applied to? And for which arch? // To be honest I have not tested amd64 yet, I guess I will do now.

In D42142#962202, @kp wrote:

===> Execution context
...

Thank you. Yes, it loops in those two outbound cases -- npkt=10 is reported by divapp.
Hm... I think I need to reproduce exactly your case. What is the commit the patch was applied to? And for which arch? // To be honest I have not tested amd64 yet, I guess I will do now.

That was on amd64, yes. I don't usually see platform-specific issues in pf though.
The version was this patch on top of:

commit 257405d707d77bc55b38e7c2bb83b8a9247a86ae (HEAD -> commit)
Author: Emmanuel Vadot <manu@FreeBSD.org>
Date:   Thu Oct 12 09:32:32 2023 +0200

    xilinx: reset: Remove debug printfs

    Sponsored by:   Beckhoff Automation GmbH & Co. KG
In D42142#962263, @kp wrote:

That was on amd64, yes. I don't usually see platform-specific issues in pf though.
The version was this patch on top of:

commit 257405d707d77bc55b38e7c2bb83b8a9247a86ae (HEAD -> commit)
Author: Emmanuel Vadot <manu@FreeBSD.org>
Date:   Thu Oct 12 09:32:32 2023 +0200 
    xilinx: reset: Remove debug printfs

I've tried the same -- all passed. Full build world+kernel. Interesting... Probably we could have a chat when you have time, e.g. in Slack, to align my test env with yours (kldstat, etc).

Short summary of the changes:

  • resolved conflict between ipfw and pf if both are used and pf wants to do divert(4)
  • pf tests are split onto two sets: with ipfw enabled and disabled

The reasoning:

  • divert(4) is designed around ipfw via MTAG_IPFW_RULE mtag usage, and it's like meant to be used by other firewalls too
  • but ipfw does not share MTAG_IPFW_RULE mtag with others in the chain -- it's deleted if present
  • it makes some divert(4) cases non-working for pf if ipfw is enabled, even if ipfw simply allows everything it consumes the mtag before pf has a chance to see it
  • three options were considered to resolve it:
    1. Make ipfw do not delete MTAG_IPFW_RULE mtag, so it can be spotted by pf
    2. Make divert(4) add some bits to mbuf pkthdr structure
    3. Make divert(4) add an extra pf specific mtag in addition to ipfw specific one
  • the option3 looks promising and is presented by this patch
This revision was not accepted when it landed; it landed in state Needs Review.Oct 19 2023, 12:37 PM
Closed by commit rGfabf705f4b5a: pf: fix pf divert-to loop (authored by igor.ostapenko_pm.me, committed by kp). · Explain Why
This revision was automatically updated to reflect the committed changes.

Thanks for your work on this.