Page MenuHomeFreeBSD

pfsync: Transport over IPv6 Unicast support
ClosedPublic

Authored by email_luiz.eng.br on May 15 2023, 6:22 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Apr 17, 6:14 PM
Unknown Object (File)
Wed, Apr 17, 1:00 PM
Unknown Object (File)
Wed, Apr 17, 12:56 PM
Unknown Object (File)
Wed, Apr 17, 11:20 AM
Unknown Object (File)
Wed, Apr 17, 1:27 AM
Unknown Object (File)
Wed, Apr 10, 1:58 PM
Unknown Object (File)
Mar 12 2024, 4:57 AM
Unknown Object (File)
Mar 9 2024, 7:40 PM

Details

Reviewers
kp
Group Reviewers
network
Commits
rG6fc7fc2dbb2b: pfsync: transport over IPv6
Summary

This is the code that implements the transport of IPv6 packets over unicast IPv6.

I left a few XXX comments on places where I was unsure on how to handle things. Two things are still missing: updating the documentation and extracting the common part of pfsync(6)_input into a separate function to remove code duplication.

Sponsored by: InnoGames GmbH

Test Plan

Setup two hosts with IPv6 configured and configure pfsync between them. Make sure to set a peer address by means of a "ifconfig pfsync0 syncpeer <ipv6>".

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Thanks for submitting it! That's quite a nice addition. Any thoughts on supporting IPv6 link-locals? (Maybe they work out-of-the box, but often it's not the case).

sys/netpfil/pf/if_pfsync.c
2781

I'd use interface fib here instead of RT_DEFAULT_FIB.

Thanks for submitting it! That's quite a nice addition. Any thoughts on supporting IPv6 link-locals? (Maybe they work out-of-the box, but often it's not the case).

It didn't cross my mind, as on our use-case we have GUA addresses on the sync interfaces. I did a quick test and it didn't work out of the box. I think I can make it work.

Added support for IPv6 Link-local addresses
Use interface FIB when looking up IPv6 source address

  • add pfsync ipv6 unicast support (luiz changes)
  • add pfsync ipv6 multicast support
  • enable usage of all multicast addresses as peersync
sys/netinet6/in6.h
209 ↗(On Diff #122650)

I'd rather put it to the pfsync-specific header.

sys/netpfil/pf/if_pfsync.c
2723

Nit: I'd suggest declaring &(struct sockaddr_in6 *)&sc->sc_sync_peer as a separate value to improve readability

I'll have more substantive feedback soon, but there are a number of (mostly minor) style(9) issues, like incorrect indentation and line length.

It also needs to be rebased, because 4bf98559d9d6fa7c3571d26ed6f2b18823e3a30b causes a few conflicts.

sbin/ifconfig/ifpfsync.c
237

That's supposed to be indented to the same tab level as 'struct' plus four spaces.

387

There's unexpected extra indentation here.

sys/netpfil/pf/if_pfsync.c
2851

We've got some line length issues here (and in other places).

It may be a result of me mismerging something, but both of the new test cases fail.

There also seems to be an issue with the locking:

uma_zalloc_debug: zone "malloc-64" with the following non-sleepable locks held:
exclusive sleep mutex pfsync (pfsync) r = 0 (0xfffff80493687f10) locked @ /usr/src/sys/netpfil/pf/if_pfsync.c:2853
stack backtrace:
#0 0xffffffff80bc4945 at witness_debugger+0x65
#1 0xffffffff80bc5a99 at witness_warn+0x3f9
#2 0xffffffff80edc0d4 at uma_zalloc_debug+0x34
#3 0xffffffff80edbbe8 at uma_zalloc_arg+0x28
#4 0xffffffff80b25cff at malloc+0x7f
#5 0xffffffff80d1ca3f at ip_mfilter_alloc+0x1f
#6 0xffffffff82fa8dcb at pfsync_kstatus_to_softc+0x5bb
#7 0xffffffff82fa8109 at pfsyncioctl+0x7a9
#8 0xffffffff80c8260e at ifioctl+0x96e
#9 0xffffffff80bca1ce at kern_ioctl+0x1fe
#10 0xffffffff80bc9f64 at sys_ioctl+0x154
#11 0xffffffff8104d8e0 at amd64_syscall+0x140
#12 0xffffffff8102076b at fast_syscall_common+0xf8

Brought the branch up to date and cleaned up some of the styling. All the tests should pass again now, and the malloc-while-non-sleepable-lock should be gone. There is a "lock order reversal" warning that shows up because we have to call in_joingroup or in6_joingroup while holding a non-sleepable lock, and those functions themselves hold a sleepable lock to work. This regression does not seem to be introduced by this patch though, since I get the same warnings from carp, which I did not change.

  • address reviewer feedback
  • Merge branch 'main' into arcpatch-D40102
  • unlocking sc in the middle of multicast setup/cleanup is not a good idea

The new IPv6 tests still fail for me:

(kp@freebsd_current_zfs)  /usr/tests/sys/netpfil % sudo kyua debug pf/pfsync:basic_ipv6                                                                                             [8:52]
one: removed
two: removed
pf enabled
Ethernet rules cleared
rules cleared
nat cleared
0 tables deleted.
0 states cleared
source tracking entries cleared
pf: statistics cleared
pf: interface flags reset
pf enabled
Ethernet rules cleared
rules cleared
nat cleared
0 tables deleted.
0 states cleared
source tracking entries cleared
pf: statistics cleared
pf: interface flags reset
ping6: bind: Can't assign requested address
Files left in work directory after failure: created_interfaces.lst, created_jails.lst
ifconfig: interface epair0b does not exist
ifconfig: interface epair1b does not exist
ifconfig: interface epair2b does not exist
pf/pfsync:basic_ipv6  ->  failed: state not found on synced host

ping6: bind: Can't assign requested address is probably the important clue there, but I've not done any debugging yet.

Honestly, I'm not quite sure what the problem is on your end -- I am unable to replicate this failure. The only working theory I have for this is that fc2b::f0 is somehow not being assigned to ${epair_one}b as it should be. Unfortunately it seems that if ifconfig is not able to bring an interface up with the specified IP address, it will still bring it up without that IP address, and give no indication of failure on stderr or the exit code. So it's not possible for me to check _why_ your machine is unable to assign that IP address to the epair, other than perhaps it's already in use?

Regardless, I realized while researching this that I'm using the wrong prefix for IPv6 ULAs -- fc00/8 is unassigned, fd00/8 is what I should have used. I'm uploading a patch for that -- hopefully that might resolve any IP clashes on your end for now, though I think it's worth investigating why ifconfig is not able to give you that IP.

  • use the correct IPv6 ULA prefix

Sorry about all the pings! I diffed from an incorrect main branch and pulled in a whole bunch of "changes" that weren't actually there. This patch should fix that.

I've found at least part of the reason why the tests are broken, but they're still failing here.
Logging the states on both seems to suggest that there is no sync taking place. ifconfig in the jails also suggests that:

pfsync0: flags=1000041<UP,RUNNING,LOWER_UP> metric 0 mtu 1500
	options=0
	syncdev: epair0b syncpeer: ff12::f0%epair0b maxupd: 1 defer: off version: 1400
	syncok: 0
	groups: pfsync
tests/sys/netpfil/pf/pfsync.sh
724

no_dad

749

That needs 'no_dad', otherwise the address is 'tentative' until the duplicate address detection completes, which it won't before we try (and fail) to run the ping here.

811

no_dad

netstat -s -p pfsync in the jails shows 7 send errors, so presumably that's related to the problem.

dtrace -n 'fbt::ip6_output:return { printf("%#x %#x", arg0, arg1); }' shows (among others):
0 57184 ip6_output:return 0x40f 0xd. Error 0xd is EACCESS. Hopefully that clue is enough for you to debug this.

Sorry for the delay - disabled DAD and updated branch to newest main again. I'm still not able to replicate the failure, so it's not clear to me whether there is more to solve after disabling DAD. The tools you showed are helpful but I can't use them unless I can replicate the failure.

  • Merge branch 'main' into arcpatch-D40102
  • disable ipv6 dad

The tests still fails:

pfsync:basic_ipv6  ->  failed: state not found on synced host
pfsync:basic_ipv6_unicast  ->  failed: state not found on synced host

Or at least, it does when ipfw is loaded (as will be the case in the CI system), even though net.inet.ip.fw.default_to_accept is set to 1. That means there's either a bug in ipfw, or there's something wrong with the IPv6 packets pfsync generates.

Okay, so it's not quite a bug in ipfw, but it is annoying unexpected behaviour.
We're hitting the "Unknown Extension header" case in ipfw_chk(), and apparently V_fw_deny_unknown_exthdrs (net.inet6.ip6.fw.deny_unknown_exthdrs) defaults to 1.

So either we tweak that sysctl in CI, or we teach ipfw that pfsync is a thing that exists:

@@ -1718,13 +1727,19 @@ do {                                                            \
                                PULLUP_TO(hlen, ulp, struct ip);
                                break;

+                       case IPPROTO_PFSYNC:
+                               PULLUP_TO(hlen, ulp, struct pfsync_header);
+                               break;
+
                        default:

Full patch in D40973

Ah, I didn't know that CI ran ipfw - thanks for pointing that out. Was able to replicate the failure finally, and test that D40973 fixes it. Since that's merged already, I just updated the base revision here.

  • Merge branch 'main' into arcpatch-D40102
This revision was not accepted when it landed; it landed in state Needs Review.Jul 13 2023, 9:03 AM
Closed by commit rG6fc7fc2dbb2b: pfsync: transport over IPv6 (authored by email_luiz.eng.br, committed by kp). · Explain Why
This revision was automatically updated to reflect the committed changes.