Details

Reviewers

Commits

rGb9c0321d54e9: pf: Fix source node locking

Summary

Source nodes are created quite early in pf_create_state(), even before
the state is allocated, locked and inserted into its hash row. They are
prone to being freed by source node killing or clearing ioctl while
pf_create_state() is still running.

The function pf_map_addr_sn() can be called in two very different paths.

One is for filter rules where it is called from
pf_create_state() after pf_insert_src_node(). In this case it is called
with a given source node and does not perform its own search and must
return the source node.

The other one is for NAT rules where it is called from
pf_get_translation() or its descendants. In this case it is called with
no known source node and performs its own search for source nodes. This
source node is then passed back to pf_create_state() without locking.

The states property of source node is increased in pf_find_src_node()
which allows for the counter to increase when a packet matches the NAT
rule but not a pass keep state rule.

The function pf_map_addr() operates on unlocked source node.

Modify pf_find_src_node() to return locked on source node found, so
that any subsequent operations can access the source node safely.

Move sn->states++ counter increase to pf_insert_src_node() to ensure
that it's called only from pf_create_state() and not from NAT ruleset
path, and have it increased only if the source node has really been
inserted or found, simplifying the cleanup.

Add locking in pf_src_connlimit() and pf_map_addr(). Sprinkle mutex
assertions in pf_map_addr().

Add a function pf_find_src_node_ptr() to check a known source node is
still valid. Use it in pf_create_state() where it's impossible to hold
locks from pf_insert_src_node() because that would cause LoR (nodes
first, then state) against pf_src_connlimit() (state first, then node).

Don't propagate the source node found while parsing the NAT ruleset to
pf_create_state() because it must be found again and locked or created.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

vegeta_tuxpowered.net created this revision.Nov 26 2024, 6:34 PM

Herald added subscribers: glebius, melifaro, farrokhi and 2 others. · View Herald TranscriptNov 26 2024, 6:34 PM

vegeta_tuxpowered.net requested review of this revision.Nov 26 2024, 6:34 PM

This patch only aims to fix issues with locking, not the other issues found in D39880 (which will require some modifications, as while writing this one, I found some issues with the 2 different paths reaching pf_map_addr_sn(), which D39880 does not take into account). I suppose it could be MFC'd into the currently maintained releases of FreeBSD after more testing.

sys/netpfil/pf/pf_lb.c
259	This part I'm very much unsure of. There no tests for a combination of `sticky-address` and `endpoint-independent` NAT rules. I don't understand what this code is supposed to do. Just finding a SN does nothing really, especially that there's a `return(0)` just a moment later.

kp added a subscriber: thj.Nov 26 2024, 7:10 PM

kp added inline comments.

sys/netpfil/pf/pf_lb.c
259	That's probably a question for @thj.

kp added inline comments.Nov 27 2024, 10:55 AM

sys/netpfil/pf/pf.c
819	It might be worth converting this to a bool. And perhaps renaming it to 'overload' or 'limited' or something?
867	return (bad);
989	'find_src_node_ptr' is an odd name for a function that expects to be given the source node when it's called. Maybe I'm not following the logic fully, are we confirming that the source node still exists, just counting the lookup, or something else?
5490	Should we not check the return value here, or do we not care either way? Also, why specifically here, given that we call pf_find_src_node_ptr() again later on line 5557?
sys/netpfil/pf/pf_lb.c
655	It's probably worth defining an PF_SN_LOCKED() or similar for this. See https://cgit.freebsd.org/src/tree/sys/net/pfvar.h#n359 That'd let us get rid of the `#ifdef INVARIANTS`

vegeta_tuxpowered.net added inline comments.Nov 27 2024, 12:23 PM

sys/netpfil/pf/pf.c
989	are we confirming that the source node still exists, just counting the lookup, or something else? Yes, the function only checks if the source node given by pointers to the node and to its hash row still exists and if yes, then it locks it, so that the caller can operate on it. I could rename it to `pf_src_node_exists()` if that'd make it more clear.
5490	Should we not check the return value here, or do we not care either way? No, as `pf_find_src_node_ptr()` sets `(*sn)` to `NULL` if the SN is gone. Also, why specifically here, That's for `pf_map_addr_sn()` in path for filter rules. But I see your point, this find+lock should probably be moved to the inside of `pf_map_addr_sn()`. given that we call pf_find_src_node_ptr() again later on line 5557? The function `pf_map_add_sn()` returns unlocked. We can't hold the locks throughout `pf_create_state()` because lock order must be identical as in when forwarding packets for already existing states when `pf_src_connlimit()` is called (first state, then source node). Line 5557 is after `pf_state_insert()` is called locks the state (`pf_state_insert()` -> `pf_state_key_attach() { … PF_STATE_LOCK(s) … }`). All source node locks are thus released, state is locked, source nodes are tested if they exist and re-locked. Maybe I could move this block below, after state locking and source node re-locking. I have some ideas how to address this and reduce the amount of re-locking but I wanted to keep this patch simple, only adding locking where needed without changing the underlying order of operations, not too much at least. As far as I understand Phabricator there is no way to upload multiple commits to a single review and comment on them, so I will upload the further proposed changes as another review. This change here, though, is enough to fix the bugs, even if not in the most optimal manner.
sys/netpfil/pf/pf_lb.c
259	As far as I understand the UDP mapping contains both the port number and the source IP address. So it kind of overrides the normal source tracking. It seems to me that `endpoint-independent` and `sticky-address` are incompatible with each other. We could add a check preventing rules with both flags from being parsed.
655	There is `PF_SRC_NODE_LOCK_ASSERT(sn)` but it performs its own hashing to obtain the hash row using `pf_hashsrc()` which is `static` in `pf.c`. I could make it non-`static` but I'm not 100% sure it would be properly inlined. My linker knowledge is not deep enough, I could check the results, though.

Okay, mostly cosmetic issues. I'd like to give Tom a chance to comment on the udp endpoint thing too, but we can fix the open remarks while we wait for that.

sys/netpfil/pf/pf.c
989	Yeah, I think that'd help. `pf_src_node_validate()` or `pf_src_node_exists()` or something like that. Maybe even `pf_src_node_check_lock()`, given that one of the points here is to end up with a locked source node.
5490	Okay, I think I follow. As far as I understand Phabricator there is no way to upload multiple commits to a single review and comment on them Not in a single review, no. It's possible to link reviews together when dealing with a series of patches, but it's clunky and annoying on both ends.
sys/netpfil/pf/pf_lb.c
259	Yeah, that's probably the way to go. That can be a separate patch, of course.
259	If we're making pf_hashsrc() callable from other files we can use PF_SRC_NODE_UNLOCK() instead of `mtx_unlock((*sn)->lock);` too. I prefer those because they make the lock/unlock calls easier to see and also make it easier to add more invariants to locking calls, and make it easier to experiment with different lock types too.
655	It'll end up not getting in-lined, but that's fine. It'll only be called in the INVARIANTS case, and then we're taking far larger performance hits than just a function call. We could also change the PF_SRC_NODE_LOCK_ASSERT() to not do the hash row lookup, because we already know that sn->lock == sh->lock, but I'd like to keep that assertion too.

Make pf_hashsrc() available from everywhere. Use PF_SRC_NODE… macros now that pf_hashsrc() is accessible. The OB1 error in pf_insert_src_node() is gone, adjust the tests accordingly. Simplify pf_src_connlimit() logic,

vegeta_tuxpowered.net marked 4 inline comments as done.Nov 28 2024, 11:45 AM

Change limited to bool. Fix one missing PF_SRC_NODE… macro.

Approved.

sys/netpfil/pf/pf.c
841–842	nit: `if (! limited)` ? style(9) has an explicit carveout for this for booleans: `Do not test without a comparison, or with unary ! (except for booleans).`
982	nit: `if(n == NULL && ! returnlocked)`?

This revision is now accepted and ready to land.Nov 28 2024, 2:15 PM

Use style(9)-compliant boolean tests

This revision now requires review to proceed.Nov 28 2024, 3:44 PM

Still approved :)

I meant to convey that I was happy for you to make these minor changes and commit immediately without further review.

This revision is now accepted and ready to land.Nov 28 2024, 3:48 PM

kp mentioned this in D47827: Draft: pf: Move route-to information to pf_rule_actions.Nov 28 2024, 4:24 PM

Closed by commit rGb9c0321d54e9: pf: Fix source node locking (authored by vegeta_tuxpowered.net). · Explain WhyNov 28 2024, 5:34 PM

This revision was automatically updated to reflect the committed changes.

vegeta_tuxpowered.net added a commit: rGb9c0321d54e9: pf: Fix source node locking.

pf: Fix source node locking
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 147129

sys/net/pfvar.h

sys/netpfil/pf/pf.c

sys/netpfil/pf/pf_lb.c

tests/sys/netpfil/pf/src_track.sh

pf: Fix source node lockingClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 147129

sys/net/pfvar.h

sys/netpfil/pf/pf.c

sys/netpfil/pf/pf_lb.c

tests/sys/netpfil/pf/src_track.sh

pf: Fix source node locking
ClosedPublic
Actions

Revision Contents
Changeset List