Page MenuHomeFreeBSD

pf: Fix source node locking
ClosedPublic

Authored by vegeta_tuxpowered.net on Nov 26 2024, 6:34 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Jan 5, 2:00 PM
Unknown Object (File)
Sun, Jan 5, 9:17 AM
Unknown Object (File)
Sat, Jan 4, 8:22 AM
Unknown Object (File)
Sun, Dec 29, 1:52 PM
Unknown Object (File)
Sat, Dec 28, 2:48 AM
Unknown Object (File)
Fri, Dec 27, 10:40 AM
Unknown Object (File)
Fri, Dec 27, 6:25 AM
Unknown Object (File)
Thu, Dec 19, 2:51 AM

Details

Summary

Source nodes are created quite early in pf_create_state(), even before
the state is allocated, locked and inserted into its hash row. They are
prone to being freed by source node killing or clearing ioctl while
pf_create_state() is still running.

The function pf_map_addr_sn() can be called in two very different paths.

One is for filter rules where it is called from
pf_create_state() after pf_insert_src_node(). In this case it is called
with a given source node and does not perform its own search and must
return the source node.

The other one is for NAT rules where it is called from
pf_get_translation() or its descendants. In this case it is called with
no known source node and performs its own search for source nodes. This
source node is then passed back to pf_create_state() without locking.

The states property of source node is increased in pf_find_src_node()
which allows for the counter to increase when a packet matches the NAT
rule but not a pass keep state rule.

The function pf_map_addr() operates on unlocked source node.

Modify pf_find_src_node() to return locked on source node found, so
that any subsequent operations can access the source node safely.

Move sn->states++ counter increase to pf_insert_src_node() to ensure
that it's called only from pf_create_state() and not from NAT ruleset
path, and have it increased only if the source node has really been
inserted or found, simplifying the cleanup.

Add locking in pf_src_connlimit() and pf_map_addr(). Sprinkle mutex
assertions in pf_map_addr().

Add a function pf_find_src_node_ptr() to check a known source node is
still valid. Use it in pf_create_state() where it's impossible to hold
locks from pf_insert_src_node() because that would cause LoR (nodes
first, then state) against pf_src_connlimit() (state first, then node).

Don't propagate the source node found while parsing the NAT ruleset to
pf_create_state() because it must be found again and locked or created.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

This patch only aims to fix issues with locking, not the other issues found in D39880 (which will require some modifications, as while writing this one, I found some issues with the 2 different paths reaching pf_map_addr_sn(), which D39880 does not take into account). I suppose it could be MFC'd into the currently maintained releases of FreeBSD after more testing.

sys/netpfil/pf/pf_lb.c
259

This part I'm very much unsure of. There no tests for a combination of sticky-address and endpoint-independent NAT rules. I don't understand what this code is supposed to do. Just finding a SN does nothing really, especially that there's a return(0) just a moment later.

kp added inline comments.
sys/netpfil/pf/pf_lb.c
259

That's probably a question for @thj.

sys/netpfil/pf/pf.c
819

It might be worth converting this to a bool.
And perhaps renaming it to 'overload' or 'limited' or something?

867

return (bad);

989

'find_src_node_ptr' is an odd name for a function that expects to be given the source node when it's called.

Maybe I'm not following the logic fully, are we confirming that the source node still exists, just counting the lookup, or something else?

5490

Should we not check the return value here, or do we not care either way?
Also, why specifically here, given that we call pf_find_src_node_ptr() again later on line 5557?

sys/netpfil/pf/pf_lb.c
655

It's probably worth defining an PF_SN_LOCKED() or similar for this. See https://cgit.freebsd.org/src/tree/sys/net/pfvar.h#n359

That'd let us get rid of the #ifdef INVARIANTS

sys/netpfil/pf/pf.c
989

are we confirming that the source node still exists, just counting the lookup, or something else?

Yes, the function only checks if the source node given by pointers to the node and to its hash row still exists and if yes, then it locks it, so that the caller can operate on it. I could rename it to pf_src_node_exists() if that'd make it more clear.

5490

Should we not check the return value here, or do we not care either way?

No, as pf_find_src_node_ptr() sets (*sn) to NULL if the SN is gone.

Also, why specifically here,

That's for pf_map_addr_sn() in path for filter rules. But I see your point, this find+lock should probably be moved to the inside of pf_map_addr_sn().

given that we call pf_find_src_node_ptr() again later on line 5557?

The function pf_map_add_sn() returns unlocked. We can't hold the locks throughout pf_create_state() because lock order must be identical as in when forwarding packets for already existing states when pf_src_connlimit() is called (first state, then source node). Line 5557 is after pf_state_insert() is called locks the state (pf_state_insert() -> pf_state_key_attach() { … PF_STATE_LOCK(s) … }). All source node locks are thus released, state is locked, source nodes are tested if they exist and re-locked. Maybe I could move this block below, after state locking and source node re-locking.

I have some ideas how to address this and reduce the amount of re-locking but I wanted to keep this patch simple, only adding locking where needed without changing the underlying order of operations, not too much at least. As far as I understand Phabricator there is no way to upload multiple commits to a single review and comment on them, so I will upload the further proposed changes as another review. This change here, though, is enough to fix the bugs, even if not in the most optimal manner.

sys/netpfil/pf/pf_lb.c
259

As far as I understand the UDP mapping contains both the port number and the source IP address. So it kind of overrides the normal source tracking. It seems to me that endpoint-independent and sticky-address are incompatible with each other. We could add a check preventing rules with both flags from being parsed.

655

There is PF_SRC_NODE_LOCK_ASSERT(sn) but it performs its own hashing to obtain the hash row using pf_hashsrc() which is static in pf.c. I could make it non-static but I'm not 100% sure it would be properly inlined. My linker knowledge is not deep enough, I could check the results, though.

Okay, mostly cosmetic issues. I'd like to give Tom a chance to comment on the udp endpoint thing too, but we can fix the open remarks while we wait for that.

sys/netpfil/pf/pf.c
989

Yeah, I think that'd help. pf_src_node_validate() or pf_src_node_exists() or something like that. Maybe even pf_src_node_check_lock(), given that one of the points here is to end up with a locked source node.

5490

Okay, I think I follow.

As far as I understand Phabricator there is no way to upload multiple commits to a single review and comment on them

Not in a single review, no. It's possible to link reviews together when dealing with a series of patches, but it's clunky and annoying on both ends.

sys/netpfil/pf/pf_lb.c
259

Yeah, that's probably the way to go.
That can be a separate patch, of course.

259

If we're making pf_hashsrc() callable from other files we can use PF_SRC_NODE_UNLOCK() instead of mtx_unlock((*sn)->lock); too.

I prefer those because they make the lock/unlock calls easier to see and also make it easier to add more invariants to locking calls, and make it easier to experiment with different lock types too.

655

It'll end up not getting in-lined, but that's fine. It'll only be called in the INVARIANTS case, and then we're taking far larger performance hits than just a function call.

We could also change the PF_SRC_NODE_LOCK_ASSERT() to not do the hash row lookup, because we already know that sn->lock == sh->lock, but I'd like to keep that assertion too.

Make pf_hashsrc() available from everywhere. Use PF_SRC_NODE… macros now that pf_hashsrc() is accessible. The OB1 error in pf_insert_src_node() is gone, adjust the tests accordingly. Simplify pf_src_connlimit() logic,

Change limited to bool. Fix one missing PF_SRC_NODE… macro.

Approved.

sys/netpfil/pf/pf.c
841–842

nit: if (! limited) ?

style(9) has an explicit carveout for this for booleans: Do not test without a comparison, or with unary ! (except for booleans).

982

nit: if(n == NULL && ! returnlocked)?

This revision is now accepted and ready to land.Nov 28 2024, 2:15 PM
vegeta_tuxpowered.net marked an inline comment as done.

Use style(9)-compliant boolean tests

This revision now requires review to proceed.Nov 28 2024, 3:44 PM

Still approved :)

I meant to convey that I was happy for you to make these minor changes and commit immediately without further review.

This revision is now accepted and ready to land.Nov 28 2024, 3:48 PM
This revision was automatically updated to reflect the committed changes.