For every state pf creates up to two source nodes: a limiting one struct pf_kstate -> src_node and a NAT one struct pf_kstate -> nat_src_node. The limiting source node is tracking information needed for limits using max-src-states and max-src-nodes and the NAT source node is tracking redirection decisions.
On closer inspection two issues emerge:
- For route-to rules the redirection decision is stored in the limiting source node. Thus sticky-address and source limiting can't be used separately.
- Global source tracking, as promised in the man page, is totally absent from the code. Pfctl is capable of setting flags PFRULE_SRCTRACK (enable source tracking) and PFRULE_RULESRCTRACK (make source tracking per rule). The kernel code checks PFRULE_SRCTRACK but ignores PFRULE_RULESRCTRACK. That would make all source tracking always global but the code is written in a way that makes source tracking work per-rule only.
This patch is based on OpenBSD approach where source nodes have a type and each state has a list of source nodes instead of just two pointers, because in OpenBSD it is possible to combine more source nodes per rule than just two e.g. limit, nat and route-to. As I'm working on porting full OpenBSD syntax, this change will be needed anyway.
The approach to make global tracking work is to attach global source nodes to the default rule. This functionality seems broken in OpenBSD too. I've checked their code history, there was a global flag to pf_insert_src_node() at some point but it was removed.
It looks like by pure chance this patch fixes stateless route-to rules. I was unable to make … route-to … no state rule to forward traffic on FreeBSD 13.1 while with this patches it is possible. However this triggers another error but I'm unsure how to fix it yet, so I'm uploading this patch as it is marked as "draft" because I could use some comments. The issue is that source nodes are always created with sn->states=1 and each time a source node is searched for, the counter is increased, as it is expected that only stateful tracking would use source nodes. This leads to some problems. The first, obvious one, is that for stateless tracking each forwarded packet will cause the counter to be increased. The second one is that if with source tracking state insertion fails (either due to memory issues or state limits) the counters has to now be decreased again, which seems more of a dirty hack than a proper solution.
Unfortunately this issue can't easily be addressed. Since source nodes are created with no expiry timeout, the only thing which prevents them from being removed immediately after creation by pf_purge_expired_src_nodes() is the linked state counter. Thus the counter must start at 1.
An alternative would be to create source nodes with expiry in future. We could handle the counter in a sane way then. But the default source node lifetime (PFTM_SRC_NODE_VAL) is 0. We would need to change that too, and that would change the default behaviour quite much: a failed state insertion might result in the freshly created source nodes being immediately expirable (with lifetime 0 and if the node was created by this state). Ideas and suggestions are welcome.
The patch also does a few improvements regarding locking:
- Each source node carries a pointer to its hash row mutex, so that we don't need to calculate the hash so many times.
- Mutex assertion test in a few places have been added.
- pf_create_state() always operates on locked source nodes.
- Instead of creating empty source nodes and then searching for them again in pf_map_addr(), source nodes are always created only once and for binding them with the state a simpler search is performed, since the hash row and the pointer to the source node are known. pf_map_addr() now holds the lock over the source node. Maybe if we employed a different way of source node purging this 2nd search would not be needed at all? Something similar to what is done for rules. Expired rules are not freed immediately but unlinked from the active ruleset and only freed once all references are gone (sure, source nodes can be unlinked too, but they are then not stored anywhere and must be deallocated immediately). If a similar thing was done for source nodes then when a source node was removed (expired or killed via DIOCCLEARSRCNODES) during state creation we could maybe still happily link to it as it would not have been yet deallocated.