pf circumvents the pfil chain on routing decisions while ipfw uses PACKET_TAG_IPFOWARD to let netinet/netinet6 code know about the change. In order for pf to do the same, we ipfw usage of the tag needs to be tightened by flipping M_FASTFWD_OURS and always looking for the tag in the first place.
I've only adhered to the Phabricator wiki guideline "Please make sure that your changeset does one thing (and one thing only) so that the review process goes smoothly. Small and self-contained changes are much easier to review!".
There is another atomic change that is required before attempting to fix the pf(4) code. This latter one and the actual pf(4) code will need more review so I needed to get this subtle change out of the way.
If this is acceptable, I will add the review for making the contents of PACKET_TAG_IPFORWARD a struct for easier wrapping.
The actual work can be found here and is by no means complete... https://github.com/opnsense/src/commits/pf_route
I looked at your patches and I don't like.
Historically in the IP processing already present special way to change routing decision from the firewall. Your patches introduce another way, that is PF-specific. PF already used its own IP routing lookup and reassembling implementations that mostly duplicate provided by system. Having own implementation in each packet filter leads to many bugs that very hard to detect. Also you still use old route API that is very slow and incapable for concurrent processing.
Instead I prefer to teach PF to use M_FASTFWD_OURS, M_IP_NEXTHOP and M_SKIP_FIREWALL mbuf flags and just reuse existing PACKET_TAG_IPFORWARD format and leave route lookup for the system's responsibility.
I'm not sure I understand.
M_IP_NEXTHOP is being implemented like ipfw(4) uses it, with the exception of pf(4)'s route-to, dup-to and reply-to requiring to forcefully reset the network stack's routing decision due to the nature of the feature, that's why there's a need for a wrapper struct of PACKET_TAG_IPFORWARD's content to transport the outgoing interface as well.
That's likely why pf(4) uses the shortcut of copying from ip_output() to begin with. And that's not good, because it completely circumvents the pfil hook processing when ipfw(4) + pf(4) are used together.
The code is currently based on 10.3, that's likely why the API is older. It's going to to be rebased on 12-CURRENT when the ipfw(4) parts are done, which I want to focus on right now to make them spotless. :)
Maybe to elaborate just a bit more:
pf(4)'s route-to, dup-to and reply-to should eventually use PACKET_TAG_IPFORWARD, which currently requires ipfw(4) to be more cooperative.
One decision is essentially overwritten by the other as we don't want multiple PACKET_TAG_IPFORWARD tags. That's the first chunk of this diff. The caveat of the previous code was the assumption that ipfw(4) is the only user of PACKET_TAG_IPFORWARD, but that will no longer be the case. The former code would start duplicating PACKET_TAG_IPFORWARD.
The second change is that if a routing decision can be overwritten, the M_FASTFWD_OURS tag must be cleared to correctly disable it when overwriting a previous routing decision through PACKET_TAG_IPFORWARD.
That's the scope of this review. Without this out of the way, it's hard to talk about much else. Maybe the wrapper struct is a very bad idea. Maybe we need a new tag PACKET_TAG_IFACEFORWARD and leave the current PACKET_TAG_IPFORWARD code intact. I'm ok to change it to whatever is the best way to do it.
Almost, because ip_output also skips the pending pfil hooks in the direction :/
In the OPNsense case we use in: pf(4) -> ipfw(4), out: ipfw(4) -> pf(4). Both filters have distinctive features: pf(4) has route-to for load balancing and NAT, ipfw(4) has dummynet shaping and forwarding.
The current code in FreeBSD blackholes route-to in the first pfil hook call and therefor completely skips ipfw(4). Feature combinations between the two therefore fail as well.
Using ip_output the pfil in hook for ipfw(4) will still be skipped, but the ipfw(4) out hook becomes reachable at least.
M_SKIP_FIREWALL is another beast in that equation, because it's a global flag and each pfil should really not set it because it implies to skip the other firewall as well. The flags ought to be be split up, one for each filter like...
#define M_SKIP_FIREWALL (M_SKIP_FIREWALL_PF | M_SKIP_FIREWALL_IPFW)
... and this requires review / adapting of a lot of spots in the stack that test using:
if (var & M_SKIP_FIREWALL)
depending on where they are called from (ipfw, pf or global context)
This is not the way this should proceed.
Dummynet is a service and it should be consumed by each pfil consumer if it likes to.
This patch does not solve anything at all.
Because even pfil reordering has been rejected at the time i do not agree with such path.
Normally i should have implemented in pf(4) long time ago the route-to/reply-to.... with the same usage as ipfw since its more scalable etc but that came due to no time.
I have evaluated that implementation though and it considerably better and faster and solves problems which pf cannot handle today.
While dummynet for pf is already there and lets merge it in FreeBSD sources it is quite overdue.
@eri This is *not* about dummynet at all. This patch doesn't even cover anything other than making PACKET_TAG_IPFORWARD usable from a non-ipfw perspective, one could also call it "universally reusable".
The plain issue is: pf(4) calls if_output and it's wrong, because it takes a hardcoded shortcut out of the pfil framework and thus breaks numerous use cases in an otherwise functioning pf(4)+ipfw(4) combination.
I'm politely asking you to stick to this point. :)
There's two things here:
- pf calls if_output, which is to be considered wrong by all standards so I've created a bug report here:
- ipfw's PACKET_TAG_IPFORWARD does the correct thing by storing the routing decision (forward) to a later time in the respective functions outside the pfil chain.
If 2. is not the way to go for pf, please do provide an alternative solution/review, which I shall be happy to review.
Until then, I have to stay to the point: if 2. is to be implemented, this first patch is needed in order to go forward.
Where are we on this? Within the scope of this specific code up for review: what else should be done?
I'm not sentimental regarding this particular implementation. With the right pointers the best possible solution can be found and worked on. :)