Defer the packet size check until after the firewall has had a look at
it. This means that the firewall now has the opportunity to
(re-)fragment an oversized packet.
Details
- Reviewers
ae philip gnn - Commits
- rS281234: Evaluate packet size after the firewall had its chance
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
Since you are in ip6_forward(), this means ip6_input() has already checked this packet and PFIL had a chance to handle this packet.
IPv6 router should not do reassembling fragmented packets and do new fragmentation of them, but if you want, I think your packet filter should track these fragments on input. How do you tested this patch?
The defragmentation is done on the input side.
When fragmented packets arrive we queue them up inside pf (telling the network stack we dropped them) on the input side. Once we've got a complete packet we can perform the actual filtering (which has to be done on the full packet or the firewall could be bypassed by fragmenting packets). At that point we have an oversized packet which somehow has to be sent out again. As netpfil doesn't have a way to tell the network stack 'Here are a bunch of packets' the only way I can see is to call ip6_forward().
How do you tested this patch?
The actual defragmentation was tested by generating packets with scapy. The forwarding path mostly by having a VM forward packets. The patch set is also running on my (dual stack, VIMAGE enabled) gateway.
Ok. ip6_input() received fragments and PF consumed them. Now you have reassembled packet inside PF. You call ip6_forward() with this oversized packet. ip6_forward() call PFIL on output and PF will find its mtag, so it can refragment this packet. Am I right?
ip6_input(last fragment) pfil_run_hooks(IN) ip6_forward(reassembled packet) pfil_run_hooks(OUT) ip6_forward(new first fragment) pfil_run_hooks(OUT) ip6_forward(new second fragment) pfil_run_hooks(OUT) .... ip6_forward(new last fragment) pfil_run_hooks(OUT)
I think this configuration will have problems with PMTUD. Some router can drop your "new fragment" and ICMP message will be addressed to the source host, but your host will use the same size for fragments. Does PF track ICMP Packet Too Big messages to not own destinations?
It does not, but when we refragment we ensure that the our fragments are no larger than the largest fragment generated by the source. That way PMTU keeps working.
I think I am lost and haven't read the diff yet, but what you do is that you keep the original fragments as an mbuf next packet list and then release them once you are confident that a reassembled packet would be fine to pass?
I guess that last comment should go onto another of the multiple diffs to review really but the discussion seemed to be here
Indeed. Fragments arrive in pf_test6() -> pf_normalize_ip6() -> pf_reassemble6(). If we can't defragment (yet) the fragment is dropped.
If it's complete the flow continues, but now on the defragmented packet.
I think bz meant keep a chain of original mbufs, then reassemble them into new mbuf, check this big packet with firewall rules, then freem() it and push all original fragments into the network. In this case they will have the same size and ids.
The difficulty, and the reason for this specific part, is the last bit. I don't know how to get pfil(9) to give multiple packets back to the IP stack.
I can see the point of doing what bz suggested. It's also shouldn't be exceptionally hard to do. Still, I'd like to get to the point where the whole things actually works first, and for that we still need this patch.
I think idea to make pfil(9) expecting to get multiple packets in reply to one is not so bad. Right now we are calling ip_output() from pfils in certain cases, which is ugly.
Yeah, I think that'd also be a way of fixing the issue Allan Jude found (essentially we get stuck in a defragment/refragment look when scrubbing on both input and output). We want to re-inject the refragmented packets without passing them through the firewall again.
I'm not sure it'd fix this specific problem though. My view is that we want part of the current behavior. That is, fragmented packets make it to pfil(PFIL_IN), but after that we give a defragmented packet to the IP stack and only refragment it when it hits pfil(PFIL_OUT). The alternative would be to queue up the fragmented packet in pfil(PFIL_IN), defragment, filter and then immediately refragment (so inject refragmented packets on the other side of pfil(PFIL_IN). That's fine, but we'd end up doing all that work again on the pfil(PFIL_OUT) side of things. That feels a little wasteful.
I just thought about fixing IPv6 reassembling in ipfw(4), and it looks like this patch will be needed for that.
Please, add a comment that in some cases we assume length changes after pfil processing.
You mean if we fall into 'if (error != 0 || m == NULL)'?
Arguably that's better, isn't it? What's the point of sending an ICMP6_PACKET_TOO_BIG error for a packet we were going to drop anyway?
pfil_run_hooks() just returns 0 if there are no filters, so we'd still do the MTU check.
Ah, yes, that's a problem.
I wonder if we can get away with just moving the MTU check to just after 'pass'.
There's a couple of points between the pfil hook and 'pass' where the packet is inserted into the local queue for delivery, but do we really care about the packet size if it's going to the input path? In that case the MTU check isn't really correct anyway, because the packet isn't going out to rt->rt_ifp.
- I'm agree to move the MTU check after the 'pass'.
- Also it looks to me, that it is safe to remove this part of duplicated code:
542 if (m->m_pkthdr.rcvif == NULL) 543 m->m_pkthdr.rcvif = V_loif; 544 if (m->m_pkthdr.csum_flags & CSUM_DELAY_DATA_IPV6) { 545 m->m_pkthdr.csum_flags |= 546 CSUM_DATA_VALID_IPV6 | CSUM_PSEUDO_HDR; 547 m->m_pkthdr.csum_data = 0xffff; 548 } 549 #ifdef SCTP 550 if (m->m_pkthdr.csum_flags & CSUM_SCTP_IPV6) 551 m->m_pkthdr.csum_flags |= CSUM_SCTP_VALID; 552 #endif 553 error = netisr_queue(NETISR_IPV6, m); 554 goto out;
Okay, I'll submit a new patch soon (when I find a few minutes for it, over the weekend probably).
Previous version patch only worked on top of the originally proposed patch. This one applies to current.