Page MenuHomeFreeBSD

dummynet: purge queued packets on interface destruction
AbandonedPublic

Authored by kp on Nov 20 2021, 12:34 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jan 8, 3:20 AM
Unknown Object (File)
Thu, Dec 26, 6:53 AM
Unknown Object (File)
Dec 9 2024, 3:35 AM
Unknown Object (File)
Dec 2 2024, 10:05 AM
Unknown Object (File)
Nov 25 2024, 5:12 PM
Unknown Object (File)
Nov 23 2024, 11:50 PM
Unknown Object (File)
Nov 21 2024, 4:39 PM
Unknown Object (File)
Nov 20 2024, 2:07 AM

Details

Reviewers
glebius
Group Reviewers
network
pfsense
Summary

When a struct ifnet goes away it's possible that dummynet will still
have packets queued for it. We must purge those to ensure we do not
attempt to send packets out a removed interface.

This scenario is triggered by the dummynet:ipfw_queue_v6 test and looks
like:

Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer = 0x20:0xffffffff80e33df6
stack pointer = 0x28:0xfffffe0148d1c170
frame pointer = 0x28:0xfffffe0148d1c240
code segment = base 0x0, limit 0xfffff, type 0x1b

				= DPL 0, pres 1, long 1, def32 0, gran 1

processor eflags = interrupt enabled, resume, IOPL = 0
current process = 0 (dummynet)
trap number = 9
panic: general protection fault
cpuid = 0
time = 1625664777
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0148d1be70
vpanic() at vpanic+0x187/frame 0xfffffe0148d1bed0
panic() at panic+0x43/frame 0xfffffe0148d1bf30
trap_fatal() at trap_fatal+0x387/frame 0xfffffe0148d1bf90
trap() at trap+0xa4/frame 0xfffffe0148d1c0a0
calltrap() at calltrap+0x8/frame 0xfffffe0148d1c0a0

  • trap 0x9, rip = 0xffffffff80e33df6, rsp = 0xfffffe0148d1c170, rbp = 0xfffffe0148d1c240 ---

ip6_input() at ip6_input+0x46/frame 0xfffffe0148d1c240
netisr_dispatch_src() at netisr_dispatch_src+0xb1/frame 0xfffffe0148d1c2a0
dummynet_send() at dummynet_send+0x1dd/frame 0xfffffe0148d1c2e0
dummynet_task() at dummynet_task+0x49c/frame 0xfffffe0148d1c380
taskqueue_run_locked() at taskqueue_run_locked+0xaa/frame 0xfffffe0148d1c400
taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe0148d1c430
fork_exit() at fork_exit+0x80/frame 0xfffffe0148d1c470
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0148d1c470

  • trap 0, rip = 0, rsp = 0, rbp = 0 ---

KDB: enter: panic

On interface removal scan through the dummynet packet queues and remove
any packet destined for the removed interface.

Sponsored by: Rubicon Communications, LLC ("Netgate")

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 42897
Build 39785: arc lint + arc unit

Event Timeline

kp requested review of this revision.Nov 20 2021, 12:34 AM
glebius requested changes to this revision.Nov 20 2021, 2:42 AM
glebius added a subscriber: glebius.

I'm not going to block this change like a bully, but I again will iterate that this is wrong way of solving this problme. Same happened with IP reassembly queue https://reviews.freebsd.org/D19622 I rejected it, but it was still committed. Later more Very likely there are more mbuf queues subject to this problem. And we can play this whack a mole game endlessly, until we scan all of the network stack on an interface departure. This change is a 100% deja vu.

My points are the same:

  1. it is a mistake to store an epoch protected pointer somewhere, where it would be stored beyond the epoch.
  2. You actually are dropping absolutely legitimate packets, if that matters to you.

I would suggest that whenever we queue an mbuf, we shall convert pointer to interface ID. Maybe improve that - combine it with generation count, to protect against ID reuse. Maybe improve interface by ID lookup KPI.

This revision now requires changes to proceed.Nov 20 2021, 2:42 AM

Phab put a big red cross in front on my name, but that's not what I really mean. I'll repeat myself. I won't fight against this change like crazy. If other developers have consensus that this is right way to solve the problem, go for it. Of course re-reading https://reviews.freebsd.org/D19622 makes sense to avoid repeating ourselves.

I'm not going to block this change like a bully, but I again will iterate that this is wrong way of solving this problme. Same happened with IP reassembly queue https://reviews.freebsd.org/D19622 I rejected it, but it was still committed. Later more Very likely there are more mbuf queues subject to this problem. And we can play this whack a mole game endlessly, until we scan all of the network stack on an interface departure. This change is a 100% deja vu.

My points are the same:

  1. it is a mistake to store an epoch protected pointer somewhere, where it would be stored beyond the epoch.

You raise a valid point. I'm not overly fond of this fix myself, although mostly for the complexity in walking all the dummynet queues.
I'm open to alternative solutions.

  1. You actually are dropping absolutely legitimate packets, if that matters to you.

It's not really something I'm worried about. In some cases we're not going to have much of a choice (e.g. we might be forwarding, and the outbound interface just went away, so that's never going to work out). In others we might still be able to send the packet, but if we'd issued the command to delete the interface (or unplugged the hardware...) a millisecond earlier the packet would never have arrived or been generated. We're a network component, and dropping packets is how we cope with anxiety.

I would suggest that whenever we queue an mbuf, we shall convert pointer to interface ID. Maybe improve that - combine it with generation count, to protect against ID reuse. Maybe improve interface by ID lookup KPI.

That's an interesting idea, but we'd have to find a lockless O(1) method of translating the struct ifnet pointer to an ID and vice versa. Otherwise we'd ruin performance on every scenario that has to enqueue mbufs (and remember the struct ifnet). I'm not seeing an obvious way of doing so, but it is a Friday evening and I've not thought about it very hard yet.

So unlike the case of reassembly queues, these are transmitted packets, not received packets that have arrived at their destination. I don't know if such packets are eligible to be re-routed at this point (or if dummynet is "post" all routing decisions). If they can't be re-routed, then I think you would always want to drop them (there's no place to send them as the interface to send them on is gone). If they can be re-routed, then perhaps the ifnet departure event needs to re-route them instead? However, this isn't quite the same as fragments where you can just leave them around even with a NULL ifp.

While I do think there might be merit in replacing ifp references with some layer of indirection to permit revocation / detection of stale references, someone would need to spearhead that project. I think even if you solve that problem you still need to do something for dummynet depending on my question above (though it may be that dummynet ends up dropping the packet when the timer goes off and it sees that the interface is no longer valid in that case?)

In D33064#747374, @jhb wrote:

So unlike the case of reassembly queues, these are transmitted packets, not received packets that have arrived at their destination.

They can be, but they can be delayed in the input path as well.

While I do think there might be merit in replacing ifp references with some layer of indirection to permit revocation / detection of stale references, someone would need to spearhead that project. I think even if you solve that problem you still need to do something for dummynet depending on my question above (though it may be that dummynet ends up dropping the packet when the timer goes off and it sees that the interface is no longer valid in that case?)

That's what I have in mind, yes. While there may be some situations where we can still deliver the packet I don't think it's worth any extra complexity. The main thing I want to accomplish is to ensure we no longer panic in this situation. Dropping a few packets, especially when interfaces get removed, is fine.

In D33064#747374, @jhb wrote:

So unlike the case of reassembly queues, these are transmitted packets, not received packets that have arrived at their destination. I don't know if such packets are eligible to be re-routed at this point (or if dummynet is "post" all routing decisions). If they can't be re-routed, then I think you would always want to drop them (there's no place to send them as the interface to send them on is gone). If they can be re-routed, then perhaps the ifnet departure event needs to re-route them instead? However, this isn't quite the same as fragments where you can just leave them around even with a NULL ifp.

But interface that disappeared is not their transmit interface, it is their receive interface. Nothing forbids us to route them forward.

While I do think there might be merit in replacing ifp references with some layer of indirection to permit revocation / detection of stale references, someone would need to spearhead that project. I think even if you solve that problem you still need to do something for dummynet depending on my question above (though it may be that dummynet ends up dropping the packet when the timer goes off and it sees that the interface is no longer valid in that case?)

I'm in the process of providing KPI for that and make dummynet to use it. I will post a review today.

I believe this can be abandoned now since D33267 fixes.