Over the last years people have reported "hangs" of epairs not recovering once the hardware queue overflowed. It turns out there are multiple problems both with the epair code and its interactions with the netisr framework. While these are not fixed do not compile in the "drain" framework anymore. This comes at a penalty of possibly dropping more packets faster again as we only have the per-CPU netisr queue for all interfaces and no per-interface "fallback" queue anymore. While touching the code also update the epair(4) man page and add tuning notes. PR: 227100
Details
Details
Diff Detail
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
No Lint Coverage - Unit
No Test Coverage - Build Status
Buildable 29887 Build 27707: arc lint + arc unit
Event Timeline
Comment Actions
I ran into this panic running the pf (forward:v4) test:
panic: epair_clone_destroy: ifp=0xfffff80051b6a800 scb->refcount!=1: 3 cpuid = 7 time = 1584005959 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00aff5a630 vpanic() at vpanic+0x182/frame 0xfffffe00aff5a680 panic() at panic+0x43/frame 0xfffffe00aff5a6e0 epair_clone_destroy() at epair_clone_destroy+0x1c1/frame 0xfffffe00aff5a730 if_clone_destroyif() at if_clone_destroyif+0x175/frame 0xfffffe00aff5a780 if_clone_destroy() at if_clone_destroy+0x1f5/frame 0xfffffe00aff5a7d0 ifioctl() at ifioctl+0x371/frame 0xfffffe00aff5a8a0 kern_ioctl() at kern_ioctl+0x27b/frame 0xfffffe00aff5a900 sys_ioctl() at sys_ioctl+0x12f/frame 0xfffffe00aff5a9d0 amd64_syscall() at amd64_syscall+0x803/frame 0xfffffe00aff5aaf0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00aff5aaf0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x80048573a, rsp = 0x7fffffffe228, rbp = 0x7fffffffe240 --- KDB: enter: panic [ thread pid 1420 tid 100582 ] Stopped at kdb_enter+0x37: movq $0,0x10928e6(%rip)
That's with net.link.epair.netisr_maxqlen=2 because that used to provoke the error quickly.
sys/net/if_epair.c | ||
---|---|---|
474 | Should we be releasing references? | |
611 | Should we not be releasing references here as well? |
Comment Actions
I've started on the inevitable try to rewrite epair(4) two weeks ago (need to say it's UBR work) and ridden it of the netisr.
I do have a prototype which seems working; need to figure out how to scale things up to hundreds of epairs or 100(s) of CPU threads.