Page MenuHomeFreeBSD

epair(4): disable per-IF fallback queuing and draining
AbandonedPublic

Authored by bz on Mar 11 2020, 9:02 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mar 7 2024, 8:43 PM
Unknown Object (File)
Dec 22 2023, 10:51 PM
Unknown Object (File)
Nov 9 2023, 5:54 AM
Unknown Object (File)
Oct 8 2023, 4:50 AM
Unknown Object (File)
Jul 20 2023, 2:45 AM
Unknown Object (File)
Feb 24 2023, 12:28 AM
Unknown Object (File)
Feb 23 2023, 12:26 AM
Unknown Object (File)
Feb 22 2023, 12:26 AM

Details

Reviewers
kp
Group Reviewers
manpages
Summary
Over the last years people have reported "hangs" of epairs
not recovering once the hardware queue overflowed.

It turns out there are multiple problems both with the epair
code and its interactions with the netisr framework.
While these are not fixed do not compile in the "drain"
framework anymore.
This comes at a penalty of possibly dropping more packets
faster again as we only have the per-CPU netisr queue for
all interfaces and no per-interface "fallback" queue anymore.

While touching the code also update the epair(4) man page and
add tuning notes.

PR: 227100

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
No Lint Coverage
Unit
No Test Coverage
Build Status
Buildable 29887
Build 27707: arc lint + arc unit

Event Timeline

I ran into this panic running the pf (forward:v4) test:

panic: epair_clone_destroy: ifp=0xfffff80051b6a800 scb->refcount!=1: 3
cpuid = 7
time = 1584005959
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00aff5a630
vpanic() at vpanic+0x182/frame 0xfffffe00aff5a680
panic() at panic+0x43/frame 0xfffffe00aff5a6e0
epair_clone_destroy() at epair_clone_destroy+0x1c1/frame 0xfffffe00aff5a730
if_clone_destroyif() at if_clone_destroyif+0x175/frame 0xfffffe00aff5a780
if_clone_destroy() at if_clone_destroy+0x1f5/frame 0xfffffe00aff5a7d0
ifioctl() at ifioctl+0x371/frame 0xfffffe00aff5a8a0
kern_ioctl() at kern_ioctl+0x27b/frame 0xfffffe00aff5a900
sys_ioctl() at sys_ioctl+0x12f/frame 0xfffffe00aff5a9d0
amd64_syscall() at amd64_syscall+0x803/frame 0xfffffe00aff5aaf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00aff5aaf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x80048573a, rsp = 0x7fffffffe228, rbp = 0x7fffffffe240 ---
KDB: enter: panic
[ thread pid 1420 tid 100582 ]
Stopped at      kdb_enter+0x37: movq    $0,0x10928e6(%rip)

That's with net.link.epair.netisr_maxqlen=2 because that used to provoke the error quickly.

sys/net/if_epair.c
474

Should we be releasing references?

611

Should we not be releasing references here as well?

I've started on the inevitable try to rewrite epair(4) two weeks ago (need to say it's UBR work) and ridden it of the netisr.
I do have a prototype which seems working; need to figure out how to scale things up to hundreds of epairs or 100(s) of CPU threads.