Run through the VANILLA/pktgen-fullspread and IPFW/pktgen-ipfw-dropone
Details
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 2551 Build 2568: arc lint + arc unit
Event Timeline
I tried this on my gateway running 10.2-STABLE but it keeps crashing, meaning I am losing all connectivity including IPMI. On the IPMI console I do not see any panic. In the logs I do not see anything unusual.
I backported this to 10.2-RELEASE and it runs ok. Mistakenly I was running the 10.2-STABLE kernel on a 10.2-RELEASE userland.
What I found in my setup was that this reduced performance by 5% but that is a pull back from a 25% increase. Can you also compare this against plain forwarding on 10.2? The comparisons that matter are:
pre-tryforward vs. old fastforward vs. tryforward now
Note that fastforward shows this bug so the speed loss, though mildly annoying, is required to prevent the bug that was reported.
With the patch backported to 10.2-RELEASE I got a kernel panic today.
I will try to get a dump.
FreeBSD/amd64 (xxxxx) (ttyu0)
login: panic: sbflush_internal: cc 0 || mb 0xfffff801786a0100 || mbcnt 256
cpuid = 1
KDB: stack backtrace:
#0 0xffffffff809442a0 at kdb_backtrace+0x60
#1 0xffffffff80907a06 at vpanic+0x126
#2 0xffffffff809078d3 at panic+0x43
#3 0xffffffff8097caab at sbflush_internal+0x7b
#4 0xffffffff8097cb72 at sbdestroy+0x12
#5 0xffffffff8097eb23 at sofree+0x183
#6 0xffffffff8097f086 at soclose+0x376
#7 0xffffffff808bc979 at _fdrop+0x29
#8 0xffffffff808bf21e at closef+0x21e
#9 0xffffffff808bee39 at fdescfree+0x4f9
#10 0xffffffff808cb739 at exit1+0x569
#11 0xffffffff808cb1ce at sys_sys_exit+0xe
#12 0xffffffff80cd2287 at amd64_syscall+0x357
#13 0xffffffff80cb796b at Xfast_syscall+0xfb
Uptime: 3m58s
I'm not sure how that trace could be related to this but if I can get a core dump it might help.
Ok, still on this Atom-4core (bench on i5-8-core Chelsio is still on-going), here are the difference:
x 10.2 forwarding (pps) + 10.2 fastforwarding (pps) * 10-stable-with D5330 forwarding (pps) +--------------------------------------------------------------------------+ | * | | * | | * * | |xx x ** * +++ + | |xx x ** * +++ ++ +| ||MA_| | | |_MA_| | | |_A| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 381295 405347.5 384301.5 389779.85 10183.979 + 10 701671 736395 713076.5 714775.6 11127.311 Difference at 95.0% confidence 324996 +/- 10021.8 83.3793% +/- 2.57114% (Student's t, pooled s = 10666.1) * 10 608019 624612 617817 617255.35 6681.9961 Difference at 95.0% confidence 227476 +/- 8092.6 58.36% +/- 2.0762% (Student's t, pooled s = 8612.85)
We should ask melifaro about the impact on projects/routing.
On a different hardware setup (Intel Xeon E5-2650, Chelsio 10-Gigabit T540-CR, NIC configured for using 4 queues in place of 8) I didn't have the same result at all !
First there is no visible impact with or without this D5330 patch:
x 10-stable r295802 tryforward (pps) + 10-stable r295802 with D5330 tryforward (pps) +--------------------------------------------------------------------------+ | x x + | |x x + + + ++xxx x x* + + +| | |__________________A____M______________| | | |_____________M__A_________________| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 2108683.5 2553645 2460830.8 2420369.1 160855.67 + 10 2343852.5 2718812 2497405.5 2529979.1 149422.27 No difference proven at 95.0% confidence
A 10-stable with D5330 tryforward is still faster than a 10.2 fastforward on this hardware setup:
x 10.2 fastforwarding (pps) + 10-stable r295802 with D5330 tryforward (pps) +--------------------------------------------------------------------------+ | + | |x x x x x + + + ++ x x x x+x + + +| | |___________________AM___________________| | | |______________M___A__________________| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 2147676 2563598 2378921.8 2375603.1 160971.63 + 10 2343852.5 2718812 2497405.5 2529979.1 149422.27 Difference at 95.0% confidence 154376 +/- 145923 6.49839% +/- 6.14258% (Student's t, pooled s = 155304)
Then comparing 10.2 forwarding, 10.2 fastforwarding, and 10-stable-with D5330 tryforward is useless:
x 10.2 forwarding (pps) + 10.2 fastforwarding (pps) * 10-stable r295802 with D5330 tryforward (pps) +--------------------------------------------------------------------------+ | x * + * | |xx x x xxx xx + +++ +*** *+ ++* * **| | |___A___| | | |_____AM_____| | | |___M_A____| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 863031 1163347 1016572.8 1014444.7 103384.17 + 10 2147676 2563598 2378921.8 2375603.1 160971.63 Difference at 95.0% confidence 1.36116e+06 +/- 127106 134.178% +/- 12.5297% (Student's t, pooled s = 135278) * 10 2343852.5 2718812 2497405.5 2529979.1 149422.27 Difference at 95.0% confidence 1.51553e+06 +/- 120721 149.395% +/- 11.9002% (Student's t, pooled s = 128482)
I wonder if you can get some pmc results on this code to see what's causing the difference? I suspect caching issues, but that's just a guess. I'm going to move forward with a change to how this all works in HEAD, following the suggestion in the comment. that will take a while to work out because icmp_error() likes to free mbufs and if we have one on the stack that will not go well.
Here are the pmc stats files (gprof output) on the atom 4 cores:
I beleive what you are looking is on this extract:
without the patch:
2739.00 20199.50 11095/11095 ip_input [8] [9] 68.1 2739.00 20199.50 11095 ip_tryforward [9] 1045.00 11134.29 5903/5903 ether_output [10] 252.00 5141.23 2955/2955 ip_findroute [12] 249.00 1401.00 1401/1401 in_localip [21] 447.41 171.56 478/1500 __mtx_lock_sleep [19] 345.00 0.00 345/699 bzero [34] 13.00 0.00 13/34 in_rtalloc_ign [70]
With the patch:
3044.00 20772.95 10986/10986 ip_input [8] [9] 70.4 3044.00 20772.95 10986 ip_tryforward [9] 1325.00 10408.61 5168/5168 ether_output [10] 227.00 3890.40 2327/2327 ip_findroute [13] 259.00 1189.00 1189/1189 in_localip [24] 533.53 557.72 622/1610 uma_zalloc_arg <cycle 2> [20] 83.87 550.67 384/902 m_freem [22] 160.00 430.00 430/430 m_copydata [40] 321.88 135.70 345/970 __mtx_lock_sleep [26] 240.67 119.34 181/182 m_dup_pkthdr [44] 335.00 0.00 335/652 bzero [37] 5.56 0.00 5/9 in_rtalloc_ign [80]
Here are the pmc stats files (gprof output) on the Intel E5 8 cores:
We can copy IP header + 8 bytes into on-stack buffer and then create mbuf only when this needed. But probably the difference will be very low.
On a vanilla 10.2-STABLE with the latest available kernel and applied patch I get this (ipfw+nat):
GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: sbsndptr: sockbuf 0xfffff80011cbd6f8 and mbuf 0xfffff80064806d00 clashing cpuid = 1 KDB: stack backtrace: #0 0xffffffff8094b5d0 at kdb_backtrace+0x60 #1 0xffffffff8090de26 at vpanic+0x126 #2 0xffffffff8090dcf3 at panic+0x43 #3 0xffffffff80987710 at sbsndmbuf+0 #4 0xffffffff80a56ffc at tcp_output+0xe8c #5 0xffffffff80a65048 at tcp_usr_send+0x3a8 #6 0xffffffff80989816 at sosend_generic+0x476 #7 0xffffffff80989a9c at sosend+0x3c #8 0xffffffff8096d9d9 at soo_write+0x49 #9 0xffffffff80965787 at dofilewrite+0x87 #10 0xffffffff809654b8 at kern_writev+0x68 #11 0xffffffff80965443 at sys_write+0x63 #12 0xffffffff80cd994f at amd64_syscall+0x40f #13 0xffffffff80cbebbb at Xfast_syscall+0xfb Uptime: 1m57s Dumping 427 out of 8134 MB:..4%..12%..23%..34%..42%..53%..64%..72%..83%..94% Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols Reading symbols from /boot/kernel/geom_eli.ko.symbols...done. Loaded symbols for /boot/kernel/geom_eli.ko.symbols Reading symbols from /boot/kernel/if_tap.ko.symbols...done. Loaded symbols for /boot/kernel/if_tap.ko.symbols Reading symbols from /boot/kernel/if_bridge.ko.symbols...done. Loaded symbols for /boot/kernel/if_bridge.ko.symbols Reading symbols from /boot/kernel/bridgestp.ko.symbols...done. Loaded symbols for /boot/kernel/bridgestp.ko.symbols Reading symbols from /boot/kernel/umass.ko.symbols...done. Loaded symbols for /boot/kernel/umass.ko.symbols Reading symbols from /boot/kernel/usb.ko.symbols...done. Loaded symbols for /boot/kernel/usb.ko.symbols Reading symbols from /boot/kernel/aio.ko.symbols...done. Loaded symbols for /boot/kernel/aio.ko.symbols Reading symbols from /boot/kernel/coretemp.ko.symbols...done. Loaded symbols for /boot/kernel/coretemp.ko.symbols Reading symbols from /boot/kernel/vmm.ko.symbols...done. Loaded symbols for /boot/kernel/vmm.ko.symbols Reading symbols from /boot/kernel/ehci.ko.symbols...done. Loaded symbols for /boot/kernel/ehci.ko.symbols Reading symbols from /boot/kernel/xhci.ko.symbols...done. Loaded symbols for /boot/kernel/xhci.ko.symbols Reading symbols from /boot/kernel/ipmi.ko.symbols...done. Loaded symbols for /boot/kernel/ipmi.ko.symbols Reading symbols from /boot/kernel/smbus.ko.symbols...done. Loaded symbols for /boot/kernel/smbus.ko.symbols Reading symbols from /boot/kernel/aesni.ko.symbols...done. Loaded symbols for /boot/kernel/aesni.ko.symbols Reading symbols from /boot/kernel/nmdm.ko.symbols...done. Loaded symbols for /boot/kernel/nmdm.ko.symbols Reading symbols from /boot/kernel/ums.ko.symbols...done. Loaded symbols for /boot/kernel/ums.ko.symbols Reading symbols from /boot/kernel/ukbd.ko.symbols...done. Loaded symbols for /boot/kernel/ukbd.ko.symbols Reading symbols from /boot/kernel/ulpt.ko.symbols...done. Loaded symbols for /boot/kernel/ulpt.ko.symbols Reading symbols from /boot/kernel/ipfw.ko.symbols...done. Loaded symbols for /boot/kernel/ipfw.ko.symbols Reading symbols from /boot/kernel/dummynet.ko.symbols...done. Loaded symbols for /boot/kernel/dummynet.ko.symbols Reading symbols from /boot/kernel/ipfw_nat.ko.symbols...done. Loaded symbols for /boot/kernel/ipfw_nat.ko.symbols Reading symbols from /boot/kernel/libalias.ko.symbols...done. Loaded symbols for /boot/kernel/libalias.ko.symbols #0 doadump (textdump=<value optimized out>) at pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) backtrace #0 doadump (textdump=<value optimized out>) at pcpu.h:219 #1 0xffffffff8090da82 in kern_reboot (howto=260) at /rex/src/freebsd/10/sys/kern/kern_shutdown.c:486 #2 0xffffffff8090de65 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /rex/src/freebsd/10/sys/kern/kern_shutdown.c:889 #3 0xffffffff8090dcf3 in panic (fmt=0x0) at /rex/src/freebsd/10/sys/kern/kern_shutdown.c:818 #4 0xffffffff80987710 in sbsndptr (sb=<value optimized out>, off=<value optimized out>, len=<value optimized out>, moff=<value optimized out>) at /rex/src/freebsd/10/sys/kern/uipc_sockbuf.c:1011 #5 0xffffffff80a56ffc in tcp_output (tp=0xfffff800643d0408) at /rex/src/freebsd/10/sys/netinet/tcp_output.c:1027 #6 0xffffffff80a65048 in tcp_usr_send (so=<value optimized out>, flags=<value optimized out>, m=<value optimized out>, nam=<value optimized out>, control=<value optimized out>, td=<value optimized out>) at /rex/src/freebsd/10/sys/netinet/tcp_usrreq.c:918 #7 0xffffffff80989816 in sosend_generic (so=0xfffff80011cbd570, addr=0x0, uio=0xfffffe023885a960, top=<value optimized out>, control=<value optimized out>, flags=<value optimized out>, td=0xffffffffffffffff) at /rex/src/freebsd/10/sys/kern/uipc_socket.c:1283 ---Type <return> to continue, or q <return> to quit--- #8 0xffffffff80989a9c in sosend (so=0x0, addr=0x0, uio=0x0, top=0x0, control=0x0, flags=0, td=0xfffff800645d7960) at /rex/src/freebsd/10/sys/kern/uipc_socket.c:1327 #9 0xffffffff8096d9d9 in soo_write (fp=<value optimized out>, uio=0xfffffe023885a960, active_cred=<value optimized out>, flags=<value optimized out>, td=<value optimized out>) at /rex/src/freebsd/10/sys/kern/sys_socket.c:103 #10 0xffffffff80965787 in dofilewrite (td=0xfffff800645d7960, fd=3, fp=0xfffff80011c034b0, auio=0xfffffe023885a960, offset=<value optimized out>, flags=0) at file.h:305 #11 0xffffffff809654b8 in kern_writev (td=0xfffff800645d7960, fd=3, auio=0xfffffe023885a960) at /rex/src/freebsd/10/sys/kern/sys_generic.c:481 #12 0xffffffff80965443 in sys_write (td=<value optimized out>, uap=<value optimized out>) at /rex/src/freebsd/10/sys/kern/sys_generic.c:396 #13 0xffffffff80cd994f in amd64_syscall (td=0xfffff800645d7960, traced=0) at subr_syscall.c:141 #14 0xffffffff80cbebbb in Xfast_syscall () at /rex/src/freebsd/10/sys/amd64/amd64/exception.S:396 #15 0x0000000803492e7a in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal
I think it is unrelated problem, similar to this one:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=148807
@ae
I think you are right. I am currently trying with "hw.em.enable_msix=0" and it seems to work ok!
Regarding my previous comment, I was running 10.2-RELEASE and had not turned fastforwarding on.
The bug still persists. I think there might be a problem with the fastforwarding routine as this does not happen when fastforwarding is off.
Steps to reproduce:
- WAN-interface: route change default -mtu 1196
- Browse some pages with a client machine
- Try to access the gateway via ssh on the LAN-interface.
I am trying to find where the problem lies but without success so far.
However if I add the following right after line 311 ("passin :") the bug dissappears:
passin: dchg = (odest.s_addr != ip->ip_dst.s_addr); if (m->m_flags & M_IP_NEXTHOP) { dchg = (m_tag_find(m, PACKET_TAG_IPFORWARD, NULL) != NULL); if (dchg != 0) { /* * Directly ship the packet on. This allows * forwarding packets originally destined to us * to some other directly connected host. */ ip_forward(m, 1); return NULL; } } ip_forward(m, dchg); return NULL;
This way ip_forward() and ip_output() take over the outgoing path.
There are also some other minor discrepancies with the fastforwarding code:
e.g. before the PFIL hooks for outgoing packets the following has to be done:
odest.s_addr = dest.s_addr = ip->ip_dst.s_addr;
After a lot of testing I found the culprit:
removal of the lines 515-516 resolves my bug.
515 if (mcopy) 516 m_freem(mcopy);
the question is why?
sys/netinet/ip_fastfwd.c | ||
---|---|---|
515–516 | Everytime consumed is called, mcopy has already been freed. This check seems harmless but causes a bug with unspecific errors resulting in kernel panics in case the route mtu is set to 576-1196. |
Possible patch:
else { counter_u64_add(ro.ro_rt->rt_pksent, 1); IPSTAT_INC(ips_forward); IPSTAT_INC(ips_fastforward); } m_freem(mcopy); RTFREE(ro.ro_rt); return NULL; consumed: RTFREE(ro.ro_rt); return NULL; drop: if (m) m_freem(m); if (mcopy) m_freem(mcopy); if (ro.ro_rt) RTFREE(ro.ro_rt); return NULL; }
sys/netinet/ip_fastfwd.c | ||
---|---|---|
361 | odest.s_addr = dest.s_addr = ip->ip_dst.s_addr; |
Another issue not addressed by fastforwarding is the change of the routing table through pfil (i.e. setfib in ipfw). After the outgoing pfil hook there should be a check to see if the routing table has changed. If it has ip_findroute() should be called to determine the new route.