rxq setup for netmap was broken because netmap_rxq_init was getting called before IFDI_INIT - thus we ended up with ring tail pointer being reset to zero.
Details
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
em0@pci0:0:31:6: class=0x020000 card=0x06db1028 chip=0x156f8086 rev=0x21 hdr=0x00 device = 'Ethernet Connection I219-LM'
Tested with netmap's pkt-gen
On receive I get crash:
#7 0xffffffff80a72006 in kassert_panic ( fmt=0xffffffff810482e7 "Assertion %s failed at %s:%d") at /usr/home/johannes/dev/freebsd/freebsd-base-graphics/sys/kern/kern_shutdown.c:669 #8 0xffffffff8053a20f in em_isc_rxd_pkt_get (arg=<unavailable>, ri=<optimized out>) at /usr/home/johannes/dev/freebsd/freebsd-base-graphics/sys/dev/e1000/em_txrx.c:697 #9 0xffffffff80b81288 in iflib_netmap_rxsync (kring=<optimized out>, flags=<optimized out>) at /usr/home/johannes/dev/freebsd/freebsd-base-graphics/sys/net/iflib.c:1008 #10 0xffffffff806c12fc in netmap_poll (priv=<optimized out>, events=<optimized out>, sr=0xfffff80220420560) at /usr/home/johannes/dev/freebsd/freebsd-base-graphics/sys/dev/netmap/netmap.c:2724 #11 0xffffffff806c3a02 in freebsd_netmap_poll (cdevi=<optimized out>, events=1, td=0xfffff80220420560) at /usr/home/johannes/dev/freebsd/freebsd-base-graphics/sys/dev/netmap/netmap_freebsd.c:1393 #12 0xffffffff80941eef in devfs_poll_f (fp=0xfffff8006cdd9c30, events=1, cred=0xfffff8000c4bba00, td=0xfffff80220420560)
On transmit I get:
621.256106 sender_body [1181] start, fd 3 main_fd 3 622.257327 main_thread [2056] 1.022 Kpps (1.023 Kpkts 491.040 Kbps in 1001225 usec) 511.50 avg_batch 0 min_space 623.258329 main_thread [2056] 0.000 pps (0.000 pkts 0.000 bps in 1001002 usec) 0.00 avg_batch 99999 min_space 623.260011 sender_body [1250] poll error/timeout on queue 0: No error: 0 624.259329 main_thread [2056] 1.023 Kpps (1.024 Kpkts 491.520 Kbps in 1000999 usec) 341.33 avg_batch 99999 min_space 625.263889 main_thread [2056] 0.000 pps (0.000 pkts 0.000 bps in 1004548 usec) 0.00 avg_batch 99999 min_space 625.263876 sender_body [1250] poll error/timeout on queue 0: No error: 0 626.265335 main_thread [2056] 1.023 Kpps (1.024 Kpkts 491.520 Kbps in 1001458 usec) 341.33 avg_batch 99999 min_space 627.265335 sender_body [1250] poll error/timeout on queue 0: No error: 0 627.276066 main_thread [2056] 1.013 Kpps (1.024 Kpkts 491.520 Kbps in 1010731 usec) 341.33 avg_batch 99999 min_space 628.276338 main_thread [2056] 0.000 pps (0.000 pkts 0.000 bps in 1000273 usec) 0.00 avg_batch 99999 min_space
The packets that do get transmitted are received by receiving machine.
@sbruno @johalun0_gmail.com I've created a dedicated branch for this fix ifilb/netmap_rx
I did a clean world+kernel build on iflib/netmap_rx branch.
Running pkt-gen -i em1 -f rx on receiver and pkt-gen -i em1 -f tx on the sender results in crash on receiver.
(sender and receiver two different machines connected back to back on em1)
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe03668b2510 vpanic() at vpanic+0x19c/frame 0xfffffe03668b2590 kassert_panic() at kassert_panic+0x126/frame 0xfffffe03668b2600 em_isc_rxd_pkt_get() at em_isc_rxd_pkt_get+0xf1/frame 0xfffffe03668b2670 iflib_netmap_rxsync() at iflib_netmap_rxsync+0x235/frame 0xfffffe03668b2770 netmap_poll() at netmap_poll+0x79c/frame 0xfffffe03668b2870 freebsd_netmap_poll() at freebsd_netmap_poll+0x32/frame 0xfffffe03668b28a0 devfs_poll_f() at devfs_poll_f+0x7f/frame 0xfffffe03668b2900 kern_poll() at kern_poll+0x4fc/frame 0xfffffe03668b2aa0 sys_poll() at sys_poll+0x50/frame 0xfffffe03668b2ac0 amd64_syscall() at amd64_syscall+0x79b/frame 0xfffffe03668b2bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe03668b2bf0 --- syscall (209, FreeBSD ELF64, sys_poll), rip = 0x800daf2aa, rsp = 0x7fffdfff9e78, rbp = 0x7fffdfff9eb0 --- KDB: enter: panic [ thread pid 828 tid 100177 ] Stopped at kdb_enter+0x3b: movq $0,kdb_why
However, if I limit the rate of packets I like so
pkt-gen -i em1 -f rx on receiver and pkt-gen -i em1 -f tx -R 100 on the sender
the receiver stops receiving packets after ~600 packets or so (6 batches received), after that rate on receiver goes to zero but does not crash.
Breaking the process might result in crash:
cpuid = 2 time = 1504082223 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe02398c0730 vpanic() at vpanic+0x19c/frame 0xfffffe02398c07b0 kassert_panic() at kassert_panic+0x126/frame 0xfffffe02398c0820 iflib_fl_bufs_free() at iflib_fl_bufs_free+0x1c2/frame 0xfffffe02398c0870 iflib_stop() at iflib_stop+0x478/frame 0xfffffe02398c08c0 iflib_netmap_register() at iflib_netmap_register+0x1a4/frame 0xfffffe02398c0900 netmap_hw_reg() at netmap_hw_reg+0x2c/frame 0xfffffe02398c0930 netmap_do_unregif() at netmap_do_unregif+0x16a/frame 0xfffffe02398c0960 netmap_priv_delete() at netmap_priv_delete+0x31/frame 0xfffffe02398c0980 netmap_dtor() at netmap_dtor+0x2b/frame 0xfffffe02398c09a0 devfs_destroy_cdevpriv() at devfs_destroy_cdevpriv+0x8b/frame 0xfffffe02398c09c0 devfs_close_f() at devfs_close_f+0x65/frame 0xfffffe02398c09f0 closef() at closef+0x1f5/frame 0xfffffe02398c0a80 closefp() at closefp+0x9f/frame 0xfffffe02398c0ac0 amd64_syscall() at amd64_syscall+0x79b/frame 0xfffffe02398c0bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe02398c0bf0
igb0 interface works flawlessly at 1.4 Mpps so problem limited to em
@johan_duh.se I've fixed these two panics in my dev branch. Both em and igb work fine for me now. I'll update the patch as soon as you can confirm that you find no further issues.
Confirmed that pkt-gen works fine at max packet rate on both igb and em now. ^C haven't caused any issues either.