BBR requires precise timing information to correctly handle delays in LRO. This
is obtained with a new option to LRO where in a list of mbufs is passed up
to LRO. This hopefully is the final preparation patch for my next commit which
will bring BBRv1 into the FreeBSD tree yeah!
Details
We have tested this extensively at NF running in production now for close to
1 Year :)
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
The LRO patch was missing data and length limits for input processing via
MBUF_QUEUE methods. This update adds that and an elaborate set of
comments to the rack_bbr_common.c module describing in detail what
a transport designer using MBUF_QUEUEING needs to contemplate.
One more update, mainly comments but we also fix a couple of things
- The timestamp provided via the ctf common functions should be the real time not something from the mbuf, we let the transports look at those flags (which both BBR and the latest Rack do).
- We also for LRO add a gating of the number of acks (currently set to infinity) that can also cause a wakeup.
netinet/tcp_lro.c | ||
---|---|---|
873 | Would it make sense to have a sysctl to disable the in_pcblookup? Eg, to just assume the lookup will tell you that can_queue is going to be 0. In fact, I'd further suggest that when you load a TCP stack, it is marked with whether or not it supports queued lro, and when a TCP stack is loaded or unloaded, a global "tcp_stack_can_queue" is recalculated, base on the OR of all loaded TCP stacks's values for supporting queued lro. So you'd have something like: if (!queued_lro_forbidden && tcp_stack_can_queue_lro) { /* do in_pcblookups .. *? } else { /* can_queue = 0 */ } |
netinet/tcp_lro.c | ||
---|---|---|
873 | Yeah thats a great idea, I think we could add a I will spin that up and update the patch. |
This adds a simple registration mechanism that stacks can use as they successfully register/deregister to
make the LRO code aware if any stack wants to use mbuf-queueing per Drew's suggestion.
This crashes for me on boot.
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x0
fault code = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff807a0c9e
stack pointer = 0x28:0xfffffe00d39fe870
frame pointer = 0x28:0xfffffe00d39fe890
code segment = base 0x0, limit 0xfffff, type 0x1b= DPL 0, pres 1, long 1, def32 0, gran 1processor eflags = interrupt enabled, resume, IOPL = 0
current process = 0 (if_io_tqg_0)
trap number = 12
panic: page fault
cpuid = 0
time = 1567793282
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00d39fe530
vpanic() at vpanic+0x19d/frame 0xfffffe00d39fe580
panic() at panic+0x43/frame 0xfffffe00d39fe5e0
trap_fatal() at trap_fatal+0x39c/frame 0xfffffe00d39fe640
trap_pfault() at trap_pfault+0x62/frame 0xfffffe00d39fe690
trap() at trap+0x2b2/frame 0xfffffe00d39fe7a0
calltrap() at calltrap+0x8/frame 0xfffffe00d39fe7a0
- trap 0xc, rip = 0xffffffff807a0c9e, rsp = 0xfffffe00d39fe870, rbp = 0xfffffe00d39fe890 ---
tcp_lro_flush() at tcp_lro_flush+0x2e/frame 0xfffffe00d39fe890
tcp_lro_rx_done() at tcp_lro_rx_done+0x82/frame 0xfffffe00d39fe8b0
tcp_lro_flush_all() at tcp_lro_flush_all+0x9c/frame 0xfffffe00d39fe8f0
_task_fn_rx() at _task_fn_rx+0xd22/frame 0xfffffe00d39fe9f0
gtaskqueue_run_locked() at gtaskqueue_run_locked+0xf9/frame 0xfffffe00d39fea40
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x88/frame 0xfffffe00d39fea70
fork_exit() at fork_exit+0x84/frame 0xfffffe00d39feab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00d39feab0
- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100050 ]
Stopped at kdb_enter+0x3b: movq $0,kdb_why
igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 00:25:90:ca:6b:b4 inet 192.168.5.154 netmask 0xffffff00 broadcast 192.168.5.255 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
the box in question is lynx4 in zoo, I can give you access if needed.
Looks like it's counters which don't get allocated.
addr2line -e kernel.debug 0xffffffff807a0c9e
/tank/users/mjg/src/freebsd/sys/amd64/include/counter.h:85
ffffffff807a0c70 <tcp_lro_flush>: ffffffff807a0c70: 55 push %rbp ffffffff807a0c71: 48 89 e5 mov %rsp,%rbp ffffffff807a0c74: 41 57 push %r15 ffffffff807a0c76: 41 56 push %r14 ffffffff807a0c78: 53 push %rbx ffffffff807a0c79: 50 push %rax ffffffff807a0c7a: 49 89 f7 mov %rsi,%r15 ffffffff807a0c7d: 49 89 fe mov %rdi,%r14 ffffffff807a0c80: 48 8b 46 20 mov 0x20(%rsi),%rax ffffffff807a0c84: 48 3b 46 30 cmp 0x30(%rsi),%rax ffffffff807a0c88: 0f 84 b5 02 00 00 je ffffffff807a0f43 <tcp_lro_flush+0x2d3> ffffffff807a0c8e: 48 8b 04 25 38 28 00 mov 0xffffffff81002838,%rax ffffffff807a0c95: 81 ffffffff807a0c96: 48 2b 04 25 98 2e 02 sub 0xffffffff81022e98,%rax ffffffff807a0c9d: 81 ffffffff807a0c9e: 65 48 83 00 01 addq $0x1,%gs:(%rax)
The counter is tcp_inp_lro_compressed