Page MenuHomeFreeBSD

Final prep patch for BBR
ClosedPublic

Authored by rrs on Aug 1 2019, 2:33 PM.

Details

Summary

BBR requires precise timing information to correctly handle delays in LRO. This
is obtained with a new option to LRO where in a list of mbufs is passed up
to LRO. This hopefully is the final preparation patch for my next commit which
will bring BBRv1 into the FreeBSD tree yeah!

Test Plan

We have tested this extensively at NF running in production now for close to
1 Year :)

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

The LRO patch was missing data and length limits for input processing via
MBUF_QUEUE methods. This update adds that and an elaborate set of
comments to the rack_bbr_common.c module describing in detail what
a transport designer using MBUF_QUEUEING needs to contemplate.

One more update, mainly comments but we also fix a couple of things

  1. The timestamp provided via the ctf common functions should be the real time not something from the mbuf, we let the transports look at those flags (which both BBR and the latest Rack do).
  2. We also for LRO add a gating of the number of acks (currently set to infinity) that can also cause a wakeup.
gallatin added inline comments.
netinet/tcp_lro.c
862 ↗(On Diff #60829)

Would it make sense to have a sysctl to disable the in_pcblookup? Eg, to just assume the lookup will tell you that can_queue is going to be 0.

In fact, I'd further suggest that when you load a TCP stack, it is marked with whether or not it supports queued lro, and when a TCP stack is loaded or unloaded, a global "tcp_stack_can_queue" is recalculated, base on the OR of all loaded TCP stacks's values for supporting queued lro.

So you'd have something like:

if (!queued_lro_forbidden && tcp_stack_can_queue_lro) {

/* do in_pcblookups .. *?

} else {

/* can_queue = 0 */

}

This revision is now accepted and ready to land.Aug 21 2019, 12:42 PM
netinet/tcp_lro.c
862 ↗(On Diff #60829)

Yeah thats a great idea, I think we could add a
"tcp_lro_register()/tcp_lro_unregister()" that a
stack could do to increment/decrement an atomic that
says we are going to use the MBUF queue mechanism. Then
based on the atomic skip the lookup and go right to
the compress methods (assuming no one as registered for it).

I will spin that up and update the patch.

This adds a simple registration mechanism that stacks can use as they successfully register/deregister to
make the LRO code aware if any stack wants to use mbuf-queueing per Drew's suggestion.

This revision now requires review to proceed.Sep 3 2019, 11:54 AM

First version was missing a (void) in the registration function.. opps

This revision was not accepted when it landed; it landed in state Needs Review.Sep 6 2019, 2:25 PM
This revision was automatically updated to reflect the committed changes.
mjg reopened this revision.EditedSep 6 2019, 6:03 PM
mjg added a subscriber: mjg.

This crashes for me on boot.

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x0
fault code = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff807a0c9e
stack pointer = 0x28:0xfffffe00d39fe870
frame pointer = 0x28:0xfffffe00d39fe890
code segment = base 0x0, limit 0xfffff, type 0x1b

= DPL 0, pres 1, long 1, def32 0, gran 1

processor eflags = interrupt enabled, resume, IOPL = 0
current process = 0 (if_io_tqg_0)
trap number = 12
panic: page fault
cpuid = 0
time = 1567793282
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00d39fe530
vpanic() at vpanic+0x19d/frame 0xfffffe00d39fe580
panic() at panic+0x43/frame 0xfffffe00d39fe5e0
trap_fatal() at trap_fatal+0x39c/frame 0xfffffe00d39fe640
trap_pfault() at trap_pfault+0x62/frame 0xfffffe00d39fe690
trap() at trap+0x2b2/frame 0xfffffe00d39fe7a0
calltrap() at calltrap+0x8/frame 0xfffffe00d39fe7a0

  • trap 0xc, rip = 0xffffffff807a0c9e, rsp = 0xfffffe00d39fe870, rbp = 0xfffffe00d39fe890 ---

tcp_lro_flush() at tcp_lro_flush+0x2e/frame 0xfffffe00d39fe890
tcp_lro_rx_done() at tcp_lro_rx_done+0x82/frame 0xfffffe00d39fe8b0
tcp_lro_flush_all() at tcp_lro_flush_all+0x9c/frame 0xfffffe00d39fe8f0
_task_fn_rx() at _task_fn_rx+0xd22/frame 0xfffffe00d39fe9f0
gtaskqueue_run_locked() at gtaskqueue_run_locked+0xf9/frame 0xfffffe00d39fea40
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x88/frame 0xfffffe00d39fea70
fork_exit() at fork_exit+0x84/frame 0xfffffe00d39feab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00d39feab0

  • trap 0, rip = 0, rsp = 0, rbp = 0 ---

KDB: enter: panic
[ thread pid 0 tid 100050 ]
Stopped at kdb_enter+0x3b: movq $0,kdb_why

igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
	ether 00:25:90:ca:6b:b4
	inet 192.168.5.154 netmask 0xffffff00 broadcast 192.168.5.255
	media: Ethernet autoselect (1000baseT <full-duplex>)
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

the box in question is lynx4 in zoo, I can give you access if needed.

Looks like it's counters which don't get allocated.

addr2line -e kernel.debug 0xffffffff807a0c9e
/tank/users/mjg/src/freebsd/sys/amd64/include/counter.h:85

ffffffff807a0c70 <tcp_lro_flush>:
ffffffff807a0c70:       55                      push   %rbp
ffffffff807a0c71:       48 89 e5                mov    %rsp,%rbp
ffffffff807a0c74:       41 57                   push   %r15
ffffffff807a0c76:       41 56                   push   %r14
ffffffff807a0c78:       53                      push   %rbx
ffffffff807a0c79:       50                      push   %rax
ffffffff807a0c7a:       49 89 f7                mov    %rsi,%r15
ffffffff807a0c7d:       49 89 fe                mov    %rdi,%r14
ffffffff807a0c80:       48 8b 46 20             mov    0x20(%rsi),%rax
ffffffff807a0c84:       48 3b 46 30             cmp    0x30(%rsi),%rax
ffffffff807a0c88:       0f 84 b5 02 00 00       je     ffffffff807a0f43 <tcp_lro_flush+0x2d3>
ffffffff807a0c8e:       48 8b 04 25 38 28 00    mov    0xffffffff81002838,%rax
ffffffff807a0c95:       81 
ffffffff807a0c96:       48 2b 04 25 98 2e 02    sub    0xffffffff81022e98,%rax
ffffffff807a0c9d:       81 
ffffffff807a0c9e:       65 48 83 00 01          addq   $0x1,%gs:(%rax)

The counter is tcp_inp_lro_compressed

This revision was not accepted when it landed; it landed in state Needs Review.Sep 6 2019, 6:29 PM
This revision was automatically updated to reflect the committed changes.