Page MenuHomeFreeBSD

tcp: keep SACK scoreboard sorted when doing rescue retransmission
ClosedPublic

Authored by rscheff on Apr 18 2021, 6:48 PM.

Details

Summary

Once a rescue retransmission was prepared and sent, subsequent
SACK ranges may need to be inserted just prior of the hole
covering this rescue retransmission, rather than at the tail.
Failing to maintain a strictly ordered scoreboard will result
in spurious retransmissions, or a KASSERT when invariants are
active.

MFC after: 3 days
Sponsored by: NetApp, Inc.

Test Plan

Run /tools/test/stress2/misc/jumbo.sh. The high load
can result in CPU affinity shifts, looking like more
or less severe reordering and partial ACKs while
SACK loss recovery is still ongoing. Without this fix,
"panic: tcp_output: sack block to the left of una"
may be observed.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

This revision is now accepted and ready to land.Apr 18 2021, 7:02 PM
20210418 22:25:18 all (1/1): jumbo.sh
stress2: pts leak: 1/2
20210418 22:26:21 all (1/1): jumbo.sh
20210418 22:27:20 all (1/1): jumbo.sh
20210418 22:28:20 all (1/1): jumbo.sh
20210418 22:29:20 all (1/1): jumbo.sh
20210418 22:30:20 all.sh done, elapsed 0 day(s), 00:05.03

20210418 22:33:17 all (1/1): tcp4.sh
witness_lock_list_get: witness exhausted
Expensive timeout(9) function: 0xffffffff80e04fe0(0xfffffe01441b34d8) 0.011586722 s
Expensive timeout(9) function: 0xffffffff80e04fe0(0xfffffe016594a0c0) 0.041485479 s
Expensive timeout(9) function: 0xffffffff80f5b210(0) 0.058046346 s
Expensive timeout(9) function: 0xffffffff80e04fe0(0xfffffe016709a478) 0.136817083 s
Apr 18 22:36:54 mercat1 su[80814]: pho to root on /dev/pts/1
panic: tcp_output: sack block to the left of una : -3778580
cpuid = 2
time = 1618778260
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e49b0530
vpanic() at vpanic+0x181/frame 0xfffffe00e49b0580
panic() at panic+0x43/frame 0xfffffe00e49b05e0
tcp_output() at tcp_output+0x2b8b/frame 0xfffffe00e49b07c0
tcp_do_segment() at tcp_do_segment+0x3246/frame 0xfffffe00e49b08b0
tcp_input_with_port() at tcp_input_with_port+0xc13/frame 0xfffffe00e49b0a00
tcp_input() at tcp_input+0xb/frame 0xfffffe00e49b0a10
ip_input() at ip_input+0x194/frame 0xfffffe00e49b0aa0
swi_net() at swi_net+0x1a1/frame 0xfffffe00e49b0b20
ithread_loop() at ithread_loop+0x279/frame 0xfffffe00e49b0bb0
fork_exit() at fork_exit+0x80/frame 0xfffffe00e49b0bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e49b0bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 12 tid 100100 ]
Stopped at      kdb_enter+0x37: movq    $0,0x128410e(%rip)
db> x/s version
version:        FreeBSD 14.0-CURRENT #1 main-n246155-b6a572d03f6-dirty: Sun Apr 18 22:20:42 CEST 2021\012    pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012
db>
In D29825#669253, @pho wrote:
20210418 22:25:18 all (1/1): jumbo.sh
stress2: pts leak: 1/2
20210418 22:26:21 all (1/1): jumbo.sh
20210418 22:27:20 all (1/1): jumbo.sh
20210418 22:28:20 all (1/1): jumbo.sh
20210418 22:29:20 all (1/1): jumbo.sh
20210418 22:30:20 all.sh done, elapsed 0 day(s), 00:05.03

20210418 22:33:17 all (1/1): tcp4.sh
witness_lock_list_get: witness exhausted
Expensive timeout(9) function: 0xffffffff80e04fe0(0xfffffe01441b34d8) 0.011586722 s
Expensive timeout(9) function: 0xffffffff80e04fe0(0xfffffe016594a0c0) 0.041485479 s
Expensive timeout(9) function: 0xffffffff80f5b210(0) 0.058046346 s
Expensive timeout(9) function: 0xffffffff80e04fe0(0xfffffe016709a478) 0.136817083 s
Apr 18 22:36:54 mercat1 su[80814]: pho to root on /dev/pts/1
panic: tcp_output: sack block to the left of una : -3778580
cpuid = 2
time = 1618778260
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e49b0530
vpanic() at vpanic+0x181/frame 0xfffffe00e49b0580
panic() at panic+0x43/frame 0xfffffe00e49b05e0
tcp_output() at tcp_output+0x2b8b/frame 0xfffffe00e49b07c0
tcp_do_segment() at tcp_do_segment+0x3246/frame 0xfffffe00e49b08b0
tcp_input_with_port() at tcp_input_with_port+0xc13/frame 0xfffffe00e49b0a00
tcp_input() at tcp_input+0xb/frame 0xfffffe00e49b0a10
ip_input() at ip_input+0x194/frame 0xfffffe00e49b0aa0
swi_net() at swi_net+0x1a1/frame 0xfffffe00e49b0b20
ithread_loop() at ithread_loop+0x279/frame 0xfffffe00e49b0bb0
fork_exit() at fork_exit+0x80/frame 0xfffffe00e49b0bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e49b0bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 12 tid 100100 ]
Stopped at      kdb_enter+0x37: movq    $0,0x128410e(%rip)
db> x/s version
version:        FreeBSD 14.0-CURRENT #1 main-n246155-b6a572d03f6-dirty: Sun Apr 18 22:20:42 CEST 2021\012    pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012
db>

Is this with or without the patch in this review?

db> x/s version
version: FreeBSD 14.0-CURRENT #1 main-n246155-b6a572d03f6-dirty: Sun Apr 18 22:20:42 CEST 2021\012 pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012
db>

Is this with or without the patch in this review?

With.

With main-n246162-b87cf2bc841 I now get:

0210418 23:28:21 all (1/1): tcp4.sh
witness_lock_list_get: witness exhausted
Expensive timeout(9) function: 0xffffffff80e05ea0(0xfffffe015b24ec48) 0.013259969 s
Expensive timeout(9) function: 0xffffffff80e06f00(0xfffffe015ea62060) 0.033885131 s
panic: tcp_output: sack block to the left of una : -5546740
cpuid = 10
time = 1618781362
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e49b0530
vpanic() at vpanic+0x181/frame 0xfffffe00e49b0580
panic() at panic+0x43/frame 0xfffffe00e49b05e0
tcp_output() at tcp_output+0x2b8b/frame 0xfffffe00e49b07c0
tcp_do_segment() at tcp_do_segment+0x3246/frame 0xfffffe00e49b08b0
tcp_input_with_port() at tcp_input_with_port+0xc13/frame 0xfffffe00e49b0a00
tcp_input() at tcp_input+0xb/frame 0xfffffe00e49b0a10
ip_input() at ip_input+0x194/frame 0xfffffe00e49b0aa0
swi_net() at swi_net+0x1a1/frame 0xfffffe00e49b0b20
ithread_loop() at ithread_loop+0x279/frame 0xfffffe00e49b0bb0
fork_exit() at fork_exit+0x80/frame 0xfffffe00e49b0bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e49b0bf0

https://people.freebsd.org/~pho/stress/log/log0095.txt