Details

Reviewers

glebius
tuexen
cc
thj
zlei
rgrimes
kbowling

Group Reviewers

transport

Commits

rG6b2977c5978b: tcp: fix duplicate retransmissions when RTO happens during SACK loss recovery
rG440f4ba18e3a: tcp: fix duplicate retransmissions when RTO happens during SACK loss recovery

Summary

A subtle bug was present in the base TCP stack, virtually since the
inception of SACK loss recovery. However, for a number of reasons
it doesn't show up easily.

Under normal circumstances, when SACK loss recovery is initiated,
snd_nxt and snd_max track each other. Only during non-SACK loss
recovery, snd_nxt deviates from snd_max.

On RTO, snd_nxt is reset to snd_una. Prior to RFC6675 SACK loss
recovery initialization, SACK loss recovery wouldn't have been
initiated right away with the first incoming SACK ACK.

In addition, cwnd used to be 1 MSS right after RTO, increasing
to 2 MSS more recently.

Furthermore, TSO/LRO typically deliver one ACK covering two or
more segments, thus masking the issue of tcp_output alternating
between retransmitting from snd_nxt (without SACK, thus not
dragging hole->rxmit right), followed by retransmitting
from the SACK hole - resulting in the same data sent twice
until a full ACK without SACK (and an empty scoreboard) is reached.

Address this by setting up snd_recover just in cc_cong_signal.

MFC after: 1 week

Test Plan

Disable TSO & LRO on sender and receiver:

ifconfig <if> -lro -tso

Disable behavior like LRD (Lost Retransmission Detection),
PRR (Proportional Rate Reduction) to demonstrate the issue similar to
what exists in older FreeBSD versions:

sysctl net.inet.tcp.do_prr=0
sysctl net.inet.tcp.do_lrd=0 (or net.inet.tcp.sack.lrd=0)

Start an iperf3 server in the background on the receiver:

iperf3 -s

Start a script to use ipfw to induce sudden, and longer loss periods:

while [ 1 ]; do

induce a little loss for ~10-20 ms, to induce SACK loss recovery ipfw add 10 deny tcp from any to any 5201; ipfw delete 10;
induce a longer loss period ~100ms, to trigger RTO ipfw add 10 deny tcp from any to any 5201; sleep 0.1; ipfw delete 10;
allow all packets for 4000 ms to build up the congestion window sleep 4; done

Finally start a tcpdump on the sender:

tcpdump -i <if> -w <file> -s 128 tcp and port 5201 &

and start traffic:

iperf3 -c <client> -t 45

Finally, stop the trace and inspect it for occurances of SACK blocks
and that every ACK during the loss episode is covering at most one
MSS (typically ~1500 bytes). After the RTO with SACK, it is normal
that cwnd remains at 2 MSS, but the presence of DSACK blocks, or
segments covering the same data twice demonstrates this issue.

packetdrill demonstration of packets getting sent twice:

#cat RTO-while-SACK-LR.pkt
//
// A simple test to verify behavior when
// during SACK LR, an RTO happens - and packets
// are sent twice - once from the sackhole, once
// from the pulled back snd_nxt - D43355
//
// Runs clean against FreeBSD 15.0-CURRENT #17 main-n272693-dfcb8de5ef80
//
--tolerance_usecs=25000

0.0 `
ifconfig tun0 tso
sysctl net.inet.tcp.sack.lrd=0
sysctl net.inet.tcp.do_prr=0
sysctl net.inet.tcp.sack.revised=1
sysctl net.inet.tcp.sack.enable=1
sysctl net.inet.tcp.cc.algorithm=cubic
sysctl net.inet.tcp.hostcache.purgenow=1
`
// Create a listening TCP socket.

0.1 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.001 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0.001 setsockopt(3, SOL_SOCKET, SO_SNDBUF, [1048576], 4) = 0
+0.001 setsockopt(3, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
+0.005 bind(3, ..., ...) = 0
+0.005 listen(3, 1) = 0

// Establish a TCP connection.
+0.035 < S  0:0(0) win 65535 <mss 1000, sackOK, wscale 9, nop, nop, nop >
+0.000 > S. 0:0(0) ack 1 win 65535 <...>
+0.000 <  . 1:1(0) ack 1 win 65535
+0.000 accept(3, ..., ...) = 4

+0.01 setsockopt(4, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
0.2  write(4, ..., 100000) = 100000

+0.0     >  .     1:1001(1000) ack 1
+0.0     >  .  1001:2001(1000) ack 1
+0.0     >  .  2001:3001(1000) ack 1
+0.0     >  .  3001:4001(1000) ack 1
+0.0     >  .  4001:5001(1000) ack 1
+0.0     >  .  5001:6001(1000) ack 1
+0.0     >  .  6001:7001(1000) ack 1
+0.0     >  .  7001:8001(1000) ack 1
+0.0     >  .  8001:9001(1000) ack 1
+0.0     >  .  9001:10001(1000) ack 1
+0.001     <  .     1:1(0) ack 1001 win 65535  // ACKing every packet
+0.001     <  .     1:1(0) ack 2001 win 65535  // to built up cwnd fast
+0.001     <  .     1:1(0) ack 3001 win 65535
+0.001     <  .     1:1(0) ack 4001 win 65535
+0.001     <  .     1:1(0) ack 5001 win 65535
+0.001     <  .     1:1(0) ack 6001 win 65535
+0.001     <  .     1:1(0) ack 7001 win 65535
+0.001     <  .     1:1(0) ack 8001 win 65535
+0.001     <  .     1:1(0) ack 9001 win 65535
+0.001     <  .     1:1(0) ack 10001 win 65535
+0      >  . 10001:11001(1000) ack 1
+0      >  . 11001:12001(1000) ack 1
+0      >  . 12001:13001(1000) ack 1
+0      >  . 13001:14001(1000) ack 1
+0      >  . 14001:15001(1000) ack 1
+0      >  . 15001:16001(1000) ack 1
+0      >  . 16001:17001(1000) ack 1
+0      >  . 17001:18001(1000) ack 1
+0      >  . 18001:19001(1000) ack 1
+0      >  . 19001:20001(1000) ack 1
+0.001       <  .  1:1(0) ack 11001 win 65535
+0.001       <  .  1:1(0) ack 12001 win 65535
+0.001       <  .  1:1(0) ack 13001 win 65535
+0.001       <  .  1:1(0) ack 14001 win 65535
+0.001       <  .  1:1(0) ack 15001 win 65535
+0.001       <  .  1:1(0) ack 16001 win 65535
+0.001       <  .  1:1(0) ack 17001 win 65535
+0.001       <  .  1:1(0) ack 18001 win 65535
+0.001       <  .  1:1(0) ack 19001 win 65535
+0.001       <  .  1:1(0) ack 20001 win 65535
+0     >  . 20001:21001(1000) ack 1
+0     >  . 21001:22001(1000) ack 1
+0     >  . 22001:23001(1000) ack 1
+0     >  . 23001:24001(1000) ack 1
+0     >  . 24001:25001(1000) ack 1
+0     >  . 25001:26001(1000) ack 1
+0     >  . 26001:27001(1000) ack 1
+0     >  . 27001:28001(1000) ack 1
+0     >  . 28001:29001(1000) ack 1
+0     >  . 29001:30001(1000) ack 1
+0.001       <  .  1:1(0) ack 21001 win 65535
+0.001       <  .  1:1(0) ack 22001 win 65535
+0.001       <  .  1:1(0) ack 23001 win 65535
+0.001       <  .  1:1(0) ack 24001 win 65535
+0.001       <  .  1:1(0) ack 25001 win 65535
+0.001       <  .  1:1(0) ack 26001 win 65535
+0.001       <  .  1:1(0) ack 27001 win 65535
+0.001       <  .  1:1(0) ack 28001 win 65535
+0.001       <  .  1:1(0) ack 29001 win 65535
+0.001       <  .  1:1(0) ack 30001 win 65535
+0     >  . 30001:31001(1000) ack 1  // we "drop" the next 30 packets
+0     >  . 31001:32001(1000) ack 1
+0     >  . 32001:33001(1000) ack 1
+0     >  . 33001:34001(1000) ack 1
+0     >  . 34001:35001(1000) ack 1
+0     >  . 35001:36001(1000) ack 1
+0     >  . 36001:37001(1000) ack 1
+0     >  . 37001:38001(1000) ack 1
+0     >  . 38001:39001(1000) ack 1
+0     >  . 39001:40001(1000) ack 1

+0     >  . 40001:41001(1000) ack 1
+0     >  . 41001:42001(1000) ack 1
+0     >  . 42001:43001(1000) ack 1
+0     >  . 43001:44001(1000) ack 1
+0     >  . 44001:45001(1000) ack 1
+0     >  . 45001:46001(1000) ack 1
+0     >  . 46001:47001(1000) ack 1
+0     >  . 47001:48001(1000) ack 1
+0     >  . 48001:49001(1000) ack 1
+0     >  . 49001:50001(1000) ack 1

+0     >  . 50001:51001(1000) ack 1
+0     >  . 51001:52001(1000) ack 1
+0     >  . 52001:53001(1000) ack 1
+0     >  . 53001:54001(1000) ack 1
+0     >  . 54001:55001(1000) ack 1
+0     >  . 55001:56001(1000) ack 1
+0     >  . 56001:57001(1000) ack 1
+0     >  . 57001:58001(1000) ack 1
+0     >  . 58001:59001(1000) ack 1
+0     >  . 59001:60001(1000) ack 1

+0     >  . 60001:61001(1000) ack 1
+0     >  . 61001:62001(1000) ack 1
+0     >  . 62001:63001(1000) ack 1
+0     >  . 63001:64001(1000) ack 1
+0     >  . 64001:65001(1000) ack 1
+0     >  . 65001:66001(1000) ack 1
+0     >  . 66001:67001(1000) ack 1
+0     >  . 67001:68001(1000) ack 1
+0     >  . 68001:69001(1000) ack 1
+0     >  . 69001:70001(1000) ack 1

+0.001   <   . 1:1(0) ack 30001 win 65535 <sack 60001:61001, nop, nop>
+0     >  . 70001:71001(1000) ack 1

+0.001   <   . 1:1(0) ack 30001 win 65535 <sack 60001:62001, nop, nop>
+0     >  . 71001:72001(1000) ack 1

+0.001   <   . 1:1(0) ack 30001 win 65535 <sack 60001:63001, nop, nop>
+0     >  . 30001:31001(1000) ack 1   // we expect the retransmission around here

+0.001   <   . 1:1(0) ack 30001 win 65535 <sack 60001:64001, nop, nop>
+0     >  . 31001:32001(1000) ack 1

+0.20~+0.30  >  . 30001:31001(1000) ack 1
+0.20~+0.60  >  . 30001:31001(1000) ack 1

+0.001   <   . 1:1(0) ack 31001 win 65535 <sack 60001:65001, nop, nop>

+0     >  . 31001:32001(1000) ack 1
+0     >  . 32001:33001(1000) ack 1

+0.001   <   . 1:1(0) ack 32001 win 65535 <sack 60001:66001, nop, nop>
+0     >  . 32001:33001(1000) ack 1
+0     >  . 33001:34001(1000) ack 1

+0.001   <   . 1:1(0) ack 33001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 34001:35001(1000) ack 1

+0.001   <   . 1:1(0) ack 34001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 34001:35001(1000) ack 1
+0     >  . 35001:36001(1000) ack 1

+0.001   <   . 1:1(0) ack 35001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 36001:37001(1000) ack 1

+0.001   <   . 1:1(0) ack 36001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 36001:37001(1000) ack 1
+0     >  . 37001:38001(1000) ack 1

+0.001   <   . 1:1(0) ack 37001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 38001:39001(1000) ack 1

+0.001   <   . 1:1(0) ack 38001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 38001:39001(1000) ack 1
+0     > .  39001:40001(1000) ack 1

+0.001   <   . 1:1(0) ack 39001 win 65535 <sack 60001:67001, nop, nop>
+0     > .  40001:41001(1000) ack 1

+0.001   <   . 1:1(0) ack 40001 win 65535 <sack 60001:67001, nop, nop>
+0     > .  40001:41001(1000) ack 1
+0     > .  41001:42001(1000) ack 1

+0.001   <   . 1:1(0) ack 41001 win 65535 <sack 60001:67001, nop, nop>
+0     > .  42001:43001(1000) ack 1

+0.001   <   . 1:1(0) ack 42001 win 65535 <sack 60001:67001, nop, nop>
+0     > .  42001:43001(1000) ack 1
+0     > .  43001:44001(1000) ack 1

+0.001   <   . 1:1(0) ack 43001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 44001:45001(1000) ack 1

+0.001   <   . 1:1(0) ack 44001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 44001:45001(1000) ack 1
+0     >  . 45001:46001(1000) ack 1

+0.001   <   . 1:1(0) ack 45001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 46001:47001(1000) ack 1

+0.001   <   . 1:1(0) ack 46001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 46001:47001(1000) ack 1
+0     >  . 47001:48001(1000) ack 1

+0.001   <   . 1:1(0) ack 47001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 48001:49001(1000) ack 1

+0.001   <   . 1:1(0) ack 48001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 48001:49001(1000) ack 1
+0     >  . 49001:50001(1000) ack 1

+0.001   <   . 1:1(0) ack 49001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 50001:51001(1000) ack 1

+0.001   <   . 1:1(0) ack 50001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 50001:51001(1000) ack 1
+0     >  . 51001:52001(1000) ack 1

+0.001   <   . 1:1(0) ack 51001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 52001:53001(1000) ack 1

+0.001   <   . 1:1(0) ack 52001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 52001:53001(1000) ack 1
+0     >  . 53001:54001(1000) ack 1

+0.001   <   . 1:1(0) ack 53001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 54001:55001(1000) ack 1

+0.001   <   . 1:1(0) ack 54001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 54001:55001(1000) ack 1
+0     >  . 55001:56001(1000) ack 1

+0.001   <   . 1:1(0) ack 55001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 56001:57001(1000) ack 1

+0.001   <   . 1:1(0) ack 56001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 56001:57001(1000) ack 1
+0     >  . 57001:58001(1000) ack 1

+0.001   <   . 1:1(0) ack 57001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 58001:59001(1000) ack 1

+0.001   <   . 1:1(0) ack 58001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 58001:59001(1000) ack 1
+0     >  . 59001:60001(1000) ack 1

+0.001   <   . 1:1(0) ack 59001 win 65535 <sack 60001:67001, nop, nop>

+0.001   <   . 1:1(0) ack 67001 win 65535
+0     >  . 67001:68001(1000) ack 1
+0     >  . 68001:69001(1000) ack 1
+0     >  . 69001:70001(1000) ack 1

+0.001   <   . 1:1(0) ack 69001 win 65535
+0     >  . 70001:71001(1000) ack 1
+0     >  . 71001:72001(1000) ack 1
+0     >  . 72001:73001(1000) ack 1

+0.001   <   . 1:1(0) ack 71001 win 65535
+0     >  . 73001:74001(1000) ack 1
+0     >  . 74001:75001(1000) ack 1
+0     >  . 75001:76001(1000) ack 1
+0     >  . 76001:77001(1000) ack 1

+0.001   <   . 1:1(0) ack 73001 win 65535
+0     >  . 77001:78001(1000) ack 1
+0     >  . 78001:79001(1000) ack 1
+0     >  . 79001:80001(1000) ack 1
+0     >  . 80001:81001(1000) ack 1

+0.001   <   . 1:1(0) ack 75001 win 65535
+0     >  . 81001:82001(1000) ack 1
+0     >  . 82001:83001(1000) ack 1
+0     >  . 83001:84001(1000) ack 1
+0     >  . 84001:85001(1000) ack 1

+0.001   <   . 1:1(0) ack 77001 win 65535
+0     >  . 85001:86001(1000) ack 1
+0     >  . 86001:87001(1000) ack 1
+0     >  . 87001:88001(1000) ack 1
+0     >  . 88001:89001(1000) ack 1

+0.001   <   . 1:1(0) ack 79001 win 65535
+0     >  . 89001:90001(1000) ack 1
+0     >  . 90001:91001(1000) ack 1
+0     >  . 91001:92001(1000) ack 1
+0     >  . 92001:93001(1000) ack 1

+0.001   <   . 1:1(0) ack 81001 win 65535
+0     >  . 93001:94001(1000) ack 1
+0     >  . 94001:95001(1000) ack 1
+0     >  . 95001:96001(1000) ack 1
+0     >  . 96001:97001(1000) ack 1

+0.001   <   . 1:1(0) ack 83001 win 65535
+0     >  . 97001:98001(1000) ack 1
+0     >  . 98001:99001(1000) ack 1

+0     > P. 99001:100001(1000) ack 1

+0.001   <   . 1:1(0) ack 85001 win 65535

+0.001   <   . 1:1(0) ack 87001 win 65535

+0.001   <   . 1:1(0) ack 89001 win 65535

+0.001   <   . 1:1(0) ack 91001 win 65535

+0.001   <   . 1:1(0) ack 93001 win 65535

+0.001   <   . 1:1(0) ack 94001 win 65535

+0.001   <   . 1:1(0) ack 96001 win 65535

+0.001   <   . 1:1(0) ack 98001 win 65535

+0.001   <   . 1:1(0) ack 100001 win 65535
+0.010 close(4) = 0
+0 > F. 100001:100001(0) ack 1
+0.1 < F. 1:1(0) ack 100002 win 65535

`
`

With the fix, the exchange then looks like this - not that there are no duplicate packets any more, and a 2nd round of SACKs sind the trailing packets are simulated to be lost:

//
// A simple test to verify behavior when
// during SACK LR, an RTO happens - and packets
// are sent twice - once from the sackhole, once
// from the pulled back snd_nxt - D43355
//
// Runs clean against FreeBSD 15.0-CURRENT #16 D43355_sack_after_rto-n272715-073145f14da4
//
--tolerance_usecs=25000

0.0 `
ifconfig tun0 tso
sysctl net.inet.tcp.sack.lrd=0
sysctl net.inet.tcp.do_prr=0
sysctl net.inet.tcp.sack.revised=1
sysctl net.inet.tcp.sack.enable=1
sysctl net.inet.tcp.cc.algorithm=cubic
sysctl net.inet.tcp.hostcache.purgenow=1
`
// Create a listening TCP socket.

0.1 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.001 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0.001 setsockopt(3, SOL_SOCKET, SO_SNDBUF, [1048576], 4) = 0
+0.001 setsockopt(3, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
+0.005 bind(3, ..., ...) = 0
+0.005 listen(3, 1) = 0

// Establish a TCP connection.
+0.035 < S  0:0(0) win 65535 <mss 1000, sackOK, wscale 9, nop, nop, nop >
+0.000 > S. 0:0(0) ack 1 win 65535 <...>
+0.000 <  . 1:1(0) ack 1 win 65535
+0.000 accept(3, ..., ...) = 4

+0.01 setsockopt(4, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
0.2  write(4, ..., 100000) = 100000

+0.0     >  .     1:1001(1000) ack 1
+0.0     >  .  1001:2001(1000) ack 1
+0.0     >  .  2001:3001(1000) ack 1
+0.0     >  .  3001:4001(1000) ack 1
+0.0     >  .  4001:5001(1000) ack 1
+0.0     >  .  5001:6001(1000) ack 1
+0.0     >  .  6001:7001(1000) ack 1
+0.0     >  .  7001:8001(1000) ack 1
+0.0     >  .  8001:9001(1000) ack 1
+0.0     >  .  9001:10001(1000) ack 1
+0.001     <  .     1:1(0) ack 1001 win 65535  // ACKing every packet
+0.001     <  .     1:1(0) ack 2001 win 65535  // to built up cwnd fast
+0.001     <  .     1:1(0) ack 3001 win 65535
+0.001     <  .     1:1(0) ack 4001 win 65535
+0.001     <  .     1:1(0) ack 5001 win 65535
+0.001     <  .     1:1(0) ack 6001 win 65535
+0.001     <  .     1:1(0) ack 7001 win 65535
+0.001     <  .     1:1(0) ack 8001 win 65535
+0.001     <  .     1:1(0) ack 9001 win 65535
+0.001     <  .     1:1(0) ack 10001 win 65535
+0      >  . 10001:11001(1000) ack 1
+0      >  . 11001:12001(1000) ack 1
+0      >  . 12001:13001(1000) ack 1
+0      >  . 13001:14001(1000) ack 1
+0      >  . 14001:15001(1000) ack 1
+0      >  . 15001:16001(1000) ack 1
+0      >  . 16001:17001(1000) ack 1
+0      >  . 17001:18001(1000) ack 1
+0      >  . 18001:19001(1000) ack 1
+0      >  . 19001:20001(1000) ack 1
+0.001       <  .  1:1(0) ack 11001 win 65535
+0.001       <  .  1:1(0) ack 12001 win 65535
+0.001       <  .  1:1(0) ack 13001 win 65535
+0.001       <  .  1:1(0) ack 14001 win 65535
+0.001       <  .  1:1(0) ack 15001 win 65535
+0.001       <  .  1:1(0) ack 16001 win 65535
+0.001       <  .  1:1(0) ack 17001 win 65535
+0.001       <  .  1:1(0) ack 18001 win 65535
+0.001       <  .  1:1(0) ack 19001 win 65535
+0.001       <  .  1:1(0) ack 20001 win 65535
+0     >  . 20001:21001(1000) ack 1
+0     >  . 21001:22001(1000) ack 1
+0     >  . 22001:23001(1000) ack 1
+0     >  . 23001:24001(1000) ack 1
+0     >  . 24001:25001(1000) ack 1
+0     >  . 25001:26001(1000) ack 1
+0     >  . 26001:27001(1000) ack 1
+0     >  . 27001:28001(1000) ack 1
+0     >  . 28001:29001(1000) ack 1
+0     >  . 29001:30001(1000) ack 1
+0.001       <  .  1:1(0) ack 21001 win 65535
+0.001       <  .  1:1(0) ack 22001 win 65535
+0.001       <  .  1:1(0) ack 23001 win 65535
+0.001       <  .  1:1(0) ack 24001 win 65535
+0.001       <  .  1:1(0) ack 25001 win 65535
+0.001       <  .  1:1(0) ack 26001 win 65535
+0.001       <  .  1:1(0) ack 27001 win 65535
+0.001       <  .  1:1(0) ack 28001 win 65535
+0.001       <  .  1:1(0) ack 29001 win 65535
+0.001       <  .  1:1(0) ack 30001 win 65535
+0     >  . 30001:31001(1000) ack 1  // we "drop" the next 30 packets
+0     >  . 31001:32001(1000) ack 1
+0     >  . 32001:33001(1000) ack 1
+0     >  . 33001:34001(1000) ack 1
+0     >  . 34001:35001(1000) ack 1
+0     >  . 35001:36001(1000) ack 1
+0     >  . 36001:37001(1000) ack 1
+0     >  . 37001:38001(1000) ack 1
+0     >  . 38001:39001(1000) ack 1
+0     >  . 39001:40001(1000) ack 1

+0     >  . 40001:41001(1000) ack 1
+0     >  . 41001:42001(1000) ack 1
+0     >  . 42001:43001(1000) ack 1
+0     >  . 43001:44001(1000) ack 1
+0     >  . 44001:45001(1000) ack 1
+0     >  . 45001:46001(1000) ack 1
+0     >  . 46001:47001(1000) ack 1
+0     >  . 47001:48001(1000) ack 1
+0     >  . 48001:49001(1000) ack 1
+0     >  . 49001:50001(1000) ack 1

+0     >  . 50001:51001(1000) ack 1
+0     >  . 51001:52001(1000) ack 1
+0     >  . 52001:53001(1000) ack 1
+0     >  . 53001:54001(1000) ack 1
+0     >  . 54001:55001(1000) ack 1
+0     >  . 55001:56001(1000) ack 1
+0     >  . 56001:57001(1000) ack 1
+0     >  . 57001:58001(1000) ack 1
+0     >  . 58001:59001(1000) ack 1
+0     >  . 59001:60001(1000) ack 1

+0     >  . 60001:61001(1000) ack 1
+0     >  . 61001:62001(1000) ack 1
+0     >  . 62001:63001(1000) ack 1
+0     >  . 63001:64001(1000) ack 1
+0     >  . 64001:65001(1000) ack 1
+0     >  . 65001:66001(1000) ack 1
+0     >  . 66001:67001(1000) ack 1
+0     >  . 67001:68001(1000) ack 1
+0     >  . 68001:69001(1000) ack 1
+0     >  . 69001:70001(1000) ack 1

+0.001   <   . 1:1(0) ack 30001 win 65535 <sack 60001:61001, nop, nop>
+0     >  . 70001:71001(1000) ack 1

+0.001   <   . 1:1(0) ack 30001 win 65535 <sack 60001:62001, nop, nop>
+0     >  . 71001:72001(1000) ack 1

+0.001   <   . 1:1(0) ack 30001 win 65535 <sack 60001:63001, nop, nop>
+0     >  . 30001:31001(1000) ack 1   // we expect the retransmission around here

+0.001   <   . 1:1(0) ack 30001 win 65535 <sack 60001:64001, nop, nop>
+0     >  . 31001:32001(1000) ack 1

+0.20~+0.30  >  . 30001:31001(1000) ack 1
+0.20~+0.60  >  . 30001:31001(1000) ack 1

+0.001   <   . 1:1(0) ack 31001 win 65535 <sack 60001:65001, nop, nop>

+0     >  . 31001:32001(1000) ack 1
+0     >  . 32001:33001(1000) ack 1

+0.001   <   . 1:1(0) ack 32001 win 65535 <sack 60001:66001, nop, nop>
+0     >  . 33001:34001(1000) ack 1
+0     >  . 34001:35001(1000) ack 1

+0.001   <   . 1:1(0) ack 33001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 35001:36001(1000) ack 1
+0     >  . 36001:37001(1000) ack 1

+0.001   <   . 1:1(0) ack 34001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 37001:38001(1000) ack 1
+0     >  . 38001:39001(1000) ack 1

+0.001   <   . 1:1(0) ack 35001 win 65535 <sack 60001:67001, nop, nop>
+0     > .  39001:40001(1000) ack 1
+0     > .  40001:41001(1000) ack 1

+0.001   <   . 1:1(0) ack 36001 win 65535 <sack 60001:67001, nop, nop>
+0     > .  41001:42001(1000) ack 1
+0     > .  42001:43001(1000) ack 1

+0.001   <   . 1:1(0) ack 37001 win 65535 <sack 60001:67001, nop, nop>
+0     > .  43001:44001(1000) ack 1
+0     >  . 44001:45001(1000) ack 1

+0.001   <   . 1:1(0) ack 38001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 45001:46001(1000) ack 1
+0     >  . 46001:47001(1000) ack 1

+0.001   <   . 1:1(0) ack 39001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 47001:48001(1000) ack 1
+0     >  . 48001:49001(1000) ack 1

+0.001   <   . 1:1(0) ack 40001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 49001:50001(1000) ack 1
+0     >  . 50001:51001(1000) ack 1

+0.001   <   . 1:1(0) ack 41001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 51001:52001(1000) ack 1
+0     >  . 52001:53001(1000) ack 1

+0.001   <   . 1:1(0) ack 42001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 53001:54001(1000) ack 1
+0     >  . 54001:55001(1000) ack 1

+0.001   <   . 1:1(0) ack 43001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 55001:56001(1000) ack 1
+0     >  . 56001:57001(1000) ack 1

+0.001   <   . 1:1(0) ack 44001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 57001:58001(1000) ack 1
+0     >  . 58001:59001(1000) ack 1

+0.001   <   . 1:1(0) ack 45001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 59001:60001(1000) ack 1
+0     >  . 67001:68001(1000) ack 1

+0.001   <   . 1:1(0) ack 46001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 68001:69001(1000) ack 1
+0     >  . 69001:70001(1000) ack 1

+0.001   <   . 1:1(0) ack 47001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 70001:71001(1000) ack 1
+0     >  . 71001:72001(1000) ack 1

+0.001   <   . 1:1(0) ack 48001 win 65535 <sack 60001:67001, nop, nop>

+0.001   <   . 1:1(0) ack 49001 win 65535 <sack 60001:67001, nop, nop>

+0.001   <   . 1:1(0) ack 50001 win 65535 <sack 60001:67001, nop, nop>

+0.001   <   . 1:1(0) ack 51001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 72001:73001(1000) ack 1

+0.001   <   . 1:1(0) ack 52001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 73001:74001(1000) ack 1
+0     >  . 74001:75001(1000) ack 1

+0.001   <   . 1:1(0) ack 53001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 75001:76001(1000) ack 1
+0     >  . 76001:77001(1000) ack 1

+0.001   <   . 1:1(0) ack 54001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 77001:78001(1000) ack 1
+0     >  . 78001:79001(1000) ack 1

+0.001   <   . 1:1(0) ack 55001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 79001:80001(1000) ack 1
+0     >  . 80001:81001(1000) ack 1

+0.001   <   . 1:1(0) ack 56001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 81001:82001(1000) ack 1
+0     >  . 82001:83001(1000) ack 1

+0.001   <   . 1:1(0) ack 57001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 83001:84001(1000) ack 1
+0     >  . 84001:85001(1000) ack 1

+0.001   <   . 1:1(0) ack 58001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 85001:86001(1000) ack 1
+0     >  . 86001:87001(1000) ack 1

+0.001   <   . 1:1(0) ack 59001 win 65535 <sack 60001:67001, nop, nop>
+0     >  . 87001:88001(1000) ack 1
+0     >  . 88001:89001(1000) ack 1

+0.001   <   . 1:1(0) ack 67001 win 65535
+0     >  . 89001:90001(1000) ack 1
+0     >  . 90001:91001(1000) ack 1
+0     >  . 91001:92001(1000) ack 1
+0     >  . 92001:93001(1000) ack 1
+0     >  . 93001:94001(1000) ack 1
+0     >  . 94001:95001(1000) ack 1
+0     >  . 95001:96001(1000) ack 1
+0     >  . 96001:97001(1000) ack 1
+0     >  . 97001:98001(1000) ack 1
+0     >  . 98001:99001(1000) ack 1

+0.001   <   . 1:1(0) ack 69001 win 65535
+0     > P. 99001:100001(1000) ack 1

+0.001   <   . 1:1(0) ack 71001 win 65535

+0.001   <   . 1:1(0) ack 73001 win 65535

+0.001   <   . 1:1(0) ack 75001 win 65535

+0.001   <   . 1:1(0) ack 77001 win 65535

+0.001   <   . 1:1(0) ack 79001 win 65535

+0.001   <   . 1:1(0) ack 81001 win 65535

+0.001   <   . 1:1(0) ack 83001 win 65535

+0.001   <   . 1:1(0) ack 85001 win 65535

+0.001   <   . 1:1(0) ack 87001 win 65535

+0.001   <   . 1:1(0) ack 89001 win 65535

+0.001   <   . 1:1(0) ack 91001 win 65535

+0.001   <   . 1:1(0) ack 93001 win 65535



+0.001   <   . 1:1(0) ack 94001 win 65535

+0.001   <   . 1:1(0) ack 96001 win 65535

+0.001   <   . 1:1(0) ack 98001 win 65535

+0.001   <   . 1:1(0) ack 100001 win 65535
+0.010 close(4) = 0
+0 > F. 100001:100001(0) ack 1
+0.1 < F. 1:1(0) ack 100002 win 65535

`
`

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

rscheff created this revision.Jan 7 2024, 4:24 PM

Herald added a reviewer: transport. · View Herald TranscriptJan 7 2024, 4:24 PM

Herald added subscribers: melifaro, imp. · View Herald Transcript

rscheff requested review of this revision.Jan 7 2024, 4:24 PM

Harbormaster completed remote builds in B55284: Diff 132404.Jan 7 2024, 4:24 PM

set snd_recover to snd_fack on RTO
pull snd_nxt right during SACK after RTO

Harbormaster completed remote builds in B55285: Diff 132410.Jan 7 2024, 8:20 PM

init recover to fack on RTO
pull snd_nxt right during SACK after RTO
only enter FastRecovery per RFC6675 when on the right edge

Harbormaster completed remote builds in B55346: Diff 132540.Jan 10 2024, 11:54 AM

prevent retransmitting of covered SACK holes when FR activates as snd_nxt closes with snd_max

Harbormaster completed remote builds in B55361: Diff 132577.Jan 11 2024, 12:31 AM

prevent sack-resending at nxt==max. Will still enter FR (and CC reaction)

Harbormaster completed remote builds in B55373: Diff 132636.Jan 11 2024, 6:54 PM

Revert previous commit, as it can lead to SACK accounting panics

Harbormaster completed remote builds in B55378: Diff 132647.Jan 11 2024, 11:42 PM

pull snd_nxt forward when transmitting from SACK hole
only enter FR by 6675 when at the right edge, and when after RTO, only when new SACK holes show up
update sack_adjust comments

Harbormaster completed remote builds in B55417: Diff 132722.Jan 12 2024, 10:55 PM

With this patch, the classic (pre-SACK) loss-recovery "indication" of snd_nxt < snd_max as well as the IN_FASTRECOVERY() flags will follow the SACK transmission selection path, if appropriate. The difference being that during IN_FASTRECOVERY, the per-ACK CC reaction (cwnd adjustment) is disabled - except for PRR - while in the case of (snd_nxt < snd_max), the CC controls the congestion window (e.g. uses slow start after the RTO until reaching ssthresh, then using congestion avoidance).

tcp_sack_adjust will also inflate cwnd (what normally - IN_FASTRECOVERY - happens when calculating the pipe).

In addition, the RFC6675 trigger to enter Fast Recovery (an ACK with a SACK option covering at least 2*MSS+1 bytes) is only active while snd_nxt is at the right edge, and requires additional indications of new losses right after a Retransmission Timeout.

In order for normal SACK processing to work correctly, after successfully transmitting a packet, snd_nxt has to be dragged along - eventually becoming equal to snd_max again, thus the additional checks in the RFC6675 FR trigger.

when retransmitting after RTO, don't inflate cnwd to nxt-una+mss, but only cwnd+mss

Harbormaster completed remote builds in B55435: Diff 132750.Jan 14 2024, 1:07 PM

when retransmitting from (nxt<max), use cwnd only
simplify 6675 FR trigger condition

Harbormaster completed remote builds in B55444: Diff 132777.Jan 15 2024, 8:47 AM

rscheff mentioned this in D43470: tcp: refactor cwnd during SACK transmissions and enable TSO.Jan 21 2024, 7:25 PM

rebase to main

Harbormaster completed remote builds in B55606: Diff 133349.Jan 25 2024, 2:38 PM

rebase to main

Harbormaster completed remote builds in B55633: Diff 133386.Jan 26 2024, 2:47 AM

2- rebase to main

Harbormaster completed remote builds in B55674: Diff 133481.Jan 29 2024, 8:35 AM

rebase to main

Harbormaster completed remote builds in B56079: Diff 134549.Feb 18 2024, 9:11 AM

tuexen accepted this revision.Apr 4 2024, 12:45 PM

This revision is now accepted and ready to land.Apr 4 2024, 12:45 PM

rscheff mentioned this in D45068: tcp: add sysctl to allow/disallow TSO during SACK loss recovery.May 2 2024, 7:50 PM

rscheff mentioned this in D45070: tcp: add counter to track when SACK loss recovery uses TSO.May 2 2024, 7:58 PM

Your summary section claims "In addition, cwnd used to be 1 MSS right after RTO, increasing to 2 MSS more recently." But I could not find the code change where cwnd is changed to 2MSS after RTO. Please elaborate if the summary needs to be revised or I am missing the point.

rebase to main

This revision now requires review to proceed.Sep 18 2024, 8:08 AM

Harbormaster completed remote builds in B59520: Diff 143416.Sep 18 2024, 8:08 AM

rscheff edited the test plan for this revision. (Show Details)Sep 18 2024, 8:31 AM

rscheff edited the test plan for this revision. (Show Details)Oct 7 2024, 12:58 PM

rscheff edited the test plan for this revision. (Show Details)Oct 7 2024, 1:57 PM

rscheff edited the test plan for this revision. (Show Details)Oct 9 2024, 12:24 PM

cc added inline comments.Oct 9 2024, 7:40 PM

sys/netinet/tcp_input.c
2789–2795	For readability, it's better to remove () between relational operators: if (tp->snd_nxt == tp->snd_max && tp->t_rxtshift == 0)
2842–2846	For readability, it's better to remove `()` between relational operators and re-align: if (tcp_is_sack_recovery(tp, &to) && tp->snd_nxt == tp->snd_max && ((tp->t_rxtshift == 0 && sack_changed != SACK_NOCHANGE) \|\| (tp->t_rxtshift > 0 && sack_changed == SACK_NEWLOSS))) { ... }
sys/netinet/tcp_output.c
392	Same here for readability, it's better to remove () between relational operators: if (sack_bytes_rxmt == 0 \|\| SEQ_LT(tp->snd_nxt, tp->snd_max)) {
1640–1651	Same here for readability, it's better to remove () between relational operators and re-align: if (error == 0 && tp->rcv_numsacks > 0 && TCPS_HAVEESTABLISHED(tp->t_state) && tp->t_flags & TF_SACK_PERMIT) {
1647–1650	Same here for readability, it's better to remove () between relational operators and re-align: if (error == 0 && sack_rxmit && SEQ_LT(tp->snd_nxt, SEQ_MIN(p->rxmit, p->end))) {
sys/netinet/tcp_sack.c
1092	no need `()`
1096	no need `()`
1105	no need `()`
1112	no need `()`
1116	no need `()`
1119	no need `()`

tuexen added inline comments.Oct 9 2024, 8:41 PM

sys/netinet/tcp_sack.c
1092	Isn't `style.9` saying: Values in return statements should be enclosed in parentheses.

cc added inline comments.Oct 9 2024, 10:33 PM

sys/netinet/tcp_sack.c
1092	You are right, other tcp files also has the `()` in return values. I must have kept some old memory about the style.

With the provided packetdrill scripts before/after the fix, my test result is in my wiki: testD43355.

Besides the removing of dup retrans, also looks the cwnd and ssthresh both have improvement.

Other than the readability that I suggested, I have no problem with this patch. Thanks for the fix.

This revision is now accepted and ready to land.Oct 10 2024, 3:18 PM

By the way based on my test, I didn't find this statement In addition, cwnd used to be 1 MSS right after RTO, increasing to 2 MSS more recently. to be true in your SUMMARY section. Also Address this by setting up snd_recover just in cc_cong_signal. needs to be revised.

Closed by commit rG440f4ba18e3a: tcp: fix duplicate retransmissions when RTO happens during SACK loss recovery (authored by rscheff). · Explain WhyOct 11 2024, 11:57 AM

This revision was automatically updated to reflect the committed changes.

rscheff added a commit: rG440f4ba18e3a: tcp: fix duplicate retransmissions when RTO happens during SACK loss recovery.

rscheff added a commit: rG6b2977c5978b: tcp: fix duplicate retransmissions when RTO happens during SACK loss recovery.Oct 18 2024, 7:54 AM

tcp: fix erroneous transmission selection after RTO w/ SACK incoming
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 144626

sys/netinet/tcp_input.c

sys/netinet/tcp_output.c

sys/netinet/tcp_sack.c

sys/netinet/tcp_var.h

tcp: fix erroneous transmission selection after RTO w/ SACK incomingClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 144626

sys/netinet/tcp_input.c

sys/netinet/tcp_output.c

sys/netinet/tcp_sack.c

sys/netinet/tcp_var.h

tcp: fix erroneous transmission selection after RTO w/ SACK incoming
ClosedPublic
Actions

Revision Contents
Changeset List