Page MenuHomeFreeBSD

tcp: fix cwnd restricted SACK retransmission loop
ClosedPublic

Authored by rscheff on Sep 20 2022, 2:56 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Oct 9, 12:55 PM
Unknown Object (File)
Tue, Oct 8, 12:30 AM
Unknown Object (File)
Fri, Sep 27, 7:38 AM
Unknown Object (File)
Wed, Sep 25, 3:49 PM
Unknown Object (File)
Tue, Sep 24, 10:00 AM
Unknown Object (File)
Wed, Sep 18, 1:24 AM
Unknown Object (File)
Tue, Sep 17, 11:24 AM
Unknown Object (File)
Mon, Sep 16, 11:26 PM

Details

Summary

While doing the initial SACK retransmission segment while heavily cwnd
constrained, tcp_ouput can erraneously send out the entire sendbuffer
again.

This may happen after an retransmission timeout, which resets snd_nxt
to snd_una while the SACK scoreboard is still populated. In this case,
cwnd is incorrectly inflated, leading to the inappropriate transmission of
all segments above snd_una.

Test Plan
--mtu=1500
--tolerance_usecs=250000

 00.000 `sysctl -w kern.ipc.maxsockbuf=83886080`
+00.000 `sysctl -w net.inet.tcp.delayed_ack=0`
+00.000 `sysctl -w net.inet.tcp.rfc3390=1`
+00.000 `sysctl -w net.inet.tcp.sendspace=65536`
+00.000 `sysctl -w net.inet.tcp.sendbuf_inc=32768`
+00.000 `sysctl -w net.inet.tcp.sendbuf_max=16777216`
+00.000 `sysctl -w net.inet.tcp.sendbuf_auto=1`
+00.000 `sysctl -w net.inet.tcp.recvspace=32768`
+00.000 `sysctl -w net.inet.tcp.recvbuf_max=16777216`
+00.000 `sysctl -w net.inet.tcp.recvbuf_auto=1`
+00.000 `sysctl -w net.inet.tcp.keepinit=5000`
+00.000 `sysctl -w net.inet.tcp.ecn.enable=1`
+00.000 `sysctl -w net.inet.tcp.mssdflt=8948`
+00.000 `sysctl -w net.inet.tcp.minmss=536`
+00.000 `sysctl -w net.inet.ip.maxfragpackets=0`
+00.000 `sysctl -w net.inet.ip.maxfragsperpacket=0`
+00.000 `sysctl -w net.inet.tcp.abc_l_var=44`
+00.000 `sysctl -w net.inet.tcp.initcwnd_segments=44`
+00.000 `sysctl -w net.inet.tcp.delacktime=20`
//+00.000 `sysctl -w net.inet.tcp.rfc6675_pipe=1`

+00.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+00.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+00.000 setsockopt(3, IPPROTO_TCP, TCP_NODELAY, [1], 4) = 0
+00.000 fcntl(3, F_GETFL) = 0x02 (flags O_RDWR)
+00.000 fcntl(3, F_SETFL, O_RDWR | O_NONBLOCK) = 0
+00.000 bind(3, ..., ...) = 0
+00.000 listen(3, 100) = 0
// explicit seq numbers in 3whs lead to accept() failing with EAGAIN.
+00.000 < S  0:0(0)          win 65535 <mss 1460,nop,wscale  8,sackOK,eol,eol>
+00.000 > S. 0:0(0)    ack 1 win 32768 <mss 1460,nop,wscale 11,sackOK,eol,eol>
+00.040 <  . 1:1(0)    ack 1 win 515
+00.000 accept(3, ..., ...) = 4
+00.000 setsockopt(4, IPPROTO_TCP, TCP_LOG, [4], 4) = 0
+00.000 setsockopt(3, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
+00.000 setsockopt(4, SOL_SOCKET, SO_DEBUG, [1], 4) = 0

// A Client Hello comes in.
+00.000 < P.           1:518(517)        ack          1 win 515
+00.000 recv(4, ..., 4096, 0) = 517
// First outgoing packet in core dump.
+00.000 >  .           1:1(0)            ack        518 win  17
// The following might happen in a different way, but the result should be the same.
// This is a Server Hello
+00.000 send(4, ..., 5137, 0) = 5137
+00.000 >  .          1:1461(1460)       ack        518 win  17
+00.000 >  .       1461:2921(1460)       ack        518 win  17
+00.000 >  .       2921:4381(1460)       ack        518 win  17
+00.000 > P.       4381:5138(757)        ack        518 win  17
+00.040 <  .        518:518(0)           ack          1 win 515 <nop,nop,sack 2909:5138>
+00.960 >  .          1:1461(1460)       ack        518 win  17
+02.200 >  .          1:1461(1460)       ack        518 win  17
+00.000 < F.        518:518(0)           ack          1 win 515 <nop,nop,sack 2909:5138>
+00.000 >  .       5138:5138(0)          ack        519 win  17

// Application gets notified and closes the socket.
+00.000 recv(4, ..., 4096, 0) = 0
+00.000 close(4) = 0

// Keep alive game (around 4 seconds)
+01.000 <  .        518:519(1)           ack          1 win 515
// Why does the DSACK contain data?
//+00.000 >  .       1461:2909(1448)       ack        519 win  17 <nop,nop,sack 518:519>
+00.000 >  .       5138:5138(0)          ack        519 win  17 <nop,nop,sack 518:519>
+00.040 <  .        519:519(0)           ack          1 win 515 <nop,nop,sack 2909:5138>
// Why is the following segment being sent?
//+00.000 > F.       5138:5138(0)          ack        519 win  17

// Keep alive game (around 5 seconds)
+01.000 <  .        518:519(1)           ack          1 win 515
//+00.000 >  F.      5138:5138(0)          ack        519 win  17 <nop, nop,sack 518:519>
+00.000 >  .       5138:5138(0)          ack        519 win  17 <nop,nop,sack 518:519>

// Bug: Timeout
+02.200 >  .          1:1461(1460)       ack        519 win  17
+00.040 <  .        519:519(0)           ack          1 win 515 <nop,nop,sack 2909:5138>
+00.000 >  .       1461:2921(1460)       ack        519 win  17

// Keep alive game (around 8 seconds)
+01.000 <  .        518:519(1)           ack          1 win 515
+00.000 > F.       5138:5138(0)          ack        519 win  17 <nop,nop,sack 518:519>
+00.040 <  .        519:519(0)           ack          1 win 515 <nop,nop,sack 2909:5138>


// Keep alive game (around 9 seconds)
+01.000 <  .        518:519(1)           ack          1 win 515
//+00.000 >  .          1:1449(1448)       ack        519 win  17 <nop,nop,sack 518:519>
+00.000 >  F.      5138:5138(0)          ack        519 win  17 <nop,nop,sack 518:519>
// ## Erraneous Retransmission fixed here ##
//+00.000 >  .          1:1461(1460)       ack        519 win  17
//+00.000 >  .       1461:2921(1460)       ack        519 win  17
//+00.000 > F.       5138:5139(1)          ack        519 win  17
+00.040 <  .        519:519(0)           ack          1 win 515 <nop,nop,sack 2909:5138>
// Keep alive game (around 10 seconds)
+01.000 <  .        518:519(1)           ack          1 win 515
//+00.000 >  .       1449:2897(1448)       ack        519 win 17 <nop,nop,sack 518:519>
//+00.000 >  .       2897:2909(12)         ack        519 win 17
+00.000 >  F.      5138:5138(0)          ack        519 win  17 <nop,nop,sack 518:519>
+00.040 <  .        519:519(0)           ack          1 win 515 <nop,nop,sack 2909:5138>

// Keep alive game (around 11 seconds)
+01.000 <  .        518:519(1)           ack          1 win 515
//+00.000 > F.       5139:5139(0)          ack        519 win  17 <nop,nop,sack 518:519>
+00.000 > F.       5138:5138(0)          ack        519 win  17 <nop,nop,sack 518:519>
+00.040 <  .        519:519(0)           ack          1 win 515 <nop,nop,sack 2909:5138>

// Keep alive game (around 12 seconds)
+01.000 <  .        518:519(1)           ack          1 win 515
//+00.000 > F.       5139:5139(0)          ack        519 win  17 <nop,nop,sack 518:519>
+00.000 > F.       5138:5138(0)          ack        519 win  17 <nop,nop,sack 518:519>
+00.040 <  .        519:519(0)           ack          1 win 515 <nop,nop,sack 2909:5138>

*       >  .          1:1461(1460)       ack        519 win  17

// Keep alive game (around 12 seconds)
+01.000 <  .        518:519(1)           ack          1 win 515
+00.000 >  .       1461:2909(1448)       ack        519 win  17 <nop,nop,sack 518:519>
+00.040 <  .        519:519(0)           ack          1 win 515 <nop,nop,sack 2909:5138>

// Keep alive game (around 12 seconds)
+01.000 <  .        518:519(1)           ack          1 win 515
//+00.000 >  .          1:1449(1448)       ack        519 win  17 <nop,nop,sack 518:519>
+00.000 >  F.      5138:5138(0)          ack        519 win  17 <nop,nop,sack 518:519>
// ## Erraneous Retransmission fixed here ##
//+00.000 >  .          1:1461(1460)       ack        519 win  17
//+00.000 >  .       1461:2921(1460)       ack        519 win  17
//+00.000 > F.       5138:5140(2)          ack        519 win  17
+00.040 <  .        519:519(0)           ack          1 win 515 <nop,nop,sack 2909:5138>


`
sysctl kern.ipc.maxsockbuf=2097152;
sysctl net.inet.tcp.delayed_ack=1;
sysctl net.inet.tcp.rfc3390=1;
sysctl net.inet.tcp.sendspace=32768;
sysctl net.inet.tcp.sendbuf_inc=8192;
sysctl net.inet.tcp.sendbuf_max=2097152;
sysctl net.inet.tcp.sendbuf_auto=1;
sysctl net.inet.tcp.recvspace=65536;
sysctl net.inet.tcp.recvbuf_max=2097152;
sysctl net.inet.tcp.recvbuf_auto=1;
sysctl net.inet.tcp.keepinit=75000;
sysctl net.inet.tcp.ecn.enable=1;
sysctl net.inet.tcp.mssdflt=536;
sysctl net.inet.tcp.minmss=216;
sysctl net.inet.ip.maxfragpackets=39792;
sysctl net.inet.ip.maxfragsperpacket=16;
sysctl net.inet.tcp.abc_l_var=2;
sysctl net.inet.tcp.initcwnd_segments=10;
sysctl net.inet.tcp.delacktime=40;
#sysctl net.inet.tcp.rfc6675_pipe=1;
sysctl -w net.inet.tcp.sack.revised=1;
`

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Normally, snd_nxt is at snd_recover or higher. only during a rto, snd_nxt is being pulled back to snd_una, thus the middle bracket becoming negative, allowing more than expected data to be sent if there is still a hole in the scoreboard. clamping this down to zero will honor the actual cwnd, and stop snd_nxt to change...

rscheff edited the test plan for this revision. (Show Details)
rscheff added reviewers: glebius, jtl, cc.
This revision is now accepted and ready to land.Sep 22 2022, 10:47 AM