Page MenuHomeFreeBSD

tcp: Rack rwnd collapse.
ClosedPublic

Authored by rrs on May 10 2022, 3:08 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Apr 22, 7:01 AM
Unknown Object (File)
Mar 22 2024, 10:53 AM
Unknown Object (File)
Mar 8 2024, 8:04 AM
Unknown Object (File)
Mar 8 2024, 8:04 AM
Unknown Object (File)
Mar 8 2024, 7:52 AM
Unknown Object (File)
Dec 20 2023, 5:39 AM
Unknown Object (File)
Dec 10 2023, 5:56 PM
Unknown Object (File)
Nov 12 2023, 12:01 PM

Details

Summary

Currently when the peer collapses its rwnd, we mark packets to be retransmitted
and use the must_retran flags like we do when a PMTU collapses to retransmit the
collapsed packets. However this causes a problem with some middle boxes that
play with the rwnd to control flow. As soon as the rwnd increases we start resending
which may be not even a rtt.. and in fact the peer may have gotten the packets. Which
means we gratuitously retransmit packets we should not.

The fix here is to make sure that a rack time has passed before retransmitting the packets.
This makes sure that the rwnd collapse was real and the packets do need retransmission.

Test Plan

Several pkt-drill scripts here can validate it all works:


Copyright (c) 2018 Randall Stewart
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.

In order for this to work you need to hack the dsack_persists to be 2 not
16. That way the second TLP and the first timeout should reduce the
count back to zero.
//

--ip_version=ipv4

0.00 kldload -n tcp_bbr tcp_rack
+0.00 sysctl -w net.inet.tcp.hostcache.purgenow=1
+0.00 sysctl -w net.inet.tcp.syncookies_only=0
+0.00 sysctl -w net.inet.tcp.syncookies=1
+0.00 sysctl -w net.inet.tcp.rfc1323=1
+0.00 sysctl -w net.inet.tcp.sack.enable=1
+0.00 sysctl -w net.inet.tcp.ecn.enable=2
// Create a TCP endpoint in the ESTABLISHED state.
+0.00 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.00 fcntl(3, F_GETFL) = 0x02 (flags O_RDWR)
+0.00 fcntl(3, F_SETFL, O_RDWR | O_NONBLOCK) = 0
+0.00 setsockopt(3, IPPROTO_TCP, TCP_LOG, [4], 4) = 0
+0.00 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0.00 > S 0:0(0) win 65535 <mss 1460,nop,wscale 8,sackOK,TS val 100 ecr 0>
+0.10 < S. 0:0(0) ack 1 win 32767 <mss 1428,sackOK, nop, nop>
+0.00 > . 1:1(0) ack 1 win 65535
+0.00 send(3, ..., 1428, 0) = 1428
+0.00 > P. 1:1429(1428) ack 1 win 65535
+.10 < . 1:1(0) ack 1429 win 32000
+0.10 send(3, ..., 11424, 0) = 11424

  • > . 1429:2857(1428) ack 1 win 65535
  • > . 2857:4285(1428) ack 1 win 65535
  • > . 4285:5713(1428) ack 1 win 65535
  • > . 5713:7141(1428) ack 1 win 65535
  • > . 7141:8569(1428) ack 1 win 65535
  • > . 8569:9997(1428) ack 1 win 65535
  • > . 9997:11425(1428) ack 1 win 65535
  • > . 11425:12853(1428) ack 1 win 65535

+0.0 < . 1:1(0) ack 4285 win 0

  • > . 4284:4284(0) ack 1 win 65535 // Window probe

+0.0 < . 1:1(0) ack 12853 win 65535
Tear it down.
+0.00 close(3) = 0
+0.00 > F. 12853:12853 (0) ack 1 win 65535
+0.10 < F. 1:1(0) ack 12854 win 32767
+0.00 > . 12854:12854 (0) ack 2 win 65535

Copyright (c) 2018 Randall Stewart
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.

In order for this to work you need to hack the dsack_persists to be 2 not
16. That way the second TLP and the first timeout should reduce the
count back to zero.

--ip_version=ipv4

0.00 kldload -n tcp_bbr tcp_rack
+0.00 sysctl -w net.inet.tcp.hostcache.purgenow=1
+0.00 sysctl -w net.inet.tcp.syncookies_only=0
+0.00 sysctl -w net.inet.tcp.syncookies=1
+0.00 sysctl -w net.inet.tcp.rfc1323=1
+0.00 sysctl -w net.inet.tcp.sack.enable=1
+0.00 sysctl -w net.inet.tcp.ecn.enable=2
// Create a TCP endpoint in the ESTABLISHED state.
+0.00 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.00 fcntl(3, F_GETFL) = 0x02 (flags O_RDWR)
+0.00 fcntl(3, F_SETFL, O_RDWR | O_NONBLOCK) = 0
+0.00 setsockopt(3, IPPROTO_TCP, TCP_LOG, [4], 4) = 0
+0.00 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0.00 > S 0:0(0) win 65535 <mss 1460,nop,wscale 8,sackOK,TS val 100 ecr 0>
+0.10 < S. 0:0(0) ack 1 win 32767 <mss 1428,sackOK, nop, nop>
+0.00 > . 1:1(0) ack 1 win 65535
+0.00 send(3, ..., 1428, 0) = 1428
+0.00 > P. 1:1429(1428) ack 1 win 65535
+.10 < . 1:1(0) ack 1429 win 32000
+0.10 send(3, ..., 11424, 0) = 11424

  • > . 1429:2857(1428) ack 1 win 65535
  • > . 2857:4285(1428) ack 1 win 65535
  • > . 4285:5713(1428) ack 1 win 65535
  • > . 5713:7141(1428) ack 1 win 65535
  • > . 7141:8569(1428) ack 1 win 65535
  • > . 8569:9997(1428) ack 1 win 65535
  • > . 9997:11425(1428) ack 1 win 65535
  • > . 11425:12853(1428) ack 1 win 65535

+0.0 < . 1:1(0) ack 4285 win 0

  • > . 4284:4284(0) ack 1 win 65535 // Window probe

+0.0 < . 1:1(0) ack 4285 win 65535

  • > . 4285:5713(1428) ack 1 win 65535
  • > . 5713:7141(1428) ack 1 win 65535

+0.0 < . 1:1(0) ack 7141 win 65535

  • > . 7141:8569(1428) ack 1 win 65535
  • > . 8569:9997(1428) ack 1 win 65535

+0.0 < . 1:1(0) ack 9997 win 65535

  • > . 9997:11425(1428) ack 1 win 65535
  • > P. 11425:12853(1428) ack 1 win 65535

+0.0 < . 1:1(0) ack 12853 win 65535
Tear it down.
+0.00 close(3) = 0
+0.00 > F. 12853:12853 (0) ack 1 win 65535
+0.10 < F. 1:1(0) ack 12854 win 32767
+0.00 > . 12854:12854 (0) ack 2 win 65535

Copyright (c) 2018 Randall Stewart
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.

In order for this to work you need to hack the dsack_persists to be 2 not
16. That way the second TLP and the first timeout should reduce the
count back to zero.

--tolerance_usecs=50000
--ip_version=ipv4

0.00 kldload -n tcp_bbr tcp_rack
+0.00 sysctl -w net.inet.tcp.hostcache.purgenow=1
+0.00 sysctl -w net.inet.tcp.syncookies_only=0
+0.00 sysctl -w net.inet.tcp.syncookies=1
+0.00 sysctl -w net.inet.tcp.rfc1323=1
+0.00 sysctl -w net.inet.tcp.sack.enable=1
+0.00 sysctl -w net.inet.tcp.ecn.enable=2
// Create a TCP endpoint in the ESTABLISHED state.
+0.00 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.00 fcntl(3, F_GETFL) = 0x02 (flags O_RDWR)
+0.00 fcntl(3, F_SETFL, O_RDWR | O_NONBLOCK) = 0
+0.00 setsockopt(3, IPPROTO_TCP, TCP_LOG, [4], 4) = 0
+0.00 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0.00 > S 0:0(0) win 65535 <mss 1460,nop,wscale 8,sackOK,TS val 100 ecr 0>
+0.10 < S. 0:0(0) ack 1 win 32767 <mss 1428,sackOK, nop, nop>
+0.00 > . 1:1(0) ack 1 win 65535
+0.00 send(3, ..., 1428, 0) = 1428
+0.00 > P. 1:1429(1428) ack 1 win 65535
+.10 < . 1:1(0) ack 1429 win 32000
+0.10 send(3, ..., 11424, 0) = 11424

  • > . 1429:2857(1428) ack 1 win 65535
  • > . 2857:4285(1428) ack 1 win 65535
  • > . 4285:5713(1428) ack 1 win 65535
  • > . 5713:7141(1428) ack 1 win 65535
  • > . 7141:8569(1428) ack 1 win 65535
  • > . 8569:9997(1428) ack 1 win 65535
  • > . 9997:11425(1428) ack 1 win 65535
  • > . 11425:12853(1428) ack 1 win 65535

+0.10 < . 1:1(0) ack 4285 win 0
+0.001 < . 1:1(0) ack 4285 win 8568
+0.010 > . 4285:5713(1428) ack 1 win 65535

  • > . 5713:7141(1428) ack 1 win 65535

+0.0 < . 1:1(0) ack 7141 win 65535

  • > . 7141:8569(1428) ack 1 win 65535
  • > . 8569:9997(1428) ack 1 win 65535

+0.0 < . 1:1(0) ack 9997 win 65535

  • > . 9997:11425(1428) ack 1 win 65535
  • > P. 11425:12853(1428) ack 1 win 65535

+0.0 < . 1:1(0) ack 12853 win 65535
Tear it down.
+0.00 close(3) = 0
+0.00 > F. 12853:12853 (0) ack 1 win 65535
+0.10 < F. 1:1(0) ack 12854 win 32767
+0.00 > . 12854:12854 (0) ack 2 win 65535

Copyright (c) 2018 Randall Stewart
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.

In order for this to work you need to hack the dsack_persists to be 2 not
16. That way the second TLP and the first timeout should reduce the
count back to zero.

--tolerance_usecs=50000
--ip_version=ipv4

0.00 kldload -n tcp_bbr tcp_rack
+0.00 sysctl -w net.inet.tcp.hostcache.purgenow=1
+0.00 sysctl -w net.inet.tcp.syncookies_only=0
+0.00 sysctl -w net.inet.tcp.syncookies=1
+0.00 sysctl -w net.inet.tcp.rfc1323=1
+0.00 sysctl -w net.inet.tcp.sack.enable=1
+0.00 sysctl -w net.inet.tcp.ecn.enable=2
// Create a TCP endpoint in the ESTABLISHED state.
+0.00 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.00 fcntl(3, F_GETFL) = 0x02 (flags O_RDWR)
+0.00 fcntl(3, F_SETFL, O_RDWR | O_NONBLOCK) = 0
+0.00 setsockopt(3, IPPROTO_TCP, TCP_LOG, [4], 4) = 0
+0.00 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0.00 > S 0:0(0) win 65535 <mss 1460,nop,wscale 8,sackOK,TS val 100 ecr 0>
+0.10 < S. 0:0(0) ack 1 win 32767 <mss 1428,sackOK, nop, nop>
+0.00 > . 1:1(0) ack 1 win 65535
+0.00 send(3, ..., 1428, 0) = 1428
+0.00 > P. 1:1429(1428) ack 1 win 65535
+.10 < . 1:1(0) ack 1429 win 32000
+0.10 send(3, ..., 11424, 0) = 11424

  • > . 1429:2857(1428) ack 1 win 65535
  • > . 2857:4285(1428) ack 1 win 65535
  • > . 4285:5713(1428) ack 1 win 65535
  • > . 5713:7141(1428) ack 1 win 65535
  • > . 7141:8569(1428) ack 1 win 65535
  • > . 8569:9997(1428) ack 1 win 65535
  • > . 9997:11425(1428) ack 1 win 65535
  • > . 11425:12853(1428) ack 1 win 65535

+0.10 < . 1:1(0) ack 4285 win 0
+0.001 send(3, ..., 1428, 0) = 1428
+0.001 < . 1:1(0) ack 4285 win 9996
+0.000 > P. 12853:14281(1428) ack 1 win 65535
+.010 > . 4285:5713(1428) ack 1 win 65535

  • > . 5713:7141(1428) ack 1 win 65535

+0.0 < . 1:1(0) ack 7141 win 65535

  • > . 7141:8569(1428) ack 1 win 65535
  • > . 8569:9997(1428) ack 1 win 65535

+0.0 < . 1:1(0) ack 9997 win 65535

  • > . 9997:11425(1428) ack 1 win 65535
  • > P. 11425:12853(1428) ack 1 win 65535

+0.0 < . 1:1(0) ack 14281 win 65535
// Tear it down.
+0.00 close(3) = 0
+0.00 > F. 14281:14281 (0) ack 1 win 65535
+0.10 < F. 1:1(0) ack 14282 win 32767
+0.00 > . 14282:14282 (0) ack 2 win 65535

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

rrs requested review of this revision.May 10 2022, 3:08 PM

Ok I have finished testing this and the pkt drills scripts show it works (one change on line 59 though had
to add the push bit.. which is correct).

Anyway I will plan on committing this in the next week or so...

This revision is now accepted and ready to land.Jun 27 2022, 1:38 PM
This revision was automatically updated to reflect the committed changes.