Page MenuHomeFreeBSD

tcp: rack fails to send out a TLP after a MTU change
ClosedPublic

Authored by rrs on Dec 2 2021, 11:34 AM.

Details

Summary

When rack sends out a TLP it sets up various state to make sure
it avoids the cwnd (its been more than 1 RTT since our last send) and
it may at times send new data. If an MTU change as occurred
and our cwnd has collapsed we can have a situation where must_retran
flag is set and we obey the cwnd thus never sending the TLP and then
sitting stuck.

This one line fix addresses that problem

Test Plan

A simple pkt-drill script to follow can test this condition.

Copyright (c) 2018 Randall Stewart
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
//

--ip_version=ipv4

0.00 kldload -n tcp_bbr tcp_rack
+0.00 sysctl -w net.inet.tcp.hostcache.purgenow=1
+0.00 sysctl -w net.inet.tcp.syncookies_only=0
+0.00 sysctl -w net.inet.tcp.syncookies=1
+0.00 sysctl -w net.inet.tcp.rfc1323=1
+0.00 sysctl -w net.inet.tcp.sack.enable=1
+0.00 sysctl -w net.inet.tcp.ecn.enable=2
Create a TCP endpoint in the ESTABLISHED state.
+0.00 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.00 fcntl(3, F_GETFL) = 0x02 (flags O_RDWR)
+0.00 fcntl(3, F_SETFL, O_RDWR | O_NONBLOCK) = 0
+0.00 setsockopt(3, IPPROTO_TCP, TCP_LOG, [4], 4) = 0
+0.00 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0.00 > S 0:0(0) win 65535 <mss 1460,nop,wscale 8,sackOK,TS val 100 ecr 0>
+0.10 < S. 0:0(0) ack 1 win 32767 <mss 1440,sackOK,TS val 400 ecr 100>
+0.00 > . 1:1(0) ack 1 win 65535 <nop,nop,TS val 200 ecr 400>
Change to rack_latest
+0.00 setsockopt(3, IPPROTO_TCP, TCP_NODELAY, [1], 4) = 0
+0.00 setsockopt(3, IPPROTO_TCP, TCP_FUNCTION_BLK, {function_set_name="rack", pcbcnt=0}, 36) = 0
+0.00 send(3, ..., 1000, 0) = 1000
+0.00 > P. 1:1001(1000) ack 1 win 65535 <nop,nop,TS val 250 ecr 400>
+.10 < . 1:1(0) ack 1001 win 32000 <nop, nop, TS val 500 ecr 250>
+0.00 send(3, ..., 1000, 0) = 1000
+0.00 > P. 1001:2001(1000) ack 1 win 65535 <nop,nop,TS val 300 ecr 500>
+.10 < . 1:1(0) ack 2001 win 32000 <nop, nop, TS val 600 ecr 300>
// cwnd should be 18,280
+0.10 send(3, ..., 25440, 0) = 25440

  • > . 2001:3429(1428) ack 1 win 65535 <nop,nop,TS val 400 ecr 600>
  • > . 3429:4857(1428) ack 1 win 65535 <nop,nop,TS val 500 ecr 600>
  • > . 4857:6285(1428) ack 1 win 65535 <nop,nop,TS val 600 ecr 600>
  • > . 6285:7713(1428) ack 1 win 65535 <nop,nop,TS val 700 ecr 600>
  • > . 7713:9141(1428) ack 1 win 65535 <nop,nop,TS val 800 ecr 600>
  • > . 9141:10569(1428) ack 1 win 65535 <nop,nop,TS val 900 ecr 600>
  • > . 10569:11997(1428) ack 1 win 65535 <nop,nop,TS val 900 ecr 600>
  • > . 11997:13425(1428) ack 1 win 65535 <nop,nop,TS val 1100 ecr 600>
  • > . 13425:14853(1428) ack 1 win 65535 <nop,nop,TS val 1200 ecr 600>
  • > . 14853:16281(1428) ack 1 win 65535 <nop,nop,TS val 1300 ecr 600>
  • > . 16281:17709(1428) ack 1 win 65535 <nop,nop,TS val 1400 ecr 600>

// TLP

  • > . 17709:19137(1428) ack 1 win 65535 <nop,nop,TS val 1500 ecr 600>

// TLP

  • > . 19137:20565(1428) ack 1 win 65535 <nop,nop,TS val 1600 ecr 600>

// T-O (cwnd to 1500)

  • > . 2001:3429(1428) ack 1 win 65535 <nop,nop,TS val 1900 ecr 600>

Send ICMP
+0.00 < [2001:3429(1428)] icmp unreachable frag_needed mtu 1200
Retransmit

  • > . 2001:3149(1148) ack 1 win 65535 <nop,nop,TS val 2000 ecr 600>
  • > . 3149:3429(280) ack 1 win 65535 <nop,nop,TS val 2100 ecr 600>

+0.10 < . 1:1(0) ack 3429 win 32000 <nop, nop, TS val 700 ecr 2100>
// Retransmits of window

  • > . 3429:4577(1148) ack 1 win 65535 <nop,nop,TS val 2200 ecr 700>
  • > . 4577:4857(280) ack 1 win 65535 <nop,nop,TS val 2300 ecr 700>

// TLP

  • > . 20565:21713(1148) ack 1 win 65535 <nop,nop,TS val 2400 ecr 700>

+.10 < . 1:1(0) ack 21713 win 32000 <nop, nop, TS val 800 ecr 2400>
// Tear it down.
+0.00 close(3) = 0

  • > . 21713:22861(1148) ack 1 win 65535 <nop,nop,TS val 2500 ecr 800>

+.00 < . 1:1(0) ack 22861 win 32000 <nop, nop, TS val 900 ecr 2400>

  • > . 22861:24009(1148) ack 1 win 65535 <nop,nop,TS val 2600 ecr 800>
  • > . 24009:25157(1148) ack 1 win 65535 <nop,nop,TS val 2700 ecr 800>

+.00 < . 1:1(0) ack 25157 win 32000 <nop, nop, TS val 1000 ecr 2700>

  • > . 25157:26305(1148) ack 1 win 65535 <nop,nop,TS val 2800 ecr 800>
  • > P. 26305:27441(1136) ack 1 win 65535 <nop,nop,TS val 2900 ecr 1000>

+.00 < . 1:1(0) ack 27441 win 32000 <nop, nop, TS val 1100 ecr 2900>
+0.00 > F. 27441:27441(0) ack 1 win 65535 <nop,nop,TS val 3000 ecr 1100>
+0.10 < F. 1:1(0) ack 27442 win 32767 <nop,nop,TS val 1200 ecr 3000>
+0.00 > . 27442:27442(0) ack 2 win 65535 <nop,nop,TS val 3100 ecr 1200>

When running the pkt drill you will have to control-c it without the fix and if you have
a tcp log dumper deamon the logging output will show you TLP's going by with no output
until you stop the test (or pkt drill times out).

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.