Page MenuHomeFreeBSD

tcp: Add support for DSACK based reordering window to rack.
ClosedPublic

Authored by rrs on Aug 11 2021, 10:46 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Dec 8, 9:27 PM
Unknown Object (File)
Tue, Nov 26, 10:21 AM
Unknown Object (File)
Oct 24 2024, 2:52 AM
Unknown Object (File)
Oct 23 2024, 10:45 AM
Unknown Object (File)
Oct 4 2024, 7:45 PM
Unknown Object (File)
Sep 10 2024, 3:03 PM
Unknown Object (File)
Sep 10 2024, 3:03 PM
Unknown Object (File)
Sep 10 2024, 3:03 PM
Subscribers

Details

Summary

The rack stack, with respect to the rack bits in it, was originally built based
on an early I-D of rack. In fact at that time the TLP bits were in a separate
I-D. The dynamic reordering window based on DSACK events was not present
in rack at that time. It is now part of the RFC and we need to update our stack
to include these features. However we want to have a way to control the feature
so that we can, if the admin decides, make it stay the same way system wide as
well as via socket option. The new sysctl and socket option has the following
meaning for setting:

00 (0) - Keep the old way, i.e. reordering window is 1 and do not use DSACK bytes to add to reorder window
01 (1) - Change the Reordering window to 1/4 of an RTT but do not use DSACK bytes to add to reorder window
10 (2) - Keep the reordering window as 1, but do use SACK bytes to add additional 1/4 RTT delay to the reorder window
11 (3) - reordering window is 1/4 of an RTT and add additional DSACK bytes to increase the reordering window (RFC behavior)

The default currently in the sysctl is 3 so we get standards based behavior.

Test Plan

Test this with various packetdrill scripts (below) to do basic validation and then
do so on a set of NF servers with special hooks to turn on BB logging (soon to
become a new feature I will work on TCP_TRACEPOINT) so that I can validate
that DSACKs are causing changes to the reordering window.

packetdrill scripts are:

--ip_version=ipv6

 0.00 `kldload -n tcp_bbr tcp_rack`
+0.00 `sysctl -w net.inet.tcp.hostcache.purgenow=1`
+0.00 `sysctl -w net.inet.tcp.syncookies_only=0`
+0.00 `sysctl -w net.inet.tcp.syncookies=1`
+0.00 `sysctl -w net.inet.tcp.rfc1323=1`
+0.00 `sysctl -w net.inet.tcp.sack.enable=1`
+0.00 `sysctl -w net.inet.tcp.ecn.enable=2`
// Create a TCP endpoint in the ESTABLISHED state.
+0.00 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.00 fcntl(3, F_GETFL) = 0x02 (flags O_RDWR)
+0.00 fcntl(3, F_SETFL, O_RDWR | O_NONBLOCK) = 0
+0.00 setsockopt(3, IPPROTO_TCP, TCP_LOG, [4], 4) = 0
+0.00 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0.00 > S  0:0(0) win 65535 <mss 1440,nop,wscale 8,sackOK,TS val 100 ecr 0>
+0.10 < S. 0:0(0) ack 1 win 32767 <mss 1440,sackOK,TS val 500 ecr 100>
+0.00 >  . 1:1(0) ack 1 win 65535 <nop,nop,TS val 200 ecr 500>
// Change to rack_latest 
+0.00 setsockopt(3, IPPROTO_TCP, TCP_FUNCTION_BLK, {function_set_name="rack", pcbcnt=0}, 36) = 0
+0.00 send(3, ..., 1428, 0) = 1428
+0.00 > P. 1:1429(1428) ack 1 win 65535 <nop,nop,TS val 300 ecr 500>
+.10 <  . 1:1(0) ack 1429 win 32000 <nop, nop, TS val 600 ecr 300>
+0.10 send(3, ..., 11424, 0) = 11424
*  > . 1429:2857(1428) ack 1 win 65535 <nop,nop,TS val 400 ecr 600>
*  > . 2857:4285(1428) ack 1 win 65535 <nop,nop,TS val 500 ecr 600>
*  > . 4285:5713(1428) ack 1 win 65535 <nop,nop,TS val 600 ecr 600>
*  > . 5713:7141(1428) ack 1 win 65535 <nop,nop,TS val 700 ecr 600>
*  > . 7141:8569(1428) ack 1 win 65535 <nop,nop,TS val 800 ecr 600>
*  > . 8569:9997(1428) ack 1 win 65535 <nop,nop,TS val 900 ecr 600>
*  > . 9997:11425(1428) ack 1 win 65535 <nop,nop,TS val 1000 ecr 600>
*  > P. 11425:12853(1428) ack 1 win 65535 <nop,nop,TS val 1100 ecr 600>
*  > P. 11425:12853(1428) ack 1 win 65535 <nop,nop,TS val 1100 ecr 600>
+.10 <  . 1:1(0) ack 12853 win 32000 <nop, nop, TS val 700 ecr 1100, nop, nop, sack 11424:12853>
+0.00 send(3, ..., 4284, 0) = 4284
*  > . 12853:14281(1428) ack 1 win 65535 <nop,nop,TS val 1200 ecr 700>
*  > . 14281:15709(1428) ack 1 win 65535 <nop,nop,TS val 1300 ecr 700>
*  > P. 15709:17137(1428) ack 1 win 65535 <nop,nop,TS val 1400 ecr 700>
*  > P. 15709:17137(1428) ack 1 win 65535 <nop,nop,TS val 1500 ecr 700>
+.10 <  . 1:1(0) ack 17137 win 32000 <nop, nop, TS val 800 ecr 1500, nop, nop, sack 15708:17137 >
// Tear it down.
+0.00 close(3) = 0
+0.00 > F. 17137:17137 (0) ack 1 win 65535 <nop,nop,TS val 1600 ecr 800>
+0.10 < F. 1:1(0) ack 17138 win 32767 <nop,nop,TS val 900 ecr 1600>
+0.00 > . 17138:17138 (0) ack 2 win 65535 <nop,nop,TS val 1700 ecr 900>

and

--ip_version=ipv6

 0.00 `kldload -n tcp_bbr tcp_rack`
+0.00 `sysctl -w net.inet.tcp.hostcache.purgenow=1`
+0.00 `sysctl -w net.inet.tcp.syncookies_only=0`
+0.00 `sysctl -w net.inet.tcp.syncookies=1`
+0.00 `sysctl -w net.inet.tcp.rfc1323=1`
+0.00 `sysctl -w net.inet.tcp.sack.enable=1`
+0.00 `sysctl -w net.inet.tcp.ecn.enable=2`
// Create a TCP endpoint in the ESTABLISHED state.
+0.00 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.00 fcntl(3, F_GETFL) = 0x02 (flags O_RDWR)
+0.00 fcntl(3, F_SETFL, O_RDWR | O_NONBLOCK) = 0
+0.00 setsockopt(3, IPPROTO_TCP, TCP_LOG, [4], 4) = 0
+0.00 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0.00 > S  0:0(0) win 65535 <mss 1440,nop,wscale 8,sackOK,TS val 100 ecr 0>
+0.10 < S. 0:0(0) ack 1 win 32767 <mss 1428,sackOK, nop, nop>
+0.00 >  . 1:1(0) ack 1 win 65535
// Change to rack_latest 
+0.00 setsockopt(3, IPPROTO_TCP, TCP_FUNCTION_BLK, {function_set_name="rack", pcbcnt=0}, 36) = 0
+0.00 send(3, ..., 1428, 0) = 1428
+0.00 > P. 1:1429(1428) ack 1 win 65535
+.10 <  . 1:1(0) ack 1429 win 32000 
+0.10 send(3, ..., 11424, 0) = 11424
*  > . 1429:2857(1428) ack 1 win 65535
*  > . 2857:4285(1428) ack 1 win 65535 
+0.0 <  . 1:1(0) ack 4285 win 32000 <nop, nop, sack 2857:4285>
*  > . 4285:5713(1428) ack 1 win 65535 
*  > . 5713:7141(1428) ack 1 win 65535 
*  > . 7141:8569(1428) ack 1 win 65535 
*  > . 8569:9997(1428) ack 1 win 65535 
*  > . 9997:11425(1428) ack 1 win 65535
*  > P. 11425:12853(1428) ack 1 win 65535
*  > P. 11425:12853(1428) ack 1 win 65535
+.10 <  . 1:1(0) ack 12853 win 32000 < nop, nop, sack 11424:12853>
+0.00 send(3, ..., 4284, 0) = 4284
*  > . 12853:14281(1428) ack 1 win 65535
*  > . 14281:15709(1428) ack 1 win 65535
*  > P. 15709:17137(1428) ack 1 win 65535
*  > P. 15709:17137(1428) ack 1 win 65535
+.10 <  . 1:1(0) ack 17137 win 32000 <nop, nop, sack 15708:17137 >
// Tear it down.
+0.00 close(3) = 0
+0.00 > F. 17137:17137 (0) ack 1 win 65535
+0.10 < F. 1:1(0) ack 17138 win 32767
+0.00 > . 17138:17138 (0) ack 2 win 65535

and

--ip_version=ipv6

 0.00 `kldload -n tcp_bbr tcp_rack`
+0.00 `sysctl -w net.inet.tcp.hostcache.purgenow=1`
+0.00 `sysctl -w net.inet.tcp.syncookies_only=0`
+0.00 `sysctl -w net.inet.tcp.syncookies=1`
+0.00 `sysctl -w net.inet.tcp.rfc1323=1`
+0.00 `sysctl -w net.inet.tcp.sack.enable=1`
+0.00 `sysctl -w net.inet.tcp.ecn.enable=2`
// Create a TCP endpoint in the ESTABLISHED state.
+0.00 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.00 fcntl(3, F_GETFL) = 0x02 (flags O_RDWR)
+0.00 fcntl(3, F_SETFL, O_RDWR | O_NONBLOCK) = 0
+0.00 setsockopt(3, IPPROTO_TCP, TCP_LOG, [4], 4) = 0
+0.00 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0.00 > S  0:0(0) win 65535 <mss 1440,nop,wscale 8,sackOK,TS val 100 ecr 0>
+0.10 < S. 0:0(0) ack 1 win 32767 <mss 1428,sackOK, nop, nop>
+0.00 >  . 1:1(0) ack 1 win 65535
// Change to rack_latest 
+0.00 setsockopt(3, IPPROTO_TCP, TCP_FUNCTION_BLK, {function_set_name="rack", pcbcnt=0}, 36) = 0
+0.00 send(3, ..., 1428, 0) = 1428
+0.00 > P. 1:1429(1428) ack 1 win 65535
+.10 <  . 1:1(0) ack 1429 win 32000 
+0.10 send(3, ..., 11424, 0) = 11424
*  > . 1429:2857(1428) ack 1 win 65535
*  > . 2857:4285(1428) ack 1 win 65535 
+0.0 <  . 1:1(0) ack 4285 win 32000 <nop, nop, sack 2857:4285>
*  > . 4285:5713(1428) ack 1 win 65535 
*  > . 5713:7141(1428) ack 1 win 65535 
*  > . 7141:8569(1428) ack 1 win 65535 
*  > . 8569:9997(1428) ack 1 win 65535 
*  > . 9997:11425(1428) ack 1 win 65535
*  > P. 11425:12853(1428) ack 1 win 65535
*  > P. 11425:12853(1428) ack 1 win 65535
+.10 <  . 1:1(0) ack 12853 win 32000 < nop, nop, sack 11424:12853>
+0.00 send(3, ..., 4284, 0) = 4284
*  > . 12853:14281(1428) ack 1 win 65535
*  > . 14281:15709(1428) ack 1 win 65535
*  > P. 15709:17137(1428) ack 1 win 65535
*  > P. 15709:17137(1428) ack 1 win 65535
+.10 <  . 1:1(0) ack 17137 win 32000 
+0.00 send(3, ..., 4284, 0) = 4284
*  > . 17137:18565(1428) ack 1 win 65535
*  > . 18565:19993(1428) ack 1 win 65535
*  > P. 19993:21421(1428) ack 1 win 65535
*  > P. 19993:21421(1428) ack 1 win 65535 // TLP 1
*  > P. 19993:21421(1428) ack 1 win 65535 // TLP 2 <persit goes to 1>
*  > . 17137:18565(1428) ack 1 win 65535  // RTX < persist goees to 0, cnt is 0 >
+0.0 <  . 1:1(0) ack 21421 win 32000 
// Tear it down.
+0.00 close(3) = 0
+0.00 > F. 21421:21421 (0) ack 1 win 65535
+0.10 < F. 1:1(0) ack 21422 win 32767
+0.00 > . 21422:21422 (0) ack 2 win 65535

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable