Page MenuHomeFreeBSD

Fix the CE reflector receiver state machine in DCTCP
ClosedPublic

Authored by rscheff on Dec 3 2019, 4:42 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 19, 11:34 PM
Unknown Object (File)
Sat, Nov 16, 1:40 AM
Unknown Object (File)
Tue, Oct 29, 1:30 PM
Unknown Object (File)
Sat, Oct 26, 11:38 PM
Unknown Object (File)
Thu, Oct 24, 1:58 PM
Unknown Object (File)
Oct 5 2024, 7:04 PM
Unknown Object (File)
Oct 2 2024, 9:20 PM
Unknown Object (File)
Sep 21 2024, 3:46 AM
Subscribers

Details

Summary

cc_ecnpkt_handler is called rather early in tcp_input, before
TF_DELACK is explicitly set. However, it is safe to assume that
typically delayed ACKs are in use - removing the requirement
to copy around the TF flags into CCF flags.

The main question remaining is then, if the DCTCP receiver
state machine asks for an explicit immediate ACK to be set,
in order to convey any change in the "CE" state of the incoming
packets in a timely fashion.

Further optimization can be had by moving the write access to
various CC state variables into the conditional branches - as
these also only change, when the CE state changes.

Also clean up some left-over CCF state which would effectively
disable delayed ACKs.

Test Plan

Template of a packetdrill script to validate the proper
operation of the CE-reflector state-machine of DCTCP:

// Testing DCTCP

--tolerance_usecs=2000

// Load and enable DCTCP module and flush hostcache
0.0 `if kldstat | grep -q cc_dctcp; then echo dctcp already loaded; 
else kldload cc_dctcp; fi`
+0.1 `sysctl net.inet.tcp.cc.algorithm=dctcp`
+0.1 `sysctl net.inet.tcp.cc.dctcp.alpha=0`
+0.1 `sysctl net.inet.tcp.initcwnd_segments=10`
+0.1 `sysctl net.inet.tcp.ecn.enable=1`
+0.1 `sysctl net.inet.tcp.hostcache.purgenow=1`

// Create a listening TCP socket.
+0.50 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.01 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0.01 setsockopt(3, SOL_SOCKET, SO_SNDBUF, [1048576], 4) = 0
+0.01 setsockopt(3, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
+0.01 bind(3, ..., ...) = 0
+0.01 listen(3, 1) = 0


// Establish a TCP connection.
+0.04 <[noecn] SEW  0:0(0) win 65535 <mss 1012, sackOK, wscale 10, TS 
+val 100 ecr 0, eol >
+0.00 >[noecn] SE. 0:0(0) ack 1 win 65535 <...>
+0.00 <[noecn]  . 1:1(0) ack 1 win 65535
+0.00 accept(3, ..., ...) = 4

// Send IW plus 1 segment, check ECN bits
+1.0  <[ect0]  .      1:1001(1000) ack 1 win 65535
+0    <[ect0]  .   1001:2001(1000) ack 1 win 65535
+0    >[noecn] .      1:1(0) ack 2001 <...>
+0    <[ect0]  .   2001:3001(1000) ack 1 win 65535
+0    <[ect0]  .   3001:4001(1000) ack 1 win 65535
+0    >[noecn] .      1:1(0) ack 4001 <...>
+0    <[ect0]  .   4001:5001(1000) ack 1 win 65535
+0    <[ect0]  .   5001:6001(1000) ack 1 win 65535
+0    >[noecn] .      1:1(0) ack 6001 <...>
+0    <[ect0]  .   6001:7001(1000) ack 1 win 65535
+0    <[ect0]  .   7001:8001(1000) ack 1 win 65535
+0    >[noecn] .      1:1(0) ack 8001 <...>
+0    <[ce]    .   8001:9001(1000) ack 1 win 65535
// should trigger a ACK,ECE here
+0    >[noecn] E.      1:1(0) ack 9001 <...>
+0    <[ce]    .   9001:10001(1000) ack 1  win 65535
//+0    >[noecn] .      1:1(0) ack 10001 <...>
+0.0  read(4, ..., 10000) = 10000
+0    <[ce]    .  10001:11001(1000) ack 1  win 65535
+0    >[noecn] E.      1:1(0) ack 11001 <...>
+0    <[ect0]  .  11001:12001(1000) ack 1  win 65535
+0    >[noecn] .      1:1(0) ack 12001 <...>
+0    <[ect0]  .  12001:13001(1000) ack 1  win 65535
+0    >[noecn] .      1:1(0) ack 13001 <...>
+0    <[ect0]  .  13001:14001(1000) ack 1  win 65535
//+0    >[noecn] .      1:1(0) ack 14001 <...>
+0    <[ect0]  .  14001:15001(1000) ack 1  win 65535
+0    >[noecn] .      1:1(0) ack 15001 <...>
+0    <[ect0]  .  15001:16001(1000) ack 1  win 65535
//+0    >[noecn] .      1:1(0) ack 16001 <...>
+0    <[ect0]  .  16001:17001(1000) ack 1  win 65535
+0    >[noecn] .      1:1(0) ack 17001 <...>
+0    <[ect0]  .  17001:18001(1000) ack 1  win 65535
//+0    >[noecn] .      1:1(0) ack 18001 <...>
+0    <[ect0]  .  18001:19001(1000) ack 1  win 65535
+0    >[noecn] .      1:1(0) ack 19001 <...>
+0    <[ect0]  .  19001:20001(1000) ack 1  win 65535
+0    >[noecn] .      1:1(0) ack 20001 <...>


+1.00 close(4) = 0
+0.00 >[noecn] F. 1:1(0) ack 20001 <...>
+0.10 <[noecn] F. 20001:20001(0) ack 2 win 65535
+0.00 >[noecn] . 2:2(0) ack 20002 <...>

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

rgrimes requested changes to this revision.Dec 5 2019, 6:49 AM

From the summary:

The main question remaining is then, if the DCTCP receiver
state machine asks for an explicit immediate ACK to be set,
in order to convey any change in the "CE" state of the incoming
packets in a timely fashion.

Does not pose a question.

I do like the new logic of acknow vs delay_ack as the control variable, just the comments need fixed.

sys/netinet/cc/cc_dctcp.c
325 ↗(On Diff #65163)

Since you have removed delay_ack these comments need to be reworked.

sys/netinet/tcp_input.c
535 ↗(On Diff #65163)

Are you sure this does not impact any other CC? Bothersome is that the case covers all 4 possible ECN bit values so default is case 0, or not ECN transport? I need to think on this more, and I need to look at what I had to do here for SCE.

550 ↗(On Diff #65163)

This function may be too specific to NewReno style CE latching until CWR and really does not belong in tcp_input, but in a cc_algo funtion since DCTCP, SCE and L4S do not latch as this code implements. Can we have a conversation about that?

This revision now requires changes to proceed.Dec 5 2019, 6:49 AM
  • dctcp specific immediate ack on cwr
  • adjust comment around dctcp_ecnpkt_handler. Keep acknow for CWR in preparation of refactory of ECN handling
sys/netinet/cc/cc_dctcp.c
325 ↗(On Diff #65163)

I will remove these two comment lines; the variable name "acknow" as a boolean seems to be sufficiently self-explainatory... Thanks.

sys/netinet/tcp_input.c
535 ↗(On Diff #65163)

The CCF_IPHDR_CE flag is currently only checked in cc_dctcp. And the problem happens when the data direction changes, while the connection is not using ECN++ (with non-data packets also marked as ECT0/ECT1/CE).

Without this (which is actually lifted from D19143 ), the pure ACKs would leave the CCF_IPHDR_CE flag latched in "CE", causing dctcp to continously reduce cwnd if the last data packet in the reverse direction was CE marked.

And yes, the newreno RFC3168 latching style is still in tcp_input.c:1551 - only overwritten by the cc_ecnpkt_handler for dctcp.

Perhaps moving the "default" processing of RFC3168 ECN into a newreno_ecnpkt_handler would be a good refactory in that area? (the lines 1550..1573)

550 ↗(On Diff #65163)

Sure; cc_ecnpkt_handler can currently override the behavior at least partially. Moving the entire section into a newreno_ecnpkt_handler in its entirety, and writing specific code for SCE, L4S (and extending DCTCP) sounds like it can be beneficial for understanding what is actually going on with the ECN signaling in the various mechanisms.

rgrimes requested changes to this revision.Dec 5 2019, 7:05 PM

I am ok to move forward with this and leave rethinking of factoring out the newreno latch to a cc function until later, how ever I would like to not use default: for the only possible state of the ECT bits being 00, OR explicity comment that the only case that can trigger the default is ECN=00.

sys/netinet/tcp_input.c
535 ↗(On Diff #65163)

I would still like to rename default: It is exactly 1 case, the case of 00 in the ECN field, which probably does not have a #define value, but is NOT ECN Capable Transport. Or am I totally missing something here?

This revision now requires changes to proceed.Dec 5 2019, 7:05 PM
  • removing the CCF_IPHDR_CE latch, since that is in D19143
rscheff added inline comments.
sys/netinet/tcp_input.c
535 ↗(On Diff #65163)

Move this change over to D19143

cc added inline comments.
sys/netinet/cc/cc_dctcp.c
353–354 ↗(On Diff #65293)

I can't find the evidence of the old comment which claims "DCTCP sets delayed ack when this segment sets the CWR flag."

According to rfc8257 (DCTCP) section-3.2, CWR is processed first and then CE is processed. This makes me agree that we need to send an immediate ACK upon the CWR bit. How about update the rfc8257 with adding this new decision?

https://tools.ietf.org/html/rfc8257#section-3.2

Also, you can remove the CCF_DELACK definition from cc.h.

This revision is now accepted and ready to land.Dec 31 2019, 4:12 PM
This revision was automatically updated to reflect the committed changes.