Page MenuHomeFreeBSD

Send immediate ACK on receipt of CWR
ClosedPublic

Authored by rscheff on Dec 4 2019, 2:39 PM.

Details

Summary

When a TCP sender reduces cwnd due to CE marks, it is possible to end
up with very small cwnd (<2 mss). When the next packet is sent, with
the CWR flag, the receiver will often wait for another packet before
sending an ACK after the delack timer expires.

This can effectively drive up the high-percentile latency on request-
response type interactions.

The above was found specifically for flows using dctcp, most likely as
the cwnd in dctcp environments is more likely to collapse to very small
values – but this patch is generic and also addresses rfc3168 ECN
sessions.

See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fd2123a3d7527d4c7092633d55e877c0cc1d84a3

Test Plan

packetdrill script (here for dctcp)

// Testing DCTCP

--tolerance_usecs=20000

// Load and enable DCTCP module and flush hostcache
0.0 `if kldstat | grep -q cc_dctcp; then echo dctcp already loaded; else kldload cc_dctcp; fi`
+0.1 `sysctl net.inet.tcp.cc.algorithm=dctcp`
+0.1 `sysctl net.inet.tcp.cc.dctcp.alpha=0`
+0.1 `sysctl net.inet.tcp.initcwnd_segments=10`
+0.1 `sysctl net.inet.tcp.ecn.enable=1`
+0.1 `sysctl net.inet.tcp.hostcache.purgenow=1`

// Create a listening TCP socket.
+0.50 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.01 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0.01 setsockopt(3, SOL_SOCKET, SO_SNDBUF, [1048576], 4) = 0
+0.01 setsockopt(3, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
+0.01 bind(3, ..., ...) = 0
+0.01 listen(3, 1) = 0


// Establish a TCP connection.
+0.04 <[ect0] SEW  0:0(0) win 65535 <mss 1012, sackOK, wscale 10, TS val 100 ecr 0, eol >
+0.00 >[noecn] SE. 0:0(0) ack 1 win 65535 <...>
+0.00 <[ect0]  . 1:1(0) ack 1 win 65535
+0.00 accept(3, ..., ...) = 4

// Send IW plus 1 segment, check ECN bits
+1.0  <[ect0]  .      1:1001(1000) ack 1 win 65535
+0    <[ect0]  .   1001:2001(1000) ack 1 win 65535
+0    >[noecn] .      1:1(0) ack 2001 <...>

+0 write(4, ..., 1) = 1
+0    >[ect0] P.      1:2(1) ack 2001 <...>

+0    <[ect0]  .   2001:3001(1000) ack 2 win 65535
+0 write(4, ..., 1) = 1
+0    >[ect0]  P. 2:3(1) ack 3001 <...>

+0    <[ect0]  .   3001:4001(1000) ack 3 win 65535
+0    <[ect0]  .   4001:5001(1000) ack 3 win 65535
+0    >[noecn] .      3:3(0) ack 5001 <...>

+0.01  <[ce]  P.   5001:5501(500) ack 3 win 65535
+0    >[noecn] E.     3:3(0) ack 5501 <...>

+0.001 read(4, ..., 5500) = 5500
+0 write(4, ..., 1) = 1
+0 > [ect0] PE. 3:4(1) ack 5501 <...>

+0.01  <[ect0]  W.   5501:6501(1000) ack 4 win 65535
// no delay ACK on CWR flag
+0    >[noecn] .      4:4(0) ack 6501 <...>


+0.31  <[ect0]    .   6501:7501(1000) ack 4 win 65535
+0.1    >[noecn] .      4:4(0) ack 7501 <...>

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

rscheff edited the test plan for this revision. (Show Details)
  • make identical change to RACK

Also note, that the reduction in cwnd is not necessarily with a FBSD stack. The DCTCP code for example restricts the lower bound that cwnd may collapse to, to no less than 2 MSS. However, if a FBSD receiver interacts with a different TCP stack on the sender, e.g. one that allows shinking cwnd down to 1 MSS, or perhaps one that uses pacing to support fractional MSS cwnd (eg. 1/2 MSS cwnd -> send one segment every 2 RTTs), getting timely feedbacks on potential critical segments (as an application may stall on the delivery of those) can be vital to have good responsiveness.

This revision is now accepted and ready to land.Jan 16 2020, 9:40 PM

I am testing this patch now. Will update from my side in one or two days.

The packetdrill script to validate this fix with newreno

// Testing ECN - immediate ACK on CWR (D22670)

--tolerance_usecs=20000

// Load and enable DCTCP module and flush hostcache
0.0 `sysctl net.inet.tcp.cc.algorithm=newreno`
+0.02 `sysctl net.inet.tcp.initcwnd_segments=10`
+0.02 `sysctl net.inet.tcp.ecn.enable=1`
+0.02 `sysctl net.inet.tcp.hostcache.purgenow=1`

// Create a listening TCP socket.
0.06 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.005 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0.005 setsockopt(3, SOL_SOCKET, SO_SNDBUF, [1048576], 4) = 0
+0.005 bind(3, ..., ...) = 0
+0.005 listen(3, 1) = 0

// Establish a TCP connection.
0.1 <[ect0] SEW 0:0(0) win 65535 <mss 1000, sackOK, wscale 10, eol, nop, nop >
+0.00 >[noecn] SE. 0:0(0) ack 1 win 65535 <...>
+0.00 <[ect0]  . 1:1(0) ack 1 win 65535
+0.00 accept(3, ..., ...) = 4

// Send IW plus 1 segment, check ECN bits
0.12  <[ect0]  .      1:1001(1000) ack 1 win 65535
+0    <[ect0]  .   1001:2001(1000) ack 1 win 65535
+0    >[noecn] .      1:1(0) ack 2001 <...>

+0 write(4, ..., 1) = 1
+0    >[ect0] P.      1:2(1) ack 2001 <...>

+0    <[ect0]  .   2001:3001(1000) ack 2 win 65535
+0 write(4, ..., 1) = 1
+0    >[ect0]  P. 2:3(1) ack 3001 <...>

+0    <[ect0]  .   3001:4001(1000) ack 3 win 65535
+0    <[ect0]  .   4001:5001(1000) ack 3 win 65535
+0    >[noecn] .      3:3(0) ack 5001 <...>

// delayed ACK
+0.01  <[ce]  P.   5001:5501(500) ack 3 win 65535
+0.1    >[noecn] E.     3:3(0) ack 5501 <...>

+0.001 read(4, ..., 5500) = 5500
+0 write(4, ..., 1) = 1
+0 > [ect0] PE. 3:4(1) ack 5501 <...>

+0.01  <[ect0]  W.   5501:6501(1000) ack 4 win 65535
// no delay ACK on CWR flag
+0    >[noecn] .      4:4(0) ack 6501 <...>


+0.31  <[ect0]    .   6501:7501(1000) ack 4 win 65535
+0.1    >[noecn] .      4:4(0) ack 7501 <...>

+0.05 close(4) = 0
+0    > [noecn] F. 14601:14601(0) ack 1
+0    < [noecn] F. 1:1(0) ack 14602 win 65535
+0    > [noecn] . 14602:14602(0) ack 2

// Restore defaults
`sysctl net.inet.tcp.ecn.enable=2`