Initial retransmit timeout improperly set
ClosedPublic
Actions

Authored by rrs on May 8 2021, 8:42 AM.

Details

Reviewers

tuexen

Group Reviewers

transport

Commits

rG87cf5dcc3335: tcp:Host cache and rack ending up with incorrect values.
rG9867224bab3f: tcp:Host cache and rack ending up with incorrect values.

Summary

In some cases rack end up with an incorrect RTT set initially. In particular
the test case is where we have a long RTT, the server sends the
initial message after the 3-way handshake. Srtt and rttvar
end up the correct values, but tp->t_rtxcur does not. Usually
quite a smaller value. This causes all kinds of trouble in 2 TLP's and
finally a RXT that knock the cwnd to 1 MSS. The consequences
of this are the connection crawls.

What should be happening is we call the proper t_rxtcur set macro
after setting up properly the srtt and rttvar.

Test Plan

run the particular sendfile tests where the server
sends first over a long RTT path.

This pkt drill script can be used to see the bogus values that end up in
the hostcache after it runs and updates the hc from rack.

With the fix, the hc values should look normal.

--ip_version=ipv4

+0.00 sysctl -w net.inet.tcp.syncookies_only=0
+0.00 sysctl -w net.inet.tcp.syncookies=1
+0.00 sysctl -w net.inet.tcp.rfc1323=1
+0.00 sysctl -w net.inet.tcp.sack.enable=1
+0.00 sysctl -w net.inet.tcp.ecn.enable=2
Create a TCP endpoint in the ESTABLISHED state.
+0.00 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.00 fcntl(3, F_GETFL) = 0x02 (flags O_RDWR)
+0.00 bind(3, ..., ...) = 0
+0.00 listen(3, 1) = 0
+0.000 < S 0:0(0) win 1500 <mss 1460, sackOK, eol, eol>
+0.000 > S. 0:0(0) ack 1 win 65535 <...>
+0.190 < . 1:1(0) ack 1 win 40000
+0.000 accept(3, ..., ...) = 4
+0.00 setsockopt(4, IPPROTO_TCP, TCP_LOG, [4], 4) = 0
+0.00 setsockopt(4, IPPROTO_TCP, TCP_FUNCTION_BLK, {function_set_name="rack_latest", pcbcnt=0}, 36) = 0
+0.000 write(4, ..., 5) = 5
+0.00 > P. 1:6(5) ack 1 win 65535
+0.200 < . 1:1(0) ack 6 win 40001
+0.100 write(4, ..., 1448) = 1448
+0.00 > P. 6:1454 (1448) ack 1 win 65535
+0.200 < . 1:1(0) ack 1454 win 40001
+0.100 write(4, ..., 1448) = 1448
+0.00 > P. 1454:2902 (1448) ack 1 win 65535
+0.200 < . 1:1(0) ack 2902 win 40001
+0.100 write(4, ..., 1448) = 1448
+0.00 > P. 2902:4350 (1448) ack 1 win 65535
+0.200 < . 1:1(0) ack 4350 win 40001
Tear it down.
+2.100 close(4) = 0
+0.00 > F. 4350:4350 (0) ack 1 win 65535
+0.200 < F. 1:1(0) ack 4351 win 40002
+0.00 > . 4351:4351 (0) ack 2 win 65535

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

rrs created this revision.May 8 2021, 8:42 AM

Herald added 1 blocking reviewer(s): transport. · View Herald TranscriptMay 8 2021, 8:42 AM

Herald added a subscriber: melifaro. · View Herald Transcript

rrs requested review of this revision.May 8 2021, 8:42 AM

Turns out the problem is far deeper. There are at least
a couple of interactions here.

Rack keeps its srtt/rttvar in microseconds (no longer the 5 bit fractional stuff). When we destroy a tcb, the fini() function needs to be called *before* we update the host cache.

The hostcache the way it was being called could be called multiple times for the same TCB which is not good.

When rack inits it needs to do its own query of the hostcache and then properly translate the information into its representation.

tuexen added inline comments.May 10 2021, 2:58 PM

sys/netinet/tcp_stacks/rack.c
6577	You are definitely missing a `tcp_hc_get()` call here. General question: Can't you call `cc_conn_init()` here first and then do the conversion to the RACK internal format? This would reduce the code duplication...