Page MenuHomeFreeBSD

Make After-Idle congestion control work correctly for transactional sessions.
ClosedPublic

Authored by rscheff on May 26 2020, 11:41 AM.

Details

Summary

When a certain period has passed since a TCP sender has
received network feedback on its half-connection, the
congestion window is supposed to be reset to the initial
window.

Until now, the t_rcvtime, which is updated for every
incoming segment, including pure ACKs and data, was
used as a proxy for when the last (data) transmission
was performed. This works fine for sessions doing
mostly bulk transfers in a single direction. However,
this approach fails for transactional IO, where the
server transmits large chunks of data repeatedly,
after the client requests data with a variable pause
in between requests.

In that case, the incoming request would effectively
reset t_rcvtime, and the sender would retain the last
value of its congestion window, however large that
may have been. Ultimately, this results in a large
burst of data to be transmitted blindly into the
network at wirespeed, without considering any potentially
changed network conditions. This can exacerbate any
induced packet losses significantly.

In this Diff, the existing rtt sampling mechanism is
used, to gather more appropriate timestamps of when
the last data segment was sent, and the check, if an
RTT sampling is currently runnig is moved from looking
at t_rtttime to t_rtseq.

Further, we also slightly adjust these variables, in
case they happen to be zero when a new sampling is
started.

There is a minuscle chance, that a dramatically delayed
RTT sample is collected, when a data segment happens
to end with an absolute sequence number of zero (as
that would not stop the RTT sample immediately), and at
that very moment, no further data is exchanged until a
much later time. However, this would always be a transient
effect, as sRTT and RTTvar will converge quickly to
appropriate values again, and the excessive timeout
value may not even be utilized at all either.

Reported-by: rrs

Test Plan

See attached packetdrill script

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

The packetdrill script newreno-after-idle-server.pkt will demonstrate the issue of the inactive (receive only) half session not resetting cwnd after-idle.

While investigating this problem futher, there is a more severe implication:

Skipping over the after_idle function in transactional sessions running Cubic, will also retain a (very old) cubic epoch time. This means, that an arbitrarily large jump in cwnd will happen, only depending on how long the client paused in performing IO requests (and at what intensity - as cwnd is only adjusted, when the sending half-session is not app-limited, but cwnd limited).

In my testing, skipping over after-idle in cubic (after a 77 sec pause in IO requests; the pause started about 1 sec after a new cubic epoch), followed by a phase of low-intensity IO (server being application limited rather cwnd limited, thus cwnd remaining untouched) caused the cubic cwnd update only after 568 sec - with a sudden jump from 36kB to 573kB (and a burst of line-rate traffic of comparable size, followed by self-inflicted loss, loss of fast retransmissions, and RTOs).

These severe implications are unlikely to have a similar devastating effect when NewReno CA very slowly grows cwnd - but the absolute-time dependency of cubic emphasises this problem.

rrs requested changes to this revision.Jun 4 2020, 5:31 PM

I don't mind you tracking t_rtseq here, but please do not change the idle reduction in rack. I will go dig
up the right variable and change this to use that. Your are welcome to maintain t_rtseq but please don't change
the rack behavior here.

sys/netinet/tcp_stacks/rack.c
10998 ↗(On Diff #72279)

Please back this change out

12151 ↗(On Diff #72279)

Please back this change out

This revision now requires changes to proceed.Jun 4 2020, 5:31 PM
  • Merge branch 'master' into D25016_fix_afteridle_timer
  • make use of a new tcpcb variable to track last send time

Addressed all comments from rrs@ with this update.

The provided packetdrill script confirms, that server-side after-idle reset of the cwnd is still happening as expected.

This revision is now accepted and ready to land.Jun 18 2020, 5:27 PM