Differential D19047

TCP Dynamic Burst Limit
Needs ReviewPublic
Actions

Authored by rscheff on Jan 31 2019, 6:41 PM.

Details

Reviewers

cc
thj

Group Reviewers

transport

Summary

A very long time ago, the simplistic Burst mitigation of BSD4.3 was commented out
rS87145.

Details are sketchy, but a properly working client sending out a delayed ACK every
other received segment should be fine with the old burst limit of 4.

However, with IW10, the minimum burst size needs to track this. Furthermore,
ACK compressing (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5d9f4262b7ea41ca9981cc790e37cca6e37c789e)
can result in a single ACK only after a high number of segments.

For TCP to maintain reasonably responsive, a minimum number of 2 ACKs per window
is expected - thus a maximum burst limit of cwnd/2.

This patch makes the minimum maxburst value at least as large as the initial window,
and the largest maxbust values - when pipe is equal to half a window - also to cwnd/2.

ToDo: Discuss if initcwnd should be stored in tcpcb directly, when recalculated frequently
in the transmit path.

Diff Detail

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 22301
Build 21490: arc lint + arc unit

Event Timeline

rscheff created this revision.Jan 31 2019, 6:41 PM

Herald added a reviewer: transport. · View Herald TranscriptJan 31 2019, 6:41 PM

Harbormaster completed remote builds in B22301: Diff 53486.Jan 31 2019, 6:42 PM

Harbormaster completed remote builds in B22301: Diff 53486.

According to this https://www.ietf.org/proceedings/88/slides/slides-88-tcpm-9.pdf Linux - at least at some point - also used a dynamic limit with pipe (inflight) as an input parameter: maxburst = pipe +3.

thj added a reviewer: thj.Jul 4 2019, 2:23 PM

I have been digging and I think I have all the background for why this was
removed in r87145.

As I understand, max burst limits the amount of data we will put into the
network by maxburst*acks_per_rtt. With delayed acks, acks_per_rtt can become 1
and so our window becomes maxburst (4 in the original implementation).

There is a bug which documents this and several threads on freebsd-hackers@ in
November and December 2001. The bug references 'mailing list' discussion which
I have included the parts of.

Original Bug:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=32141
    https://docs.freebsd.org/cgi/getmsg.cgi?fetch=614764+0+archive/2001/freebsd-bugs/20011125.freebsd-bugs
    https://docs.freebsd.org/cgi/getmsg.cgi?fetch=663633+0+archive/2001/freebsd-bugs/20011202.freebsd-bugs

on the mailing list might refer to:
    https://docs.freebsd.org/cgi/getmsg.cgi?fetch=881527+0+archive/2001/freebsd-hackers/20011202.freebsd-hackers

this post links the bug to the 'FreeBSD performing worse than Linux?' thread
    https://docs.freebsd.org/cgi/getmsg.cgi?fetch=776672+0+archive/2001/freebsd-hackers/20011202.freebsd-hackers
    
    https://docs.freebsd.org/cgi/getmsg.cgi?fetch=867302+0+archive/2001/freebsd-hackers/20011202.freebsd-hackers

Furhter, in 2014 Illunos fixed a similar issue related to poor performance with
max burst.

Burst limitation was removed from delphix, discussed in an Illumos issue:

https://github.com/delphix/delphix-os/commit/86e502910bb74d1b60b08940d03230aa514504ac
https://www.illumos.org/issues/5295

From this reading, I think max burst size has some problems. I think
draft-hughes-restart-00 (https://tools.ietf.org/html/draft-hughes-restart-00)
covers the problems with burst mitigation and some potential solutions.

Using a maxburst alone is going to have this issue with the ack clock unless
you introduce an additional mechanism, one of the mechanisms in the restart
draft is a send timer. That might be somewhere to look.

swills added a subscriber: swills.Jul 10 2019, 3:00 PM

imp added a subscriber: imp.Jul 10 2019, 3:34 PM

imp added inline comments.

sys/netinet/tcp_output.c
1610	one would think a symbolic constant here might be in order here. Also, the comment below seems stale...

I believe rS87145 was submitted because of some unexpected DelayedACK
implementation, or it might be just the DelayedACK was expecting two full MSS
(instead of two segments that updated later) in FreeBSD 4.2/4.3 in 2001.

ref:
https://docs.freebsd.org/cgi/getmsg.cgi?fetch=881527+0+archive/2001/freebsd-hackers/20011202.freebsd-hackers

Without the patch ("means rS87145"), two things will solve or partially solve the problem:

Turn off delayed acks on the receiver (performance 80K->6.8MB/sec)

OR

Turn off newreno on the transmitter. (performance 80K->7.9MB/sec)

cc added inline comments.Aug 7 2019, 6:52 PM

sys/netinet/tcp_output.c
261–263	Can you elaborate how this translates to max cwnd/2? Is that divide and then multiply (tp->snd_cwnd>>1) again necessary?
1617–1618	If TSO is on, this is the number of max TSO chunks to be out, not packets.

rscheff marked an inline comment as not done.Aug 8 2019, 11:40 PM

rscheff added inline comments.

sys/netinet/tcp_output.c
261–263	Yes, you are correct - i couldn't translate my excel formula properly here. See below for the intended burst size relative to flightsize vs cwnd The idea here is, to set maxburst relative to flight_size, limiting the maximum burst to half cwnd, when flightsize(pipe) is also half cwnd (allowing for clients that ACK only twice per RTT), but smaller maximum bursts, when the pipe is nearly empty or nearly full - to prevent line-rate bursts of huge cwnd from occuring (as we have seen during cubic testing). The formula (relative to cwnd) here should be maxburst [bytes] = (0.5cwnd [bytes])-abs(flightsize [bytes]-(0.5cwnd [bytes])). thus maxburst = (tp->snd_cwnd>>1) - abs( (tp->snd_max - tp->snd_una) - (tp->snd_cwnd>>1) ). The drawback compared to pacing is, that a full flight will not necessarily be re-established within one RTT, but a much more agressive ramp up to cwnd will happen over a few RTTs (potentially with bursts too large to get absorbed by network queue buffers) https://www.wolframalpha.com/input/?i=plot+y%3Dmax(0.1,0.5-abs(x-0.5))+from+x%3D0+to+1
1617–1618	Yes, I have not looked into dealing with TSO chunks; Ideally, that should be taken into account as well...

Revision Contents
Changeset List

Path

Size

sys/

netinet/

tcp_output.c

15 lines

Diff 53486

View Options

TCP Dynamic Burst LimitNeeds ReviewPublicActions