Page MenuHomeFreeBSD

TCP Dynamic Burst Limit
Needs ReviewPublic

Authored by rscheff on Jan 31 2019, 6:41 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Apr 18, 12:00 PM
Unknown Object (File)
Feb 22 2024, 1:46 PM
Unknown Object (File)
Dec 20 2023, 1:38 AM
Unknown Object (File)
Dec 11 2023, 12:13 PM
Unknown Object (File)
Dec 10 2023, 2:03 PM
Unknown Object (File)
Dec 8 2023, 9:59 PM
Unknown Object (File)
Nov 19 2023, 12:56 PM
Unknown Object (File)
Nov 19 2023, 12:48 PM
Subscribers

Details

Reviewers
cc
thj
Group Reviewers
transport
Summary

A very long time ago, the simplistic Burst mitigation of BSD4.3 was commented out
rS87145.

Details are sketchy, but a properly working client sending out a delayed ACK every
other received segment should be fine with the old burst limit of 4.

However, with IW10, the minimum burst size needs to track this. Furthermore,
ACK compressing (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5d9f4262b7ea41ca9981cc790e37cca6e37c789e)
can result in a single ACK only after a high number of segments.

For TCP to maintain reasonably responsive, a minimum number of 2 ACKs per window
is expected - thus a maximum burst limit of cwnd/2.

This patch makes the minimum maxburst value at least as large as the initial window,
and the largest maxbust values - when pipe is equal to half a window - also to cwnd/2.

ToDo: Discuss if initcwnd should be stored in tcpcb directly, when recalculated frequently
in the transmit path.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 22301
Build 21490: arc lint + arc unit

Event Timeline

According to this https://www.ietf.org/proceedings/88/slides/slides-88-tcpm-9.pdf Linux - at least at some point - also used a dynamic limit with pipe (inflight) as an input parameter: maxburst = pipe +3.

I have been digging and I think I have all the background for why this was
removed in r87145.

As I understand, max burst limits the amount of data we will put into the
network by maxburst*acks_per_rtt. With delayed acks, acks_per_rtt can become 1
and so our window becomes maxburst (4 in the original implementation).

There is a bug which documents this and several threads on freebsd-hackers@ in
November and December 2001. The bug references 'mailing list' discussion which
I have included the parts of.

Original Bug:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=32141
    https://docs.freebsd.org/cgi/getmsg.cgi?fetch=614764+0+archive/2001/freebsd-bugs/20011125.freebsd-bugs
    https://docs.freebsd.org/cgi/getmsg.cgi?fetch=663633+0+archive/2001/freebsd-bugs/20011202.freebsd-bugs

on the mailing list might refer to:
    https://docs.freebsd.org/cgi/getmsg.cgi?fetch=881527+0+archive/2001/freebsd-hackers/20011202.freebsd-hackers

this post links the bug to the 'FreeBSD performing worse than Linux?' thread
    https://docs.freebsd.org/cgi/getmsg.cgi?fetch=776672+0+archive/2001/freebsd-hackers/20011202.freebsd-hackers
    
    https://docs.freebsd.org/cgi/getmsg.cgi?fetch=867302+0+archive/2001/freebsd-hackers/20011202.freebsd-hackers

Furhter, in 2014 Illunos fixed a similar issue related to poor performance with
max burst.

Burst limitation was removed from delphix, discussed in an Illumos issue:

https://github.com/delphix/delphix-os/commit/86e502910bb74d1b60b08940d03230aa514504ac
https://www.illumos.org/issues/5295

From this reading, I think max burst size has some problems. I think
draft-hughes-restart-00 (https://tools.ietf.org/html/draft-hughes-restart-00)
covers the problems with burst mitigation and some potential solutions.

Using a maxburst alone is going to have this issue with the ack clock unless
you introduce an additional mechanism, one of the mechanisms in the restart
draft is a send timer. That might be somewhere to look.

imp added inline comments.
sys/netinet/tcp_output.c
1610

one would think a symbolic constant here might be in order here.

Also, the comment below seems stale...

I believe rS87145 was submitted because of some unexpected DelayedACK
implementation, or it might be just the DelayedACK was expecting two full MSS
(instead of two segments that updated later) in FreeBSD 4.2/4.3 in 2001.

ref:
https://docs.freebsd.org/cgi/getmsg.cgi?fetch=881527+0+archive/2001/freebsd-hackers/20011202.freebsd-hackers

Without the patch ("means rS87145"), two things will solve or partially solve the problem:

  • Turn off delayed acks on the receiver (performance 80K->6.8MB/sec)

OR

  • Turn off newreno on the transmitter. (performance 80K->7.9MB/sec)
sys/netinet/tcp_output.c
261–263

Can you elaborate how this translates to max cwnd/2? Is that divide and then multiply (tp->snd_cwnd>>1) again necessary?

1617–1618

If TSO is on, this is the number of max TSO chunks to be out, not packets.

rscheff marked an inline comment as not done.Aug 8 2019, 11:40 PM
rscheff added inline comments.
sys/netinet/tcp_output.c
261–263

Yes, you are correct - i couldn't translate my excel formula properly here. See below for the intended burst size relative to flightsize vs cwnd

The idea here is, to set maxburst relative to flight_size, limiting the maximum burst to half cwnd, when flightsize(pipe) is also half cwnd (allowing for clients that ACK only twice per RTT), but smaller maximum bursts, when the pipe is nearly empty or nearly full - to prevent line-rate bursts of huge cwnd from occuring (as we have seen during cubic testing).

The formula (relative to cwnd) here should be
maxburst [bytes] = (0.5*cwnd [bytes])-abs(flightsize [bytes]-(0.5*cwnd [bytes])).

thus

maxburst = (tp->snd_cwnd>>1) - abs( (tp->snd_max - tp->snd_una) - (tp->snd_cwnd>>1) ).

The drawback compared to pacing is, that a full flight will not necessarily be re-established within one RTT, but a much more agressive ramp up to cwnd will happen over a few RTTs (potentially with bursts too large to get absorbed by network queue buffers)

https://www.wolframalpha.com/input/?i=plot+y%3Dmax(0.1,0.5-abs(x-0.5))+from+x%3D0+to+1

1617–1618

Yes, I have not looked into dealing with TSO chunks; Ideally, that should be taken into account as well...