Page MenuHomeFreeBSD

New SACK (RFC6675)
ClosedPublic

Authored by rscheff on Jan 26 2019, 9:29 PM.

Details

Summary

RFC6675, the update to SACK loss recovery (RFC3517), has the following new features:

  • an improved pipe calculation, which was already partially implemented*
  • an improved logic, when to enter Loss Recovery
  • as icing on the cake, RescueRetransmission was added

With RFC6675, each independed SACK block, and each SMSS number of data SACKed, are considered for the purpose of having crossed the DupThresh threshold.
E.g. unlike before, where 3 independent duplicate ACKs were required (4 ACKs with the same Acknowledgement number), even the first DupACK seen could start
loss recovery, provided the SACK information contained indicates that more than 2 MSS blocks were received (2 MSS + 1 byte, spanning at least 3 different
segments).

Especially in situations where the return path exhibits high (ACK) loss rates ("ACK thinning"), this can help starting loss recovery more timely.

The RescueRetransmission is a feature in particular for use in Request-Response type of traffic, where additional data may only be made available to TCP,
after all the previous data has been delivered in full to the client. This pattern is prevalent in IO (storage, e.g. NFS) environments. While at the end
of loss recovery (all known missing data has been retransmitted, but no SACK block covering Recovery was receives, and no additional data is available to
send), the sender will transmit the very last segment once more, if possible. If a burst loss happend to drop a 1 or more packets just left to snd_max,
without this feature TCP has to revert to RTO to recover from such a situation.

  • The prior rfc6675_pipe calculation only considered data received in the last SACK option. For more than (typically) 3 seperate losses in a window, the

TCP option itself does not the entire relevant data. D18624 addresses this, and provides additional state data relevant for PRR too.

This Patch also contains D18624, as RFC6675 makes use of many of these SACK state variables, and they should be calculated properly.

All changes use the sysctl "rfc6675_pipe" for now.

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

  • missed to revert tcpcb variable name used during testing

Packetdrill script to validate both new functionalities of Rescue Retransmission, as well as PartialAck with 3 MSS Sack Block to enter Loss Recovery.

sys/netinet/tcp_input.c
2504

New variables should not be declared in the middle of a function.

  • updating diff to head and removing unrelated whitespace changes
rscheff added inline comments.
sys/netinet/tcp_input.c
2491

remove style changes.

2646–2660

remove superfluous (), and move comment within the bracket, and rewrite comment

Add documentation that do_rfc6675 will enable rescue retransmission with this patch.

Discussed in the transport call again: rebase to main, validate that the code is still functional and doesn't break anything obvious. After some soak time in main, change the sysctl for rfc6675_pipe, as all the other aspects of 6675 (better trigger to enter loss recovery - enabled for a long time; improved pipe calculation and with this patch, rescue retransmission) would be in place.

Without toggling the sysctl in main, some rare codepath / compatibility issues might otherwise lurk undetected forever - but while active development work is done, fixing bugs and addressing issues would be quick.

  • Merge branch 'main' into D18985_sack_rescue_retx
  • expand tcp(4) man page to discuss new mechanisms enabled by rfc6675_pipe
  • on partial ack w/o SACK blk, use sack_partial_ack for rescuerexmit

Validated 6675 style loss recovery entry, and rescue retransmissions using the test script. Addressed an interop issue with PRR (partial ACK with no SACK blocks should be dealt with in sack_partialack, not prr_partialack).

Had to adjust the testing script to accomodate the new timing/expedited retransmissions with PRR.

Further steps discussed in this weeks transport call:

Will commit this diff, since new functionality is fenced behind the sysctl net.inet.tcp.rfc6675_pipe.

As exposure of this loss recovery improvements is typically low due to the non-default setting, in a few weeks, toggle the sysctl in HEAD to enable and collect larger feedback (network loss recovery efficiency / rate of TCP retransmission timeouts is expected to slightly decrease with PRR and 6675 newsack.

If results are as expected, rename "rfc6675_pipe" into "newsack" (it has been discussed that mentioning RFCs directly in sysctls is a bad practise) and MFC to stable/13, enabled by default.

FWIW, I did perform a statistically questionable quick "test", comparing this patch (with rfc6675_pipe enabled) against HEAD just running PRR, by effectively browsing over some Alexa Top-343 websites.
Number of RTO went down, exchanged SACK blocks up (which would be expected).
Average and Stdev for flow completion went down (~10%), while median went up slightly (1%). However, this was not a true valid test, as caching and temporary effects could have impacted these results, especially as only the get request itself (with all the headers and redirections) could have improved in goodput.

This revision is now accepted and ready to land.Feb 16 2021, 10:58 AM