Page MenuHomeFreeBSD

tcp: Align cubic cc with rfc9438
Needs ReviewPublic

Authored by rscheff on Sep 4 2023, 7:55 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Apr 26, 1:39 AM
Unknown Object (File)
Wed, Apr 24, 6:15 PM
Unknown Object (File)
Tue, Apr 23, 4:58 PM
Unknown Object (File)
Mon, Apr 8, 2:05 PM
Unknown Object (File)
Fri, Mar 29, 8:02 PM
Unknown Object (File)
Mar 25 2024, 10:33 PM
Unknown Object (File)
Jan 25 2024, 9:31 PM
Unknown Object (File)
Jan 19 2024, 10:05 PM
Subscribers

Details

Reviewers
cc
tuexen
rrs
jtl
lstewart
glebius
Group Reviewers
transport
Summary

Patch of the cubic congestion control module, provided by
Bhaskar Pardeshi of VMware, Inc.

The primary motivation was to make the performance of cubic_cc
align more closely with the newreno_cc when the RTT is very low.

  1. Renamed a few cubic state variables to the variable names found in the RFC 8312bis specification. I thought it makes the code more understandable for the person reading the RFC and the code. It also makes the variable naming convention more uniform.

Note: This aspect was already committed by D40436

  1. Updated "cubic_k" with the new formula as mentioned in the RFC. The new formula additionally uses "cwnd_epoch" for calculating K.
  1. The major change this patch does is to update the Reno-friendly estimate calculation in the Congestion Avoidance stage. The new formula mentioned in the RFC is used to do this. Updated the "tf_cwnd" function to reflect this formula and introduced a new constant "alpha". Most of the code change is because of the change in the Reno-friendly estimate. Most of the new variables are added because of this only.
  1. After a 3-dupack / ECN-Echo congestion event, the ssthresh was earlier being updated using the current max congestion window value. However, as per RFC 8312bis spec, I have updated the code to use the in-flight data size to update ssthresh.
  1. This patch also resolves a bug in the usage of CUBICFLAG_CONG_EVENT flag. In RTO_ERR case, this flag was being cleared without even checking if there was any other congestion event way before the previous RTO event. I partially reverted the upstream FreeBSD CUBIC change which introduced this flag. Instead of a flag, we are now using a counter to remember the number of congestion events. I guess this flag was introduced in this change - https://reviews.freebsd.org/D23655. I have not removed the flag definition though.
  1. Also, I have updated the code to collect the "epoch" variables (t_epoch, cwnd_epoch, W_est) at a single place (in cubic_ack_received()) rather than collecting them in cubic_cong_signal(). This makes it easier to reset the epoch from multiple places, without using special flags.
  1. Currently, in the CUBIC concave/convex region the cwnd was being set to w_cubic_next directly. However, the RFC mentioned that we MUST increase the window by (target - cwnd) / cwnd only. As a matter of fact, this was mentioned in the old RFCs and drafts too. I have updated the code to increase the window only by (target - cwnd) / cwnd only.

ToDo: Rename references to draft 8312bis to the final RFC 9438.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 56087
Build 52976: arc lint + arc unit

Event Timeline

Thanks for having a plan to sync-up with rfc9438, which is released recently in Aug 2023 and obsoletes rfc8312.

This revision is now accepted and ready to land.Sep 7 2023, 3:35 PM
In D41715#950701, @cc wrote:

Thanks for having a plan to sync-up with rfc9438, which is released recently in Aug 2023 and obsoletes rfc8312.

The jump is from pre-8312 to rfc9438. I will roll at least one update with all the comments changing from 8312bis-draft-15 to rfc9438, but really some regression / performance testing of this patch would be needed. Maybe some servers can run this as patches cc_cubic and statistical data collected (Lawrence may be delighted too ... )

In D41715#950701, @cc wrote:

Thanks for having a plan to sync-up with rfc9438, which is released recently in Aug 2023 and obsoletes rfc8312.

The jump is from pre-8312 to rfc9438. I will roll at least one update with all the comments changing from 8312bis-draft-15 to rfc9438, but really some regression / performance testing of this patch would be needed. Maybe some servers can run this as patches cc_cubic and statistical data collected (Lawrence may be delighted too ... )

Understood. The 8312bis-draft-15 is shaped/finalized into rfc9438 with no changes in formulas. I will help with some emulation tests.

  • update comments from draft-8312bis-15 to RFC9438
This revision now requires review to proceed.Sep 11 2023, 11:58 AM
sys/netinet/cc/cc_cubic.h
228

CUBIC_SHIFT == 8: means (2 * CUBIC_SHIFT) ==16

Why double CUBIC_SHIFT left shift here?

sys/netinet/cc/cc_cubic.c
349

The "target" window unit is bytes, then there shouldn't be an extra multiply of "CCV(ccv, t_maxseg)".

sys/netinet/cc/cc_cubic.h
328

Rename the variable "west" to "W_est".

Also the Figure 4 in RFC 9438 equals below if we use the "bytes_acked" and "cwnd" unit in bytes. There is no need of "smss" in the calculation.

(Figure 4): W_est = W_est + Alpha_cubic * bytes_acked / cwnd

sys/netinet/cc/cc_cubic.c
339–340

Better use UINT_MAX, instead of INT_MAX.

351

Better use UINT_MAX, instead of INT_MAX.

457–458

I think for some clarity of the two CC_NDUPACK and CC_ECN cases, we can rework cubic_ssthresh_update(), and rename it to cubic_get_ssthresh() like below and explicitly update CCV(ccv, snd_ssthresh).

like this:

if (!IN_CONGRECOVERY(CCV(ccv, t_flags))) {
        ssthresh = cubic_get_ssthresh(ccv, mss);
        CCV(ccv, snd_ssthresh) = max(ssthresh, 2 * maxseg);
...
}
475–478

There is a redundant pipe calculation where cubic_ssthresh_update() already did it. If use the new cubic_get_ssthresh() mentioned above, then, these three lines can be reworked to:

if (!IN_CONGRECOVERY(CCV(ccv, t_flags))) {
        ssthresh = cubic_get_ssthresh(ccv, mss);
        CCV(ccv, snd_cwnd) = ulmax(ssthresh, CCV(ccv, t_maxseg));
        CCV(ccv, snd_ssthresh) = max(ssthresh, 2 * maxseg);
...
}
665

May rework this cubic_ssthresh_update() function like below, for better clarity.

static uint32_t
cubic_get_ssthresh(struct cc_var *ccv, uint32_t maxseg) {
...
        return (pipe * CUBIC_BETA) >> CUBIC_SHIFT;
}
sys/netinet/cc/cc_cubic.c
503–504

According to RFC9438 Section 4.8, CUBIC follows Reno to reduce cwnd [RFC5681] but sets ssthresh using βcubic (same as in Section 4.6), so here snd_ssthresh should be updated as:

	ssthresh = (cubic_compute_pipe(ccv) * CUBIC_BETA) >> CUBIC_SHIFT;
        CCV(ccv, snd_ssthresh) = max(ssthresh, 2 * mss);
}
sys/netinet/cc/cc_cubic.c
304

According to rfc9438 section 4.10. Slow Start, K is set to 0 when exiting the first slow start without incurring any packet loss (W_max is undefined).

313

According to RFC9438 Section 4.8 Timeout, W_est SHOULD be set to the congestion window size at the beginning of the current congestion avoidance stage. So the code branch shall be like this:

if (cubic_data->flags & CUBICFLAG_RTO_EVENT) {
        cubic_data->flags &= ~CUBICFLAG_RTO_EVENT;
        W_est = CCV(ccv, snd_cwnd);
        cubic_data->W_max = cubic_data->W_est = W_est;
        cubic_data->K = 0;
} else {
        usecs_since_epoch = (ticks - cubic_data->t_epoch) * tick;
...
        W_est = cubic_data->W_est = tf_cwnd(...);
}
498–500

Missing both "undo_ssthresh" and "undo_cwnd" variables, and their value assignments.

But I wonder this is the place that differs from the RFC9438 Section 4.9 Spurious Congestion Events, where these "undo_*" variables are used for Spurious Fast Retransmits (section 4.9.2.) and only if "cwnd < cwnd_prior".

sys/netinet/cc/cc_cubic.h
228

CUBIC_SHIFT == 8: means (2 * CUBIC_SHIFT) ==16

Why double CUBIC_SHIFT left shift here?

On a second understanding, the old ONE_SUB_CUBIC_BETA (which contains 8bits left shift) is removed in the new formula. Therefore, adding the 8bits left shift makes sense now.

sys/netinet/cc/cc_cubic.h
202

Rename the variable "west" to "W_est".

Assuming the "cwnd" has a unit of byte, then the "smss" variable is no longer needed as in Figure 4.
(Figure 4): W_est = W_est + Alpha_cubic * bytes_acked / cwnd

cc requested changes to this revision.Sep 25 2023, 8:51 PM

Please consider resolving the problems in my comments.

This revision now requires changes to proceed.Sep 25 2023, 8:51 PM
rscheff marked 8 inline comments as done.
  • update comments from draft-8312bis-15 to RFC9438
  • address review comments

With the exception of the RTO rollback, I believe I addressed all your comments; Thanks for double-checking the provided patches against RFC9438. Since there are a few logic differences and obviously the changes in the calculations, proper operation should be validated in a test bed before the commit.

sys/netinet/cc/cc_cubic.c
304

That section in 4.10 talks about the interaction with hystart++, when existing without loss; the base stack doesn't do hystart++, and exists slow start only after loss or ECN mark - with a defined cwnd at that moment of a congestion signal.

cc requested changes to this revision.Oct 28 2023, 12:54 PM
cc added inline comments.
sys/netinet/cc/cc_cubic.c
304

I meant "K is set to 0 when CUBIC uses HyStart++ and when it exits the first slow start without incurring any packet loss". But I may tell the difference after some emulation tests.

459

Sorry for the confusion. I meant to use "mss". So this line should be:

	CCV(ccv, snd_ssthresh) = ulmax(ssthresh, 2 * mss);
476–477

Sorry for the confusion. I meant to use "mss". So these two lines should be:

	CCV(ccv, snd_ssthresh) = ulmax(ssthresh, 2 * mss);
	CCV(ccv, snd_cwnd) = ulmax(ssthresh, mss);
503–504

I prefer to stick with the two lines code for better readability. Will this single line benefit performance?

sys/netinet/cc/cc_cubic.h
202

I meant to remove the "smss" variable. Because:

(Figure 4): W_est = W_est + Alpha_cubic * segments_acked / cwnd_in_segs == W_est + Alpha_cubic * bytes_acked / cwnd_in_bytes

328

I meant to remove the "smss" variable. Because:

(Figure 4): W_est = W_est + Alpha_cubic * segments_acked / cwnd_in_segs == W_est + Alpha_cubic * bytes_acked / cwnd_in_bytes

This revision now requires changes to proceed.Oct 28 2023, 12:54 PM
rscheff marked 8 inline comments as done.
  • fix seg/byte calculation, use precomputed mss in more places
sys/netinet/cc/cc_cubic.c
498–500

the tp->ssthresh_prev are unrolled in the generic CC_RTO_ERR handling, before the CC specific callout is done...
Also, while FBSD emits DSACK when duplicate data segments are received, that information is currently not processed on receipt - so there is no spurious fast retransmit detection yet. (section 4.9.2 doesn't apply).