Details

Reviewers

lstewart
gnn
sbruno
hiren

Group Reviewers

transport

Commits

rS331567: CC Cubic: fix underflow for cubic_cwnd()

Summary

Singed calculations in cubic_cwnd() can result in negative cwnd
value which is then cast to an unsigned value. Values less than
1 mss are generally bad for other parts of the code, also fixed.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

jason_eggnet.com created this revision.Jan 31 2018, 4:43 AM

Harbormaster completed remote builds in B14689: Diff 38703.Jan 31 2018, 4:43 AM

jason_eggnet.com added a reviewer: transport.Jan 31 2018, 4:44 AM

jason_eggnet.com added subscribers: transport, jtl.

jason_eggnet.com removed a subscriber: transport.

jtl added a reviewer: lstewart.Jan 31 2018, 5:03 AM

Is this truly an underflow? Or, is it a low number caused by an overflow in one of the earlier calculations? (It might help if you could give a precise example from your testing.)

The distinction is important because this might not be the correct solution to an overflow in one of the earlier calculations.

sys/netinet/cc/cc_cubic.h
164 ↗	(On Diff #38703)	It sounds like you may be doing the action item @lstewart left for himself. :-)

In D14141#296724, @jtl wrote:

Is this truly an underflow? Or, is it a low number caused by an overflow in one of the earlier calculations? (It might help if you could give a precise example from your testing.)

The distinction is important because this might not be the correct solution to an overflow in one of the earlier calculations.

There definitely is a problem with overflow when ticks_since_cong is too high. The fix for that isn't as clear as the fix for this issue, hence @lstewart's punt to the future. But it doesn't take very many ticks to overflow, so we should definitely tackle that fix soon.

The problem in this review is with a negative cwnd (which becomes an underflow once cast on return) and has I believe a simple solution. Here's a test program and results with inputs seen in production.

It's possible the fix to the overflow issue has an impact to the solution for underflow, if we want to wait for that, but I think this fix here will remain part of the solution for both.

test.c:

#include <stdio.h>

#define CUBIC_C_FACTOR          102
#define CUBIC_SHIFT             8
#define CUBIC_SHIFT_4           32

int hz = 1000;

unsigned long
cubic_cwnd(int ticks_since_cong, unsigned long wmax, unsigned int smss, long K)
{
        long cwnd;

        /* K is in fixed point form with CUBIC_SHIFT worth of precision. */

        /* t - K, with CUBIC_SHIFT worth of precision. */
        cwnd = ((long)(ticks_since_cong << CUBIC_SHIFT) - (K * hz)) / hz;

        printf("tmp cwnd %ld\n", cwnd);

        /* (t - K)^3, with CUBIC_SHIFT^3 worth of precision. */
        cwnd *= (cwnd * cwnd);

        printf("tmp cwnd2 %ld\n", cwnd);

        printf("final cwnd %ld\n", (cwnd * CUBIC_C_FACTOR * smss) >> CUBIC_SHIFT_4);
        /*
         * C(t - K)^3 + wmax
         * The down shift by CUBIC_SHIFT_4 is because cwnd has 4 lots of
         * CUBIC_SHIFT included in the value. 3 from the cubing of cwnd above,
         * and an extra from multiplying through by CUBIC_C_FACTOR.
         */
        cwnd = ((cwnd * CUBIC_C_FACTOR * smss) >> CUBIC_SHIFT_4) + wmax;

        return cwnd;
}

int main(int argc, char **argv) {
        unsigned long cwnd;

        cwnd = cubic_cwnd(170 + 56, 1460UL, 1200, 462L);

        printf("cwnd %lu\n", cwnd);
}

$ cc -O2 test.c
$ ./a.out
tmp cwnd -404
tmp cwnd2 -65939264
final cwnd -1880
cwnd 18446744073709551196

sbruno added a subscriber: sbruno.Feb 3 2018, 4:25 PM

gnn accepted this revision.Feb 4 2018, 7:26 AM

This revision is now accepted and ready to land.Feb 4 2018, 7:26 AM

I remember this one. :-)
jegg, can you please elaborate on 'cwnd < 1smss bad for other parts of the code' part?
Thanks.

In D14141#297944, @hiren wrote:

I remember this one. :-)
jegg, can you please elaborate on 'cwnd < 1smss bad for other parts of the code' part?
Thanks.

A zero cwnd can cause infinite looping (i.e., continual goto again; looping) within tcp_output(). Technically even one byte would be ok but there's no real point to having the cwnd be some fractional mss value.

In general the concept of having a zero cwnd doesn't make any sense.

In D14141#297953, @jason_eggnet.com wrote:

In D14141#297944, @hiren wrote:

I remember this one. :-)
jegg, can you please elaborate on 'cwnd < 1smss bad for other parts of the code' part?
Thanks.

A zero cwnd can cause infinite looping (i.e., continual goto again; looping) within tcp_output(). Technically even one byte would be ok but there's no real point to having the cwnd be some fractional mss value.

In general the concept of having a zero cwnd doesn't make any sense.

oh, I know 0 won't do any good. :-)

I meant why not 0 < cwnd < mss. I know we generally stick to multiple of mss for cwnd but just wondering what goes wrong if we don't. (Specially when we are bytes based as opposed to packets based wrt cwnd).

Thanks.

BTW, I am okay with this bandaid fix for now.

In D14141#297959, @hiren wrote:

In D14141#297953, @jason_eggnet.com wrote:

In D14141#297944, @hiren wrote:

I remember this one. :-)
jegg, can you please elaborate on 'cwnd < 1smss bad for other parts of the code' part?
Thanks.

A zero cwnd can cause infinite looping (i.e., continual goto again; looping) within tcp_output(). Technically even one byte would be ok but there's no real point to having the cwnd be some fractional mss value.

In general the concept of having a zero cwnd doesn't make any sense.

oh, I know 0 won't do any good. :-)

I meant why not 0 < cwnd < mss. I know we generally stick to multiple of mss for cwnd but just wondering what goes wrong if we don't. (Specially when we are bytes based as opposed to packets based wrt cwnd).

Thanks.

Ah. 0 < cwnd < mss is *probably* ok. The trickiest part to validating that would be following the cwnd == 1 (byte) codepath. But I didn't want to have to prove that for this patch.

This fix doesn't make sense to me. If it has been a long time since the last congestion epoch resulting in an overflow in the calculated congestion window, collapsing the window from <something_large> to 1 segment is a terrible idea. There's also no sound reason that cwnd shouldn't be allowed to shrink below 1 segment, and if that causes problems elsewhere, those places should be fixed.

A more appropriate fix would be to check ticks_since_cong at the top of this function and if it's greater than the maximal value at which overflow would occur, do something simple like return cwnd += bytes_this_ack i.e. switch to slowstart-like window doubling behaviour. I haven't done the math, but presume that would be less aggressive growth than the cubic function would be producing this far along the cubic growth curve, but at least keeps the window growing at a not too shabby rate.

lstewart added inline comments.Feb 7 2018, 2:13 AM

sys/netinet/cc/cc_cubic.h
164 ↗	(On Diff #38703)	Indeed and mea culpa! :/

Oh and @jason_eggnet.com , regarding your test code, I structured things the way they are so that you can simply #include <netinet/cc/cc_cubic.h> into your userspace test.c to get access to the various window calculation inlines rather than copy/pasting them.

@jason_eggnet.com I thought about this some more and while there is no doubt that overflow/underflow due to the function's inputs is possible and needs to be remedied, the cause of overflow in your test case is not in fact the time since congestion being too large, but the bogus value of K, which for a wmax of 1460 bytes (i.e. slightly more than 1 MSS) should be 205 per my quick check:

lstewart@lsl:~/devel % cat cubic_cwnd_overflow_test.c
#include <sys/types.h>
#include <netinet/cc/cc_cubic.h>
#include <stdio.h>

int hz = 1000;

int
main(int argc, char *argv[])
{
	unsigned long cwnd, cwnd_max;
	uint32_t mss;
	int ticks_since_cong;

	cwnd_max = 1460;
	mss = 1200;
	ticks_since_cong = 226;

	printf("K -- theoretical vs fixed-point : %0.3f vs %jd\n"
	    "cwnd -- theoretical vs fixed-point : %lu vs %lu\n",
	    theoretical_cubic_k((double)cwnd_max / (double)mss),
	    cubic_k(cwnd_max / mss),
	    theoretical_cubic_cwnd(ticks_since_cong, cwnd_max, mss),
	    cubic_cwnd(ticks_since_cong, cwnd_max, mss, cubic_k(cwnd_max / mss)));

	return 0;
}
lstewart@lsl:~/devel % cc -lm -o cubic_cwnd_overflow_test cubic_cwnd_overflow_test.c
lstewart@lsl:~/devel % ./cubic_cwnd_overflow_test
K -- theoretical vs fixed-point : 216.914 vs 205
cwnd -- theoretical vs fixed-point : 1344 vs 1369

j-nitrology.com added a subscriber: j-nitrology.com.Feb 8 2018, 7:09 AM

In D14141#298565, @lstewart wrote:

@jason_eggnet.com I thought about this some more and while there is no doubt that overflow/underflow due to the function's inputs is possible and needs to be remedied, the cause of overflow in your test case is not in fact the time since congestion being too large, but the bogus value of K, which for a wmax of 1460 bytes (i.e. slightly more than 1 MSS) should be 205 per my quick check:

I think you are correct. Looks like K is just stale.

I've been running with this patch and the underflow condition appears to have been resolved. Going to attempt to patch the overflow conditions (nothing proposed yet) and see if that resolves all outlying values.

diff --git a/sys/netinet/cc/cc_cubic.c b/sys/netinet/cc/cc_cubic.c
index 7571ca764ca..98fce0cf3ca 100644
--- a/sys/netinet/cc/cc_cubic.c
+++ b/sys/netinet/cc/cc_cubic.c
@@ -86,6 +86,10 @@ struct cubic {
        unsigned long   max_cwnd;
        /* cwnd at the previous congestion event. */
        unsigned long   prev_max_cwnd;
+       /* Cached value for max_cwnd when K was computed */
+       unsigned long   k_max_cwnd;
+       /* Cached value for t_maxseg when K was computed */
+       uint32_t        k_maxseg;
        /* Number of congestion events. */
        uint32_t        num_cong_events;
        /* Minimum observed rtt in ticks. */
@@ -170,6 +174,12 @@ cubic_ack_received(struct cc_var *ccv, uint16_t type)
                            cubic_data->mean_rtt_ticks, cubic_data->max_cwnd,
                            CCV(ccv, t_maxseg));

+                       if (cubic_data->max_cwnd != cubic_data->k_max_cwnd || cubic_data->k_maxseg != CCV(ccv, t_maxseg)) {
+                               cubic_data->K = cubic_k(cubic_data->max_cwnd / CCV(ccv, t_maxseg));
+                               cubic_data->k_maxseg = CCV(ccv, t_maxseg);
+                               cubic_data->k_max_cwnd = cubic_data->max_cwnd;
+                       }
+
                        w_cubic_next = cubic_cwnd(ticks_since_cong +
                            cubic_data->mean_rtt_ticks, cubic_data->max_cwnd,
                            CCV(ccv, t_maxseg), cubic_data->K);
@@ -237,6 +247,7 @@ cubic_cb_init(struct cc_var *ccv)

        /* Init some key variables with sensible defaults. */
        cubic_data->t_last_cong = ticks;
+       cubic_data->k_maxseg = CCV(ccv, t_maxseg);
        cubic_data->min_rtt_ticks = TCPTV_SRTTBASE;
        cubic_data->mean_rtt_ticks = 1;

@@ -309,6 +320,7 @@ cubic_conn_init(struct cc_var *ccv)
         * this here bad things happen when entries from the TCP hostcache
         * get used.
         */
+       cubic_data->prev_max_cwnd = CCV(ccv, snd_cwnd);
        cubic_data->max_cwnd = CCV(ccv, snd_cwnd);
 }

@@ -372,7 +384,6 @@ cubic_post_recovery(struct cc_var *ccv)

        cubic_data->epoch_ack_count = 0;
        cubic_data->sum_rtt_ticks = 0;
-       cubic_data->K = cubic_k(cubic_data->max_cwnd / CCV(ccv, t_maxseg));
 }

 /*
diff --git a/sys/netinet/cc/cc_cubic.h b/sys/netinet/cc/cc_cubic.h
index 4b452f72726..e8c2e28822c 100644
--- a/sys/netinet/cc/cc_cubic.h
+++ b/sys/netinet/cc/cc_cubic.h
@@ -184,8 +184,10 @@ cubic_cwnd(int ticks_since_cong, unsigned long wmax, uint32_t smss, int64_t K)
         */
        cwnd = ((cwnd * CUBIC_C_FACTOR * smss) >> CUBIC_SHIFT_4) + wmax;

+/*
        if (cwnd < (int64_t)smss)
                return (uint64_t)smss;
+*/

        return ((unsigned long)cwnd);
 }

@jason_eggnet.com why did you move K calculation from post recovery to ack received? I don't think that's correct.

In D14141#299283, @hiren wrote:

@jason_eggnet.com why did you move K calculation from post recovery to ack received? I don't think that's correct.

Because the only time the value of K actually matters is as an argument to the cubic_cwnd() function. That function is only called one time, in cubic_ack_received().

K itself is just the cached result of calling cubic_k(cubic_data->max_cwnd / CCV(ccv, t_maxseg))

cubic_k() is a pure function (at run time anyway) whose result only depends on the value of its inputs, the max_cwnd and t_maxseg.

So basically, why not delay the calculation of K until right before it is needed. Use the previous value of K if neither of the inputs have changed.

In D14141#299284, @jason_eggnet.com wrote:

In D14141#299283, @hiren wrote:

@jason_eggnet.com why did you move K calculation from post recovery to ack received? I don't think that's correct.

So basically, why not delay the calculation of K until right before it is needed. Use the previous value of K if neither of the inputs have changed.

I am hoping this is just the 'debugging' code to find out why we are getting a stale value of K and not something to commit.
Or I am thoroughly confused at this point. :-)

This revision now requires changes to proceed.Feb 9 2018, 6:33 PM

Fix underflow and overflow conditions in the cubic cc

delay the calculation of K until just before it is used
ensure that the value of K is correctly and minimally invalidated
check for underflow and overflow conditions in cubic_cwnd
limit maximum cwnd in cubic_ack_received

handle ticks_since_cong wrapping in cubic cc

When the maximum congestion window is computed, set CCF_MAX_CWND
to skip recomputing the cwnd. When inputs that could materially
affect the cwnd are altered, clear CCF_MAX_CWND.

This prevents ticks_since_cong from wrapping and resetting the cwnd
for a long running connection that lacks congestion.

Harbormaster completed remote builds in B14915: Diff 39124.Feb 10 2018, 1:20 AM

In D14141#299456, @hiren wrote:

In D14141#299284, @jason_eggnet.com wrote:

In D14141#299283, @hiren wrote:

@jason_eggnet.com why did you move K calculation from post recovery to ack received? I don't think that's correct.

So basically, why not delay the calculation of K until right before it is needed. Use the previous value of K if neither of the inputs have changed.

I am hoping this is just the 'debugging' code to find out why we are getting a stale value of K and not something to commit.
Or I am thoroughly confused at this point. :-)

I think you're confused :) I've updated the commit in the review formally, let me know if you have any questions, I think I've detailed it reasonably well.

off by one with overflow threshold in cubic_cwnd

Quick demonstration of overflow/underflow thresholds for cubing a 64 bit integer:

#include <stdio.h>

int main(int argc, char **argv) {
        long i = 0;
        long v = 0;
        long old_v = 0;
        for (;;) {
                v = i*i*i;
                if (v < old_v)
                        break;
                old_v = v;
                i++;
        }
        printf("positive overflow: %ld\n", i);

        i = 0;
        v = 0;
        old_v = 0;
        for (;;) {
                v = i*i*i;
                if (v > old_v)
                        break;
                old_v = v;
                i--;
        }
        printf("negative overflow: %ld\n", i);
}

$ ./a.out
positive overflow: 2097152
negative overflow: -2097153

In D14141#299658, @jason_eggnet.com wrote:

In D14141#299456, @hiren wrote:

In D14141#299284, @jason_eggnet.com wrote:

In D14141#299283, @hiren wrote:

@jason_eggnet.com why did you move K calculation from post recovery to ack received? I don't think that's correct.

So basically, why not delay the calculation of K until right before it is needed. Use the previous value of K if neither of the inputs have changed.

I am hoping this is just the 'debugging' code to find out why we are getting a stale value of K and not something to commit.
Or I am thoroughly confused at this point. :-)

I think you're confused :) I've updated the commit in the review formally, let me know if you have any questions, I think I've detailed it reasonably well.

Except for the K caching piece.

Since there is no signal in the cc module system for notifying the cc for a change in mss, I decided to cache the value at the time of calculating K for purposes of later invalidation.

I took the same approach for max_mss because it was set in many places and I didn't want to trigger more recalculations of K than necessary.

It could be done differently but I'm positive that this method minimizes K calculations. I'm open to better solutions.

use flag instead of cached value for old max_cwnd tracking in cubic

I don't have time to look at this the way I intended. :-)
Removing myself as a reviewer/blocker so others can decide.

sbruno accepted this revision.Mar 22 2018, 11:37 PM

This revision was not accepted when it landed; it landed in state Needs Review.Mar 26 2018, 7:54 PM

Closed by commit rS331567: CC Cubic: fix underflow for cubic_cwnd() (authored by sbruno). · Explain Why

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: imp. · View Herald TranscriptMar 26 2018, 7:54 PM

rscheff added a subscriber: rscheff.Feb 3 2019, 11:26 AM

rscheff mentioned this in D18954: Implement Cubic-specific After-Idle reaction.Feb 3 2019, 6:06 PM