Page MenuHomeFreeBSD

Add Boundary and Overflow checks in Cubic formulas
ClosedPublic

Authored by rscheff on Feb 8 2019, 2:26 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Mar 25, 5:07 AM
Unknown Object (File)
Mon, Mar 25, 1:03 AM
Unknown Object (File)
Fri, Mar 22, 11:07 PM
Unknown Object (File)
Mon, Mar 18, 9:11 PM
Unknown Object (File)
Feb 21 2024, 11:22 PM
Unknown Object (File)
Feb 16 2024, 2:11 AM
Unknown Object (File)
Feb 8 2024, 8:34 AM
Unknown Object (File)
Jan 17 2024, 3:49 PM

Details

Summary

This Diff is split off from D18954, and deals only with a number
of integer overflow scenarios and missing boundary checks.

Test Plan

To invoke the problem case, the session must have expirienced loss (to
pull down ssthresh to a low value). Thereafter, provided there is no
additional data loss but the session runs application limited for longer
periods (sender only sends a small amount of data, waits for less than
RTO interval, and sends only a few kB every 10s of milliseconds),
the cubic-calculated cwnd may end up negative or beyond INT_MAX.

Some overflows (w_tf) are apparent during low-latency sessions immediately,
other (ticks_since_cong) may only happen after very long periods of time
(~3 weeks) and cubic_cwnd only for large K/max_cwnd values.

Once the sender application starts to provide large amounts of data,
a line-rate traffic burst is unavoidable as cwnd may have grown to many
MB or GB in size.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

While splitting these overflow checks out from the other Diff, I found that there is another typecast overflow, and potential negative overflow in cubic_cwnd.

First, the typecast for leftshifting ticks_since_cong was one bracket too far out to be effective (max ticks_since_cong to overflow after 2^23 milliseconds, or ~2:20hrs)

cwnd can easily become negative, for “large” values of K (correlates to large max_cwnd), and shortly after a cong / after-idle event, without any check for that.

Then the well known overflow on taking the cube of too large values.

Next, the multiplication by smss before the right-shift has potentially higher precision (of an rough estimation of K), but with a probable overflow after 11050 ticks (11 seconds) @ MTU9000.

Finally, a negative return value would be cast to a unsigned int later, thus again leading to an (incorrect) very large effective value for cwnd, so restricting any negative values to zero is the remedy.

sys/netinet/cc/cc_cubic.h
175 ↗(On Diff #53687)

typecast here applies after the leftshift. (2^23 will become -1)

204 ↗(On Diff #53687)

cwnd could still be negative here. The function returns unsigned long, which could typecast this wrong.

  • replace one division with a multiplication, with proper brackets. Taken from D8021

I prefer D18982 to be submitted. If D18982 is submitted, we don't need this patch as D18982 covers everything here.

  • put magic number as #define

I learned that the SCE project group is looking into qualification of TCP Cubic; As R. Grimes is part of that group, adding him as a reviewer, to validate the functional changes to Cubic are sound.

I usually do the comparison separate from formula when they already exist, but either is ok with me.

This revision is now accepted and ready to land.Nov 16 2019, 4:14 AM