Page MenuHomeFreeBSD

Add Boundary and Overflow checks in Cubic formulas
ClosedPublic

Authored by rscheff on Feb 8 2019, 2:26 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 25, 9:01 AM
Unknown Object (File)
Sun, Jan 19, 6:44 PM
Unknown Object (File)
Thu, Jan 9, 1:39 AM
Unknown Object (File)
Wed, Jan 8, 4:26 AM
Unknown Object (File)
Dec 22 2024, 6:49 AM
Unknown Object (File)
Dec 22 2024, 6:45 AM
Unknown Object (File)
Dec 21 2024, 4:22 PM
Unknown Object (File)
Dec 21 2024, 4:22 PM

Details

Summary

This Diff is split off from D18954, and deals only with a number
of integer overflow scenarios and missing boundary checks.

Test Plan

To invoke the problem case, the session must have expirienced loss (to
pull down ssthresh to a low value). Thereafter, provided there is no
additional data loss but the session runs application limited for longer
periods (sender only sends a small amount of data, waits for less than
RTO interval, and sends only a few kB every 10s of milliseconds),
the cubic-calculated cwnd may end up negative or beyond INT_MAX.

Some overflows (w_tf) are apparent during low-latency sessions immediately,
other (ticks_since_cong) may only happen after very long periods of time
(~3 weeks) and cubic_cwnd only for large K/max_cwnd values.

Once the sender application starts to provide large amounts of data,
a line-rate traffic burst is unavoidable as cwnd may have grown to many
MB or GB in size.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 22432
Build 21589: arc lint + arc unit

Event Timeline

While splitting these overflow checks out from the other Diff, I found that there is another typecast overflow, and potential negative overflow in cubic_cwnd.

First, the typecast for leftshifting ticks_since_cong was one bracket too far out to be effective (max ticks_since_cong to overflow after 2^23 milliseconds, or ~2:20hrs)

cwnd can easily become negative, for “large” values of K (correlates to large max_cwnd), and shortly after a cong / after-idle event, without any check for that.

Then the well known overflow on taking the cube of too large values.

Next, the multiplication by smss before the right-shift has potentially higher precision (of an rough estimation of K), but with a probable overflow after 11050 ticks (11 seconds) @ MTU9000.

Finally, a negative return value would be cast to a unsigned int later, thus again leading to an (incorrect) very large effective value for cwnd, so restricting any negative values to zero is the remedy.

sys/netinet/cc/cc_cubic.h
175

typecast here applies after the leftshift. (2^23 will become -1)

204

cwnd could still be negative here. The function returns unsigned long, which could typecast this wrong.

  • replace one division with a multiplication, with proper brackets. Taken from D8021

I prefer D18982 to be submitted. If D18982 is submitted, we don't need this patch as D18982 covers everything here.

  • put magic number as #define

I learned that the SCE project group is looking into qualification of TCP Cubic; As R. Grimes is part of that group, adding him as a reviewer, to validate the functional changes to Cubic are sound.

I usually do the comparison separate from formula when they already exist, but either is ok with me.

This revision is now accepted and ready to land.Nov 16 2019, 4:14 AM