It was observed, that in very congested networks (multiple CE marks
per RTT, for multiple (>10) windows (RTTs), cwnd would collapse down to 1 byte
Furthermore, the stack would start transmitting only using the persist timer,
which has another implication that the newly sent data is erraneously sent out
without the ECT0 codepoint.
Apparently, some (linux?) clients do not process the TCP ECN header flags, unless
the IP ECN codepoint is either ECT0, ECT1 or CE. Thus the slowly sent out new
data with IP ECN codepoint "Not ECT", and TCP ECN header flag "CWR" does not
unlatch the clients ECE flag, resulting in a deadlock where cwnd remains for
lengthy periods (~5 min) at 1 byte, before the receiver eventually clears ECE
and normal cwnd processing can resume.
In short, when cubic cc enters a new round of CC_ECN, the function
cubic_update_ssthresh effectively sets cwnd. But that function does NOT have
a lower bound. Thus cwnd can actually become as small as 1 Byte.
This conditional in tcp_output here:
!((tp->t_flags & TF_FORCEDATA) && len == 1))
should probably prevent RTO- or Persist Timer-triggered transmissions
(window probes (?) etc) from having the ECT0 codepoint set.
I suspect it wouldn't be necessary to fix that conditional though, if
cwnd is prevented from collapsing to less than 2 bytes...
Finally, there was an oversight with a (typecasting in kernel) max
instead of ulmax assignment, where cwnd was also not properly checked
for a lower bound.