This will pave the way of setting ssthresh differently in TCP CUBIC, according
to RFC8312 section 4.7.
This is the first patch of two consecutive patches on CUBIC to improve
reaction to (and rollback from) RTO.
Differential D26807
move cwnd and ssthresh updates into individual congestion control module cc on Oct 15 2020, 8:42 PM. Authored by Tags None Referenced Files
Details This will pave the way of setting ssthresh differently in TCP CUBIC, according This is the first patch of two consecutive patches on CUBIC to improve No functional change, only code movement.
Diff Detail
Event Timeline
Comment Actions Hi Lawrence, incrementing by t_maxseg (w/o TSopt) vs. tcp_maxseg() (w/ TSopt/MPTCP) does not lead to a unexpected transmission of 2 segments, when only a single segment would be expected (tcp_output prefers to send full segments but considers option overhead). Incrementing by tcp_maxseg() is sightly less aggressive during CA, not? Comment Actions Allo! Apologies for the delayed reaction... First up, it seems buggy to me that the sender side accounts for received SACK blocks in the "effective" MSS computed by tcp_maxseg() for outbound segments. (Not your bug, but your change amplifies the effect of the bug). I'll try ground the remainder of my thoughts with my interpretation of the relevant "first principles" and practical considerations... The congestion window maintained at the sender attempts to track an estimate of pipe capacity, but does so without knowledge of whether the capacity constraint applies at a particular TCP/IP layer or includes/excludes certain sources of non-application-payload overhead. TCPs typically take the shortcut of approximating pipe capacity purely in terms of TCP segment payload, but most IP paths I'm aware of typically track bits in terms of IP-payload i.e. it is likely more correct for most if not all paths to consider the congestion window as representing pipe capacity inclusive of all TCP segment overhead (fixed header and options). Doing so ensures that for a given congestion window, connections with different TCP segment sizes account for their sent TCP-layer bits the same. Also worth noting that paths often account based on number of packets in addition to or instead of bits i.e. tracking congestion window in terms of both bits and number of segments is potentially of value. So, in a high level sense, accounting for TCP option bits in a given send window worth of data is both justified and arguably more correct. However, using tcp_maxseg() in congestion control accounting effectively moves us further away from this and towards accounting purely based on payload bits. I also have a concern that tcp_maxseg() pessimises calculations for connections which use TCP options that do not appear on every segment (SACK being the most common case). It's a rounding error and unlikely to be of concern for sufficiently large send windows, but its a different story at the small window end of things, and even more so for connections using a smaller than typical MSS. Comment Actions Not quite sure I get your point here; are you saying the tcp_maxseg() is erraneously including received SACK blocks fom the previous ACK? Maybe you can send an annotated code snippet per direct mail?
I agree with you on the first principles approach. We have had this kind of discussion @IETF during reECN (ConEx), and that many don't apprecitate this subtle difference was a major hurdle in the process to get tracktion to AccECN (why there is a higher fidelity bytes-based signal to begin with, and additionally, shall it include vaious header bytes or not). The concrete issue addressed here is, that during loss recovery, incrementing cwnd by t_maxseg, when TSopt is in use, leaves 12 payload bytes "unused" in a transmission opportunity, which may be sent out right away. The error of transmitting two segments (with much more header- and potentially ethernet padding bytes, than 12 bytes of payload certainly makes it more attractive to NOT incorrectly included these header option bytes. Revamping cwnd to include various headers, some of which are not known to the sender at transmission time, is certainly beyond the scope of a simply bugfix. (E.g. TCP over IP over GRE over IPsec over VxLAN over MPLS may be a valid chain of headers in the middle of a path, where congestion occurs. Most of these technologies try to be as transparent to an endhost as possible, though...)
IIRC, tcp_maxseg() should take into account those options, which would be present "currently". As you note, SACK options would be the only one such TCP option, that FreeBSD would be using. However, once we tackle this issue or other options which are only infrequently present (e.g. MP-TCP options, AccECN option, AckCC (?) ) we can address this aspect then. Comment Actions It keys off tp->rcv_numsacks per the implementation in tcp_subr.c. I'm pretty sure this is an unintentional bug in the implementation of tcp_maxseg().
Agreed, the broader discussion is not specific to this changeset (probably should move it to freebsd-transport@) but I wanted to touch on some of my thinking to provide some background for my concerns. What is directly relevant to this changeset is the subtle but potentially impactful change to newreno's cwnd accounting by switching to using tcp_maxseg() in newreno_cong_signal(), which I flagged previously because it is a functional change which also introduces inconsistency within the module given that t_maxseg is still used elsewhere. FYI, I opted to exclude this changeset from our internal Netflix tree until we can either quantify the effects or figure out a different way of addressing the underlying problem being addressed, given my concern that this may pessimise connections with small send windows. That's what prompted me to circle back to this review to better understand the underlying issue being addressed by the change and to publicly flag my concern that there is in fact a subtle functional change being introduced here.
Ah, I see. My initial reaction to such an issue is that we should consider addressing by tuning the runt transmission avoidance checks in tcp_output() rather than marginally gaming cwnd, which may still not prevent a runt transmission if, for example, the effective MSS varied over the preceding window of data for some reason.
Sure, IMO the only reasonable thing TCP can do is fully account for its bits to most accurately derive its pipe capacity estimate.
It attempts to do so, but crudely. It doesn't track the actual option-related bits per send, doesn't keep a rolling tally per send window or over any other timescale, nor are any retrospective adjustments made if the cumulative effect of slightly incorrect tcp_maxseg()-based adjustments begins to add up. Comment Actions The above discussion about tcp_maxseg() has been closed in the subjected email thread: https://lists.freebsd.org/pipermail/freebsd-transport/2020-November/000355.html |