Randall and I have been poking at different ways to improve FreeBSD
tcp's reaction to loss. One of the major issue we found is that we do
not use information provided by SACK efficiently even though we do keep
the SACK scoreboard well in shape. Knowing amount of data in flight can
be really crucial and can help us use available capacity of the path
more efficiently. We currently do not have an accurate way of knowing
this information.
For example, inside tcp_do_segment(), while processing duplicate acks,
we try to compute amount of data inflight with:
awnd = (tp->snd_nxt - tp->snd_fack) + tp->sackhint.sack_bytes_rexmit;
Which is incorrect as it doesn't take into account whats been already
sacked by the receiver.
There are definitely other places in the stack where we do this
incorrectly.
RFC 6675 provides guidance on how to implement calculations for
bytes in flight at any point in time. Randall and I came to a conclusion
that following can provide us inflight information almost(!) accurately
with least amount of code changes:
pipe = snd_max - snd_una - sackhint.sacked_bytes + sackhint.sack_bytes_rexmit;
here,
snd_max: highest sequence number sent
snd_una: lowest sequence number sent but not yet cumulatively acked
sacked_bytes: total bytes sacked by receiver reported via SACK holes
sack_bytes_rexmit: total bytes retransmitted from holes in this recovery
period
Only missing piece in FreeBSD is sackd_bytes. This is basically total
bytes sacked by the receiver and it can be extracted from SACK holes
reported by the receiver. The approach we've decided to take is pretty
simple: we already process each ACK with sack holes in
tcp_sack_doack() and extract sack blocks out of it. We'd now also track
this new variable there which keeps track of total sacked bytes
reported.
The downside with this approach is:
There is no persistent information about sacked bytes. We recalculate
it every time we get an ACK with sack holes in it. So if, for any
reason, receiver decides to drop sack info than we get incorrect
value for inflight. This may be also true when there are more holes but
receiver can only report 3 at a time.