Page MenuHomeFreeBSD

nuke sack_newdata
ClosedPublic

Authored by rscheff on Jan 10 2019, 3:17 PM.
Tags
None
Referenced Files
F106632141: D18811.diff
Fri, Jan 3, 1:31 AM
Unknown Object (File)
Wed, Dec 18, 6:39 PM
Unknown Object (File)
Fri, Dec 6, 5:08 PM
Unknown Object (File)
Fri, Dec 6, 5:08 PM
Unknown Object (File)
Fri, Dec 6, 5:08 PM
Unknown Object (File)
Fri, Dec 6, 5:08 PM
Unknown Object (File)
Fri, Dec 6, 4:55 PM
Unknown Object (File)
Oct 4 2024, 9:31 PM

Details

Summary

Combine SACK RecoveryPoint (sack_newdata) with Reno Recover (snd_recover) and remove sack_newdata.

sack_newdata was added during the original commit when SACK support was included in the BSD stack.

Both variables (snd_recover and sack_newdata) track the sequence number of snd_nxt when entering fast recovery, and are called similar in the respective RFCs. For some reason a new variable was added rather than using the existing one. While looking into improving the SACK scoreboard in preparation to implement RFC6675 and RFC6937 i found that more modern SACK code also uses sack_newdata inconsistently (e.g. only for pipe calculation).

A recent change disentangled the kernel TCPCB from access/reference by userspace programs, thus removing this variable (ABI) should be safe to do now.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Received a PM from tuexen. The Q was, why it is safe to nuke sack_newdata and completely replace it everywhere with snd_recover.

I've annotated tcp_input.c and found a few instances already, where code added after the original SACK patch (introducing the fork of the RFC variable "RecoveryPoint") already was using snd_recover for SACK-enabled flows, without bad stuff happening (obviously),but nevertheless inconsistently accessing the "wrong" variable which just happens to track the same thing.

Retaining sack_newdata seems to be pointless. The alternative to getting rid of sack_newdata, as it seems to me, would be to properly seggragate snd_recover to be used only for non SACK-enabled flows, and duplicating the relevant code with sack_newdata for SACK-enabled sessions, and use these two variables consistently depending on which flavor a particular session has.

sys/netinet/tcp_input.c
414 ↗(On Diff #52743)

Recovery Point (RFC) is set in this wrapper irrespective if the session supports SACK or not.

1813 ↗(On Diff #52743)

Move recovery point forward on full ACKs. Just in case?

2599 ↗(On Diff #52743)

cc_cong_signal wrapper already sets snd_recover to snd_nxt. The special handling for SACK sessions below could omit setting snd_recover (which used to be sack_newdata) anew.

2607 ↗(On Diff #52743)

redundant with cc_cong_signal

2608 ↗(On Diff #52743)

could move up before the if clause (same on both branches)

2613 ↗(On Diff #52743)

Could move this assignment up just before the SACK check, and save a small bit of code.

2701 ↗(On Diff #52743)

Here is the point, where the split tracking of sack_newdata and snd_recover got mingled already. Fortunately, snd_recover gets set by the cc_cong_signal wrapper anyway (thus tracks sack_newdata for SACK-enabled sessions). If Reno and SACK had truly independent tracking of the RecoveryPoint variable, this would have failed a long time ago.

2834 ↗(On Diff #52743)

another instance, where sack_newdata was already missed for SACK-enabled sessions

2838–2840 ↗(On Diff #52743)

A prior example, where the sack_newdata was missed as the "proper" variable for RecoveryPoint, and snd_recover used instead.

sys/netinet/tcp_var.h
195 ↗(On Diff #52743)

isn't cc_algo referenced rather often even during in-sequence processing, to warrant it to move to a front cacheline?

212 ↗(On Diff #52743)

Fast Open is a compile-time option; #ifdef for the connected tcpcb variables?

Here is one Patch, where the "wrong" variable name holding recovery has been used for SACK in the past: rS132417

Removal of the variable in a cacheline > 4 is not that critical (to retain the alignment of tcpcb at that place still.)

jtl added inline comments.
sys/netinet/tcp_log_buf.h
35 ↗(On Diff #52743)

You need to bump this if you change the binary format (which this does).

sys/netinet/tcp_var.h
195 ↗(On Diff #52743)

@rrs should comment on this, since he did the cacheline usage analysis.

  • bump tcp_log_buf version
rscheff added a reviewer: rrs. rscheff removed 1 blocking reviewer(s): transport.Jul 29 2019, 6:08 PM

Randall, since you seem to be the most knowledgeable person when it comes to tcpcb optimization, I think this one should be reviewed by you for any potential cache line implications.

Functionally, the removal of this redundant variable will not impact the stack at all when using SACK.

sys/netinet/tcp_var.h
195 ↗(On Diff #52743)

This is probably ok since we are down in cache line 5. However I would like to
run another re-analysis of tcp_var.h as well as tcp_rack.h.. it has been a while
and things tend to drift around sometimes :)

Be worth seeing if we can gain a few cycles :D

On the question of validation - in the three places where sack_newdata is currently used (1x tcp_do_segment, 1x tcp_output, 1x tcp_sack_partialack, I placed KASSERTS(sack_newdata == snd_recover) and run the dsack and sack test cases from the packetdrill suite.

As previously found (and manually validated for various SACK loss recovery episodes), the two variables always hold the very same value and sack_newdata is fully redundant.

[root@freebsd ~/tcp-testsuite-master/tcp-testsuite/rcv-data-segments]# ./run-all-rcv-data-segments-tests

Name Verdict

overlapping-050-075 PASSED
overlapping-050-100 PASSED
:

reordering-54321 PASSED

Summary: Number of tests run: 216

Number of tests passed:    216
Number of tests failed:      0
Number of tests timed out:   0
Number of tests skipped:     0
diff --git a/sys/netinet/tcp_input.c b/sys/netinet/tcp_input.c
index e8a281878cf..f10fba8031a 100644
--- a/sys/netinet/tcp_input.c
+++ b/sys/netinet/tcp_input.c
@@ -2561,6 +2561,9 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, struct socket *so,
                                                TCPSTAT_INC(
                                                    tcps_sack_recovery_episode);
                                                tp->sack_newdata = tp->snd_nxt;
+                                               KASSERT(tp->sack_newdata == tp->snd_recover,
+                                                   ("%s: sack_newdata (%d) != snd_recover (%d)",
+                                                   __func__, tp->sack_newdata, tp->snd_recover));
                                                tp->snd_cwnd = maxseg;
                                                (void) tp->t_fb->tfb_tcp_output(tp);
                                                goto drop;
diff --git a/sys/netinet/tcp_output.c b/sys/netinet/tcp_output.c
index 1fc8e9ace04..d4a2b7c9d8e 100644
--- a/sys/netinet/tcp_output.c
+++ b/sys/netinet/tcp_output.c
@@ -418,6 +418,9 @@ tcp_output(struct tcpcb *tp)
                         * of len is bungled by the optimizer.
                         */
                        if (len > 0) {
+                               KASSERT(tp->sack_newdata == tp->snd_recover,
+                                   ("%s: sack_newdata (%d) != snd_recover (%d)",
+                                   __func__, tp->sack_newdata, tp->snd_recover));
                                cwin = tp->snd_cwnd -
                                        (tp->snd_nxt - tp->sack_newdata) -
                                        sack_bytes_rxmt;
diff --git a/sys/netinet/tcp_sack.c b/sys/netinet/tcp_sack.c
index 89345bb70cf..856619993ee 100644
--- a/sys/netinet/tcp_sack.c
+++ b/sys/netinet/tcp_sack.c
@@ -779,6 +779,9 @@ tcp_sack_partialack(struct tcpcb *tp, struct tcphdr *th)
        /* Send one or 2 segments based on how much new data was acked. */
        if ((BYTES_THIS_ACK(tp, th) / tp->t_maxseg) >= 2)
                num_segs = 2;
+       KASSERT(tp->sack_newdata == tp->snd_recover,
+           ("%s: sack_newdata (%d) != snd_recover (%d)",
+           __func__, tp->sack_newdata, tp->snd_recover));
        tp->snd_cwnd = (tp->sackhint.sack_bytes_rexmit +
            (tp->snd_nxt - tp->sack_newdata) + num_segs * tp->t_maxseg);
        if (tp->snd_cwnd > tp->snd_ssthresh)
sys/netinet/tcp_var.h
195 ↗(On Diff #52743)

So, to move this patch forward, should I simply put a placeholder in that slot, so that subsequent data structures don't get moved around?

cddl/lib/libdtrace/tcp.d
247 ↗(On Diff #56695)

this structure is in dtrace - but it appears to be an unstable interface as it tracks tcpcb directly. Shall this be removed altogether instead?

This revision is now accepted and ready to land.Feb 13 2020, 3:14 PM