Page MenuHomeFreeBSD

tcp: refactor cwnd during SACK transmissions and enable TSO
ClosedPublic

Authored by rscheff on Jan 16 2024, 10:10 PM.
Tags
None
Referenced Files
F103126479: D43470.id.diff
Thu, Nov 21, 8:43 AM
Unknown Object (File)
Wed, Nov 13, 5:01 PM
Unknown Object (File)
Thu, Nov 7, 6:46 PM
Unknown Object (File)
Thu, Nov 7, 6:46 PM
Unknown Object (File)
Thu, Nov 7, 2:07 AM
Unknown Object (File)
Thu, Oct 24, 7:35 PM
Unknown Object (File)
Thu, Oct 24, 2:52 AM
Unknown Object (File)
Oct 18 2024, 6:15 AM
Subscribers

Details

Summary

Since the dawn of the SACK feature, TSO was never enabled
during SACK loss recovery episodes. With the use of
LRO / GRO leading to ACK compression (partial ACKs and
SACK blocks stepping by more than a singular segment),
and also features like PRR which allow increasing the
cwnd in the final phase of loss recovery (SSRB), this
frequently results in multiple passes of the
tcp_default_output() codepath.

It turns out that previously, the SACK transmission path would
only really work as the restriction to 1 MSS per call to tcp_output()
work, but allowing more than 1 MSS could result in invalid transmission
selection.

Refactoring of cwnd and moving the adjustment for SACKed data into
tcp_output() - cwnd tracking the maximum extend from snd_una - allows
both SACK loss recovery as well as SACK transmissions after RTO during
slow start.

Test Plan

On a TSO enabled NIC, with the receiver also using TSO and
LRO, verify that TSO chunks of more than 1 SMSS show up in a
packet capture. Also, when taking an RTO during a SACK loss
recovery episode, the flight size increases as per SlowStart and
Congestion Avoidance while excluding the re-transmission of
previously SACKed data.
Note that additional losses in that phase result in very small
congestion windows (since cwnd is reduced multiple times),
and entering a loss recovery then can result in lengthy
recovery until snd_max is finally reached.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 55532
Build 52421: arc lint + arc unit

Event Timeline

  • change cwnd during SACK LR to flightsize
  • put cwnd inflation during classic retransmissions into tcp_output()
  • properly calculate available data during SACK transmissions
  • allow cwnd adjustments by more than MSS (LRO/TSO side effect)
  • reset scoreboard when entering LR after RTO
  • various minor style changes
rscheff retitled this revision from tcp: allow tso for sack rexmits, limited at 4 mss to tcp: refactor cwnd during SACK transmissions and enable TSO.Jan 21 2024, 6:24 PM

With this change, cwnd will reflect the expected flightsize also during SACK transmissions.
This allows the proper operation of skipping previously SACKed data after an RTO, and also the use of TSO while retransmitting.
The SACK related cwin calculation previously seems to only have worked as only a single MSS per tcp_output() call could be sent.
Keep sack_rxmit set when an attempt was made to send from the scoreboard - this prevents a longer codepath before returning. If the snd_cwnd would allow for more than the current hole to be sent, we can cycle back by specifically setting sendalot only in those cases.

While there, also do some various style changes and readability enhancements.

Is it possible to split this up? At least separate the style changes?

Is it possible to split this up? At least separate the style changes?

This includes changes in
D43520
D43447
and D43355.

I'll extract the pure style changes with no functional impact as a separate one, but want to keep this homogeneous for testing.

rscheff edited the test plan for this revision. (Show Details)
  • (int32_t)min(a,b) is still not the same as imin(a,b)
  • delta to D43355_sack_after_rto
This revision is now accepted and ready to land.Apr 4 2024, 12:47 PM
This revision now requires review to proceed.Oct 11 2024, 12:41 PM

Because of commit 440f4ba18e3a, please re-base.

sys/netinet/tcp_input.c
2598–2605

For readability, please remove the parentheses () for Relational operators, which have higher precedence than Logical operator. Also re-align.

2626–2628

Re-align.

2805–2810

For readability, please remove the parentheses () for Relational operators, which have higher precedence than Logical operator.

sys/netinet/tcp_output.c
273–278

For readability, please remove the parentheses () for Relational operators, which have higher precedence than Logical operator.

558–566

For readability, please remove the parentheses () for Relational operators, which have higher precedence than Logical operator. Also re-align.

cc requested changes to this revision.Oct 11 2024, 12:46 PM

Need code update.

This revision now requires changes to proceed.Oct 11 2024, 12:46 PM
In D43470#1072656, @cc wrote:

Because of commit 440f4ba18e3a, please re-base.

This is exactly the reason for the new diff ;)

git rebase main

Current branch D43470_sacktso is up to date.

commit 02fbfd835eb70d1081a8918acb5cb554afde1fb7 (HEAD -> D43470_sacktso)

sys/netinet/tcp_input.c
2598–2605

See style(9). Not everyone knows the exact precedence for all operators. There are many examples of extra (not strictly needed by the compiler) parenthesis, for ease of readability.
Also, a half-tab is the 2nd level indentation, with all subsequent clauses aligning to this.
For readability, I tend to have logical operators at the end of a line, so that the individual clauses are easier to see and not get drowned.

2626–2628

There are plenty of examples where the used indentation (full tab plus half-tab) is used for "} else if (foo)\n(bar)" constructs.

between

<------><------><------><------><------>} else if (tcp_is_sack_recovery(tp, &to) &&
<------><------><------><------><------><------>    IN_FASTRECOVERY(tp->t_flags) &&
<------><------><------><------><------><------>    SEQ_LT(tp->snd_nxt, tp->snd_max)) {

and

<------><------><------><------><------>} else if (tcp_is_sack_recovery(tp, &to) &&
<------><------><------><------><------>    IN_FASTRECOVERY(tp->t_flags) &&
<------><------><------><------><------>    SEQ_LT(tp->snd_nxt, tp->snd_max)) {

I think the former one is more readable.

sys/netinet/tcp_input.c
2717

This line should not be changed. A missing { caused compile error.

rscheff marked 2 inline comments as done.
  • fix typo after rebase
sys/netinet/tcp_output.c
558–560

Let's keep a minimum change. If the priority of (len > tp->t_maxseg) is higher than others, just need to change this line, and leave the rest of the lines as they were. The rest of the changes are insignificant and can be left for future non-functional refresh.

566–570

Same here. Don't change anything on this line.

614–616

Unnecessary change, please leave the pair of {} as it was.

1637–1647

Duplicated code.

  • remove unnecessary changes
rscheff added inline comments.
sys/netinet/tcp_output.c
558–566

Realign? remember, the linter in upstream is not the one you are used to; and I like "bool" checks (TF_xx and TF2_xx) without fluff (!(foo) instead of (foo) != 0 - which I believe is covered by style(9). And let's agree to disagree what readability with extra parenthesis means - both ways seem to be ok by style; I like to make the precedence explict, and not rely on the implicit precedence ordering of operators which may be different between different languages.

1637–1647

Thanks. Seems when i split the original patch in two, I missed the comment - restoring that here.

sys/netinet/tcp_output.c
302

Type of cwin should be int32_t, as it will be compared with other signed variables.

327–331

For readability, there is no need to combine the two lines into one.

334–338

Same here. For readability, there is no need to combine the two lines into one.

340–355

Can be simpler.

417

This line is not a necessary change.

418

Is imin() a better choice for consistence?

558–566

I think the result of a bitwise (e.g.&, |, ^, ~) operation is not boolean, although you may think C treats non-zero values as true and zero as false in conditional expressions like if, while, etc.

So style(9) still applies. :)

Do not use ! for tests unless it is a boolean // from style(9)

int result = 5 & 5;  // 5 is 0101
// Result is 0101 (5 in decimal)

Talking about this block of code, we can just leave as it was (don't change), as it is insignificant/non-functional change, and not related with the patch subject.

sys/netinet/tcp_sack.c
951–958

Looks to be no-op. Can we just add the comment without adding the no-op code? And please proof-reading the comment.

rscheff marked an inline comment as done.
  • implement suggested improvements

Better now. But it can be cleaner.

sys/netinet/tcp_output.c
417

This line is still not reverted.

418

Either revert this line or use imin for consistence of the imin in the else clause as below.

len = imin(sbavail(&so->so_snd), sendwin) - off;

558–562

This block change is still not reverted.

  • remove unneccesary style and line wrap changes

OK. I am approving it now as my test in https://wiki.freebsd.org/chengcui/testD43470 shows some improvement. Any bug related observations can be fixed later.

This revision is now accepted and ready to land.Mon, Oct 28, 3:42 PM