Page MenuHomeFreeBSD

Only rewind erraneous RTO when previous values are truly valid
ClosedPublic

Authored by rscheff on Jan 21 2022, 6:30 AM.
Tags
None
Referenced Files
F103387293: D33979.id101921.diff
Sun, Nov 24, 8:56 AM
F103333534: D33979.id101921.diff
Sat, Nov 23, 4:44 PM
Unknown Object (File)
Sat, Nov 23, 10:58 AM
Unknown Object (File)
Fri, Nov 22, 11:46 AM
Unknown Object (File)
Thu, Nov 21, 8:16 AM
Unknown Object (File)
Sat, Nov 9, 1:34 AM
Unknown Object (File)
Fri, Nov 8, 10:14 PM
Unknown Object (File)
Fri, Nov 8, 9:46 PM

Details

Summary

Under rare circumstances, a spurious retranmission is
incorrectly detected and rewound, messing up various tcpcb values,
which can lead to a panic when SACK is in use.

This was tracked down to happen in this section. This happens
when there has been a retransmission with the previous state
recorded, and on subsequent regular transmissions, TF_PREVVALID
is never cleared.

In the transport call we discussed this, and while this should be
considered as a workaround, the longer term plan is to find in which
codepath the PREVVALID flag is not cleared, and ultimately rely solely
on this flag. Note that this will be a minuscle change in semantics of this
flag, and therefore, it should also be invalideated whenever the TCP
stack changes on a session.
The alternate approach would be to reset TF_PREVVALID whenever
tp->t_rxtshift is updated, such that TF_PREVVALID is only ever
set, when rxtshift is also 1. This would reduce the currently
necessary check for both states.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 44025
Build 40913: arc lint + arc unit

Event Timeline

Looks good to me. Thanks.

sys/netinet/tcp_input.c
1653

As long as you are fixing things, to_tsecr and t_badrxtwin are timestamps not sequence numbers
so though its the same we really should be using

TSTMP_LT

not

SEQ_LT

  • tidying up all the conditionals, where the base stack unwinds RTO use TSTMP_LT, after checking that t_badrxtwin holds a valid value throughout.
  • check for bad retransmits only when TSopt is present in ACK
sys/netinet/tcp_input.c
2889

Is this CC_RTO_ERR handling a DUP of that in line# 1655, given a fact that both looks to be under the same conditions in this change?

sys/netinet/tcp_input.c
2889

I believe the section around #1655 is only relevant for pure ACKs, while here we also process piggy-backed ACKs with data. As calling CC_RTO_ERR resets the PREVVALID flag, is it safe to run over this twice.

If the question was around a more streamlined approach, rather than checking to unwind erraneous RTOs in different places, that would be a change of much larger scope.

sys/netinet/tcp_input.c
2889

On the contrary, the comment just on top of line# 1655 shows "Parse options on any incoming segment.", and my read on line# 1801 shows "This is a pure ack for outstanding data." on top of the non-timestamp case.

If the "Parse options on any incoming segment." place still stands true, and D8556 adds this CC_RTO_ERR handling on any segment with timestamp just below it for years, we may merge all the three places of CC_RTO_ERR handling into a single place as a more streamlined approach.

Thoughts?

  • exclude tsecr == 0, since that likely flags an invalid ts echo return
  • adding Note around the (current) use of the PREVVALID flag
This revision is now accepted and ready to land.Jan 27 2022, 5:11 PM