Page MenuHomeFreeBSD

Only rewind erraneous RTO when previous values are truly valid
ClosedPublic

Authored by rscheff on Jan 21 2022, 6:30 AM.

Details

Summary

Under rare circumstances, a spurious retranmission is
incorrectly detected and rewound, messing up various tcpcb values,
which can lead to a panic when SACK is in use.

This was tracked down to happen in this section. This happens
when there has been a retransmission with the previous state
recorded, and on subsequent regular transmissions, TF_PREVVALID
is never cleared.

In the transport call we discussed this, and while this should be
considered as a workaround, the longer term plan is to find in which
codepath the PREVVALID flag is not cleared, and ultimately rely solely
on this flag. Note that this will be a minuscle change in semantics of this
flag, and therefore, it should also be invalideated whenever the TCP
stack changes on a session.
The alternate approach would be to reset TF_PREVVALID whenever
tp->t_rxtshift is updated, such that TF_PREVVALID is only ever
set, when rxtshift is also 1. This would reduce the currently
necessary check for both states.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

sys/netinet/tcp_input.c
1653

As long as you are fixing things, to_tsecr and t_badrxtwin are timestamps not sequence numbers
so though its the same we really should be using

TSTMP_LT

not

SEQ_LT

  • tidying up all the conditionals, where the base stack unwinds RTO use TSTMP_LT, after checking that t_badrxtwin holds a valid value throughout.
  • check for bad retransmits only when TSopt is present in ACK
sys/netinet/tcp_input.c
2889

Is this CC_RTO_ERR handling a DUP of that in line# 1655, given a fact that both looks to be under the same conditions in this change?

sys/netinet/tcp_input.c
2889

I believe the section around #1655 is only relevant for pure ACKs, while here we also process piggy-backed ACKs with data. As calling CC_RTO_ERR resets the PREVVALID flag, is it safe to run over this twice.

If the question was around a more streamlined approach, rather than checking to unwind erraneous RTOs in different places, that would be a change of much larger scope.

sys/netinet/tcp_input.c
2889

On the contrary, the comment just on top of line# 1655 shows "Parse options on any incoming segment.", and my read on line# 1801 shows "This is a pure ack for outstanding data." on top of the non-timestamp case.

If the "Parse options on any incoming segment." place still stands true, and D8556 adds this CC_RTO_ERR handling on any segment with timestamp just below it for years, we may merge all the three places of CC_RTO_ERR handling into a single place as a more streamlined approach.

Thoughts?

  • exclude tsecr == 0, since that likely flags an invalid ts echo return
  • adding Note around the (current) use of the PREVVALID flag
This revision is now accepted and ready to land.Jan 27 2022, 5:11 PM