Page MenuHomeFreeBSD

Prevent premature zeroing of a scaled receiver window
ClosedPublic

Authored by rscheff on Apr 20 2020, 7:12 PM.

Details

Summary

When window scaling is in effect, rounding down the
scaled down receive window (which a right shift effectively
does) can lead to deadlock situations with
certain (older) linux clients, which will start sending
window probes while reneging the sequence number
below our acceptable sequence number.

In transactional scenarios the sender, while still transmitting
new data, may have shrunk or even zeroed its own window.
Due to the scaled window granularity, and the reneging
of the sequence number, an eventual window update
would be ignored by the FreeBSD stack since that sequence
number is below the acceptable range.

An earlier approach tried internally was to unconditionally
roundup2 only the receive window to be sent out. However, as
the rounded up value was not tracked also in tp->rcv_adv and
received data segment may fall beyond the window allowed by
tp->rcv_adv, this was lead to a loss in performance.

Further testing showed, that retaining the rounded-up value and
updating tp->rcv_adv using the value the sender would receive
addresses these performance concerns completely.

Reported-by: Cui Cheng (Aug 2016)

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

  • and need to convert to network order obviously.

That same fix should be applied to BBR and RACK. Then I'm fine with it.

sys/netinet/tcp_output.c
1242 ↗(On Diff #70813)

Conceptually it would be th->th_win == htons(0), since the filed is in network byte order.

  • add fix to bbr and rack too
  • address a more general issue with window retractions due to scaling granularity. also, simply sending out the (scaled) window indicating there is space left in the receive socket buffer was not properly tracked in tp->rcv_adv. rounding up recwin prior to the scale down and sending out will address all these issues.
rscheff added inline comments.
sys/netinet/tcp_output.c
1242 ↗(On Diff #70813)

reworked the patch, to address a wider range of issues associated with window scaling granularity effects.

So you are proposing the solution described in the second paragraph of the summary?

I would suggest to add a comment why overestimating is preferred to underestimate the receive window
when using a granularity larger than one, especially if are accepting a loss in performance...

So you are proposing the solution described in the second paragraph of the summary?

I would suggest to add a comment why overestimating is preferred to underestimate the receive window
when using a granularity larger than one, especially if are accepting a loss in performance...

Sorry, the original description of this Diff was based off the initial internal bug report and fix, which I mixed up. The final fix - roundup2 rcvwnd and retaining this to adjust tp->rcv_adv eventually - has no performance implications other than addressing the deadlock situation with older / buggy TCP stacks which could end up reneging their Seq# when the signalled scaled window decreases. Rounding up in this manner will result that the right edge of the receive window will not decrease in the critical case, allevating the reneging case.

this change tracks this linux rcvwnd roundup patch, and the issue was initially mentioned to transport@ in 2016 (but without a fix at the time), while the probelatic renegign of the seq# was addressed much later in this Linux Patch.

So you are proposing the solution described in the second paragraph of the summary?

I would suggest to add a comment why overestimating is preferred to underestimate the receive window
when using a granularity larger than one, especially if are accepting a loss in performance...

Sorry, the original description of this Diff was based off the initial internal bug report and fix, which I mixed up. The final fix - roundup2 rcvwnd and retaining this to adjust tp->rcv_adv eventually - has no performance implications other than addressing the deadlock situation with older / buggy TCP stacks which could end up reneging their Seq# when the signalled scaled window decreases. Rounding up in this manner will result that the right edge of the receive window will not decrease in the critical case, allevating the reneging case.

OK, now I understand it. I agree. Could you add a comment stating that rounding up avoids shrinking of the window, which is not allowed. Then I'm happy to approve...

  • adding comment to explain why recwin is rounded up
This revision is now accepted and ready to land.Apr 27 2020, 7:59 PM