Page MenuHomeFreeBSD

Retire t_maxopd, leave t_maxseg.
ClosedPublic

Authored by glebius on Sep 7 2015, 8:24 PM.

Details

Summary

Historically, t_maxseg was the value that store a value of:

min(PeerOfferedMSS, MTU - minimum protocol header)

With T/TCP there appeared t_maxopd. Later, T/TCP was removed,
but t_maxopd stayed in place. After some permutations, the
results was:

t_maxopd stores min(PeerOfferedMSS, MTU - minimum protocol header)
t_maxseg stores (t_maxopd - optlen)

Note, that it is impossible to store (t_maxopd - optlen), since optlen
is calculated per-packet. Really, the following invariant is held:

t_maxseg == t_maxopd - TCPOLEN_TSTAMP_APPA

It was violated only in very rare cases, like manually set MSS via
TCP_MAXSEG setsockopt(). Btw, the setsockopt is broken now.

So, in crucial code t_maxopd took the place of t_maxseg. The t_maxseg
is still in use, but in code that can recover from deviations of the
value: cwnd or ssthreash calculations, statistics.

The patch removes t_maxopd, and restores old notion to t_maxseg - bare
offered MSS (or MTU product), that doesn't take option length into
account.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

glebius retitled this revision from to Retire t_maxopd, leave t_maxseg..
glebius updated this object.
glebius edited the test plan for this revision. (Show Details)
glebius added a subscriber: network.
sys/netinet/tcp_input.c
496 ↗(On Diff #8564)

As Patrick already noticed, this check is suspicious: it should take options length into account.

I'd say that situation is even worse. The t_maxseg is the peer's MSS, not ours. And this check should be done against our MSS, which we actually do not store anywhere.

So, let's leave this problem as is (the patch doesn't change it!) to the LRO maintainers.

sys/netinet/tcp_output.c
877 ↗(On Diff #8564)

As Patrick noticed, ipoptlen should probably be also subtracted here. For now, my suggestion is to leave problem untouched, to avoid mixing of different changes in one patch.

Restore sanity check, and add XXX comment.

Any comments on how this was tested?

For the crucial code, that calculates actual maximum segment size to be sent, the change isn't functional. The crucial code already used t_maxopd, which holds the correct value. Now it is substituted by t_maxseg, which the after patch stores the same value as t_maxopd used before.

The functional change happens in the code, that you won't see in the diff: the code that uses t_maxseg. So, with patch it will use another value. As said, the code that kept using t_maxseg isn't crucial, it recovers from deviations. E.g. I can add (- random(10)) to cwnd or sshthresh and TCP would still be working. The performance could change however. It even can improve for certain loss/RTT scenario and decrease for another scenario at the same time.

So, how a proper testing can be done is a very difficult question here.

My point is that original BSD stack and modern NetBSD uses unreduced pure offered MSS in these calculations.

Modern RFCs say that cwnd calculation should utilize the "fixed" MSS, see RFC5681, Definitions. So, the current code is closer to a correct behaviour. Looks like I need to dig deeper in the problem.

Btw, OpenBSD also tracks maxopd and maxseg separately, and also they recalculate cwnd when maxseg changes.

Gleb, if we real world testing is needed we'd be happy to test it in our environment.

Modern RFCs say that cwnd calculation should utilize the "fixed" MSS, see RFC5681, Definitions. So, the current code is closer to a correct behaviour. Looks like I need to dig deeper in the problem.

Can you please point to the exact place where RFC is talking about this? I am missing the "fixed" part you are talking about.

Btw, OpenBSD also tracks maxopd and maxseg separately, and also they recalculate cwnd when maxseg changes.

Interesting. At every stage of cwnd growth (upon receiving acks), it get's increased by maxseg (and a bunch of other things). What is the need for this explicit cwnd adjustment? It'd be great if you can point to the code doing this in openbsd.

jch added a subscriber: jch.
np requested changes to this revision.Sep 16 2015, 3:16 PM
np added a reviewer: np.
np added a subscriber: np.

t_maxopd is used in sys/dev/cxgbe/tom/t4_cpl_io.c too (in assign_rxopt). Does GENERIC build with the proposed patch?

Hiren, here in RFC5681, last sentence:

SENDER MAXIMUM SEGMENT SIZE (SMSS): The SMSS is the size of the
   largest segment that the sender can transmit.  This value can be
   based on the maximum transmission unit of the network, the path
   MTU discovery [RFC1191, RFC4821] algorithm, RMSS (see next item),
   or other factors.  The size does not include the TCP/IP headers
   and options.
glebius edited edge metadata.

Description for updated version.

  • t_maxopd retired.
  • t_maxseg stores what maxopd used to store: (MTU - minimum protocol header).

The sending part hasn't changed. It still calculates effective MSS
via tcp_dooptions().

However, MSS used in cwnd and ssthresh calculation is now more precise.
Before, it was (MSS - (timestamps running ? timestamp len : 0)).
Now, a function tcp_maxseg() is provided. It is a lightweight version
of tcp_dooptions(). Its result is used in places that used t_maxseg
before.

@hiren, your opinion on the patch much appreciated.
@stas, you said you were interested in testing.
@np, review for your drivers would be appreciated.

The cxgb and cxgbe TOM changes look good to me.

jtl added a reviewer: jtl.
jtl added a subscriber: jtl.

Thanks Gleb! This looks like a positive change. See two minor in-line comments.

sys/netinet/tcp_timer.c
685 ↗(On Diff #11154)

This looks broken (not by your commit; it looks like a pre-existing bug). This code saves t_pmtud_saved_maxseg each time we try a lower value. This means that t_pmtud_saved_maxseg will not contain the original MSS value if we ever try to restore it.

Yet another bug we should fix in a different commit.

sys/netinet/tcp_var.h
137 ↗(On Diff #11154)

To keep the structure size unchanged, you'll need to add another u_int padding variable to replace this removed variable.

sys/netinet/tcp_var.h
137 ↗(On Diff #11154)

We need to keep the size only for stable branches. We usually keep a bunch of spares in many structures in head as well, simply because there is no mechanism to insert spares everywhere when we branch a new stable.

This revision was automatically updated to reflect the committed changes.

It seems these changes also need to be applied to the modular 'fastpath' TCP stack, as it still tries to access the t_maxopd member