Page MenuHomeFreeBSD

Retire t_maxopd, leave t_maxseg.
ClosedPublic

Authored by glebius on Sep 7 2015, 8:24 PM.

Details

Summary

Historically, t_maxseg was the value that store a value of:

min(PeerOfferedMSS, MTU - minimum protocol header)

With T/TCP there appeared t_maxopd. Later, T/TCP was removed,
but t_maxopd stayed in place. After some permutations, the
results was:

t_maxopd stores min(PeerOfferedMSS, MTU - minimum protocol header)
t_maxseg stores (t_maxopd - optlen)

Note, that it is impossible to store (t_maxopd - optlen), since optlen
is calculated per-packet. Really, the following invariant is held:

t_maxseg == t_maxopd - TCPOLEN_TSTAMP_APPA

It was violated only in very rare cases, like manually set MSS via
TCP_MAXSEG setsockopt(). Btw, the setsockopt is broken now.

So, in crucial code t_maxopd took the place of t_maxseg. The t_maxseg
is still in use, but in code that can recover from deviations of the
value: cwnd or ssthreash calculations, statistics.

The patch removes t_maxopd, and restores old notion to t_maxseg - bare
offered MSS (or MTU product), that doesn't take option length into
account.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

glebius updated this revision to Diff 8564.Sep 7 2015, 8:24 PM
glebius retitled this revision from to Retire t_maxopd, leave t_maxseg..
glebius updated this object.
glebius edited the test plan for this revision. (Show Details)
glebius added a subscriber: network.
glebius added inline comments.Sep 7 2015, 8:30 PM
sys/netinet/tcp_input.c
496 ↗(On Diff #8564)

As Patrick already noticed, this check is suspicious: it should take options length into account.

I'd say that situation is even worse. The t_maxseg is the peer's MSS, not ours. And this check should be done against our MSS, which we actually do not store anywhere.

So, let's leave this problem as is (the patch doesn't change it!) to the LRO maintainers.

sys/netinet/tcp_output.c
877 ↗(On Diff #8564)

As Patrick noticed, ipoptlen should probably be also subtracted here. For now, my suggestion is to leave problem untouched, to avoid mixing of different changes in one patch.

glebius updated this revision to Diff 8568.Sep 7 2015, 9:14 PM

Restore sanity check, and add XXX comment.

gnn added a subscriber: gnn.Sep 10 2015, 2:50 PM

Any comments on how this was tested?

For the crucial code, that calculates actual maximum segment size to be sent, the change isn't functional. The crucial code already used t_maxopd, which holds the correct value. Now it is substituted by t_maxseg, which the after patch stores the same value as t_maxopd used before.

The functional change happens in the code, that you won't see in the diff: the code that uses t_maxseg. So, with patch it will use another value. As said, the code that kept using t_maxseg isn't crucial, it recovers from deviations. E.g. I can add (- random(10)) to cwnd or sshthresh and TCP would still be working. The performance could change however. It even can improve for certain loss/RTT scenario and decrease for another scenario at the same time.

So, how a proper testing can be done is a very difficult question here.

My point is that original BSD stack and modern NetBSD uses unreduced pure offered MSS in these calculations.

Modern RFCs say that cwnd calculation should utilize the "fixed" MSS, see RFC5681, Definitions. So, the current code is closer to a correct behaviour. Looks like I need to dig deeper in the problem.

Btw, OpenBSD also tracks maxopd and maxseg separately, and also they recalculate cwnd when maxseg changes.

glebius planned changes to this revision.Sep 10 2015, 3:10 PM
stas edited edge metadata.Sep 10 2015, 6:33 PM

Gleb, if we real world testing is needed we'd be happy to test it in our environment.

hiren edited edge metadata.EditedSep 11 2015, 1:20 AM

Modern RFCs say that cwnd calculation should utilize the "fixed" MSS, see RFC5681, Definitions. So, the current code is closer to a correct behaviour. Looks like I need to dig deeper in the problem.

Can you please point to the exact place where RFC is talking about this? I am missing the "fixed" part you are talking about.

Btw, OpenBSD also tracks maxopd and maxseg separately, and also they recalculate cwnd when maxseg changes.

Interesting. At every stage of cwnd growth (upon receiving acks), it get's increased by maxseg (and a bunch of other things). What is the need for this explicit cwnd adjustment? It'd be great if you can point to the code doing this in openbsd.

jch edited edge metadata.Sep 16 2015, 2:54 PM
jch added a subscriber: jch.
np requested changes to this revision.Sep 16 2015, 3:16 PM
np added a reviewer: np.
np added a subscriber: np.

t_maxopd is used in sys/dev/cxgbe/tom/t4_cpl_io.c too (in assign_rxopt). Does GENERIC build with the proposed patch?

@np, thanks. Will take care.

Hiren, here in RFC5681, last sentence:

SENDER MAXIMUM SEGMENT SIZE (SMSS): The SMSS is the size of the
   largest segment that the sender can transmit.  This value can be
   based on the maximum transmission unit of the network, the path
   MTU discovery [RFC1191, RFC4821] algorithm, RMSS (see next item),
   or other factors.  The size does not include the TCP/IP headers
   and options.
glebius updated this revision to Diff 11154.Dec 11 2015, 11:35 PM
glebius edited edge metadata.

Description for updated version.

  • t_maxopd retired.
  • t_maxseg stores what maxopd used to store: (MTU - minimum protocol header).

The sending part hasn't changed. It still calculates effective MSS
via tcp_dooptions().

However, MSS used in cwnd and ssthresh calculation is now more precise.
Before, it was (MSS - (timestamps running ? timestamp len : 0)).
Now, a function tcp_maxseg() is provided. It is a lightweight version
of tcp_dooptions(). Its result is used in places that used t_maxseg
before.

@hiren, your opinion on the patch much appreciated.
@stas, you said you were interested in testing.
@np, review for your drivers would be appreciated.

np added a comment.Dec 12 2015, 3:16 PM

The cxgb and cxgbe TOM changes look good to me.

kbowling edited edge metadata.Dec 17 2015, 3:59 AM
kbowling added a subscriber: transport.
jtl accepted this revision.Dec 17 2015, 3:46 PM
jtl added a reviewer: jtl.
jtl added a subscriber: jtl.

Thanks Gleb! This looks like a positive change. See two minor in-line comments.

sys/netinet/tcp_timer.c
685 ↗(On Diff #11154)

This looks broken (not by your commit; it looks like a pre-existing bug). This code saves t_pmtud_saved_maxseg each time we try a lower value. This means that t_pmtud_saved_maxseg will not contain the original MSS value if we ever try to restore it.

Yet another bug we should fix in a different commit.

sys/netinet/tcp_var.h
137 ↗(On Diff #11154)

To keep the structure size unchanged, you'll need to add another u_int padding variable to replace this removed variable.

glebius added inline comments.Jan 6 2016, 10:32 PM
sys/netinet/tcp_var.h
137 ↗(On Diff #11154)

We need to keep the size only for stable branches. We usually keep a bunch of spares in many structures in head as well, simply because there is no mechanism to insert spares everywhere when we branch a new stable.

This revision was automatically updated to reflect the committed changes.

It seems these changes also need to be applied to the modular 'fastpath' TCP stack, as it still tries to access the t_maxopd member