tcp: Don't "negotiate" MSS.
ClosedPublic
Actions

Authored by sepherosa_gmail.com on Sep 21 2017, 8:11 AM.

Details

Reviewers

hiren
glebius
gnn
lstewart
rrs
gallatin
rwatson
adrian
kbowling
• karels
tuexen
bz

Group Reviewers

transport

Commits

rS324050: tcp: Don't "negotiate" MSS.

Summary

_NO_ OSes actually "negotiate" MSS.

RFC 879:
"... This Maximum Segment Size (MSS) announcement (often mistakenly
called a negotiation) ..."

This negotiation behaviour was introduced 11 years ago by r159955
without any explaination about why FreeBSD had to "negotiate" MSS:

    In syncache_respond() do not reply with a MSS that is larger than what
    the peer announced to us but make it at least tcp_minmss in size.

    Sponsored by:   TCP/IP Optimization Fundraise 2005

The tcp_minmss behaviour is still kept.

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

sepherosa_gmail.com created this revision.Sep 21 2017, 8:11 AM

Herald added 1 blocking reviewer(s): transport. · View Herald TranscriptSep 21 2017, 8:11 AM

jch added a subscriber: jch.Sep 21 2017, 8:57 AM

tuexen added a subscriber: tuexen.Sep 21 2017, 4:24 PM

tuexen added a reviewer: tuexen.Sep 24 2017, 8:25 AM

tuexen removed a subscriber: tuexen.

I hadn't noticed this snippet of code before. It was definitely wrong.

bz accepted this revision as: bz.Sep 25 2017, 11:14 AM

We can't revert this patch, for one extremely simple reason, and another more subtle reason.

The simple reason is PPPoE.

PPPoE adds an 8 byte overhead to packets. Generally, devices behind the PPPoE router do not know about this 8 byte lower MTU / MSS, they advertise an MSS of 1460. The PPPoE device intercepts packets with the SYN flag set and reduces the MSS option by 8 bytes.

If the receiver of this altered MSS option doesn't incorporate it into its MSS option response, then the wrong MSS will be "negotiated." Specifically, the device behind the PPPoE router will be unaware of the lower MSS, and send larger packets.

For inbound connections, if the device doesn't lower the MSS by 8 bytes in the MSS option response, as directed by the PPPoE middlebox, then the external client will use the wrong MSS.

Yes, it is absolutely a negotiation.

The more subtle reason is, for organizations that have to deal with virtually every TCP stack ever made, past and future, there's really no upside to trusting the remote TCP stack more than we must.

Hi Jason,

as far as I know, middleboxes performing MSS clamping do it for incoming and outgoing traffic.

So considering your example:

If you have a client behind such a middlebox, it send the SYN with the MSS=1460, but the server receives the SYN with an MSS of 1452. Now the server sends back a SYN-ACK with an MSS of 1460, and the middlebox will change it also such that the client sees an MSS of 1452. So this will work.

If you have a server behind the PPPoE line, the client will send a SYN with an MSS of 1460, but the server will see the MSS of 1452. The server will send an SYN-ACK with an MSS of 1460, but the client will see 1452, since the middlebox will change it.

So it is not a negotiation. Each sides announces what maximum segments it can receive.

So I think the idea is OK.

Ah, yes you are correct with regard to the PPPoE case.

Hmm. Maybe it is ok.

But what is the advantage of doing this, what are we trying to fix here? It seems like it is not broken now, and could only expose us to bugs.

I think it is good to be standards compliant. It helps in case of suboptimal middleboxes likes the ones described by sepherosa which relatively decrement the MSS instead of enforcing a maximum. The benefit is the use larger segments which should improve the performance (less overhead).

Need to check if suggested patch is enough to handle all cases. Right now writing packetdrill scripts for testing...

I'm not seeing where that was discussed but, sounds good to me.

Shouldn't we also apply something like

 	/* Map our computed MSS into the 3-bit index. */
-	mss = min(tcp_mssopt(&sc->sc_inc), max(sc->sc_peer_mss, V_tcp_minmss));
+	mss = sc->sc_peer_mss;
 	for (i = nitems(tcp_sc_msstab) - 1; tcp_sc_msstab[i] > mss && i > 0;
 	     i--)
 		;

since we will use the computed index to restore the sc->sc_peer_mss later.

sys/netinet/tcp_syncache.c
1637	I would prefer mssopt = max(tcp_mssopt(&sc->sc_inc), V_tcp_minmss); instead of the two lines.

For what it is worth:

While testing the MSS handling I figured out that FreeBSD was NOT negotiating the MSS in case of receiving a SYN-segment in the SYN-SENT state.
So the patch discussed here will also result in consistent handling of the MSS.

In D12430#259071, @tuexen wrote:
Shouldn't we also apply something like
 	/* Map our computed MSS into the 3-bit index. */
-	mss = min(tcp_mssopt(&sc->sc_inc), max(sc->sc_peer_mss, V_tcp_minmss));
+	mss = sc->sc_peer_mss;
 	for (i = nitems(tcp_sc_msstab) - 1; tcp_sc_msstab[i] > mss && i > 0;
 	     i--)
 		;
since we will use the computed index to restore the sc->sc_peer_mss later.

Ah, missed that one. Yeah, we need this change.