Details

Reviewers

rrs
jtl
jhb
jason_eggnet.com
markj

Group Reviewers

transport

Commits

rS342378: MFC r342280:
rS342280: Fix a regression in the TCP handling of received segments.

Summary

If an out of order TCP segment with SEG.SEQ==RCV.NXT is received, handle it without allocating a queue entry or apply any other limit.

@rrs Please check the usage of logging, which is still missing.

@tuexen Check and test the case where end-point specific limits apply. The case with tcp_new_limits seem suspicious.

Test Plan

Use the attached test script for checking the case where the global limit applies. Make sure it runs on a system with hw.ncpu=1 and net.inet.tcp.reass.maxsegments=83, the minimal value. This script reproduces the issue reported by jhb@.

This is the problem based on my analysis: When TCP receives out of order segments it limits the resources being used. There is a global limit on the number of queue entries. If the limit is reached, no new queue entries are allocated. This applies currently also to segments with SEG.SEQ==RCV.NXT, which results in stalled TCP connections.

jhb.pkt18 KBDownload

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

tuexen created this revision.Dec 16 2018, 11:23 PM

Herald added a reviewer: transport. · View Herald TranscriptDec 16 2018, 11:23 PM

Herald added a subscriber: imp. · View Herald Transcript

Harbormaster completed remote builds in B21600: Diff 52086.Dec 16 2018, 11:23 PM

tuexen added a reviewer: markj.Dec 16 2018, 11:24 PM

Can you please describe the actual reported problem?
Thanks!

In D18580#396167, @hiren wrote:

Can you please describe the actual reported problem?
Thanks!

Problem description added.

Ensure that the condition SEG.SEQ==RCV.NXT is also considered when applying the local end-point based limit.

Harbormaster completed remote builds in B21604: Diff 52097.Dec 17 2018, 11:24 AM

jtl added inline comments.Dec 17 2018, 4:11 PM

sys/netinet/tcp_reass.c
981 ↗	(On Diff #52097)	Wouldn't it produce less code churn to move this to the top of the new_entry label?

rrs requested changes to this revision.Dec 17 2018, 4:55 PM

rrs added inline comments.

sys/netinet/tcp_reass.c
981 ↗	(On Diff #52097)	I agree and also I am not sure that you want to return here. You need to go through and spin through to see if you can make more progress. Consider we have this situation rcv_nxt = 1 reassembly queue entries for 100->2000, 3000-5000 Now in comes seg 1-100. If I read this right we would recognize and drop into this code, which is fine, we would advance forward to tp->rcv_nxt = 100. But we would not dequeue the next packet(s) 100->2000 that could be added. I would think we would want this code up to the top and then have it setup to jump into the loop below...

This revision now requires changes to proceed.Dec 17 2018, 4:55 PM

rrs added inline comments.Dec 17 2018, 4:59 PM

sys/netinet/tcp_reass.c
981 ↗	(On Diff #52097)	Yeah looking at the code, instead of the return do goto present; That should do it.

jhb added inline comments.Dec 17 2018, 6:12 PM

sys/netinet/tcp_reass.c
981 ↗	(On Diff #52097)	In the case that the segment is 1-100, won't it be merged with the existing 100-2000 queue entry and not have to allocate a new queue entry at all? I am mostly basing this on the comments than the code, but the comments imply that we only get to the 'new_entry' label if the packet can't be joined to an existing segment, so that means that there will always still be a hole after the current packet is appended to the socket buffer. (That is, it would have to be 1-98 or some other smaller range). If that is true, then I think Michael's patch is correct to return. FWIW, I feel like at some point the older TCP reassembly code used to allocate a queue entry on the stack to use for the th_seq == rcv_nxt case to avoid this problem. Also, whatever fix we do here probably needs to be pushed out as an EN for 12.0. If the affected code was merged to 11.2, then we will need an EN there as well. (cpericva@ reported the same issue I had which is that he can't complete a 'git clone' of freebsd.git from scratch without getting a hang)

jhb added a subscriber: cperciva.Dec 17 2018, 6:12 PM

tuexen added inline comments.Dec 17 2018, 7:37 PM

sys/netinet/tcp_reass.c
981 ↗	(On Diff #52097)	If the reassembly queue is 100-2000, 3000-5000 and we get 1-100, wouldn't we extend the first queue entry to the left and goto present? The case I'm handling is the we get segment with `SEG.SEQ==RCV.NXT`, but this segment does not extend the first queue entry to the left. Therefore we can't deliver more and the `return` is fine. Am I'm missing something?
981 ↗	(On Diff #52097)	Moving the code makes sense. Didn't realise this since I did the change in two steps. Will upload a new revision.

Address jtl's comment making the diff much smaller.

Harbormaster completed remote builds in B21620: Diff 52126.Dec 17 2018, 7:41 PM

tuexen marked an inline comment as done.Dec 17 2018, 7:42 PM

LGTM; would appreciate also hearing @rrs's opinion.

sys/netinet/tcp_reass.c
981 ↗	(On Diff #52097)	FWIW, I feel like at some point the older TCP reassembly code used to allocate a queue entry on the stack to use for the th_seq == rcv_nxt case to avoid this problem. Yes, and I think that was dangerous. In fact, we had a remote DoS SA due to that. I'm personally glad we have an alternate way to do the same thing (at least, now that @tuexen is fixing it :-) ).

I just tested this patch on a machine using the scenario jhb@ mentioned:
Running git clone https://github.com/freebsd/freebsd.git. It still fails.

So it seems that there are more issues than the one I mentioned above.
Will need to look into this in more detail...

rrs accepted this revision.Dec 17 2018, 9:43 PM

rrs added inline comments.

sys/netinet/tcp_reass.c
981 ↗	(On Diff #52097)	Yeah your right.. in theory we should have joined it if it was aligned to another block so the return is ok. Just moving it up I think is all that is needed...

This revision is now accepted and ready to land.Dec 17 2018, 9:43 PM

Also add an exception to the mbuf count limit rule. This is the one being triggered by the git clone https://github.com/freebsd/freebsd.git command. With this fix, it runs to completion without a problem.

This revision now requires review to proceed.Dec 17 2018, 11:04 PM

Harbormaster completed remote builds in B21621: Diff 52133.Dec 17 2018, 11:04 PM

jtl accepted this revision.Dec 17 2018, 11:38 PM

This revision is now accepted and ready to land.Dec 17 2018, 11:38 PM

kevans added a subscriber: kevans.Dec 18 2018, 4:27 PM

emaste added a subscriber: emaste.Dec 18 2018, 5:13 PM

So it seems that there are more issues than the one I mentioned above.

So is it the case that this patch is needed to fix one issue, but another remains?

In D18580#396949, @emaste wrote:

So it seems that there are more issues than the one I mentioned above.

So is it the case that this patch is needed to fix one issue, but another remains?

No, Michael uploaded a newer diff after that comment that fixed the git clone fully. I think it fixes two places to allow retransmits to always make forward progress. Not sure why it hasn't been committed yet?

I have a machine where I reliably fail to git clone postgres repository, it always gets stuck at about 20%. This patch fixes it.

In D18580#396953, @jhb wrote:

In D18580#396949, @emaste wrote:

So it seems that there are more issues than the one I mentioned above.

So is it the case that this patch is needed to fix one issue, but another remains?

No, Michael uploaded a newer diff after that comment that fixed the git clone fully. I think it fixes two places to allow retransmits to always make forward progress. Not sure why it hasn't been committed yet?

I'm just trying to figure out if I need to do anything special due to this being a candidate for an EN (haven't done that before).

In D18580#396949, @emaste wrote:

So it seems that there are more issues than the one I mentioned above.

So is it the case that this patch is needed to fix one issue, but another remains?

This patch now fixes three instances of the same problem. Nothing is outstanding anymore and testing show that the problem reported is fixed by this patch. From my view it is ready to be committed. Just trying to figure out if there is any special procedure due to the fact that this is a candidate for an EN...

Closed by commit rS342280: Fix a regression in the TCP handling of received segments. (authored by tuexen). · Explain WhyDec 20 2018, 4:05 PM

This revision was automatically updated to reflect the committed changes.

In terms of an EN, the only thing you probably need to do is reply to the commit with the reply sent to re@ explaining this should be an EN and which existing releases it applies to.

In D18580#397052, @jhb wrote:

In terms of an EN, the only thing you probably need to do is reply to the commit with the reply sent to re@ explaining this should be an EN and which existing releases it applies to.

Thanks for the information. They are aware of it now.

For the record, I observe this problem on unpatched 12.0 with nginx-tests,
which manifests itself as a stalled transmit since select() never returns
ready read event, although data was sent and should be available on socket.
In tcpdump it's seen as after some time, client's TCP ACK sequence number
starts to never increment (stuck at 2701889 in the table), then server
performs retransmits and so on, until connection timeout.
Notably, this is reproducible on loopback with low SO_SNDBUF set on server.

seq      | 2671497 2672309   2703049   2701889   2746781   2701889
seqN     | 2672309 2703049   2732861   2746781   2757673   2702005
last ack v 2672193 2701889*3 2701889*n 2701889*m 2701889*p 2701889
--t-->

The patch fixes the problem.

rozhuk.im-gmail.com added a subscriber: rozhuk.im-gmail.com.Dec 27 2018, 6:44 AM

Accept always TCP segment with SEG.SEQ==RCV.NXT
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 52194

head/sys/netinet/tcp_reass.c

Accept always TCP segment with SEG.SEQ==RCV.NXTClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 52194

head/sys/netinet/tcp_reass.c

Accept always TCP segment with SEG.SEQ==RCV.NXT
ClosedPublic
Actions

Revision Contents
Changeset List