Page MenuHomeFreeBSD

Accept always TCP segment with SEG.SEQ==RCV.NXT
ClosedPublic

Authored by tuexen on Dec 16 2018, 11:23 PM.

Details

Summary

If an out of order TCP segment with SEG.SEQ==RCV.NXT is received, handle it without allocating a queue entry or apply any other limit.

@rrs Please check the usage of logging, which is still missing.

@tuexen Check and test the case where end-point specific limits apply. The case with tcp_new_limits seem suspicious.

Test Plan

Use the attached test script for checking the case where the global limit applies. Make sure it runs on a system with hw.ncpu=1 and net.inet.tcp.reass.maxsegments=83, the minimal value. This script reproduces the issue reported by jhb@.

This is the problem based on my analysis: When TCP receives out of order segments it limits the resources being used. There is a global limit on the number of queue entries. If the limit is reached, no new queue entries are allocated. This applies currently also to segments with SEG.SEQ==RCV.NXT, which results in stalled TCP connections.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

tuexen created this revision.Dec 16 2018, 11:23 PM
hiren added a subscriber: hiren.Dec 17 2018, 5:13 AM

Can you please describe the actual reported problem?
Thanks!

tuexen edited the test plan for this revision. (Show Details)Dec 17 2018, 11:06 AM

Can you please describe the actual reported problem?
Thanks!

Problem description added.

tuexen updated this revision to Diff 52097.Dec 17 2018, 11:24 AM

Ensure that the condition SEG.SEQ==RCV.NXT is also considered when applying the local end-point based limit.

jtl added inline comments.Dec 17 2018, 4:11 PM
sys/netinet/tcp_reass.c
981 ↗(On Diff #52097)

Wouldn't it produce less code churn to move this to the top of the new_entry label?

rrs requested changes to this revision.Dec 17 2018, 4:55 PM
rrs added inline comments.
sys/netinet/tcp_reass.c
981 ↗(On Diff #52097)

I agree and also I am not sure that you want to return here. You need to
go through and spin through to see if you can make more progress.

Consider we have this situation
rcv_nxt = 1

reassembly queue entries for

100->2000, 3000-5000

Now in comes seg 1-100.

If I read this right we would recognize and drop into this code, which is fine, we
would advance forward to tp->rcv_nxt = 100. But we would not dequeue the
next packet(s) 100->2000 that could be added.

I would think we would want this code up to the top and then have it
setup to jump into the loop below...

This revision now requires changes to proceed.Dec 17 2018, 4:55 PM
rrs added inline comments.Dec 17 2018, 4:59 PM
sys/netinet/tcp_reass.c
981 ↗(On Diff #52097)

Yeah looking at the code, instead of the return
do

goto present;

That should do it.

jhb added inline comments.Dec 17 2018, 6:12 PM
sys/netinet/tcp_reass.c
981 ↗(On Diff #52097)

In the case that the segment is 1-100, won't it be merged with the existing 100-2000 queue entry and not have to allocate a new queue entry at all? I am mostly basing this on the comments than the code, but the comments imply that we only get to the 'new_entry' label if the packet can't be joined to an existing segment, so that means that there will always still be a hole after the current packet is appended to the socket buffer. (That is, it would have to be 1-98 or some other smaller range). If that is true, then I think Michael's patch is correct to return.

FWIW, I feel like at some point the older TCP reassembly code used to allocate a queue entry on the stack to use for the th_seq == rcv_nxt case to avoid this problem. Also, whatever fix we do here probably needs to be pushed out as an EN for 12.0. If the affected code was merged to 11.2, then we will need an EN there as well. (cpericva@ reported the same issue I had which is that he can't complete a 'git clone' of freebsd.git from scratch without getting a hang)

jhb added a subscriber: cperciva.Dec 17 2018, 6:12 PM
tuexen added inline comments.Dec 17 2018, 7:37 PM
sys/netinet/tcp_reass.c
981 ↗(On Diff #52097)

If the reassembly queue is 100-2000, 3000-5000 and we get 1-100, wouldn't we extend the first queue entry to the left and goto present?

The case I'm handling is the we get segment with SEG.SEQ==RCV.NXT, but this segment does not extend the first queue entry to the left. Therefore we can't deliver more and the return is fine. Am I'm missing something?

981 ↗(On Diff #52097)

Moving the code makes sense. Didn't realise this since I did the change in two steps. Will upload a new revision.

tuexen updated this revision to Diff 52126.Dec 17 2018, 7:41 PM

Address jtl's comment making the diff much smaller.

tuexen marked an inline comment as done.Dec 17 2018, 7:42 PM
jtl accepted this revision.Dec 17 2018, 8:31 PM

LGTM; would appreciate also hearing @rrs's opinion.

sys/netinet/tcp_reass.c
981 ↗(On Diff #52097)

FWIW, I feel like at some point the older TCP reassembly code used to allocate a queue entry on the stack to use for the th_seq == rcv_nxt case to avoid this problem.

Yes, and I think that was dangerous. In fact, we had a remote DoS SA due to that. I'm personally glad we have an alternate way to do the same thing (at least, now that @tuexen is fixing it :-) ).

I just tested this patch on a machine using the scenario jhb@ mentioned:
Running git clone https://github.com/freebsd/freebsd.git. It still fails.

So it seems that there are more issues than the one I mentioned above.
Will need to look into this in more detail...

rrs accepted this revision.Dec 17 2018, 9:43 PM
rrs added inline comments.
sys/netinet/tcp_reass.c
981 ↗(On Diff #52097)

Yeah your right.. in theory we should have joined it if it was aligned to another block
so the return is ok. Just moving it up I think is all that is needed...

This revision is now accepted and ready to land.Dec 17 2018, 9:43 PM
tuexen updated this revision to Diff 52133.Dec 17 2018, 11:04 PM

Also add an exception to the mbuf count limit rule. This is the one being triggered by the git clone https://github.com/freebsd/freebsd.git command. With this fix, it runs to completion without a problem.

This revision now requires review to proceed.Dec 17 2018, 11:04 PM
jtl accepted this revision.Dec 17 2018, 11:38 PM
This revision is now accepted and ready to land.Dec 17 2018, 11:38 PM
kevans added a subscriber: kevans.Dec 18 2018, 4:27 PM
emaste added a subscriber: emaste.Dec 18 2018, 5:13 PM

So it seems that there are more issues than the one I mentioned above.

So is it the case that this patch is needed to fix one issue, but another remains?

jhb added a comment.Dec 20 2018, 1:18 AM

So it seems that there are more issues than the one I mentioned above.

So is it the case that this patch is needed to fix one issue, but another remains?

No, Michael uploaded a newer diff after that comment that fixed the git clone fully. I think it fixes two places to allow retransmits to always make forward progress. Not sure why it hasn't been committed yet?

mjg added a subscriber: mjg.Dec 20 2018, 1:48 AM

I have a machine where I reliably fail to git clone postgres repository, it always gets stuck at about 20%. This patch fixes it.

In D18580#396953, @jhb wrote:

So it seems that there are more issues than the one I mentioned above.

So is it the case that this patch is needed to fix one issue, but another remains?

No, Michael uploaded a newer diff after that comment that fixed the git clone fully. I think it fixes two places to allow retransmits to always make forward progress. Not sure why it hasn't been committed yet?

I'm just trying to figure out if I need to do anything special due to this being a candidate for an EN (haven't done that before).

So it seems that there are more issues than the one I mentioned above.

So is it the case that this patch is needed to fix one issue, but another remains?

This patch now fixes three instances of the same problem. Nothing is outstanding anymore and testing show that the problem reported is fixed by this patch. From my view it is ready to be committed. Just trying to figure out if there is any special procedure due to the fact that this is a candidate for an EN...

This revision was automatically updated to reflect the committed changes.
jhb added a comment.Dec 20 2018, 4:50 PM

In terms of an EN, the only thing you probably need to do is reply to the commit with the reply sent to re@ explaining this should be an EN and which existing releases it applies to.

In D18580#397052, @jhb wrote:

In terms of an EN, the only thing you probably need to do is reply to the commit with the reply sent to re@ explaining this should be an EN and which existing releases it applies to.

Thanks for the information. They are aware of it now.

For the record, I observe this problem on unpatched 12.0 with nginx-tests,
which manifests itself as a stalled transmit since select() never returns
ready read event, although data was sent and should be available on socket.
In tcpdump it's seen as after some time, client's TCP ACK sequence number
starts to never increment (stuck at 2701889 in the table), then server
performs retransmits and so on, until connection timeout.
Notably, this is reproducible on loopback with low SO_SNDBUF set on server.

seq      | 2671497 2672309   2703049   2701889   2746781   2701889
seqN     | 2672309 2703049   2732861   2746781   2757673   2702005
last ack v 2672193 2701889*3 2701889*n 2701889*m 2701889*p 2701889
--t-->

The patch fixes the problem.