Properly save the original MSS during blackhole detection
AcceptedPublic
Actions

Authored by jtl on Jan 7 2016, 1:36 AM.

Details

Reviewers

Group Reviewers

Summary

Only save the original MSS once during blackhole detection. Previously, the code would overwrite it at every downward increment, preventing the code from properly restoring the original value.

(This was originally identified during the D3593 review, but was not fixed there.)

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 1976
Build 1984: arc lint + arc unit

Event Timeline

jtl updated this revision to Diff 11981.Jan 7 2016, 1:36 AM

jtl retitled this revision from to Properly save the original MSS during blackhole detection.

jtl updated this object.

jtl edited the test plan for this revision. (Show Details)

jtl added a reviewer: transport.

Herald edited edge metadata. · View Herald TranscriptJan 7 2016, 1:36 AM

Herald added a subscriber: imp. · View Herald Transcript

I'll try to look at this in a couple (working) days.

@jtl Help me understand this better. Do we really need to restore 'original' MSS value when trying to restore or the previous one that worked?
AFAIK, in current implementation, once we lower MTU because of blackhole, we don't raise it back even on successful transfer of packets. We don't do upward probing to get back to original MTU. It stays the same (or goes further down). We only unset TF2_PLPMTU_BLACKHOLE when we've tried enough with lowest possible value and we've failed.
So, for example, we start with 1448, detect blackhole and reduce to 1188. Do some packets successfully there and again detect blackhole which reduces further to 524. Now, if we have to restore MSS value, what should it be?

(or may be I am making this entire thing more complicated than it is because of my jet lagged brain ;-))

In D4810#102924, @hiren wrote:

@jtl Help me understand this better. Do we really need to restore 'original' MSS value when trying to restore or the previous one that worked?
AFAIK, in current implementation, once we lower MTU because of blackhole, we don't raise it back even on successful transfer of packets. We don't do upward probing to get back to original MTU. It stays the same (or goes further down). We only unset TF2_PLPMTU_BLACKHOLE when we've tried enough with lowest possible value and we've failed.
So, for example, we start with 1448, detect blackhole and reduce to 1188. Do some packets successfully there and again detect blackhole which reduces further to 524. Now, if we have to restore MSS value, what should it be?

(or may be I am making this entire thing more complicated than it is because of my jet lagged brain ;-))

My understanding of the overall intent of the design is:

If we are retransmitting, try lowering the MSS to see if we encountered a blackhole and a lower MSS will allow the retransmissions to succeed.
- If not, assume there is no blackhole and restore the original MSS.
- If the retransmission succeeds, assume there was a blackhole and keep using the new MSS until we hit more problems.

What is not clear is the intended behavior if we begin having to retransmit after we have switched to using the new MSS: should we eventually restore the MSS to the original value? Or, once blackhole detection has discovered a new MSS, should we switch to using that as our base MSS and never go higher than that?

In any case, my patch was intended to address the first prong of the feature: restoring the MSS when blackhole detection fails to find a new MSS that works. I think this is broken.

If I am reading the code correctly, it currently works like this:

MSS starts at 1448 (for example).

We retransmit once.

Before our second retransmission, we set TF2_PLPMTU_BLACKHOLE, and set tp->t_pmtud_saved_maxseg to tp->t_maxseg. So, at this point, tp->t_pmtud_saved_maxseg is 1448. We then lower the MSS to V_tcp_pmtud_blackhole_mss (1200, let's say).

We retransmit a second and third time.

Before our fourth retransmission, we set TF2_PLPMTU_BLACKHOLE (which is a no-op), and set tp->t_pmtud_saved_maxseg to tp->t_maxseg. So, at this point, tp->t_pmtud_saved_maxseg is 1200. We then lower the MSS to V_tcp_mssdflt (536, let's say) and clear TF2_PLPMTU_PMTUD (to deactivate PMTU discovery for the session).

We then retransmit a fourth time. While retransmitting (assuming we are retransmitting a number of bytes >= the new MSS), tcp_output will set TF2_PLPMTU_MAXSEG. And, assuming V_tcp_mssdflt > V_tcp_minmss (which it is, by default), tcp_output will set TF2_PLPMTU_PMTUD.

We then retransmit a fifth time.

Before our sixth retransmission, we set TF2_PLPMTU_BLACKHOLE (which is a no-op), and set tp->t_pmtud_saved_maxseg to tp->t_maxseg. So, at this point, tp->t_pmtud_saved_maxseg is 536. We then set the MSS to V_tcp_mssdflt (which is a no-op, since we already set this). We also clear TF2_PLPMTU_PMTUD.

We then retransmit a sixth time. (tcp_output() will probably set TF2_PLPMTU_PMTUD again.)

Before our seventh retransmission, we set the TF2_PLPMTU_PMTUD flag, clear the TF2_PLPMTU_BLACKHOLE flag, and "restore" the MSS to tp->t_pmtud_saved_maxseg. However, because we set tp->t_pmtud_saved_maxseg to tp->t_maxseg just before the sixth retransmission, this is a no-op.

So, our session is now permanently stuck with an MSS of 536, even if we just encountered a temporary reachability problem, rather than a blackhole.

BTW, before each subsequent retransmission, I think we will then oscillate between tp->t_maxseg = tp->t_pmtud_saved_maxseg and tp->t_maxseg = V_tcp_mssdflt. I don't think my patch addresses that.

So, the bottom line is that I think the algorithm here is broken and needs to be fixed. In the course of writing this analysis, I think I've found another problem to address (the oscillation problem). :-)

Please let me know if you agree with the analysis.

Apologies for a very late response.

Firstly, I agree that the code is total spaghetti and should be fixed. What I remember from looking at earlier was we (FreeBSD) should do following:

if we don't find a blackhole (i.e. lowering mss doesn't help in sending), we restore the mss.
if we find a blackhole and connection is humming along with lowered mss, we do not (by design) try to probe for higher mss. (There is an rfc (of course) for this which we do not support).

Looking at your analysis, you seem to suggest that we fail to restore mss evenif we don't find a blackhole? (i.e. 1) is not followed?) if so, that needs to be fixed.

In any case, as you volunteered (thanks a lot, btw) to fix this, I'd let you decide how to go ahead with this. As I do not have an easy way to test this atm.

Hiren,

This looks like a fix for the first case.. preserve the original MTU in case PMTU doesn't work rather than one of the intermediate steps during detection.

Thanks @kevin.bowling_kev009.com. You made me look at the actual code :p

@jonlooney_gmail.com This looks good. Your analysis and patch looks correct to me.

This revision is now accepted and ready to land.Feb 4 2016, 7:27 AM

Ping @jtl

I didn't know that this review existed when committing r322967, which contains the fix suggested here.

Revision Contents
Changeset List

Path

Size

sys/

netinet/

tcp_timer.c

13 lines

Diff 11981

View Options

Properly save the original MSS during blackhole detectionAcceptedPublicActions