Page MenuHomeFreeBSD

Change blackhole detection and mtu clamping trigger.
ClosedPublic

Authored by hiren on Aug 19 2015, 8:16 PM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 10 2024, 5:01 AM
Unknown Object (File)
Nov 5 2024, 3:41 PM
Unknown Object (File)
Oct 24 2024, 10:40 AM
Unknown Object (File)
Oct 21 2024, 5:23 PM
Unknown Object (File)
Oct 21 2024, 5:23 PM
Unknown Object (File)
Oct 21 2024, 5:23 PM
Unknown Object (File)
Oct 21 2024, 5:23 PM
Unknown Object (File)
Oct 21 2024, 5:23 PM

Details

Summary

Blackhole detection and mtu clamping begins with just first time an RTO fires.
This seems too aggressive and probably not correct. This change makes it so
that it only triggers when the retransmit also fails. i.e. it gets triggered after 2 chances.

The changes also make sure that each mtu probe stage (usually 1448 -> 1188 -> 524) gets 2 chances before further clamping down.

Test Plan

This is how it behaves now with simulated loss:

  1. A packet with mtu 1448 gets dropped
  2. We receive 3 dup acks to let us know about the loss
  3. 1st retransmit attempt with mtu 1448 which also gets dropped, t_rxtshift = 1
  4. 2nd retransmit attempt with mtu 1448 which also gets dropped, t_rxtshift = 2
  5. Now, we decide to trigger blackhole mtu reduction and go from 1448 to 1188.
  6. 3rd retransmit attempt now with newer mtu 1188 which gets dropped rxtshift = 3
  7. 4th retransmit attempt now with mtu 1188 which also gets dropped rxtshift = 4
  8. At this point, we further reduce MTU to 524, 5th retransmit attempt which gets dropped rxtshift = 5
  9. 6th retransmit attempt with mtu 524 which gets dropped rxtshift = 6. Now, because 524 is net.inet.tcp.mssdflt, we don't clamp down more.
  10. 7th retransmit attempt with mtu 524. If this gets dropped, we assume that this is not actually a blackhole and we undo the clamping down we've done and increment net.inet.tcp.pmtud_blackhole_failed

If a valid ack comes at any time during this, t_rxtshift gets reset to 0.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

hiren retitled this revision from to Change blackhole detection and mtu clamping trigger..
hiren updated this object.
hiren edited the test plan for this revision. (Show Details)
hiren added a subscriber: network.
gnn added a reviewer: gnn.
This revision is now accepted and ready to land.Aug 19 2015, 9:05 PM

I wished our TCP comments would all cite the RFC/section for the appropriate thing to make it a lot easier to see if implementation and theory line up. At least for commit messages it might be a good idea to cite them?

In D3434#69986, @bz wrote:

I wished our TCP comments would all cite the RFC/section for the appropriate thing to make it a lot easier to see if implementation and theory line up. At least for commit messages it might be a good idea to cite them?

Apologies for not being clear about this.
rfc 4821 doesn't specify the implementation details. Section "2. Overview" says:

This document does not contain a complete description of an
implementation.  It only sketches details that do not affect
interoperability with other implementations....

I can mention this in the commitlog.

I've discovered some anomalies with the current approach. I'd come back with more results/conclusions.

hiren edited edge metadata.

Intention of this feature is to be able to crank mtu down till we can
successfully send data or if mtu gets reduced to decided min mss. This
was not happening with the earlier proposed patch. We could only reduce
mtu once.

Current patch makes it possible where we transition from, for example,
1448 -> 1188 -> 524 till we successfully can send a packet. (542 is the
predetermined min mss here.)

jch added a subscriber: jch.

Any comments/concerns regarding this patch?

Improving the patch by making sure each probe stage gets 2 chances of recovery
before further clamping down of mtu.

sbruno added a reviewer: sbruno.
sbruno added a subscriber: sbruno.

This is an adequate solution to "searching" for a useable MTU when ICMP is blocked. Linux does a binary search starting at MIN and working its way up. This is a nice way to do a similar technique.

This revision is now accepted and ready to land.Sep 22 2015, 9:03 PM

@bz / others: any objection/comments? If not, I'll commit this in a few days.

This revision was automatically updated to reflect the committed changes.