Page MenuHomeFreeBSD

DCTCP implementation.
ClosedPublic

Authored by hiren on Aug 13 2014, 9:40 PM.
Tags
None
Referenced Files
F106677788: D604.diff
Fri, Jan 3, 6:31 PM
Unknown Object (File)
Sat, Dec 28, 1:38 PM
Unknown Object (File)
Sat, Dec 28, 1:35 PM
Unknown Object (File)
Sun, Dec 15, 8:33 PM
Unknown Object (File)
Tue, Dec 10, 12:34 PM
Unknown Object (File)
Oct 31 2024, 11:02 PM
Unknown Object (File)
Oct 31 2024, 10:49 PM
Unknown Object (File)
Oct 24 2024, 7:34 AM

Details

Summary

DCTCP (Datacenter TCP) implementation for FreeBSD by Midori Kato and Lars
Eggert.

Summary of the changes:

  1. Add a new congestion control module for dctcp.
  2. ECN (Explicit congestion notification) procession is done differently in

dctcp from RFC3168. Make necessary changes in tcp_input.c to accomodate it.

  1. Implement ODCTCP (One sided dctcp) where only sender or receiver is following

dctcp and not the other end.

http://tools.ietf.org/html/draft-bensley-tcpm-dctcp
https://eggert.org/students/kato-thesis.pdf

Submitted by: Midori Kato and Lars Eggert (Initial patch)
Improvements by: hiren (Commenting in the code mainly with other minor tweaks)
Tested by: hiren, Midori Kato

Test Plan

I've set up 3 boxes for testing. Dummynet node is the traffic shaper to help me
simulate traffic.

Setup:

Host A - sender (with proposed patch)
Host B - Dummynet (with r266941)
Host C - receiver (with proposed patch)

Connectivity:

Sender                         Dummynet             Receiver
Host A                          Host B                  Host C
192.168.10.10  <---> 192.168.10.11
                                 192.168.11.11 <-->  192.168.11.10

Sender Host A 192.168.10.10
Dummynet Host B 192.168.10.11, 192.168.11.11
Receiver Host C 192.168.11.10

On Sender/Receiver:

$ sysctl -w net.inet.tcp.cc.algorithm=dctcp
$ sysctl -w net.inet.tcp.ecn.enable=1

On Dummynet to setup forwarding:

$ sudo dn.sh add <src_ip> <src_iface> <dst_ip> <dst_iface> 1
$ sudo ./dn.sh add 192.168.10.10 192.168.11.10 1

Verify:

$ sudo ipfw show    
09994     0       0 pipe 9994 ip from 192.168.10.10 not 2049,22,12865 to 192.168.11.10 not dst-port 2049,22,12865 out 
09995     0       0 pipe 9995 ip from 192.168.11.10 not 2049,22,12865 to 192.168.10.10 not dst-port 2049,22,12865 out 
65535 18476 3441803 allow ip from any to any 

Start the traffic from sender (Host A):

$ sudo ./cap-netperf 192.168.10.10 igb0 192.168.11.10 bge1 192.168.10.11 1 dctcp

Check the o/p files generated by cap-netperf:

$ ls -lth 
total 998 
-rw-r--r--  1 root  users    54K May 30 00:00 192.168.10.10-192.168.11.10-fwd.pcap.gz
-rw-r--r--  1 root  users    27K May 30 00:00 192.168.10.10-192.168.11.10-fwd.sift.gz 

Generate the graphs:

$ sudo ../../vis-netperf 192.168.10.10-192.168.11.10
1 TCP connection(s) in the fwd trace
Extracting data for connection 1, direction fwd
Plotting every 1 sample(s) of connection 1, direction fwd
$ ls -lth total 1174
-rw-r--r--  1 root  users   175K May 30 00:05 192.168.10.10-192.168.11.10-1-fwd.pdf
-rw-r--r--  1 root  users    54K May 30 00:00 192.168.10.10-192.168.11.10-fwd.pcap.gz
-rw-r--r--  1 root  users    27K May 30 00:00 192.168.10.10-192.168.11.10-fwd.sift.gz

Look at the graph plotted in pdf to make sure we see correct behavior.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

hiren retitled this revision from to DCTCP implementation..
hiren updated this object.
hiren edited the test plan for this revision. (Show Details)
hiren added reviewers: gnn, lstewart.

I've been holding on to these changes for a long time now and would like to commit by this weekend.

gnn edited edge metadata.

Do it. So long as it's not on by default there is no more reason to wait.

This revision is now accepted and ready to land.Aug 28 2014, 10:19 PM
bz requested changes to this revision.Aug 29 2014, 1:18 PM
bz added a reviewer: bz.
bz added a subscriber: bz.

Mostly just a style review; not reviewed (most of) the functional changes (yet).

share/man/man4/cc_dctcp.4
2

Should have a - at the end.

32

If there is a 2014 copyright, how can the man page date be from last year?

88

Empty line never looks right in a man page

sys/modules/cc/cc_dctcp/Makefile
8

Patched twice! Only half of this is needed.

sys/netinet/cc/cc_dctcp.c
69

Looks like spacing is wrong for style #define<tab>...

77

Weird spacing here as well

193

Style suggest an empty line at beginning of a function without declarations

203

superfluous empty line

315

Another case of suggested blank one at beginning of function

sys/netinet/tcp_input.c
501

This is only used once; do we really need to make it a file-local inline function for that?

sys/netinet/tcp_output.c
175 ↗(On Diff #1095)

Indentation looks wrong? Is this just the web interface?

This revision now requires changes to proceed.Aug 29 2014, 1:18 PM
share/man/man4/cc_dctcp.4
2

Didn't get it. Do you mean it should be:
.\"-
?
I don't see it in any of the man pages that I checked.

32

I'll fix it.

88

Removed.

sys/modules/cc/cc_dctcp/Makefile
8

Bah! Removed.

sys/netinet/cc/cc_dctcp.c
69

Fixed.

77

Looks okay on terminal.

193

Fixed.

203

Fixed.

315

Fixed.

sys/netinet/tcp_output.c
175 ↗(On Diff #1095)

Yep, the web interface :-)

I'll wait for any other functional comments and then update the review.

hiren edited edge metadata.

Addressing comments from bz@

lstewart requested changes to this revision.Sep 1 2014, 8:33 AM
lstewart edited edge metadata.

Preliminary review suggests we'll need to involve Midori and/or Lars in order to rework some aspects of the module. I stopped reviewing in depth as some high level fundamental things need to get resolved first.

share/man/man4/cc_dctcp.4
39–46

I would suggest some editorial work on the man page as it reads oddly in places presumably because it was written by Midori who is a non native speaker e.g. congestion doesn't have a "depth" per se, "feedbacks".

Something like this might be a bit more succinct and informative for the intro paragraph:

"The DCTCP (data center TCP) congestion control algorithm aims to maximise throughput and minimise latency in data center networks by utilising the proportion of Explicit Congestion Notification (ECN) marks received from capable hardware as a congestion signal."

Sentence about RFC3168 does not belong nor does reference to concurrent data transfers.

sys/netinet/cc.h
151–152

This is not a suitable addition to the mod_cc KPI. Passing such specific arguments makes it far too DCTCP specific. The correct way to pass such information if required is within struct cc_var.

Also, it seems where this is hooked into the stack is a bit of a blanket catch all.

155

Why was this added to the KPI if the default stack behaviour of always marking ECT if peers have agreed to use ECN is also fine for DCTCP? Seems the diff can be reduced by removing this and associated code changes in the TCP stack.

sys/netinet/cc/cc_dctcp.c
305–306

If ECN was not successfully negotiated for the connection, should we abort using cc_dctcp and fall back to cc_newreno?

sys/netinet/tcp_input.c
1550–1576

This whole block of code should go into an inline ECN processing function.

It also seems that the hook really desired is a generic "any packet received" hook, rather than a "any packet received for a connection using ECN" hook. Would certainly make the mechanism more useful for other CC modules that wish to do weird and wonderful things.

This revision now requires changes to proceed.Sep 1 2014, 8:33 AM

Just a brief comment on the simple duplicate still.

sys/modules/cc/cc_dctcp/Makefile
2

This is still duplicated.

In D604#21, @bz wrote:

Just a brief comment on the simple duplicate still.

Wow. so weird. Here is how it looks on the machine:

% pwd
/home/hirenp/commit_head/sys/modules/cc/cc_dctcp
% svn diff

Index: Makefile

  • Makefile (revision 0)

+++ Makefile (working copy)
@@ -0,0 +1,7 @@
+# $FreeBSD$
+
+.PATH: ${.CURDIR}/../../../netinet/cc
+KMOD= cc_dctcp
+SRCS= cc_dctcp.c
+
+.include <bsd.kmod.mk>

Property changes on: Makefile


Added: svn:mime-type

-0,0 +1

+text/plain
\ No newline at end of property
Added: svn:keywords

-0,0 +1

+FreeBSD=%H
\ No newline at end of property
Added: svn:eol-style

-0,0 +1

+native
\ No newline at end of property
mon% cat Makefile

$FreeBSD$

.PATH: ${.CURDIR}/../../../netinet/cc
KMOD= cc_dctcp
SRCS= cc_dctcp.c

.include <bsd.kmod.mk>
%

share/man/man4/cc_dctcp.4
39–46

Manpage needs more love, totally agree. I didn't pay much attention in the hope to get back to it later. I also like your suggestion for intro para. I'll try to get it fixed up.

sys/netinet/cc.h
151–152

Sure, we can add these arguments to struct cc_var.

Blanket catch? Isn't it only applied when we match following condition?
if (tp->t_flags & TF_ECN_PERMIT)

i.e. only when we are doing ECN.

Am I misunderstanding something?

155

Good point, I'd need to ponder a bit more.

sys/netinet/cc/cc_dctcp.c
305–306

Yes, seems logical. I need to check how/if that case is handled.

sys/netinet/tcp_input.c
1550–1576

As I've pointed out in earlier comment, isn't this codeblock protected by:
if (tp->t_flags & TF_ECN_PERMIT) condition? Or Is it something I am not getting? Can you please clarify?

sys/netinet/cc.h
151–152

Right, catch every packet for ECN enabled connections. It is essentially a catch all and I don't like its generality. I think we need to either distill the essence of what dctcp needs to do with this hook into a more specific hook, or add a bit of metadata passed in to an existing hook in the cc_var struct. The alternative is to create a real catch all hook, which farms every packet received to the hook and gives the module ultimate flexibility to do "stuff" - something I'm not in favour of.

sys/netinet/cc/cc_dctcp.c
305–306

It isn't other than via benign neglect - ideally an algorithm should be able to remove itself from an active connection if some condition is not met, in this case ECN fails to be negotiated. Sounds like we need a new stack provided function a module can call to say "I give up ownership of this connection, please fall back to an appropriate default algorithm".

Are we abandoning this or going forwards? If forwards can we get Lars or Midori involved asap?

In D604#26, @gnn wrote:

Are we abandoning this or going forwards? If forwards can we get Lars or Midori involved asap?

I am working on it with Midori.

OK, if you're going to do a new patch/review then close this one, otherwise let us know and we'll keep this one open.

In D604#28, @gnn wrote:

OK, if you're going to do a new patch/review then close this one, otherwise let us know and we'll keep this one open.

We will add an updated patch here later this week.

hiren edited edge metadata.

Updating D604: DCTCP implementation.

This update includes:

  • ecnpkt_handler() is now a bit more specific wrt what it's doing. Instead of

passing around as arguments, TCP and IP header flags are now part of cc_var
flags.

  • ect_handler() is now removed as it's not changing anything from what we

have as default behavior.

  • Re-wording manpage a bit. (This may need more tweaking.)

Obligatory weekly bump!

I really want to commit this feature by end of this week if I do not hear back from anyone :-)

gnn edited edge metadata.
hiren updated this revision to Diff 3123.

Closed by commit rS277054 (authored by @hiren).