Page MenuHomeFreeBSD

First step to modularize TCP including two added tcp modules that can be played with.
ClosedPublic

Authored by rrs on Oct 31 2015, 11:39 PM.

Details

Summary

As previously discussed on the transport lists and call, here is a first cut
at modularizing the TCP stack. The functions still do not
include
tcp_input
or
pru_usrreq

Most all other functions are accounted for however (enough that we can try some different TCP functionality without
impact to the main TCP code).

Test Plan

Pound on it at NF as I develop several missing RFC's and other fun such at NF.

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sys/dev/cxgb/ulp/tom/cxgb_listen.c
756 ↗(On Diff #9909)

Why store tp locally? There is a cost to adding it to the stack.

sys/dev/cxgbe/tom/t4_listen.c
1551 ↗(On Diff #9909)

Why store tp locally? There is a cost to adding it to the stack.

sys/netinet/tcp_input.c
233 ↗(On Diff #9909)

This is already defined in tcp_var.h. Why add a non-static declaration to tcp_input.c?

1768 ↗(On Diff #9909)

It looks like there is an accidental whitespace change here.

sys/netinet/tcp_subr.c
152 ↗(On Diff #9909)

It looks like there is an extra line here.

331 ↗(On Diff #9909)

You really only need a read lock here, correct?

394 ↗(On Diff #9909)

You should be able to increment at by just using strlen(f->fb->tcp_block_name). Or, you can use strlcpy(), which gives you size for free.

Alternatively, you could print this out line-by-line with SYSCTL_OUT(). You could simply allocate an 80-character buffer and fill it with data for each line. You could even print the stats that way.

405 ↗(On Diff #9909)

When will buffer ever be NULL?

431 ↗(On Diff #9909)

Why n+2?

555 ↗(On Diff #9909)

!= NULL?

all the rest I will fix shortly

sys/netinet/tcp_syncache.c
824 ↗(On Diff #9909)

Which is fine but it means you extend it so it continues to be used.. which is
why I did it this way.. I have no problems either way :-)

sys/netinet/tcp_timer.c
957 ↗(On Diff #9909)

I am going to fix all this code, the whole stop case is kludgy due to the
lack of a callout_async_drain() function. I have that review going elsewhere
and once we get that in place I am going to switch all this kludgy async drain
via timers out to use it. That will then make the stop have to return the number
of async_drains to expect...

sys/dev/cxgb/ulp/tom/cxgb_listen.c
756 ↗(On Diff #9909)

Ok I have changed this, but do you really think that the compiler with
-O2 would not optimize it.. I do things to *make it clear* whats going on
in code. No matter if you do

{

tp = foomacro()
 do_someting(tp)

}

or

{

do_something(foomacro())

}

The compiler underneath will optimize out such behavior.. and I do
think the old way illustrates clarity in what is going on with the code but
I am ambivalent about it in general so I changed it :-)

sys/dev/cxgbe/tom/t4_listen.c
1551 ↗(On Diff #9909)

same reason as above though again I changed it .. I don't like code
that is less clear.. but I am less caring about it :-)

sys/netinet/tcp_input.c
233 ↗(On Diff #9909)

That may have been a cut and paste error since I have no memory of
adding it explicitly :-)

1768 ↗(On Diff #9909)

ack

sys/netinet/tcp_subr.c
152 ↗(On Diff #9909)

ack

331 ↗(On Diff #9909)

yep good catch :-)

394 ↗(On Diff #9909)

There are always lots of alternatives.. I will take your strlen(f->) suggestion :-)

405 ↗(On Diff #9909)

your right .. TSNH :-)

431 ↗(On Diff #9909)

Because in theory it can grow after the return, so I always give a couple of more.
Its cheap and you less likely have a race where you don't get all that are there.

555 ↗(On Diff #9909)

I don't see the need to do != NULL, why?

Is not

if (blk->tcp_timer_stop_all)

the same thing has

if (blk->tcp_timer_stop_all != NULL)

And I have seen both methods used throughout the kernel.

578 ↗(On Diff #9909)

sure

604 ↗(On Diff #9909)

I will move back the lock, since we are going to wlock anyway.

611 ↗(On Diff #9909)

ok

1086 ↗(On Diff #9909)

I don't see that you can and assure that you get the
right pointers ref-count updated correctly!

sys/netinet/tcp_syncache.c
834 ↗(On Diff #9909)

ok

sys/netinet/tcp_usrreq.c
1384 ↗(On Diff #9909)

true

1620 ↗(On Diff #9909)

Ok, it adds a *lot* of code into a tiny function so why not (even though
I think such a thing would be a development error).

This updates all things from Jonathan's comments..

hiren added a reviewer: hiren.
hiren added a subscriber: hiren.

Randall,

Thanks a lot for this work. We need this so badly :-)

It looks good to me. There are a few tiny comments inline and a couple general comments:

  1. Documentation: We need to document this new-world-order in a better way. So anyone willing to add new things can follow along easily.
  2. Function header comments
  3. possibly a general comments block right at the start of tcp_input.c explaining the general code flow. (This may/can very well be a copy/paste of your commit-log message for this commit ;-))
  1. Out dated base: You seem to be using older (probably stable/10? ) as the base of this changes. You should update that to latest -head for obvious reasons. :-)

I am so excited to have this finally in our stack!

modules/tcp/fastpath/fastpath.c
187 ↗(On Diff #10066)

These comments where updated on -head by r264063. You may want to incorporate that to stay consistent with tcp_input.c's version of DELAY_ACK().

196 ↗(On Diff #10066)

It'd be great if you can add a few line of function header comments explaining the motivation and how this is fast or works better.
(Mainly for posterity).

204 ↗(On Diff #10066)

'have' or 'are' probably?

353 ↗(On Diff #10066)

Again a few lines of function header comments will help. :-)

426 ↗(On Diff #10066)

This comments block is missing r269766.

I think you should update your working branch to latest -head so we don't miss such tiny things :-)

rrs edited edge metadata.

Adds and updates comments. I did find a small difference below
the comments in the auto-buffer sizing so that too was fixed.

modules/tcp/fastpath/fastpath.c
187 ↗(On Diff #10066)

Ok, but thats the one thing about a clone (like this) keeping it in sync
is always fun. A better choice once this is all in is to break out the slow
path into a fixed function and have the module called from there.

I will sync the comment though :-)

196 ↗(On Diff #10066)

done

204 ↗(On Diff #10066)

fixed

353 ↗(On Diff #10066)

done

426 ↗(On Diff #10066)

Besides comment block we may well be missing functionality. The main
purpose of having these here are to demonstrate the module functionality.
I will look and see if its just comments, if so I will clone it over, if not
then I will think about if I want to spend the time updating these.
(This again begs that we get the tcp_do_segment() itself broken down
into smaller reusable chunks so you don't have this type of maintenance issues.

rrs edited edge metadata.

In another review, Han's commented on having source code in modules. Since
it probably is not a good idea lets move them to a sub-dir called netinet/tcp_stacks
and then just arrange the modules makefile to point over there.

Ok after reviewing a bit of the tcp_function_set routine, I updated it to
have a default no arguments to just show the default, the -l to
show the default with a *. No changes to the TCP stack itself here.. I think
we are about ready.. :-)

hiren edited edge metadata.

Looks good to me!

sys/dev/cxgb/ulp/tom/cxgb_listen.c
756 ↗(On Diff #9909)

Ok I have changed this, but do you really think that the compiler with

-O2 would not optimize it..

I would hope so, but I've seen compilers fail to optimize out assignments like this. In one case, we picked up a several percent performance gain by removing a variable assignment like this from a function in a critical path in one of our internal kernel modules.

So, while it is nice to trust the compiler to optimize, I prefer to help the compiler along when we can do so without sacrificing clarity. I think this is one of those cases.

sys/netinet/tcp_subr.c
555 ↗(On Diff #9909)

I don't see the need to do != NULL, why?

style(9) says to use != NULL. Someone called me on it (post-commit) recently. I suspect you'll get the same, especially as you compare with == NULL just lines later.

I'm trying to save you the post-commit style comments. :-)

Hi Randall,

I have a few specific comments in-line, and two other questions:

  • It looks like you dropped out the new user-space program to control the function selection. Was that intentional? If so, do you want me to rewrite the sysctl_net_inet_list_available() function so it will list the functions and inpcb information in the way you had the program list them?
  • I think it is a convention (although I don't know that it is required) to prefix structure members with an abbreviation of the structure type (e.g. td->td_ucred, tp->t_inpcb, etc.). Have you considered following that convention for the new structures you added?

Also, I had a thought about testing this. In particular, rwatson@ had indicated that he wondered how it would impact branch prediction, etc. If we can find a machine with a replicable TCP testing load, we should be able to use PMC to count various statistics (including the # of branch prediction misses, etc.) both with and without this change. I don't know if that will completely satisfy rwatson's concerns, but it will help quantify the present impact of this change.

Again, I really appreciate your work on this. When we're talking about style matters, you know everything else must be pretty good. ;-)

Jonathan

netinet/tcp_subr.c
351

Does this require a write lock to change? (I honestly don't know; what does the lock cover?)

401

Can't this be simplified to (at + 1)? Saves another strlen() walk of the buffer.

617

Why set this to NULL just before freeing?

netinet/tcp_syncache.c
823

I'm not sure we need to check for (blk != NULL) here. blk should always be non-NULL.

If you are concerned blk will be NULL we probably need to add a KASSERT.

netinet/tcp_subr.c
351

Well it probably should be a write lock.. I will change.

Build universe as also found a few problems.. will update.

401

sure.

617

Because I am paranoid :-)

netinet/tcp_syncache.c
823

ok

rrs edited edge metadata.

This updates for the latest comments and adds back the missing
user program that I inadvertently took out of the diff (by not
diffing in the right place :-D)

  • I still wonder why we need a new user-space program, when you could use sysctls to do the same thing.
  • I still wonder about the convention for naming structure members (see previous comment; e.g. tfs_pcbcnt vs. pcbcnt).
sys/netinet/tcp_subr.c
347 ↗(On Diff #10127)

I think we need a check for TCP_FUNC_BEING_REMOVED here. I thought it was there in a previous rev. Maybe I'm wrong.

usr.sbin/tcp_function_ctrl/tcp_function_ctrl.c
60 ↗(On Diff #10127)

"Error retrieving list size"? (Also, this isn't a list of "default functions".)

93 ↗(On Diff #10127)

name must be non-NULL at this point.

I still wonder why we need a new user-space program, when you could use sysctls to do the same thing.

  • Well if you look at an email Hiren asked me .. how do I tell the number of pcb's that are there attached to it? The -l option does that for you.. the rest is just fluff... I am pretty set on having a user space function at this point, it also allows us to add more things in the future. So stop wondering and go with the flow :-)

I still wonder about the convention for naming structure members (see previous comment; e.g. tfs_pcbcnt vs. pcbcnt).

  • I am not a picky guy, tell me what names you want and I will change them to that. I want to close this out and get it off my plate however so hurry up and specify what names you want. :-)
sys/netinet/tcp_subr.c
347 ↗(On Diff #10127)

Probably a good idea..

usr.sbin/tcp_function_ctrl/tcp_function_ctrl.c
60 ↗(On Diff #10127)

ok

93 ↗(On Diff #10127)

ok

In D4055#87001, @rrs wrote:

I still wonder why we need a new user-space program, when you could use sysctls to do the same thing.

  • Well if you look at an email Hiren asked me .. how do I tell the number of pcb's that are there attached to it? The -l option does that for you.. the rest is just fluff... I am pretty set on having a user space function at this point, it also allows us to add more things in the future. So stop wondering and go with the flow :-)

Yes, but there is nothing that the user-space program does that could not be done through a sysctl. You could trivially modify the net.inet.tcp.functions_available sysctl to also return the number of pcb's. In fact, I have offered to modify it to return the exact output the user-space program currently produces. At that point, we could do away with the program and the net.inet.tcp.functions_list_detail sysctl.

I am not opposed to a user-space program when it adds value. However, there is a cost to adding additional code that must be maintained. Here, the code doesn't seem to add value over what the sysctls provide (or could provide, with trivial changes).

You talk of it allowing us to do more in the future. Once we are ready to do things that require a user-space program, we can add one then. There is no reason to rush to add one at this time. If we add one now, it will require maintenance and we will need to consider backwards compatibility when we try to add new features to the program.

If you still think we really need a user-space program, I would ask that you at least get a second opinion on it. I could be wrong. Getting additional input may help clarify the correct direction.

I still wonder about the convention for naming structure members (see previous comment; e.g. tfs_pcbcnt vs. pcbcnt).

  • I am not a picky guy, tell me what names you want and I will change them to that. I want to close this out and get it off my plate however so hurry up and specify what names you want. :-)

Just a quick proposal:

struct tcp_function_block {
        char tfb_name[TCP_FUNCTION_NAME_LEN_MAX];
        int        (*tfb_tcp_output)(struct tcpcb *);
        void        (*tfb_tcp_do_segment)(struct mbuf *, struct tcphdr *,
                            struct socket *, struct tcpcb *,
                            int, int, uint8_t,
                            int);
        int     (*tfb_tcp_ctloutput)(struct socket *so, struct sockopt *sopt,
                            struct inpcb *inp, struct tcpcb *tp);
        /* Optional memory allocation/free routine */
        void        (*tfb_init)(struct tcpcb *);
        void        (*tfb_fini)(struct tcpcb *);
        /* Optional timers, must define all if you define one */
        void        (*tfb_tcp_timer_stop_all)(struct tcpcb *);
        int        (*tfb_tcp_timers_left)(struct tcpcb *);
        void        (*tfb_tcp_timer_activate)(struct tcpcb *,
                            uint32_t, u_int);
        int        (*tfb_tcp_timer_active)(struct tcpcb *, uint32_t);
        void        (*tfb_tcp_timer_stop)(struct tcpcb *, uint32_t);
        volatile uint32_t tfb_refcnt;
        uint32_t  tfb_flags;
};

struct tcp_function {
        TAILQ_ENTRY(tcp_function) tf_next;
        struct tcp_function_block *tf_fb;
};
sys/netinet/tcp_timer.c
862 ↗(On Diff #10127)

should call tp->t_fb->tcp_timer_activate.

Jonathan

I can be convinced *not* to have a user space program (though that is exactly what unix was founded upon, a
bunch of small programs that did things for you that could be put together.. I am an old timer so I harken
from those days)..

Send me the code you think would make the sysctl work and you can convince me.. assuming the output
of such a thing would be decent to look at.

sys/netinet/tcp_timer.c
862 ↗(On Diff #10127)

good catch.

rrs edited edge metadata.

This updates the latest round from Jonathan.

Note that the tfb_timer_stopall was changed to an int return. With the
new callout_async_drain() it will need to return the number of callout_async_drain() calls
that returned 0 so the drain code will know how many drain calls will be made.
(adding that code will get rid of the need for the timers_left() call too).

In D4055#87053, @rrs wrote:

Send me the code you think would make the sysctl work and you can convince me.. assuming the output
of such a thing would be decent to look at.

Sent via direct email. Thanks for considering it!

Adds Jonathan's code (with one minor modification to make it print right).

jtl edited edge metadata.

I think the infrastructure to manage TCP stacks is acceptable.

There are still some style nits I pointed out (primarily, inconsistent use of '(ptr)' vs. '(ptr != NULL)', and the reverse of that).

I also did not review the kernel module at all (and, rrs@ said he didn't expect that, since it was just an example). I think it should probably be committed to some "examples" or "sample" directory until someone reviews it and the community "adopts" it as something we want to support.

With those caveats, I think the infrastructure to manage TCP stacks is acceptable.

I think we all owe Randall some thanks, especially for working through so many comments. :-)

This revision is now accepted and ready to land.Nov 13 2015, 8:05 PM
hiren edited edge metadata.
In D4055#87218, @jtl wrote:

I think we all owe Randall some thanks, especially for working through so many comments. :-)

A huge 'Thank You' to both of you!
This is a great piece of work.

This revision was automatically updated to reflect the committed changes.