Add an ability to specify initial congestion window.
AbandonedPublic
Actions

Authored by hiren on Oct 9 2015, 5:51 PM.

Details

Reviewers

allanjude
rrs
wblock
lstewart
gnn
rwatson

Group Reviewers

manpages

Summary

Currently we have initial congestion window of 10mss by default.
Add a tunable to adjust that value on fly.

Please see
https://lists.freebsd.org/pipermail/freebsd-net/2014-March/038050.html

A top level net.inet.tcp.nonstandard knob is being added here and plan is
to add non-standard tunables like this one under it from now on.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 928
Build 928: arc lint + arc unit

Event Timeline

hiren updated this revision to Diff 9282.Oct 9 2015, 5:51 PM

hiren retitled this revision from to Add an ability to specify initial congestion window..

hiren updated this object.

hiren edited the test plan for this revision. (Show Details)

hiren added reviewers: lstewart, gnn.

hiren added a subscriber: network.

Herald added a subscriber: imp. · View Herald TranscriptOct 9 2015, 5:51 PM

hiren added reviewers: allanjude, rrs.Oct 9 2015, 5:52 PM

hiren added a reviewer: rwatson.

hiren removed a subscriber: imp.

I'd like a warning somewhere in docs or code that points out the risk of abusing this. That can be done after this is committed.

This revision is now accepted and ready to land.Oct 9 2015, 5:55 PM

In D3858#79721, @gnn wrote:

I'd like a warning somewhere in docs or code that points out the risk of abusing this. That can be done after this is committed.

Yes, I agree.
The point of having net.inet.tcp.nonstandard is also that only as far as I can see. When you are playing with it, you know you are on your own. :-)

I'll think about how should we express that in code/comments/docs. Any tips regarding that would be really useful to me.

j-nitrology.com added a subscriber: j-nitrology.com.Oct 9 2015, 10:32 PM

As a user, I'm a huge +1 on the ability to tune CWND's in particular, and customisability of kernel parameters in general, so thank you for pushing this @hiren.

As a user, I'm also -1 on it being placed somewhere special (*.nonstandard).

What exactly is a nonstandard.* tree trying to achieve? The feeling is that it's a message to the user that it is unsupported, or as @hiren mentioned, that "you're on your own". This raises a question:

In what sense are users "more on their own" modifying this tunable versus every/any other FreeBSD sysctl?

Further, in terms of extensibility, consistency and maintainability, how many other (existing and future) TCP/UDP/IP/* related tunables might be put in this non-standard .nonstandard sysctl branch?

But more importantly, on what basis will an individual committer decide whether or not it goes in there or not?

Should default socket buffer tunables be put there because a user can set too high/low values, effecting their own, and neighbouring networks?
Should tcp.msl be put there because a user can set it to 10ms ?
Should <insert sysctl> be put there because a user can degrade FreeBSD/TCP/forwarding/* performance by changing it?
Should <insert sysctl> be put there because a user could change the value and <insert infinite other impact scenarios here>

Instead of a 'special' place in the sysctl tree, why cant we just have the string "non-standard" or equivalent messaging in the sysctl description?

In fact, other than trying to absolve ourselves of responsibility for users modifying this parameter (not that we ever were), I don't believe this needs any special treatment at all.

For the moment I agree with Hiren's placement of this sysctl. Setting the cwnd in this way is actually dangerous if deployed on a large set of servers that face the Internet and therefore the sysctl tree itself ought to give notice. There will be other sysctls joining this one in the future as we continue to advance our work on the TCP stack and this is as good a place as any to keep such tweaks until they're fully tested in large deployments. Some such tweaks will always remain in this tree.

@gnn While I understand and tend to agree with the premise, I don't agree with the conclusion.

Not withstanding, unless other (more potentially devasting) existing sysctl's are also placed there, and even then, I can't see how this special case is warranted.

This is not to mention again the POLA violation, or the option to mark these experimental/nonstandard in the description, thereby satisfying the intent, without the associated downsides.

Also, was it not called *.experimental in previous versions of the patch? If we want a place to put 'temporary until widely tested' tunables, isn't that a better place/name?

What about making any experimental a primary branch ala experimental.tcp.foo ?

In D3858#79979, @koobs wrote:

@gnn While I understand and tend to agree with the premise, I don't agree with the conclusion.

Not withstanding, unless other (more potentially devasting) existing sysctl's are also placed there, and even then, I can't see how this special case is warranted.

This is not to mention again the POLA violation, or the option to mark these experimental/nonstandard in the description, thereby satisfying the intent, without the associated downsides.

Also, was it not called *.experimental in previous versions of the patch? If we want a place to put 'temporary until widely tested' tunables, isn't that a better place/name?

What about making any experimental a primary branch ala experimental.tcp.foo ?

If it is experimental, then put it under experimental.

In the case of the CWND, if it breaks conformance with the TCP RFCs, then it should stay under 'nonstandard'.

These seem like the best and easiest classifications. experimental for stuff that is from draft RFCs etc, and nonstandard for things that are for purposely going off spec.

In D3858#79981, @allanjude wrote:

In D3858#79979, @koobs wrote:

@gnn While I understand and tend to agree with the premise, I don't agree with the conclusion.

Not withstanding, unless other (more potentially devasting) existing sysctl's are also placed there, and even then, I can't see how this special case is warranted.

This is not to mention again the POLA violation, or the option to mark these experimental/nonstandard in the description, thereby satisfying the intent, without the associated downsides.

Also, was it not called *.experimental in previous versions of the patch? If we want a place to put 'temporary until widely tested' tunables, isn't that a better place/name?

What about making any experimental a primary branch ala experimental.tcp.foo ?

If it is experimental, then put it under experimental.

In the case of the CWND, if it breaks conformance with the TCP RFCs, then it should stay under 'nonstandard'.

These seem like the best and easiest classifications. experimental for stuff that is from draft RFCs etc, and nonstandard for things that are for purposely going off spec.

Thanks Allan,
Couldn't have said any better.

As George correctly mentioned, this tree is going to grow based on what else we do which doesn't comply to any existing standards/rfcs.

The whole point here is to give our users to ability to try something without recompiling their kernels.

jch added a subscriber: jch.Oct 12 2015, 6:48 AM

@koobs: The difference between TCP related sysctls and other OS sysctls is that TCP is by and large the product of IETF standards vs a bunch of ad hoc OS developers. By definition behaviour not covered in any of the IETF standards which relate to TCP are non-standard i.e. a clear indication to the user they are manipulating something which goes against documented wisdom. I am somewhat sympathetic to your argument that such sysctls should perhaps receive no special namespace - I was merely voicing a strong objection and alternative to Andre's "experimental" tree at the time it was floated and subsequently introduced. The experimental tree should absolutely die and "nonstandard" was my 2 second attempt at a sensible name for the tree - all gripes with the naming should be directed my way. The issue here is about giving the sys admin control over users/apps potentially asking the system to do crazy crap that can harm other network users. My thinking is that tcp.nonstandard.allowed adds an extra level of thought on behalf of the sysadmin before allowing.

@hiren: I really don't like having to AND V_tcp_nonstandard_allowed with each option to determine if they're allowed. Perhaps the patch can be reworked such that flags controlling non standard behaviour can only ever be set if nonstandard behaviour is allowed i.e. use a SYSCTL_PROC and fail to update the variable if allowed==0. As this mechanism is going to be with us for probably a long time, did you give any thought to implementing the more granular mechanism I posited in that old mailing list thread e.g. having the nonstandard.allowed sysctl take a list of ascii behaviours to provide granularity of control? I'm willing to be convinced otherwise, but I still think that is the right way forward with this. The control would then look like:

sysctl net.inet.tcp.nonstandard.allowed=initcwnd,crazyidea1
Which would then be translated as flags in a global bit field and be copied to the tcpcb on connection creation

Individual settings related to nonstanrd behaviours would be controlled via regular TCP sysctl options:
sysctl net.inet.tcp.initcwnd=32

Apps could setsockopt TCP_INITCWND to override the system default, and would get a errno back if the behvaiour was not currently enabled by the sysadmin. getsockopt would always work though and return the effective value.

And the test in the kernel for code paths based on non-standard options would be to check a bitmask to see if the initcwnd flag is set in the list of allowed nonstandard things to tweak. A uint32_t would give us room for 32 non standard behaviours which is probably a good starting point, although I could be convinced 64 bits is needed.

@lstewart No gripes, and thank you for the extra context :) I understand (and agree) that there is a need and desire to be clear that users are playing with something that isn't standardized. My only feeling was there might be another (better) way to achieve exactly that goal. I still think there is.

I've warmed 'a little' to a sub-tree, but it leaves open the issue of sysctl churn (and associated backwards compatibility issues) with tunables that start experimental/non-standard, which are subsequently standardized. This mechanism only makes sense for things that will never be.

Let's be careful not to conflate standard/non-standard with our system defaults. For some more context, Andre's intent for the experimental tree was to house things which were published within the IETF as experimental or draft status vs standards track. I argue that non-standard is a more appropriate grouping and in fact a superset of experimental, as it also encompasses anything we (the FreeBSD OS) choose to do which is not related to efforts within the IETF. If we choose to set the system default initial cwnd to 10 in a given branch of FreeBSD (as we have even though it is experimental as far as the IETF is concerned), that is orthogonal to standards compliance and orthogonal to whether an admin chooses to let an app request a different value via the tcp.nonstandard.allowed mechanism, which we are putting in place as a hoop to jump through to hopefully make people think twice about before changing.

As for things which are non standard becoming standard and possible sysctl churn, firstly that happens very rarely as TCP standards track development is very slow and conservative. Secondly, sysctl has a way to hide a node from the tree such that you can still write/read from it if you know it's there e.g. it got coded into a script or app, but is not returned by a walk of the tree. Thirdly, with all of that being said, I think the middle ground of only having tcp.nonstandard.allowed and then have options specific to behaviour controlled by the allowed option be under the regular tcp tree is a good compromise per the example in my previous comment. The only churn then when for example initcwnd=20 becomes standard is to change the default value. We may also have to handle the option no longer being in the list of things to allow if it's a feature that has become standard and we flip the switch to make it default for all connections, which may or may not be an issue depending on how the SYSCTL_PROC is coded and how consumers respond to an errno being returned after trying to set something which is no longer in the allowed list because it is considered standard and default allowed + enabled.

For the 'allowed' sysctl, maybe something like kern.random.harvest does:

#sysctl kern.random.harvest
kern.random.harvest.mask_symbolic: UMA,FS_ATIME,SWI,INTERRUPT,NET_NG,NET_ETHER,NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED
kern.random.harvest.mask_bin: 11111111111
kern.random.harvest.mask: 2047

#sysctl kern.random.harvest.mask=2015
kern.random.harvest.mask: 2047 -> 2015

#sysctl kern.random.harvest
kern.random.harvest.mask_symbolic: UMA,FS_ATIME,SWI,INTERRUPT,NET_NG,[NET_ETHER],NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED
kern.random.harvest.mask_bin: 11111011111
kern.random.harvest.mask: 2015

In D3858#80538, @allanjude wrote:

For the 'allowed' sysctl, maybe something like kern.random.harvest does:

[snip]

Yup, that would work nicely.

Just to quickly comment on sysctl naming: I don't like putting things like "nonstandard" or "experimental" in sysctl names, because what is standardised or experimental frequently changes. Sysctl names are effectively ABIs, since the names get put in loader.conf, sysctl.conf, etc, and if you change the sysctl name in the kernel, you may break those configuration lines. For sysctl.conf, you at least get an error, but if a loader tunable changes names, it just silently stops working. There are other ways to document whether something is standardised -- e.g., source-code commands and in the man page -- that are probably better. We already have a TCP man page documenting a number of sysctls (I think?) and really the information should just be there, with suitable caveats. So I'd vote (if I were asked to vote) for just naming the sysctl the most descriptive thing.

Based on IRC discussion it seems that @emaste and @allanjude are okay with what @rwatson is proposing. Which I think is something like this:
Create a new sysctl net.inet.tcp.initcwnd and have all the "danger" associated with it in man 4 tcp and in some smartly crafted sysctl description string.

In D3858#80741, @hiren wrote:

Based on IRC discussion it seems that @emaste and @allanjude are okay with what @rwatson is proposing. Which I think is something like this:
Create a new sysctl net.inet.tcp.initcwnd and have all the "danger" associated with it in man 4 tcp and in some smartly crafted sysctl description string.

Yes, I think that's sensible. I don't think putting "nonstandard" or even "setting_this_will_blow_up_the_internet" in either the sysctl name for the setting itself, or a separate enable/disable, will help. People will cut and paste it from a random google search anyway and it will become magic FreeBSD "go-fast" lore.

If it's so dangerous that a warning in the man page and sysctl description is not sufficient then it ought to be hard to activate (e.g. requiring a kernel recompile).

So in the new world order we have net.inet.tcp.initcwnd=10, no master control switch and net.inet.tcp.experimental.* is no more. I'm an app developer and I come in and setsockopt TCP_INITCWND=100. Are we comfortable with saying the app developer knows best and not giving the sysadmin a mechanism to control? I don't care about people stupidly copying sysctl statements from the Internet because it requires a conscious choice for change and they have admin rights on the system, but are we comfortable with not having a mechanism to empower the sysadmin to control per vnet per socket changes to things which can have a non trivial influence beyond the socket and system?

I haven't spent long thinking about it as I'm in thesis mode, but personally I think I'm ok with and would welcome such a new world order. However, I would note it is a departure from the way we've treated such matters in the past, so there should be some acknowledgement and acceptance of that.

It's worth noting that you could set the initcwnd via the net.inet.tcp.slowstart_flightsize sysctl from r50673 16 years ago until it was removed in r226447. 9.2 was the first release without it, and we used it heavily at Limelight during its tenure. Its long lived existence and removal not receiving much notice should at least ease concerns a bit about users blowing things up.

While here my unsolicited opinion tends to agree with the most recent comments. It won't prevent the copy pasta turbo button users we might be concerned with. While auditing your sysctls through each release is a Good Thing, it is a minor annoyance having things change out from under you and something many users might not catch.

In D3858#80831, @lstewart wrote:

So in the new world order we have net.inet.tcp.initcwnd=10, no master control switch and net.inet.tcp.experimental.* is no more. I'm an app developer and I come in and setsockopt TCP_INITCWND=100. Are we comfortable with saying the app developer knows best and not giving the sysadmin a mechanism to control? I don't care about people stupidly copying sysctl statements from the Internet because it requires a conscious choice for change and they have admin rights on the system, but are we comfortable with not having a mechanism to empower the sysadmin to control per vnet per socket changes to things which can have a non trivial influence beyond the socket and system?

Would a net.inet.tcp.initcwnd_max not make sense to gain this control without adding additional hoops? If you wish to basically neuter the functionality of the setsockopt (which may make sense as the default), you set the max to match. We have an internal patch that implements a setsockopt initcwnd that we use fairly heavily via this method with good results.

net.inet.tcp.initcwnd=10
net.inet.tcp.initcwnd_max=10

So Lawrence,

I guess the question is do we allow a user to shoot-themselves and everyone else in the
foot that is on their network (when they do setsockopt(TCP_INITCWND=1000))?

On the one hand the thought of letting you do what you want may not be a bad approach (if it
is such a new-world-order.. the libertarian in me likes that)... However when the whole network
you are at collapses because you have this idiot running this on all his machines what then?

I have mixed feelings about this.. and in general we should probably keep with the old world
order.

I think I do agree with you on your approach to non-standard. Yeah it sounds weird to
a non-IETF'er but it seems like the right thing. And I too second your approach to make
it so you can't set things if the "non_standard" knob is not enabled. Of course that
means you have to wipe all the non-standard settings if someone disables the knob.. but
thats an implementation detail that I am sure Hiren can handle :-)

FWIW, consider the case that these days, it is often developers themselves that are deploying/provisioning/configuring the target servers

My "BOFH ops" side likes something about Jasons (@j-nitrology.com) init_cwnd{_max} idea to limit abuse (at the app layer), apart from the fact yet another sysctl is added.

I'm still not sure it achieves the kind of seat-belting that is purportedly required though, which I'm still not convinced on.

Additionally it poses another barrier to someone wanting to bump the value (unless suitable error messaging is presented, pointing them to the _max variable.)

This may well be a decent compromise.

Koobs:

The big difference here is if I set a typical sysctl in FreeBSD I can easily shoot-myself and
thats ok. I pay the price for my stupid setting of the sysctl. The big difference with some of
the TCP settings is that not only I, but those connected to my area of the Internet also will
pay the price.

The non-standard, though maybe arguably a bad name (but it sounds right to my IETF ears), is
a way of giving the user a signal that *not* only are they possibly shooting themselves in the
foot they are more than likely going to shoot others around them. I don't know if there is
a better name for it but lines in a manual page probably are not going to call the "red flag" attention
that the knob needs.

How does a developer classify these, thats pretty easy if you are dealing with an IETF standard, look
at the top of the document, does it say "Standards track" or does it say "Informational" or "Experimental".
So for ietf related things its not hard to decide.

Hmm maybe it could be something like
net.inet.tcp.ietf_standard
or
net.inet.tcp.ietf_nonstandard

:-)

@nrr I understand and agree with the desire and the distinction between impact on self-only vs others, but the reality is that the proposed mechanisms (all of them, including a separate sysctl tree, man page warnings, sysctl descriptions) do not actually solve or prevent that problem. More precisely and importantly, they can not.

The user will choose to set whatever value they want, no matter what mechanism we use to brace, protect, buffer, prevent-foot-shooting, all the way through to modifying the code and recompiling the kernel. They have to do that right now (unnecessarily) until this change lands. Us making it easier for anyone to experiment with FreeBSD in general, and with TCP parameters specifically, *without* jumping through hoops is the actual unsung goal of this change.

With the utmost genuine respect, FreeBSD isn't the IETF. I'm all for educating, informing and guiding users with the best information so they can leverage the system 'well', but not at the expense of making the user experience frustrating, obscure, or not just plain easy.

I think it's time to bring this to a close. Let's go with the suggestion from rwatson and others. I also agree that a tunable max limit isn't a bad idea but should prevent committing and closing this particular review.

@lstewart and @rrs : While I see your IETF point of view, I also think @rwatson has a valid point about sysctls being ABIs and @koobs point of keeping things simple.

@j-nitrology.com's suggestion of having a max upper-bound indeed seems like a good compromise to stop unintended abuse. Also thanks for reminding about existence of flightsize sysctl that we removed in 9.2. So apparently this is not a new thing but putting a functionality back into FreeBSD that we removed.

So, here is what I am going to do:
net.inet.tcp.initcwnd (initialized to 10)
net.inet.tcp.initcwnd_max (initialized to 64)
(Again, both are tunables that can be changed at run time)
And an update to tcp(4) man page.

I'll update the diff tomorrow if I don't hear any strong objection by then.

I have to be brief and can't respond to each comment as I'm about to hit the road for a wedding 7 hours away, but in short I disagree with having a max. If we're going to allow arbitrary settings of initcwnd regardless of having a safety belt to limit whether an unprivileged user can request a different value, it should be unbounded. We can always add the safety belt in later (Robert's and others' concerns seem to have misunderstood the nature of the safety belt proposal w.r.t. sysctl churn but we can revisit another time).

Oh, and initcwnd should be in bytes, not MSS.

In D3858#81164, @lstewart wrote:

I have to be brief and can't respond to each comment as I'm about to hit the road for a wedding 7 hours away, but in short I disagree with having a max. If we're going to allow arbitrary settings of initcwnd regardless of having a safety belt to limit whether an unprivileged user can request a different value, it should be unbounded. We can always add the safety belt in later (Robert's and others' concerns seem to have misunderstood the nature of the safety belt proposal w.r.t. sysctl churn but we can revisit another time).

I can leave the MAX aside right now but I don't understand your reasoning. Don't we cap values with such MAX defines all the time?

Or you don't like the fact that it will change a super-high value (anything > MAX set by the app) to MAX underneath the app developer?

Oh, and initcwnd should be in bytes, not MSS.

Sure but we specify that as number of packets/MSS right?
for initcwnd of 10, we currently we do:

tp->snd_cwnd = min(10 * tp->t_maxseg, max(2 * tp->t_maxseg, 14600));

and new proposal is to do:

tp->snd_cwnd = min(V_tcp_cwnd * tp->t_maxseg, max(2 * tp->t_maxseg, V_tcp_cwnd * 1460));

Which will do MSS to bytes conversion.

Or are you proposing something that I am not understanding?

Lawrence and I had an IRC chat after this and here is the summary:

In D3858#81168, @hiren wrote:

In D3858#81164, @lstewart wrote:

I disagree with having a max. If we're going to allow arbitrary settings of initcwnd regardless of having a safety belt to limit whether an unprivileged user can request a different value, it should be unbounded.

There is no actual "max" limit for this. All limits depend on capacity of a link. So Lawrence's point is to live with whatever admin decides to set.
I am okay with that.

Oh, and initcwnd should be in bytes, not MSS.

Lawrence suggested that there is a drawback in the current approach of specifying initcwnd in MSS. If a connection starts out with lower than usual MSS, initcwnd would also come out to be lower than expected.

It should be specified in both number of MSS and bytes. And we should pick whatever is larger. In simplest form, something like:
max(initcwnd_segs * tp->t_maxseg, initicwnd_bytes)

But this still needs to thought out to make sure we cover all the cases and I suggest we take this up as a separate review/commit.

Coming back to the current patch, new plan is:

net.inet.tcp.initcwnd (initialized to 10)
And an update to tcp(4) man page.

Question: What should we do about existing net.inet.tcp.experimental.initcwnd10?
Axe it or keep it with a warning or something in the description string?

hiren edited edge metadata.Oct 23 2015, 12:35 AM

hiren added a subscriber: transport.

On net.inet.tcp.experimental.initcwnd10: if it's not shipped in a release, we can remove it as a bit of interface stability in 11-CURRENT. If it's not a tunable, then it will generate a warning on boot if set in sysctl.conf after kernel support is removed, as the sysctl won't be present. If it's also a tunable, there won't be a warning, and it will silently fail to operate. So, hopefully it's not in a release? If it is, we probably need something in the release notes about its removal, etc.

In D3858#83124, @rwatson wrote:

On net.inet.tcp.experimental.initcwnd10: if it's not shipped in a release, we can remove it as a bit of interface stability in 11-CURRENT. If it's not a tunable, then it will generate a warning on boot if set in sysctl.conf after kernel support is removed, as the sysctl won't be present. If it's also a tunable, there won't be a warning, and it will silently fail to operate. So, hopefully it's not in a release? If it is, we probably need something in the release notes about its removal, etc.

initcwnd10 in already in 10. And it's an ON/OFF runtime tunable which is set to ON by default.

So if understand your comment correctly, we remove it from -head and add mention it in release notes?

Also, behavior is not going to change as by default out of the box initcwnd would still be 10.

New approach based on comments:
Remove existing experiemtal initcwnd sysctl, add new initcwnd_in_mss and update tcp(4) manpage.

This revision now requires review to proceed.Oct 23 2015, 8:19 PM

Herald added a reviewer: manpages. · View Herald TranscriptOct 23 2015, 8:19 PM

hiren added a reviewer: wblock.Oct 23 2015, 9:31 PM

wblock added inline comments.Oct 25 2015, 3:21 PM

share/man/man4/tcp.4
458	s/Enable/Enable the/
459	s/Default/The default/ Not sure whether "kept" is necessary.
460	The dependencies in this sentence can be read several ways ("link capacity as it regulates"). Reordering it could help. I think but am not positive this works: Caution: .It Va initcwnd_in_mss regulates the burst of packets allowed to be sent in the first RTT. The value should be relative to the link capacity. The second sentence needs expansion. The original says "chosen carefully", but what that means here is not explained. An example could help, or something like "Start with small values for lower-capacity links." Or suggest a way to determine reasonable actual values.
461	Start new sentences on new lines: regulates the burst of packets allowed to be sent in the first RTT. Large bursts
462	Hmm. s/drops/loss/ ?

@wblock Thanks for your inputs, I agree to all but one. Updating the diffs in a bit.

Improve the manpage changes as suggested by wblock.

Herald edited edge metadata. · View Herald TranscriptOct 26 2015, 6:57 AM

In D3858#83098, @hiren wrote:

Lawrence and I had an IRC chat after this and here is the summary:

In D3858#81168, @hiren wrote:

In D3858#81164, @lstewart wrote:

I disagree with having a max. If we're going to allow arbitrary settings of initcwnd regardless of having a safety belt to limit whether an unprivileged user can request a different value, it should be unbounded.

There is no actual "max" limit for this. All limits depend on capacity of a link. So Lawrence's point is to live with whatever admin decides to set.
I am okay with that.

Oh, and initcwnd should be in bytes, not MSS.

Lawrence suggested that there is a drawback in the current approach of specifying initcwnd in MSS. If a connection starts out with lower than usual MSS, initcwnd would also come out to be lower than expected.

It should be specified in both number of MSS and bytes. And we should pick whatever is larger. In simplest form, something like:
max(initcwnd_segs * tp->t_maxseg, initicwnd_bytes)

min, not max, as I was arguing to preserve the intent of RFC 3390's byte/segment burst limit calculation.

But this still needs to thought out to make sure we cover all the cases and I suggest we take this up as a separate review/commit.

Coming back to the current patch, new plan is:

net.inet.tcp.initcwnd (initialized to 10)
And an update to tcp(4) man page.

Question: What should we do about existing net.inet.tcp.experimental.initcwnd10?
Axe it or keep it with a warning or something in the description string?

I vote to axe the entire tcp.experimental tree and document in the release notes. For others, the history is that we shipped 9.3 and 10 with it (I protested to Andre and the RE team but was unsuccessful convincing them that this was a mistake) so we're stuck with that sysctl in the old branches, but IMO it should never have been there in the first place.

In D3858#83634, @lstewart wrote:

In D3858#83098, @hiren wrote:

Oh, and initcwnd should be in bytes, not MSS.

Lawrence suggested that there is a drawback in the current approach of specifying initcwnd in MSS. If a connection starts out with lower than usual MSS, initcwnd would also come out to be lower than expected.

It should be specified in both number of MSS and bytes. And we should pick whatever is larger. In simplest form, something like:
max(initcwnd_segs * tp->t_maxseg, initicwnd_bytes)

min, not max, as I was arguing to preserve the intent of RFC 3390's byte/segment burst limit calculation.

Ok. I'll create another review/discussion for this.

But this still needs to thought out to make sure we cover all the cases and I suggest we take this up as a separate review/commit.

Coming back to the current patch, new plan is:

net.inet.tcp.initcwnd (initialized to 10)
And an update to tcp(4) man page.

Question: What should we do about existing net.inet.tcp.experimental.initcwnd10?
Axe it or keep it with a warning or something in the description string?

I vote to axe the entire tcp.experimental tree and document in the release notes. For others, the history is that we shipped 9.3 and 10 with it (I protested to Andre and the RE team but was unsuccessful convincing them that this was a mistake) so we're stuck with that sysctl in the old branches, but IMO it should never have been there in the first place.

Thanks. I'll do that.

I am also planning to change the name from initcwnd_in_mss to initcwnd_segs. It seems better.

Planning to commit this in a day or so.