Page MenuHomeFreeBSD
Feed Advanced Search

Oct 27 2015

lstewart added a comment to D3858: Add an ability to specify initial congestion window..
In D3858#83098, @hiren wrote:

Lawrence and I had an IRC chat after this and here is the summary:

In D3858#81168, @hiren wrote:

I disagree with having a max. If we're going to allow arbitrary settings of initcwnd regardless of having a safety belt to limit whether an unprivileged user can request a different value, it should be unbounded.

There is no actual "max" limit for this. All limits depend on capacity of a link. So Lawrence's point is to live with whatever admin decides to set.
I am okay with that.

Oh, and initcwnd should be in bytes, not MSS.

Lawrence suggested that there is a drawback in the current approach of specifying initcwnd in MSS. If a connection starts out with lower than usual MSS, initcwnd would also come out to be lower than expected.

It should be specified in both number of MSS and bytes. And we should pick whatever is larger. In simplest form, something like:
max(initcwnd_segs * tp->t_maxseg, initicwnd_bytes)

Oct 27 2015, 12:36 AM

Oct 15 2015

lstewart added a comment to D3858: Add an ability to specify initial congestion window..

I have to be brief and can't respond to each comment as I'm about to hit the road for a wedding 7 hours away, but in short I disagree with having a max. If we're going to allow arbitrary settings of initcwnd regardless of having a safety belt to limit whether an unprivileged user can request a different value, it should be unbounded. We can always add the safety belt in later (Robert's and others' concerns seem to have misunderstood the nature of the safety belt proposal w.r.t. sysctl churn but we can revisit another time).

Oct 15 2015, 9:59 PM

Oct 14 2015

lstewart added a comment to D3858: Add an ability to specify initial congestion window..

So in the new world order we have net.inet.tcp.initcwnd=10, no master control switch and net.inet.tcp.experimental.* is no more. I'm an app developer and I come in and setsockopt TCP_INITCWND=100. Are we comfortable with saying the app developer knows best and not giving the sysadmin a mechanism to control? I don't care about people stupidly copying sysctl statements from the Internet because it requires a conscious choice for change and they have admin rights on the system, but are we comfortable with not having a mechanism to empower the sysadmin to control per vnet per socket changes to things which can have a non trivial influence beyond the socket and system?

Oct 14 2015, 10:09 PM
lstewart added a comment to D3858: Add an ability to specify initial congestion window..

For the 'allowed' sysctl, maybe something like kern.random.harvest does:

[snip]

Oct 14 2015, 12:24 AM
lstewart added a comment to D3858: Add an ability to specify initial congestion window..

Let's be careful not to conflate standard/non-standard with our system defaults. For some more context, Andre's intent for the experimental tree was to house things which were published within the IETF as experimental or draft status vs standards track. I argue that non-standard is a more appropriate grouping and in fact a superset of experimental, as it also encompasses anything we (the FreeBSD OS) choose to do which is not related to efforts within the IETF. If we choose to set the system default initial cwnd to 10 in a given branch of FreeBSD (as we have even though it is experimental as far as the IETF is concerned), that is orthogonal to standards compliance and orthogonal to whether an admin chooses to let an app request a different value via the tcp.nonstandard.allowed mechanism, which we are putting in place as a hoop to jump through to hopefully make people think twice about before changing.

Oct 14 2015, 12:12 AM

Oct 13 2015

lstewart added a comment to D3858: Add an ability to specify initial congestion window..

@koobs: The difference between TCP related sysctls and other OS sysctls is that TCP is by and large the product of IETF standards vs a bunch of ad hoc OS developers. By definition behaviour not covered in any of the IETF standards which relate to TCP are non-standard i.e. a clear indication to the user they are manipulating something which goes against documented wisdom. I am somewhat sympathetic to your argument that such sysctls should perhaps receive no special namespace - I was merely voicing a strong objection and alternative to Andre's "experimental" tree at the time it was floated and subsequently introduced. The experimental tree should absolutely die and "nonstandard" was my 2 second attempt at a sensible name for the tree - all gripes with the naming should be directed my way. The issue here is about giving the sys admin control over users/apps potentially asking the system to do crazy crap that can harm other network users. My thinking is that tcp.nonstandard.allowed adds an extra level of thought on behalf of the sysadmin before allowing.

Oct 13 2015, 12:52 PM

Aug 25 2015

lstewart added a comment to D2970: Undo the increase in sequence number by 1 due to the FIN flag in case of a transient error..

This change seems inadequate given that we would have set TF_SENTFIN and updated snd_max. I haven't followed through all the implications of not reverting those changes, but if we're going to attempt a state rollback we'd better make sure we get it right. I'm also a bit unclear on some details in the original report given that an RTO would reset snd_nxt to snd_una and get us out of any permanent pickle. I'm not a fan of rollbacks in general as they're fragile. What's the use case where a rollback here matters?

Aug 25 2015, 4:34 PM
lstewart added a comment to D2970: Undo the increase in sequence number by 1 due to the FIN flag in case of a transient error..

As a side note, I really dislike the conflation of logical sequence space and data accounting used in many places in our stack. It's something that's fairly straight forward to address and I have some proof of concept patches I did a while ago which we should dust off at some point.

Aug 25 2015, 3:59 PM

Jun 17 2015

lstewart added a comment to D1761: Extend LRO support to accumulate more than 65535 bytes.

Ok, but that's anecdotal and gives us reviewers nothing to go on - without any methodology or raw data who knows whether the LRO change is solely responsible for the improvement and if it introduced any undesired side effects. It's also possible that with tuning, the same results could have been obtained without the "jumbo" LRO change.

Jun 17 2015, 11:52 PM
lstewart added a comment to D1761: Extend LRO support to accumulate more than 65535 bytes.

I hope I didn't delete it... from what I could see online, the "Abandon" Phabricator action is the means by which a reviewer indicates they have permanently rejected the patch (as opposed to suggesting changes).

Jun 17 2015, 10:31 PM
lstewart abandoned D1761: Extend LRO support to accumulate more than 65535 bytes.

Just because some hardware is capable of coalescing more than 64k of data doesn't mean we should feel obligated to support the functionality. I'd be curious to understand the anticipated use cases that led to hardware support being added. Without some compelling data to show that this is useful, I think this work should be put on ice until such time as it can be shown to be worthwhile. If such data exists, I'm willing to give it due consideration and revise my judgment, but at this stage I strongly suspect there is no workload we support or will support in the near future that would significantly benefit from raising the LRO chunk size above 64k vs the hacks required to make it work, so that's why I'm voting against this patch outright rather than suggesting changes. The real goal is to remove LRO entirely anyway, which I believe we have ideas on how to do e.g. packet batching techniques.

Jun 17 2015, 10:23 PM

Jun 5 2015

lstewart added a comment to D2079: Fix TCP timers use-after-free old race conditions.

Yes, lowering the keepalive timer was how I was triggering this more quickly during investigation as with our default it took days at high load to trigger. I've also analysed a core dump with the tp in t_state 0, so it's not specific to TIMEWAIT either. I think I might know what's going on but will hopefully confirm my findings later today.

Jun 5 2015, 12:17 AM

Jun 2 2015

lstewart added a comment to D2079: Fix TCP timers use-after-free old race conditions.

Randall accidentally misspoke. We're seeing tcp_timer_keep() fire with a tp in TIMEWAIT and t_inpcb==NULL. The rest of the tp looks sane indicating it hasn't been GCed. I'm still trying to understand how this is possible as the code looks correct to me, but I'm continuing to dig...

Jun 2 2015, 2:32 PM

May 27 2015

lstewart added a comment to D2079: Fix TCP timers use-after-free old race conditions.
In D2079#49598, @jch wrote:

Thanks for your detailed comment.

First, you are right INP_INFO lock is not required by in_pcbdrop() but instead by in_pcbfree() (and in_pcbremlists() which is called only from in_pcbfree()). Call stack from tcp_timer_persist() to in_pcbfree() is indeed far from being obvious:

tcp_timer_keep()
tcp_drop()
tcp_close()
sofree()
tcp_usr_detach() (via pr->pr_usrreqs->pru_detach() in sofree())
tcp_detach()
in_pcbfree()
in_pcbremlists()
May 27 2015, 12:13 AM

May 26 2015

lstewart added a comment to D2079: Fix TCP timers use-after-free old race conditions.

I'll prefix this by saying I'm not well versed in the finer points of PCBs and associated locking, and the locking guide in in_pcb.h is somewhat unclear on a few things to my mind. Apologies if this is all super obvious to others.

May 26 2015, 1:40 AM

May 22 2015

lstewart added a comment to D2079: Fix TCP timers use-after-free old race conditions.

Sorry for coming very late to the party and I realise you've already committed the changes, but thought I'd ask my question here so that all the context relating to this work is in one place...

May 22 2015, 4:13 AM

Mar 18 2015

lstewart accepted D2089: Add flowid to siftr(4).
Mar 18 2015, 11:09 PM
lstewart added a comment to D2089: Add flowid to siftr(4).

I didn't think 0 was a valid flow id, in which case I think it would be useful to document the cases in which a consumer of SIFTR data might legitimately expect to see a flowid of 0 (if it's even possible - it might be that a flowid is always set in which case you don't need to document anything).

Mar 18 2015, 4:07 AM
lstewart accepted D2089: Add flowid to siftr(4).

Looks good, although you might want to document in the man page any caveats related to when the flow id might be 0 (I can't remember if there are any situations in which the flow id is not set?)

Mar 18 2015, 3:12 AM
lstewart requested changes to D2089: Add flowid to siftr(4).

Minor typo needs fixing.

Mar 18 2015, 12:20 AM

Sep 30 2014

lstewart added a comment to D711: mod_cc(9) per-algorithm per-connection {get|set}sockopt support.

Thanks for the feedback guys. I'll update the patch shortly. Not sure why your comments never came through Gleb, but I definitely didn't see them until your follow up a few days ago.

Sep 30 2014, 8:36 PM

Sep 25 2014

lstewart added a comment to D711: mod_cc(9) per-algorithm per-connection {get|set}sockopt support.

Last call - I'll commit the patch if I don't hear any objections in the next 48 hours.

Sep 25 2014, 7:24 PM

Sep 2 2014

lstewart added a comment to D711: mod_cc(9) per-algorithm per-connection {get|set}sockopt support.
In D711#6, @glebius wrote:

I find tautology in "socket option TCP_CCALGOOPT". All other socket options do not have abbreviation "OPT" in their names. I'd suggest to name it TCP_CCALGO or somehow other way.

Sep 2 2014, 8:58 AM
lstewart updated D711: mod_cc(9) per-algorithm per-connection {get|set}sockopt support.
Sep 2 2014, 8:31 AM
lstewart updated D711: mod_cc(9) per-algorithm per-connection {get|set}sockopt support.
Sep 2 2014, 8:31 AM
lstewart updated D711: mod_cc(9) per-algorithm per-connection {get|set}sockopt support.
Sep 2 2014, 8:31 AM
lstewart retitled D711: mod_cc(9) per-algorithm per-connection {get|set}sockopt support from to mod_cc(9) per-algorithm per-connection {get|set}sockopt support.
Sep 2 2014, 8:29 AM
lstewart added inline comments to D604: DCTCP implementation..
Sep 2 2014, 12:14 AM

Sep 1 2014

lstewart requested changes to D604: DCTCP implementation..

Preliminary review suggests we'll need to involve Midori and/or Lars in order to rework some aspects of the module. I stopped reviewing in depth as some high level fundamental things need to get resolved first.

Sep 1 2014, 8:33 AM

Aug 4 2014

lstewart added a comment to D506: Merge PLPMTU blackhole detection from xnu..

Hi Sean,

Aug 4 2014, 12:48 AM