Page MenuHomeFreeBSD
Feed Advanced Search

Aug 25 2016

lstewart committed rS304803: Pass the number of segments coalesced by LRO up the stack by repurposing the.
Pass the number of segments coalesced by LRO up the stack by repurposing the
Aug 25 2016, 1:33 PM
lstewart closed D7564: LRO nsegs by committing rS304803: Pass the number of segments coalesced by LRO up the stack by repurposing the.
Aug 25 2016, 1:33 PM

Aug 24 2016

lstewart accepted D7074: FFL: Change type of tcp_output() len variable.
Aug 24 2016, 3:16 AM
lstewart accepted D7073: FFL: Change type of tcp_output() recwin, sendwin, and adv variables..
Aug 24 2016, 3:15 AM

Aug 18 2016

lstewart retitled D7564: LRO nsegs from to LRO nsegs.
Aug 18 2016, 3:15 PM

May 18 2016

lstewart added a comment to D6442: Make more use of arc4random() in the kernel..
In D6442#136542, @pfg wrote:

Hello;

The improvement is probably not huge but given that we can provide better randomness ... why not? :)

May 18 2016, 11:47 PM

Apr 29 2016

lstewart added a comment to D6105: Add cwnd and ssthresh recommendations to RFC 6675 support. While here, unify everything under one sysctl knob..
In D6105#130864, @hiren wrote:

But why do we need such finely grained control? I don't get it. Either it works or it doesn't. We shouldn't be doing 6675 piecemeal. We should be doing 6675 in full and enabled by default. Providing any level of minutiae beyond enabled/disabled is not only unnecessary but a bad idea IMO.

iirc, @rrs tried the patch and it didn't work well for his workload. I
am just trying to avoid such situations.

Apr 29 2016, 8:18 AM
lstewart added a comment to D6105: Add cwnd and ssthresh recommendations to RFC 6675 support. While here, unify everything under one sysctl knob..

But why do we need such finely grained control? I don't get it. Either it works or it doesn't. We shouldn't be doing 6675 piecemeal. We should be doing 6675 in full and enabled by default. Providing any level of minutiae beyond enabled/disabled is not only unnecessary but a bad idea IMO.

Apr 29 2016, 7:41 AM
lstewart added a comment to D5872: tcp: Don't prematurely drop receiving-only connections.

We probably can leave the cwnd resetting to later rexmt timeout or possible later fast retransmit (I think fast retransmit could kick in under some cases, if ENOBUFS happened); instead of resetting the cwnd immediately upon ENOBUFS.

Please leave the manipulation of cwnd as is so as to avoid conflating two different changes. The manipulation of cwnd on local drop has nothing to do with the subject of this particular change.

Yep, I am not going to delete the cwnd reset in this patch.

Apr 29 2016, 2:02 AM
lstewart added a comment to D5872: tcp: Don't prematurely drop receiving-only connections.

We probably can leave the cwnd resetting to later rexmt timeout or possible later fast retransmit (I think fast retransmit could kick in under some cases, if ENOBUFS happened); instead of resetting the cwnd immediately upon ENOBUFS.

Apr 29 2016, 1:31 AM
lstewart added a comment to D6105: Add cwnd and ssthresh recommendations to RFC 6675 support. While here, unify everything under one sysctl knob..

Why isn't there simply a do_rfc6675 knob that supersedes this and the previously committed work?

Apr 29 2016, 1:21 AM

Apr 21 2016

lstewart added a comment to D5872: tcp: Don't prematurely drop receiving-only connections.
In D5872#128556, @hiren wrote:

I thought that had been fixed ages ago... oops.

Fixed? i.e. doing something other than setting cwnd to 1 seg?

Apr 21 2016, 5:16 AM
lstewart added a comment to D5872: tcp: Don't prematurely drop receiving-only connections.

I thought that had been fixed ages ago... oops. It should be calling cc_cong_signal() with a new congestion type. Just leave that line as is for the moment though as Mike says.

Apr 21 2016, 3:10 AM
lstewart added a comment to D5872: tcp: Don't prematurely drop receiving-only connections.

... but add a macro to check that the rexmit/persist timer is armed if appropriate! Should be added higher up though so that it is checked before all return statements in the vicinity.

Apr 21 2016, 1:53 AM

Apr 19 2016

lstewart added a comment to D5872: tcp: Don't prematurely drop receiving-only connections.

I agree with Mike's proposal (although FYI, I do belive tcp_output() will send an ACK on RTO). TCP ACKs are intentionally unreliable by design and setting the retransmit timer there is nonsense - either there is a bug elsewhere which needs to be fixed, or it is trying to paper over local ACK loss in a dubious manner. The ENOBUFS case should also become a thing of the past when the back pressure work goes in any way. For the immediate change, perhaps replacing with a macro that expands to a KASSERT to double check the appropriate conditions for the retransmit or persist timers being set would be a good idea. The macro should be used elsewhere in tcp_output() and tcp_intput() as well but that can be done in follow up commit(s).

Apr 19 2016, 2:36 AM

Mar 30 2016

lstewart added a comment to D5709: alq(9): Record any write failures and return the last in alq_close(9)..

Apologies for the delay in getting to this, still heads down wrapping up my PhD thesis. The comments from Kib and Mark all appear to have been addressed and the changes look good. I don't really have an opinion on the "report first or most recent" error issue.

Mar 30 2016, 1:09 AM

Feb 10 2016

lstewart added a comment to D5173: Rework initial congestion window calculation..

You're a bit "warmer" with the revised changes but still a fair ways off the mark. Apologies to anyone watching but I'm too time poor at the moment to engage in the proper but protracted back-and-forth public Phabricator discussion to resolve all the problems with this work. Perhaps another brief sync on IRC is in order and you can always summarise the chat logs here as context for others.

Feb 10 2016, 9:08 AM

Feb 2 2016

lstewart added a comment to D5124: Update <cc>_after_idle to take initcwnd_segments into account. .
In D5124#109833, @hiren wrote:

@lstewart I agree and I think its time to improve the initcwnd handling code. But that'd be a separate commit.

What is your take on the problem at hand? Are you okay with the diffs? I'd like to get this in and possibly MFC for 10.3.

Feb 2 2016, 5:36 AM

Feb 1 2016

lstewart added a comment to D5124: Update <cc>_after_idle to take initcwnd_segments into account. .

Oops, that should of course be 4 segments, not 3 (though recall that we really need an initcwnd_bytes variable as well in order to fully capture the spirit of the RFC 3390 and later RFCs - something perhaps you can add as part of this work).

Feb 1 2016, 5:35 AM
lstewart added a comment to D5124: Update <cc>_after_idle to take initcwnd_segments into account. .

I would suggest that the code to handle RFC3390 should be merged with the new code i.e. the net.inet.tcp.rfc3390 sysctl should become a SYSCTL_PROC and simply set V_tcp_initcwnd_segments=3 behind the scenes, and return the evaluated result of "V_tcp_initcwnd_segments==3" as the sysctl value.

Feb 1 2016, 5:27 AM

Jan 12 2016

lstewart committed rS293713: Remove myself after having forgotten to do so post my previous large commit..
Remove myself after having forgotten to do so post my previous large commit.
Jan 12 2016, 12:07 AM

Oct 27 2015

lstewart added a comment to D3858: Add an ability to specify initial congestion window..
In D3858#83098, @hiren wrote:

Lawrence and I had an IRC chat after this and here is the summary:

In D3858#81168, @hiren wrote:

I disagree with having a max. If we're going to allow arbitrary settings of initcwnd regardless of having a safety belt to limit whether an unprivileged user can request a different value, it should be unbounded.

There is no actual "max" limit for this. All limits depend on capacity of a link. So Lawrence's point is to live with whatever admin decides to set.
I am okay with that.

Oh, and initcwnd should be in bytes, not MSS.

Lawrence suggested that there is a drawback in the current approach of specifying initcwnd in MSS. If a connection starts out with lower than usual MSS, initcwnd would also come out to be lower than expected.

It should be specified in both number of MSS and bytes. And we should pick whatever is larger. In simplest form, something like:
max(initcwnd_segs * tp->t_maxseg, initicwnd_bytes)

Oct 27 2015, 12:36 AM

Oct 15 2015

lstewart added a comment to D3858: Add an ability to specify initial congestion window..

I have to be brief and can't respond to each comment as I'm about to hit the road for a wedding 7 hours away, but in short I disagree with having a max. If we're going to allow arbitrary settings of initcwnd regardless of having a safety belt to limit whether an unprivileged user can request a different value, it should be unbounded. We can always add the safety belt in later (Robert's and others' concerns seem to have misunderstood the nature of the safety belt proposal w.r.t. sysctl churn but we can revisit another time).

Oct 15 2015, 9:59 PM

Oct 14 2015

lstewart added a comment to D3858: Add an ability to specify initial congestion window..

So in the new world order we have net.inet.tcp.initcwnd=10, no master control switch and net.inet.tcp.experimental.* is no more. I'm an app developer and I come in and setsockopt TCP_INITCWND=100. Are we comfortable with saying the app developer knows best and not giving the sysadmin a mechanism to control? I don't care about people stupidly copying sysctl statements from the Internet because it requires a conscious choice for change and they have admin rights on the system, but are we comfortable with not having a mechanism to empower the sysadmin to control per vnet per socket changes to things which can have a non trivial influence beyond the socket and system?

Oct 14 2015, 10:09 PM
lstewart added a comment to D3858: Add an ability to specify initial congestion window..

For the 'allowed' sysctl, maybe something like kern.random.harvest does:

[snip]

Oct 14 2015, 12:24 AM
lstewart added a comment to D3858: Add an ability to specify initial congestion window..

Let's be careful not to conflate standard/non-standard with our system defaults. For some more context, Andre's intent for the experimental tree was to house things which were published within the IETF as experimental or draft status vs standards track. I argue that non-standard is a more appropriate grouping and in fact a superset of experimental, as it also encompasses anything we (the FreeBSD OS) choose to do which is not related to efforts within the IETF. If we choose to set the system default initial cwnd to 10 in a given branch of FreeBSD (as we have even though it is experimental as far as the IETF is concerned), that is orthogonal to standards compliance and orthogonal to whether an admin chooses to let an app request a different value via the tcp.nonstandard.allowed mechanism, which we are putting in place as a hoop to jump through to hopefully make people think twice about before changing.

Oct 14 2015, 12:12 AM

Oct 13 2015

lstewart added a comment to D3858: Add an ability to specify initial congestion window..

@koobs: The difference between TCP related sysctls and other OS sysctls is that TCP is by and large the product of IETF standards vs a bunch of ad hoc OS developers. By definition behaviour not covered in any of the IETF standards which relate to TCP are non-standard i.e. a clear indication to the user they are manipulating something which goes against documented wisdom. I am somewhat sympathetic to your argument that such sysctls should perhaps receive no special namespace - I was merely voicing a strong objection and alternative to Andre's "experimental" tree at the time it was floated and subsequently introduced. The experimental tree should absolutely die and "nonstandard" was my 2 second attempt at a sensible name for the tree - all gripes with the naming should be directed my way. The issue here is about giving the sys admin control over users/apps potentially asking the system to do crazy crap that can harm other network users. My thinking is that tcp.nonstandard.allowed adds an extra level of thought on behalf of the sysadmin before allowing.

Oct 13 2015, 12:52 PM

Aug 25 2015

lstewart added a comment to D2970: Undo the increase in sequence number by 1 due to the FIN flag in case of a transient error..

This change seems inadequate given that we would have set TF_SENTFIN and updated snd_max. I haven't followed through all the implications of not reverting those changes, but if we're going to attempt a state rollback we'd better make sure we get it right. I'm also a bit unclear on some details in the original report given that an RTO would reset snd_nxt to snd_una and get us out of any permanent pickle. I'm not a fan of rollbacks in general as they're fragile. What's the use case where a rollback here matters?

Aug 25 2015, 4:34 PM
lstewart added a comment to D2970: Undo the increase in sequence number by 1 due to the FIN flag in case of a transient error..

As a side note, I really dislike the conflation of logical sequence space and data accounting used in many places in our stack. It's something that's fairly straight forward to address and I have some proof of concept patches I did a while ago which we should dust off at some point.

Aug 25 2015, 3:59 PM

Jun 17 2015

lstewart added a comment to D1761: Extend LRO support to accumulate more than 65535 bytes.

Ok, but that's anecdotal and gives us reviewers nothing to go on - without any methodology or raw data who knows whether the LRO change is solely responsible for the improvement and if it introduced any undesired side effects. It's also possible that with tuning, the same results could have been obtained without the "jumbo" LRO change.

Jun 17 2015, 11:52 PM
lstewart added a comment to D1761: Extend LRO support to accumulate more than 65535 bytes.

I hope I didn't delete it... from what I could see online, the "Abandon" Phabricator action is the means by which a reviewer indicates they have permanently rejected the patch (as opposed to suggesting changes).

Jun 17 2015, 10:31 PM
lstewart abandoned D1761: Extend LRO support to accumulate more than 65535 bytes.

Just because some hardware is capable of coalescing more than 64k of data doesn't mean we should feel obligated to support the functionality. I'd be curious to understand the anticipated use cases that led to hardware support being added. Without some compelling data to show that this is useful, I think this work should be put on ice until such time as it can be shown to be worthwhile. If such data exists, I'm willing to give it due consideration and revise my judgment, but at this stage I strongly suspect there is no workload we support or will support in the near future that would significantly benefit from raising the LRO chunk size above 64k vs the hacks required to make it work, so that's why I'm voting against this patch outright rather than suggesting changes. The real goal is to remove LRO entirely anyway, which I believe we have ideas on how to do e.g. packet batching techniques.

Jun 17 2015, 10:23 PM

Jun 5 2015

lstewart added a comment to D2079: Fix TCP timers use-after-free old race conditions.

Yes, lowering the keepalive timer was how I was triggering this more quickly during investigation as with our default it took days at high load to trigger. I've also analysed a core dump with the tp in t_state 0, so it's not specific to TIMEWAIT either. I think I might know what's going on but will hopefully confirm my findings later today.

Jun 5 2015, 12:17 AM

Jun 2 2015

lstewart added a comment to D2079: Fix TCP timers use-after-free old race conditions.

Randall accidentally misspoke. We're seeing tcp_timer_keep() fire with a tp in TIMEWAIT and t_inpcb==NULL. The rest of the tp looks sane indicating it hasn't been GCed. I'm still trying to understand how this is possible as the code looks correct to me, but I'm continuing to dig...

Jun 2 2015, 2:32 PM

May 27 2015

lstewart added a comment to D2079: Fix TCP timers use-after-free old race conditions.
In D2079#49598, @jch wrote:

Thanks for your detailed comment.

First, you are right INP_INFO lock is not required by in_pcbdrop() but instead by in_pcbfree() (and in_pcbremlists() which is called only from in_pcbfree()). Call stack from tcp_timer_persist() to in_pcbfree() is indeed far from being obvious:

tcp_timer_keep()
tcp_drop()
tcp_close()
sofree()
tcp_usr_detach() (via pr->pr_usrreqs->pru_detach() in sofree())
tcp_detach()
in_pcbfree()
in_pcbremlists()
May 27 2015, 12:13 AM

May 26 2015

lstewart added a comment to D2079: Fix TCP timers use-after-free old race conditions.

I'll prefix this by saying I'm not well versed in the finer points of PCBs and associated locking, and the locking guide in in_pcb.h is somewhat unclear on a few things to my mind. Apologies if this is all super obvious to others.

May 26 2015, 1:40 AM

May 22 2015

lstewart added a comment to D2079: Fix TCP timers use-after-free old race conditions.

Sorry for coming very late to the party and I realise you've already committed the changes, but thought I'd ask my question here so that all the context relating to this work is in one place...

May 22 2015, 4:13 AM

Mar 18 2015

lstewart accepted D2089: Add flowid to siftr(4).
Mar 18 2015, 11:09 PM
lstewart added a comment to D2089: Add flowid to siftr(4).

I didn't think 0 was a valid flow id, in which case I think it would be useful to document the cases in which a consumer of SIFTR data might legitimately expect to see a flowid of 0 (if it's even possible - it might be that a flowid is always set in which case you don't need to document anything).

Mar 18 2015, 4:07 AM
lstewart accepted D2089: Add flowid to siftr(4).

Looks good, although you might want to document in the man page any caveats related to when the flow id might be 0 (I can't remember if there are any situations in which the flow id is not set?)

Mar 18 2015, 3:12 AM
lstewart requested changes to D2089: Add flowid to siftr(4).

Minor typo needs fixing.

Mar 18 2015, 12:20 AM

Sep 30 2014

lstewart added a comment to D711: mod_cc(9) per-algorithm per-connection {get|set}sockopt support.

Thanks for the feedback guys. I'll update the patch shortly. Not sure why your comments never came through Gleb, but I definitely didn't see them until your follow up a few days ago.

Sep 30 2014, 8:36 PM

Sep 25 2014

lstewart added a comment to D711: mod_cc(9) per-algorithm per-connection {get|set}sockopt support.

Last call - I'll commit the patch if I don't hear any objections in the next 48 hours.

Sep 25 2014, 7:24 PM

Sep 2 2014

lstewart added a comment to D711: mod_cc(9) per-algorithm per-connection {get|set}sockopt support.
In D711#6, @glebius wrote:

I find tautology in "socket option TCP_CCALGOOPT". All other socket options do not have abbreviation "OPT" in their names. I'd suggest to name it TCP_CCALGO or somehow other way.

Sep 2 2014, 8:58 AM
lstewart updated D711: mod_cc(9) per-algorithm per-connection {get|set}sockopt support.
Sep 2 2014, 8:31 AM
lstewart updated D711: mod_cc(9) per-algorithm per-connection {get|set}sockopt support.
Sep 2 2014, 8:31 AM
lstewart updated D711: mod_cc(9) per-algorithm per-connection {get|set}sockopt support.
Sep 2 2014, 8:31 AM
lstewart retitled D711: mod_cc(9) per-algorithm per-connection {get|set}sockopt support from to mod_cc(9) per-algorithm per-connection {get|set}sockopt support.
Sep 2 2014, 8:29 AM
lstewart added inline comments to D604: DCTCP implementation..
Sep 2 2014, 12:14 AM

Sep 1 2014

lstewart requested changes to D604: DCTCP implementation..

Preliminary review suggests we'll need to involve Midori and/or Lars in order to rework some aspects of the module. I stopped reviewing in depth as some high level fundamental things need to get resolved first.

Sep 1 2014, 8:33 AM

Aug 4 2014

lstewart added a comment to D506: Merge PLPMTU blackhole detection from xnu..

Hi Sean,

Aug 4 2014, 12:48 AM