Page MenuHomeFreeBSD

lstewart (Lawrence Stewart)
User

Projects

User Details

User Since
May 11 2014, 7:08 PM (239 w, 5 d)

Recent Activity

Mon, Dec 3

lstewart added a comment to D18336: Update benchmarks/spp to v0.4.

Thanks all for the feedback... I'm a bit rusty on working with ports. Will commit sometime this week with my src commit bit hat on and "review/approved by:" if no further feedback materialises.

Mon, Dec 3, 5:45 PM

Mon, Nov 26

lstewart updated the diff for D18336: Update benchmarks/spp to v0.4.

Checked with one of the authors. Preference is to reference the Bitbucket URL in pkg-descr if it can only list a single URL.

Mon, Nov 26, 9:29 AM
lstewart updated the diff for D18336: Update benchmarks/spp to v0.4.

Full context diff, shuffle variable order and ditch DISTVERSION in favour of a custom variable to hold the commit hash used in the tarball directory name.

Mon, Nov 26, 8:41 AM
lstewart created D18336: Update benchmarks/spp to v0.4.
Mon, Nov 26, 7:53 AM

Oct 31 2018

lstewart added a comment to D17595: Fix handling of RST segments in SYN-RCVD state via the syn cache code path.

@tuexen Any thoughts on the TFO case?

Oct 31 2018, 2:52 AM

Oct 18 2018

lstewart added inline comments to D17595: Fix handling of RST segments in SYN-RCVD state via the syn cache code path.
Oct 18 2018, 8:45 PM

Jul 21 2018

lstewart accepted D16282: NULL out cc_data in pluggable TCP {cc}_cb_destroy.

I would prefer to leave the check in cc_cdg until we run to ground why the callback was getting hit twice (in the case someone has cc_cdg as default).

Jul 21 2018, 3:40 AM
lstewart added a comment to D16282: NULL out cc_data in pluggable TCP {cc}_cb_destroy.

@lstewart we debated who should NULL before submitting this, I think it can be moved out of the destructors to the places I put the KASSERTs but I'll have to double check everything. One issue I foresaw NULLing in the framework is that we have to trust that the cc_cb_destroy destructor did indeed free() and can't catch mistakes with a KASSERT. But writing a new CC isn't exactly common and will probably be cribbed from an existing one so maybe I was over thinking that angle.

Jul 21 2018, 2:52 AM

Jul 20 2018

lstewart added a comment to D16282: NULL out cc_data in pluggable TCP {cc}_cb_destroy.

If the cb_destroy virtual function shouldn't null the pointer, I assert that it shouldn't free it either.

There should be another cc_assert wrapper function that checks for not null, frees then nulls, after calling the virtual function.

Jul 20 2018, 3:48 AM
lstewart added inline comments to D16282: NULL out cc_data in pluggable TCP {cc}_cb_destroy.
Jul 20 2018, 3:29 AM
lstewart added a comment to D16282: NULL out cc_data in pluggable TCP {cc}_cb_destroy.

When ABE was added (rS331214) to New Reno and leak fixed (rS333699) , some assumptions about the state of it as the default and always ready cc seem to have changed, notably it now has a constructor and destructor (newreno_cb_destroy) for per connection state.

Jul 20 2018, 3:14 AM

Jun 8 2018

lstewart added a member for transport: lstewart.
Jun 8 2018, 6:28 PM

May 17 2018

lstewart committed rS333699: Plug a memory leak and potential NULL-pointer dereference introduced in r331214..
Plug a memory leak and potential NULL-pointer dereference introduced in r331214.
May 17 2018, 2:46 AM
lstewart closed D15358: Address memory leak in new reno cc module.
May 17 2018, 2:46 AM

May 15 2018

lstewart added inline comments to D15358: Address memory leak in new reno cc module.
May 15 2018, 1:45 AM

May 14 2018

lstewart updated the diff for D15358: Address memory leak in new reno cc module.

Final commit candidate, rebased against FreeBSD head r333598, and with M_ZERO removed from malloc call.

May 14 2018, 2:24 AM

May 10 2018

lstewart updated the diff for D15358: Address memory leak in new reno cc module.
May 10 2018, 10:55 AM
lstewart commandeered D15358: Address memory leak in new reno cc module.

newreno_plugleak_v3.diff with getsockopt(2)/setsockopt(2) updated.

May 10 2018, 10:34 AM
lstewart added a comment to D15358: Address memory leak in new reno cc module.

@wollman Many thanks for the historical and standards related context - greatly appreciated. I realised that I probably need to update getsockopt(2)/setsockopt(2) as well...

May 10 2018, 10:00 AM
lstewart updated subscribers of D15358: Address memory leak in new reno cc module.
In D15358#323834, @thj wrote:

Still to be tested, but I think something like this would address the leak and change memory allocation to being conditional on need: newreno_plugleak_v1.diff

This reads okay.

I ran through a loop with netcat and didn't doesn't leak(of course it wouldn't!) and tested setting the abe beta values via set sockopt with a modified netcat (https://people.freebsd.org/~thj/diffs/ncabe.diff). This also doesn't leak.

May 10 2018, 1:41 AM

May 9 2018

lstewart added a comment to D15358: Address memory leak in new reno cc module.

Still to be tested, but I think something like this would address the leak and change memory allocation to being conditional on need: newreno_plugleak_v1.diff

May 9 2018, 3:43 AM

Mar 19 2018

lstewart closed D11616: Add support for TCP ABE draft-khademi-tcpm-alternativebackoff-ecn.
Mar 19 2018, 4:37 PM
lstewart committed rS331214: Add support for the experimental Internet-Draft "TCP Alternative Backoff with.
Add support for the experimental Internet-Draft "TCP Alternative Backoff with
Mar 19 2018, 4:37 PM

Feb 7 2018

lstewart added a comment to D14141: fix underflow for cubic_cwnd().

@jason_eggnet.com I thought about this some more and while there is no doubt that overflow/underflow due to the function's inputs is possible and needs to be remedied, the cause of overflow in your test case is not in fact the time since congestion being too large, but the bogus value of K, which for a wmax of 1460 bytes (i.e. slightly more than 1 MSS) should be 205 per my quick check:

Feb 7 2018, 3:14 AM
lstewart added a comment to D14141: fix underflow for cubic_cwnd().

Oh and @jason_eggnet.com , regarding your test code, I structured things the way they are so that you can simply #include <netinet/cc/cc_cubic.h> into your userspace test.c to get access to the various window calculation inlines rather than copy/pasting them.

Feb 7 2018, 2:21 AM
lstewart added inline comments to D14141: fix underflow for cubic_cwnd().
Feb 7 2018, 2:13 AM
lstewart added a comment to D14141: fix underflow for cubic_cwnd().

This fix doesn't make sense to me. If it has been a long time since the last congestion epoch resulting in an overflow in the calculated congestion window, collapsing the window from <something_large> to 1 segment is a terrible idea. There's also no sound reason that cwnd shouldn't be allowed to shrink below 1 segment, and if that causes problems elsewhere, those places should be fixed.

Feb 7 2018, 2:10 AM

Aug 24 2017

lstewart committed rS322831: Only emit the trailing new line added in r322613 when not operating in quiet.
Only emit the trailing new line added in r322613 when not operating in quiet
Aug 24 2017, 8:20 AM

Aug 18 2017

lstewart committed rS322643: An off-by-one error exists in sbuf_vprintf()'s use of SBUF_HASROOM() when an.
An off-by-one error exists in sbuf_vprintf()'s use of SBUF_HASROOM() when an
Aug 18 2017, 2:06 AM
lstewart closed D8535: Fix an off-by-one error in libsbuf's userland sbuf_vprintf() function. by committing rS322643: An off-by-one error exists in sbuf_vprintf()'s use of SBUF_HASROOM() when an.
Aug 18 2017, 2:06 AM
lstewart updated the diff for D8535: Fix an off-by-one error in libsbuf's userland sbuf_vprintf() function..

yoda be gone

Aug 18 2017, 12:31 AM

Aug 17 2017

lstewart added a comment to D8535: Fix an off-by-one error in libsbuf's userland sbuf_vprintf() function..

@cem Thanks!

Aug 17 2017, 10:53 PM
lstewart updated the diff for D8535: Fix an off-by-one error in libsbuf's userland sbuf_vprintf() function..

Update diff post r322614 commit.

Aug 17 2017, 8:06 AM
lstewart added a reviewer for D8535: Fix an off-by-one error in libsbuf's userland sbuf_vprintf() function.: cem.

Conrad: would you be willing to sanity check this sbuf change for me as well?

Aug 17 2017, 7:32 AM
lstewart committed rS322614: Implement simple record boundary tracking in sbuf(9) to avoid record splitting.
Implement simple record boundary tracking in sbuf(9) to avoid record splitting
Aug 17 2017, 7:20 AM
lstewart closed D8536: Implement simple record boundary tracking in sbuf(9) by committing rS322614: Implement simple record boundary tracking in sbuf(9) to avoid record splitting.
Aug 17 2017, 7:20 AM
lstewart committed rS322613: The r322210 change to pgrep's PID delimiting behaviour causes pgrep's default.
The r322210 change to pgrep's PID delimiting behaviour causes pgrep's default
Aug 17 2017, 6:36 AM
lstewart added a comment to D8536: Implement simple record boundary tracking in sbuf(9).
In D8536#249937, @cem wrote:

Looks fine. I think it would be good to note in the commit message that this is only for top-level sections and does not nest. It's obvious when you think about it, but, I like clarity.

Aug 17 2017, 1:10 AM

Aug 15 2017

lstewart added a reviewer for D8536: Implement simple record boundary tracking in sbuf(9): cem.
Aug 15 2017, 5:15 AM

Aug 8 2017

lstewart updated the diff for D8536: Implement simple record boundary tracking in sbuf(9).

Rebase patch against current head and address review feedback.

Aug 8 2017, 12:59 AM
lstewart updated the diff for D8535: Fix an off-by-one error in libsbuf's userland sbuf_vprintf() function..

The off-by-one error was incorrectly attributed to the condition that checks if vsnprintf() was successful at rendering all of the specified content into the sbuf.

Aug 8 2017, 12:45 AM
lstewart committed rS322210: pgrep naively appends the delimiter to all PIDs including the last.
pgrep naively appends the delimiter to all PIDs including the last
Aug 8 2017, 12:31 AM
lstewart closed D8537: Suppress emission of delimiter after final PID when using pgrep's -d switch by committing rS322210: pgrep naively appends the delimiter to all PIDs including the last.
Aug 8 2017, 12:31 AM
lstewart updated the summary of D8535: Fix an off-by-one error in libsbuf's userland sbuf_vprintf() function..
Aug 8 2017, 12:21 AM

Aug 2 2017

lstewart accepted D11616: Add support for TCP ABE draft-khademi-tcpm-alternativebackoff-ecn.

Thanks Tom, looking good. Will wait a few days to see if any further feedback materialises.

Aug 2 2017, 2:13 PM

Jul 28 2017

lstewart added inline comments to D11616: Add support for TCP ABE draft-khademi-tcpm-alternativebackoff-ecn.
Jul 28 2017, 1:48 AM

Jul 27 2017

lstewart added a comment to D11616: Add support for TCP ABE draft-khademi-tcpm-alternativebackoff-ecn.

This looks good barring a couple of nits as noted. Please update tcp(4) and cc_newreno(4) appropriately.

Jul 27 2017, 2:56 AM

Jul 17 2017

lstewart added a comment to D11616: Add support for TCP ABE draft-khademi-tcpm-alternativebackoff-ecn.

I'm pretty sure this wouldn't compile as proposed. Please remember to build test patches against FreeBSD's svn head branch, as all work always gets comitted to head first before potentially being backported later.

Jul 17 2017, 2:22 PM

Jul 11 2017

lstewart added a comment to D10894: First cut implementation of hybrid slow start.

Following up on some discussion held on the fringe of the recent developer summit at BSDCan, what this work is missing is some accompanying documentation to demonstrate the dynamic behaviour of the proposed implementation over some range of relevant network parameters, along with a critique of expected versus measured/observed behaviour. I'm not asking for a full blown academic paper - a stream of consciousness Google-doc with some data, a few X vs time plots and some comments would suffice. Happy to provide guidance if required - you know where to reach me.

Jul 11 2017, 6:16 AM

May 24 2017

lstewart accepted D10556: Update cubic constants.
May 24 2017, 12:25 AM · network

May 4 2017

lstewart requested changes to D10556: Update cubic constants.
May 4 2017, 5:06 AM · network

Apr 7 2017

lstewart added inline comments to D9668: Support estimated RTT for receive buffer auto resizing.
Apr 7 2017, 7:27 PM

Mar 19 2017

lstewart added a comment to D9668: Support estimated RTT for receive buffer auto resizing.

Apologies for the delay in getting to this.

Mar 19 2017, 9:33 AM

Feb 20 2017

lstewart added a reviewer for D9668: Support estimated RTT for receive buffer auto resizing: lstewart.
Feb 20 2017, 9:10 AM

Feb 17 2017

lstewart added inline comments to D9519: Don't zero out srtt after excess retransmits.
Feb 17 2017, 11:41 PM

Nov 17 2016

lstewart updated the diff for D8537: Suppress emission of delimiter after final PID when using pgrep's -d switch.

@kib: You prefer this?

Nov 17 2016, 4:48 AM

Nov 16 2016

lstewart updated D8536: Implement simple record boundary tracking in sbuf(9).
Nov 16 2016, 9:25 AM
lstewart retitled D8537: Suppress emission of delimiter after final PID when using pgrep's -d switch from to Suppress emission of delimiter after final PID when using pgrep's -d switch.
Nov 16 2016, 9:20 AM
lstewart retitled D8536: Implement simple record boundary tracking in sbuf(9) from to Implement simple record boundary tracking in sbuf(9).
Nov 16 2016, 8:59 AM
lstewart retitled D8535: Fix an off-by-one error in libsbuf's userland sbuf_vprintf() function. from to Fix an off-by-one error in libsbuf's userland sbuf_vprintf() function..
Nov 16 2016, 8:42 AM

Nov 10 2016

lstewart added a comment to D8487: loader: zfs toplevel vdev must have spa set..

With this patch applied on top of svn head r308477, I built a custom memstick image, wiped the harddrives and reran my build guide steps including creation of the RAIDZ zpool with "-O checksum=skein" option and can confirm the system boots, so although I did not test without this patch applied, I think it's safe to say the patch fixed things given that there have been no other relevant changes in sys/boot between the revision I tested with previously (r307747) and r308477.

Nov 10 2016, 3:42 AM
lstewart added a comment to D7826: Disable loop unrolling in skein in sys/boot.

To clarify, I did not test prior to this commit, so am not sure if it ever worked. I merely looked up svn log history on sys/boot and this seemed like a plausible culprit assuming there has been a regression.

Nov 10 2016, 12:15 AM

Nov 9 2016

lstewart added a comment to D7826: Disable loop unrolling in skein in sys/boot.

Is it possible this broke UEFI booting from RAIDZ pools with checksum=skein? If I omit "-O checksum=skein" from my zpool create step everything works great, but with skein checksums, boot1 sees my pool (prints my pool name after "found the following pools" message) but hangs where it should have run loader.efi and then requires a hard reset.

Nov 9 2016, 2:38 PM

Oct 12 2016

lstewart accepted D8185: Add compile-time option to activate hhook(9) framework for TCP.
Oct 12 2016, 1:41 AM

Aug 25 2016

lstewart committed rS304803: Pass the number of segments coalesced by LRO up the stack by repurposing the.
Pass the number of segments coalesced by LRO up the stack by repurposing the
Aug 25 2016, 1:33 PM
lstewart closed D7564: LRO nsegs by committing rS304803: Pass the number of segments coalesced by LRO up the stack by repurposing the.
Aug 25 2016, 1:33 PM

Aug 24 2016

lstewart accepted D7074: FFL: Change type of tcp_output() len variable.
Aug 24 2016, 3:16 AM
lstewart accepted D7073: FFL: Change type of tcp_output() recwin, sendwin, and adv variables..
Aug 24 2016, 3:15 AM

Aug 18 2016

lstewart retitled D7564: LRO nsegs from to LRO nsegs.
Aug 18 2016, 3:15 PM

May 18 2016

lstewart added a comment to D6442: Make more use of arc4random() in the kernel..
In D6442#136542, @pfg wrote:

Hello;

The improvement is probably not huge but given that we can provide better randomness ... why not? :)

May 18 2016, 11:47 PM

Apr 29 2016

lstewart added a comment to D6105: Add cwnd and ssthresh recommendations to RFC 6675 support. While here, unify everything under one sysctl knob..
In D6105#130864, @hiren wrote:

But why do we need such finely grained control? I don't get it. Either it works or it doesn't. We shouldn't be doing 6675 piecemeal. We should be doing 6675 in full and enabled by default. Providing any level of minutiae beyond enabled/disabled is not only unnecessary but a bad idea IMO.

iirc, @rrs tried the patch and it didn't work well for his workload. I
am just trying to avoid such situations.

Apr 29 2016, 8:18 AM
lstewart added a comment to D6105: Add cwnd and ssthresh recommendations to RFC 6675 support. While here, unify everything under one sysctl knob..

But why do we need such finely grained control? I don't get it. Either it works or it doesn't. We shouldn't be doing 6675 piecemeal. We should be doing 6675 in full and enabled by default. Providing any level of minutiae beyond enabled/disabled is not only unnecessary but a bad idea IMO.

Apr 29 2016, 7:41 AM
lstewart added a comment to D5872: tcp: Don't prematurely drop receiving-only connections.

We probably can leave the cwnd resetting to later rexmt timeout or possible later fast retransmit (I think fast retransmit could kick in under some cases, if ENOBUFS happened); instead of resetting the cwnd immediately upon ENOBUFS.

Please leave the manipulation of cwnd as is so as to avoid conflating two different changes. The manipulation of cwnd on local drop has nothing to do with the subject of this particular change.

Yep, I am not going to delete the cwnd reset in this patch.

Apr 29 2016, 2:02 AM
lstewart added a comment to D5872: tcp: Don't prematurely drop receiving-only connections.

We probably can leave the cwnd resetting to later rexmt timeout or possible later fast retransmit (I think fast retransmit could kick in under some cases, if ENOBUFS happened); instead of resetting the cwnd immediately upon ENOBUFS.

Apr 29 2016, 1:31 AM
lstewart added a comment to D6105: Add cwnd and ssthresh recommendations to RFC 6675 support. While here, unify everything under one sysctl knob..

Why isn't there simply a do_rfc6675 knob that supersedes this and the previously committed work?

Apr 29 2016, 1:21 AM

Apr 21 2016

lstewart added a comment to D5872: tcp: Don't prematurely drop receiving-only connections.
In D5872#128556, @hiren wrote:

I thought that had been fixed ages ago... oops.

Fixed? i.e. doing something other than setting cwnd to 1 seg?

Apr 21 2016, 5:16 AM
lstewart added a comment to D5872: tcp: Don't prematurely drop receiving-only connections.

I thought that had been fixed ages ago... oops. It should be calling cc_cong_signal() with a new congestion type. Just leave that line as is for the moment though as Mike says.

Apr 21 2016, 3:10 AM
lstewart added a comment to D5872: tcp: Don't prematurely drop receiving-only connections.

... but add a macro to check that the rexmit/persist timer is armed if appropriate! Should be added higher up though so that it is checked before all return statements in the vicinity.

Apr 21 2016, 1:53 AM

Apr 19 2016

lstewart added a comment to D5872: tcp: Don't prematurely drop receiving-only connections.

I agree with Mike's proposal (although FYI, I do belive tcp_output() will send an ACK on RTO). TCP ACKs are intentionally unreliable by design and setting the retransmit timer there is nonsense - either there is a bug elsewhere which needs to be fixed, or it is trying to paper over local ACK loss in a dubious manner. The ENOBUFS case should also become a thing of the past when the back pressure work goes in any way. For the immediate change, perhaps replacing with a macro that expands to a KASSERT to double check the appropriate conditions for the retransmit or persist timers being set would be a good idea. The macro should be used elsewhere in tcp_output() and tcp_intput() as well but that can be done in follow up commit(s).

Apr 19 2016, 2:36 AM

Mar 30 2016

lstewart added a comment to D5709: alq(9): Record any write failures and return the last in alq_close(9)..

Apologies for the delay in getting to this, still heads down wrapping up my PhD thesis. The comments from Kib and Mark all appear to have been addressed and the changes look good. I don't really have an opinion on the "report first or most recent" error issue.

Mar 30 2016, 1:09 AM

Feb 10 2016

lstewart added a comment to D5173: Rework initial congestion window calculation..

You're a bit "warmer" with the revised changes but still a fair ways off the mark. Apologies to anyone watching but I'm too time poor at the moment to engage in the proper but protracted back-and-forth public Phabricator discussion to resolve all the problems with this work. Perhaps another brief sync on IRC is in order and you can always summarise the chat logs here as context for others.

Feb 10 2016, 9:08 AM

Feb 2 2016

lstewart added a comment to D5124: Update <cc>_after_idle to take initcwnd_segments into account. .
In D5124#109833, @hiren wrote:

@lstewart I agree and I think its time to improve the initcwnd handling code. But that'd be a separate commit.

What is your take on the problem at hand? Are you okay with the diffs? I'd like to get this in and possibly MFC for 10.3.

Feb 2 2016, 5:36 AM

Feb 1 2016

lstewart added a comment to D5124: Update <cc>_after_idle to take initcwnd_segments into account. .

Oops, that should of course be 4 segments, not 3 (though recall that we really need an initcwnd_bytes variable as well in order to fully capture the spirit of the RFC 3390 and later RFCs - something perhaps you can add as part of this work).

Feb 1 2016, 5:35 AM
lstewart added a comment to D5124: Update <cc>_after_idle to take initcwnd_segments into account. .

I would suggest that the code to handle RFC3390 should be merged with the new code i.e. the net.inet.tcp.rfc3390 sysctl should become a SYSCTL_PROC and simply set V_tcp_initcwnd_segments=3 behind the scenes, and return the evaluated result of "V_tcp_initcwnd_segments==3" as the sysctl value.

Feb 1 2016, 5:27 AM

Jan 12 2016

lstewart committed rS293713: Remove myself after having forgotten to do so post my previous large commit..
Remove myself after having forgotten to do so post my previous large commit.
Jan 12 2016, 12:07 AM

Oct 27 2015

lstewart added a comment to D3858: Add an ability to specify initial congestion window..
In D3858#83098, @hiren wrote:

Lawrence and I had an IRC chat after this and here is the summary:

In D3858#81168, @hiren wrote:

I disagree with having a max. If we're going to allow arbitrary settings of initcwnd regardless of having a safety belt to limit whether an unprivileged user can request a different value, it should be unbounded.

There is no actual "max" limit for this. All limits depend on capacity of a link. So Lawrence's point is to live with whatever admin decides to set.
I am okay with that.

Oh, and initcwnd should be in bytes, not MSS.

Lawrence suggested that there is a drawback in the current approach of specifying initcwnd in MSS. If a connection starts out with lower than usual MSS, initcwnd would also come out to be lower than expected.

It should be specified in both number of MSS and bytes. And we should pick whatever is larger. In simplest form, something like:
max(initcwnd_segs * tp->t_maxseg, initicwnd_bytes)

Oct 27 2015, 12:36 AM

Oct 15 2015

lstewart added a comment to D3858: Add an ability to specify initial congestion window..

I have to be brief and can't respond to each comment as I'm about to hit the road for a wedding 7 hours away, but in short I disagree with having a max. If we're going to allow arbitrary settings of initcwnd regardless of having a safety belt to limit whether an unprivileged user can request a different value, it should be unbounded. We can always add the safety belt in later (Robert's and others' concerns seem to have misunderstood the nature of the safety belt proposal w.r.t. sysctl churn but we can revisit another time).

Oct 15 2015, 9:59 PM

Oct 14 2015

lstewart added a comment to D3858: Add an ability to specify initial congestion window..

So in the new world order we have net.inet.tcp.initcwnd=10, no master control switch and net.inet.tcp.experimental.* is no more. I'm an app developer and I come in and setsockopt TCP_INITCWND=100. Are we comfortable with saying the app developer knows best and not giving the sysadmin a mechanism to control? I don't care about people stupidly copying sysctl statements from the Internet because it requires a conscious choice for change and they have admin rights on the system, but are we comfortable with not having a mechanism to empower the sysadmin to control per vnet per socket changes to things which can have a non trivial influence beyond the socket and system?

Oct 14 2015, 10:09 PM
lstewart added a comment to D3858: Add an ability to specify initial congestion window..

For the 'allowed' sysctl, maybe something like kern.random.harvest does:

[snip]

Oct 14 2015, 12:24 AM
lstewart added a comment to D3858: Add an ability to specify initial congestion window..

Let's be careful not to conflate standard/non-standard with our system defaults. For some more context, Andre's intent for the experimental tree was to house things which were published within the IETF as experimental or draft status vs standards track. I argue that non-standard is a more appropriate grouping and in fact a superset of experimental, as it also encompasses anything we (the FreeBSD OS) choose to do which is not related to efforts within the IETF. If we choose to set the system default initial cwnd to 10 in a given branch of FreeBSD (as we have even though it is experimental as far as the IETF is concerned), that is orthogonal to standards compliance and orthogonal to whether an admin chooses to let an app request a different value via the tcp.nonstandard.allowed mechanism, which we are putting in place as a hoop to jump through to hopefully make people think twice about before changing.

Oct 14 2015, 12:12 AM

Oct 13 2015

lstewart added a comment to D3858: Add an ability to specify initial congestion window..

@koobs: The difference between TCP related sysctls and other OS sysctls is that TCP is by and large the product of IETF standards vs a bunch of ad hoc OS developers. By definition behaviour not covered in any of the IETF standards which relate to TCP are non-standard i.e. a clear indication to the user they are manipulating something which goes against documented wisdom. I am somewhat sympathetic to your argument that such sysctls should perhaps receive no special namespace - I was merely voicing a strong objection and alternative to Andre's "experimental" tree at the time it was floated and subsequently introduced. The experimental tree should absolutely die and "nonstandard" was my 2 second attempt at a sensible name for the tree - all gripes with the naming should be directed my way. The issue here is about giving the sys admin control over users/apps potentially asking the system to do crazy crap that can harm other network users. My thinking is that tcp.nonstandard.allowed adds an extra level of thought on behalf of the sysadmin before allowing.

Oct 13 2015, 12:52 PM

Aug 25 2015

lstewart added a comment to D2970: Undo the increase in sequence number by 1 due to the FIN flag in case of a transient error..

This change seems inadequate given that we would have set TF_SENTFIN and updated snd_max. I haven't followed through all the implications of not reverting those changes, but if we're going to attempt a state rollback we'd better make sure we get it right. I'm also a bit unclear on some details in the original report given that an RTO would reset snd_nxt to snd_una and get us out of any permanent pickle. I'm not a fan of rollbacks in general as they're fragile. What's the use case where a rollback here matters?

Aug 25 2015, 4:34 PM
lstewart added a comment to D2970: Undo the increase in sequence number by 1 due to the FIN flag in case of a transient error..

As a side note, I really dislike the conflation of logical sequence space and data accounting used in many places in our stack. It's something that's fairly straight forward to address and I have some proof of concept patches I did a while ago which we should dust off at some point.

Aug 25 2015, 3:59 PM

Jun 17 2015

lstewart added a comment to D1761: Extend LRO support to accumulate more than 65535 bytes.

Ok, but that's anecdotal and gives us reviewers nothing to go on - without any methodology or raw data who knows whether the LRO change is solely responsible for the improvement and if it introduced any undesired side effects. It's also possible that with tuning, the same results could have been obtained without the "jumbo" LRO change.

Jun 17 2015, 11:52 PM
lstewart added a comment to D1761: Extend LRO support to accumulate more than 65535 bytes.

I hope I didn't delete it... from what I could see online, the "Abandon" Phabricator action is the means by which a reviewer indicates they have permanently rejected the patch (as opposed to suggesting changes).

Jun 17 2015, 10:31 PM
lstewart abandoned D1761: Extend LRO support to accumulate more than 65535 bytes.

Just because some hardware is capable of coalescing more than 64k of data doesn't mean we should feel obligated to support the functionality. I'd be curious to understand the anticipated use cases that led to hardware support being added. Without some compelling data to show that this is useful, I think this work should be put on ice until such time as it can be shown to be worthwhile. If such data exists, I'm willing to give it due consideration and revise my judgment, but at this stage I strongly suspect there is no workload we support or will support in the near future that would significantly benefit from raising the LRO chunk size above 64k vs the hacks required to make it work, so that's why I'm voting against this patch outright rather than suggesting changes. The real goal is to remove LRO entirely anyway, which I believe we have ideas on how to do e.g. packet batching techniques.

Jun 17 2015, 10:23 PM

Jun 5 2015

lstewart added a comment to D2079: Fix TCP timers use-after-free old race conditions.

Yes, lowering the keepalive timer was how I was triggering this more quickly during investigation as with our default it took days at high load to trigger. I've also analysed a core dump with the tp in t_state 0, so it's not specific to TIMEWAIT either. I think I might know what's going on but will hopefully confirm my findings later today.

Jun 5 2015, 12:17 AM

Jun 2 2015

lstewart added a comment to D2079: Fix TCP timers use-after-free old race conditions.

Randall accidentally misspoke. We're seeing tcp_timer_keep() fire with a tp in TIMEWAIT and t_inpcb==NULL. The rest of the tp looks sane indicating it hasn't been GCed. I'm still trying to understand how this is possible as the code looks correct to me, but I'm continuing to dig...

Jun 2 2015, 2:32 PM

May 27 2015

lstewart added a comment to D2079: Fix TCP timers use-after-free old race conditions.
In D2079#49598, @jch wrote:
Thanks for your detailed comment.
 
First, you are right INP_INFO lock is not required by `in_pcbdrop()` but instead by `in_pcbfree()` (and `in_pcbremlists()` which is called only from `in_pcbfree()`). Call stack from `tcp_timer_persist()` to `in_pcbfree()` is indeed far from being obvious:
tcp_timer_keep()
tcp_drop()
tcp_close()
sofree()
tcp_usr_detach() (via pr->pr_usrreqs->pru_detach() in sofree())
tcp_detach()
in_pcbfree()
in_pcbremlists()
May 27 2015, 12:13 AM

May 26 2015

lstewart added a comment to D2079: Fix TCP timers use-after-free old race conditions.

I'll prefix this by saying I'm not well versed in the finer points of PCBs and associated locking, and the locking guide in in_pcb.h is somewhat unclear on a few things to my mind. Apologies if this is all super obvious to others.

May 26 2015, 1:40 AM