- User Since
- May 11 2014, 7:08 PM (239 w, 5 d)
Mon, Dec 3
Thanks all for the feedback... I'm a bit rusty on working with ports. Will commit sometime this week with my src commit bit hat on and "review/approved by:" if no further feedback materialises.
Mon, Nov 26
Checked with one of the authors. Preference is to reference the Bitbucket URL in pkg-descr if it can only list a single URL.
Full context diff, shuffle variable order and ditch DISTVERSION in favour of a custom variable to hold the commit hash used in the tarball directory name.
Oct 31 2018
@tuexen Any thoughts on the TFO case?
Oct 18 2018
Jul 21 2018
Jul 20 2018
Jun 8 2018
May 17 2018
May 15 2018
May 14 2018
Final commit candidate, rebased against FreeBSD head r333598, and with M_ZERO removed from malloc call.
May 10 2018
newreno_plugleak_v3.diff with getsockopt(2)/setsockopt(2) updated.
@wollman Many thanks for the historical and standards related context - greatly appreciated. I realised that I probably need to update getsockopt(2)/setsockopt(2) as well...
May 9 2018
Still to be tested, but I think something like this would address the leak and change memory allocation to being conditional on need: newreno_plugleak_v1.diff
Mar 19 2018
Feb 7 2018
@jason_eggnet.com I thought about this some more and while there is no doubt that overflow/underflow due to the function's inputs is possible and needs to be remedied, the cause of overflow in your test case is not in fact the time since congestion being too large, but the bogus value of K, which for a wmax of 1460 bytes (i.e. slightly more than 1 MSS) should be 205 per my quick check:
Oh and @jason_eggnet.com , regarding your test code, I structured things the way they are so that you can simply #include <netinet/cc/cc_cubic.h> into your userspace test.c to get access to the various window calculation inlines rather than copy/pasting them.
This fix doesn't make sense to me. If it has been a long time since the last congestion epoch resulting in an overflow in the calculated congestion window, collapsing the window from <something_large> to 1 segment is a terrible idea. There's also no sound reason that cwnd shouldn't be allowed to shrink below 1 segment, and if that causes problems elsewhere, those places should be fixed.
Aug 24 2017
Aug 18 2017
yoda be gone
Aug 17 2017
Update diff post r322614 commit.
Conrad: would you be willing to sanity check this sbuf change for me as well?
Aug 15 2017
Aug 8 2017
Rebase patch against current head and address review feedback.
The off-by-one error was incorrectly attributed to the condition that checks if vsnprintf() was successful at rendering all of the specified content into the sbuf.
Aug 2 2017
Thanks Tom, looking good. Will wait a few days to see if any further feedback materialises.
Jul 28 2017
Jul 27 2017
This looks good barring a couple of nits as noted. Please update tcp(4) and cc_newreno(4) appropriately.
Jul 17 2017
I'm pretty sure this wouldn't compile as proposed. Please remember to build test patches against FreeBSD's svn head branch, as all work always gets comitted to head first before potentially being backported later.
Jul 11 2017
Following up on some discussion held on the fringe of the recent developer summit at BSDCan, what this work is missing is some accompanying documentation to demonstrate the dynamic behaviour of the proposed implementation over some range of relevant network parameters, along with a critique of expected versus measured/observed behaviour. I'm not asking for a full blown academic paper - a stream of consciousness Google-doc with some data, a few X vs time plots and some comments would suffice. Happy to provide guidance if required - you know where to reach me.
May 24 2017
May 4 2017
Apr 7 2017
Mar 19 2017
Apologies for the delay in getting to this.
Feb 20 2017
Feb 17 2017
Nov 17 2016
@kib: You prefer this?
Nov 16 2016
Nov 10 2016
With this patch applied on top of svn head r308477, I built a custom memstick image, wiped the harddrives and reran my build guide steps including creation of the RAIDZ zpool with "-O checksum=skein" option and can confirm the system boots, so although I did not test without this patch applied, I think it's safe to say the patch fixed things given that there have been no other relevant changes in sys/boot between the revision I tested with previously (r307747) and r308477.
To clarify, I did not test prior to this commit, so am not sure if it ever worked. I merely looked up svn log history on sys/boot and this seemed like a plausible culprit assuming there has been a regression.
Nov 9 2016
Is it possible this broke UEFI booting from RAIDZ pools with checksum=skein? If I omit "-O checksum=skein" from my zpool create step everything works great, but with skein checksums, boot1 sees my pool (prints my pool name after "found the following pools" message) but hangs where it should have run loader.efi and then requires a hard reset.
Oct 12 2016
Aug 25 2016
Aug 24 2016
Aug 18 2016
May 18 2016
Apr 29 2016
But why do we need such finely grained control? I don't get it. Either it works or it doesn't. We shouldn't be doing 6675 piecemeal. We should be doing 6675 in full and enabled by default. Providing any level of minutiae beyond enabled/disabled is not only unnecessary but a bad idea IMO.
Why isn't there simply a do_rfc6675 knob that supersedes this and the previously committed work?
Apr 21 2016
I thought that had been fixed ages ago... oops. It should be calling cc_cong_signal() with a new congestion type. Just leave that line as is for the moment though as Mike says.
... but add a macro to check that the rexmit/persist timer is armed if appropriate! Should be added higher up though so that it is checked before all return statements in the vicinity.
Apr 19 2016
I agree with Mike's proposal (although FYI, I do belive tcp_output() will send an ACK on RTO). TCP ACKs are intentionally unreliable by design and setting the retransmit timer there is nonsense - either there is a bug elsewhere which needs to be fixed, or it is trying to paper over local ACK loss in a dubious manner. The ENOBUFS case should also become a thing of the past when the back pressure work goes in any way. For the immediate change, perhaps replacing with a macro that expands to a KASSERT to double check the appropriate conditions for the retransmit or persist timers being set would be a good idea. The macro should be used elsewhere in tcp_output() and tcp_intput() as well but that can be done in follow up commit(s).
Mar 30 2016
Apologies for the delay in getting to this, still heads down wrapping up my PhD thesis. The comments from Kib and Mark all appear to have been addressed and the changes look good. I don't really have an opinion on the "report first or most recent" error issue.
Feb 10 2016
You're a bit "warmer" with the revised changes but still a fair ways off the mark. Apologies to anyone watching but I'm too time poor at the moment to engage in the proper but protracted back-and-forth public Phabricator discussion to resolve all the problems with this work. Perhaps another brief sync on IRC is in order and you can always summarise the chat logs here as context for others.
Feb 2 2016
Feb 1 2016
Oops, that should of course be 4 segments, not 3 (though recall that we really need an initcwnd_bytes variable as well in order to fully capture the spirit of the RFC 3390 and later RFCs - something perhaps you can add as part of this work).
I would suggest that the code to handle RFC3390 should be merged with the new code i.e. the net.inet.tcp.rfc3390 sysctl should become a SYSCTL_PROC and simply set V_tcp_initcwnd_segments=3 behind the scenes, and return the evaluated result of "V_tcp_initcwnd_segments==3" as the sysctl value.
Jan 12 2016
Oct 27 2015
Oct 15 2015
I have to be brief and can't respond to each comment as I'm about to hit the road for a wedding 7 hours away, but in short I disagree with having a max. If we're going to allow arbitrary settings of initcwnd regardless of having a safety belt to limit whether an unprivileged user can request a different value, it should be unbounded. We can always add the safety belt in later (Robert's and others' concerns seem to have misunderstood the nature of the safety belt proposal w.r.t. sysctl churn but we can revisit another time).
Oct 14 2015
So in the new world order we have net.inet.tcp.initcwnd=10, no master control switch and net.inet.tcp.experimental.* is no more. I'm an app developer and I come in and setsockopt TCP_INITCWND=100. Are we comfortable with saying the app developer knows best and not giving the sysadmin a mechanism to control? I don't care about people stupidly copying sysctl statements from the Internet because it requires a conscious choice for change and they have admin rights on the system, but are we comfortable with not having a mechanism to empower the sysadmin to control per vnet per socket changes to things which can have a non trivial influence beyond the socket and system?
Let's be careful not to conflate standard/non-standard with our system defaults. For some more context, Andre's intent for the experimental tree was to house things which were published within the IETF as experimental or draft status vs standards track. I argue that non-standard is a more appropriate grouping and in fact a superset of experimental, as it also encompasses anything we (the FreeBSD OS) choose to do which is not related to efforts within the IETF. If we choose to set the system default initial cwnd to 10 in a given branch of FreeBSD (as we have even though it is experimental as far as the IETF is concerned), that is orthogonal to standards compliance and orthogonal to whether an admin chooses to let an app request a different value via the tcp.nonstandard.allowed mechanism, which we are putting in place as a hoop to jump through to hopefully make people think twice about before changing.
Oct 13 2015
@koobs: The difference between TCP related sysctls and other OS sysctls is that TCP is by and large the product of IETF standards vs a bunch of ad hoc OS developers. By definition behaviour not covered in any of the IETF standards which relate to TCP are non-standard i.e. a clear indication to the user they are manipulating something which goes against documented wisdom. I am somewhat sympathetic to your argument that such sysctls should perhaps receive no special namespace - I was merely voicing a strong objection and alternative to Andre's "experimental" tree at the time it was floated and subsequently introduced. The experimental tree should absolutely die and "nonstandard" was my 2 second attempt at a sensible name for the tree - all gripes with the naming should be directed my way. The issue here is about giving the sys admin control over users/apps potentially asking the system to do crazy crap that can harm other network users. My thinking is that tcp.nonstandard.allowed adds an extra level of thought on behalf of the sysadmin before allowing.
Aug 25 2015
This change seems inadequate given that we would have set TF_SENTFIN and updated snd_max. I haven't followed through all the implications of not reverting those changes, but if we're going to attempt a state rollback we'd better make sure we get it right. I'm also a bit unclear on some details in the original report given that an RTO would reset snd_nxt to snd_una and get us out of any permanent pickle. I'm not a fan of rollbacks in general as they're fragile. What's the use case where a rollback here matters?
As a side note, I really dislike the conflation of logical sequence space and data accounting used in many places in our stack. It's something that's fairly straight forward to address and I have some proof of concept patches I did a while ago which we should dust off at some point.
Jun 17 2015
Ok, but that's anecdotal and gives us reviewers nothing to go on - without any methodology or raw data who knows whether the LRO change is solely responsible for the improvement and if it introduced any undesired side effects. It's also possible that with tuning, the same results could have been obtained without the "jumbo" LRO change.
I hope I didn't delete it... from what I could see online, the "Abandon" Phabricator action is the means by which a reviewer indicates they have permanently rejected the patch (as opposed to suggesting changes).
Just because some hardware is capable of coalescing more than 64k of data doesn't mean we should feel obligated to support the functionality. I'd be curious to understand the anticipated use cases that led to hardware support being added. Without some compelling data to show that this is useful, I think this work should be put on ice until such time as it can be shown to be worthwhile. If such data exists, I'm willing to give it due consideration and revise my judgment, but at this stage I strongly suspect there is no workload we support or will support in the near future that would significantly benefit from raising the LRO chunk size above 64k vs the hacks required to make it work, so that's why I'm voting against this patch outright rather than suggesting changes. The real goal is to remove LRO entirely anyway, which I believe we have ideas on how to do e.g. packet batching techniques.
Jun 5 2015
Yes, lowering the keepalive timer was how I was triggering this more quickly during investigation as with our default it took days at high load to trigger. I've also analysed a core dump with the tp in t_state 0, so it's not specific to TIMEWAIT either. I think I might know what's going on but will hopefully confirm my findings later today.
Jun 2 2015
Randall accidentally misspoke. We're seeing tcp_timer_keep() fire with a tp in TIMEWAIT and t_inpcb==NULL. The rest of the tp looks sane indicating it hasn't been GCed. I'm still trying to understand how this is possible as the code looks correct to me, but I'm continuing to dig...
May 27 2015
May 26 2015
I'll prefix this by saying I'm not well versed in the finer points of PCBs and associated locking, and the locking guide in in_pcb.h is somewhat unclear on a few things to my mind. Apologies if this is all super obvious to others.