As I wrote before, this race condition was driven by an issue in callout_stop() that it is fixed in rS286880: callout_stop() should return 0 (fail) when the callout is currently (D3078). I will revert rS284245 as it is not more needed now.

Aug 18 2015, 10:21 AM

jch committed rS286880: callout_stop() should return 0 (fail) when the callout is currently.

callout_stop() should return 0 (fail) when the callout is currently

Aug 18 2015, 10:15 AM

jch closed D3078: callout_stop() should return 0 when the callout is currently being serviced and indeed unstoppable. by committing rS286880: callout_stop() should return 0 (fail) when the callout is currently.

Aug 18 2015, 10:15 AM

jch updated the diff for D3078: callout_stop() should return 0 when the callout is currently being serviced and indeed unstoppable..

Rebase on top of r286874.

Updating D3078: callout_stop() should return 0 when the callout is currently being serviced and

indeed unstoppable.

Aug 18 2015, 9:04 AM

jch committed rS286873: Make clear that TIME_WAIT timeout expiration is managed solely by.

Make clear that TIME_WAIT timeout expiration is managed solely by

Aug 18 2015, 8:27 AM

Aug 8 2015

jch committed rS286443: Fix a kernel assertion issue introduced with r286227:.

Fix a kernel assertion issue introduced with r286227:

Aug 8 2015, 8:40 AM

Aug 3 2015

jch committed rS286227: Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability:.

Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability:

Aug 3 2015, 12:14 PM

jch closed D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability by committing rS286227: Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability:.

Aug 3 2015, 12:14 PM

jch updated the test plan for D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

Aug 3 2015, 8:40 AM

jch added inline comments to D3078: callout_stop() should return 0 when the callout is currently being serviced and indeed unstoppable..

Aug 3 2015, 8:18 AM

jch updated the diff for D3078: callout_stop() should return 0 when the callout is currently being serviced and indeed unstoppable..

Follow jhb's idea: Use 'not_running' instead of 'running'.

Updating D3078: callout_stop() should return 0 when the callout is currently being serviced and

indeed unstoppable.

Aug 3 2015, 8:17 AM

Aug 2 2015

jch added a comment to D3078: callout_stop() should return 0 when the callout is currently being serviced and indeed unstoppable..

In D3078#66062, @jhb wrote:
This must be a recent regression? The old code definitely checked for this case. For example, in stable/9:
if (!(c->c_flags & CALLOUT_PENDING)) {
    ...

Aug 2 2015, 8:16 PM

Jul 31 2015

jch added a reviewer for D3078: callout_stop() should return 0 when the callout is currently being serviced and indeed unstoppable.: jhb.

Jul 31 2015, 12:55 PM

jch added a comment to D3078: callout_stop() should return 0 when the callout is currently being serviced and indeed unstoppable..

I will push this change by end of next week, thus if you need more time please scream. As usual, comments are more than welcomed even after the commit. Thanks.

Jul 31 2015, 12:55 PM

Jul 30 2015

jch added a comment to D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

As this change is quite stable and I have addressed all the review comments, I plan to push it by the end of this week. As usual please scream if you have something more to add. Moreover comments are still welcomed here even after this change being pushed. Thanks all for your time.

Jul 30 2015, 4:10 PM

jch updated the diff for D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

[tcp-scale]: Rebase on HEAD r286066

Jul 30 2015, 4:05 PM

Jul 14 2015

jch updated the diff for D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

No need anymore to upgrade to INP_INFO_RLOCK/INP_WLOCK
state in tcp_timer_rexmt(), we are already in this state.

Jul 14 2015, 1:56 PM

jch updated the diff for D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

Adding jhb's suggested comment in syncache_socket().

Jul 14 2015, 1:50 PM

jch updated the diff for D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

[tcp-scale]: Add comment proposed by jhb about having two inps locked at
same time without the exclusive INP_INFO lock.

Jul 14 2015, 1:32 PM

jch updated the diff for D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

Rebase on top of r285351.

Jul 14 2015, 12:37 PM

jch updated subscribers of D3078: callout_stop() should return 0 when the callout is currently being serviced and indeed unstoppable..

Jul 14 2015, 12:26 PM

jch added a comment to D3078: callout_stop() should return 0 when the callout is currently being serviced and indeed unstoppable..

Sorry @rrs you are the latest one to have done big changes to callout thus I picked up you first for this review, Tell me if you have time or not for it.

Jul 14 2015, 12:26 PM

jch updated the test plan for D3078: callout_stop() should return 0 when the callout is currently being serviced and indeed unstoppable..

Jul 14 2015, 12:24 PM

jch retitled D3078: callout_stop() should return 0 when the callout is currently being serviced and indeed unstoppable. from to callout_stop() should return 0 when the callout is currently being serviced and indeed unstoppable..

Jul 14 2015, 12:23 PM

Jun 29 2015

jch accepted D2946: Avoid a situation where we do not set persist timer after a zero window condition..

Jun 29 2015, 5:14 PM

jch added a comment to D2946: Avoid a situation where we do not set persist timer after a zero window condition..

I reviewed this patch as part of:

Jun 29 2015, 5:14 PM

Jun 19 2015

jch added a comment to D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

In D2599#55428, @mat wrote:

Ooops, sorry, I was trying to remove myself from the subscribers here :-/

Jun 19 2015, 12:24 PM

jch reclaimed D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

Jun 19 2015, 12:23 PM

jch added inline comments to D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

Jun 19 2015, 12:09 PM

Jun 18 2015

jch added a comment to D2079: Fix TCP timers use-after-free old race conditions.

In D2079#49517, @lstewart wrote:

Leaving aside D2599 for the moment (which looks like good work and I will indeed take a look at it in detail - please include me on reviews for any TCP related work. I don't always get time to give them attention in the review window, but being aware of the work is very useful), I'm still not clear why tcp_drop(), and therefore the timers which call it, need the info lock in the new world order (in fact, I think my confusion also applies to the old world order. I was thinking that taking the reference on the inpcb in tcp_newtcpcb() means you now control when the inpcb can be GCed with respect to the timers executing which should allow simplification of the locking in the timers. It may even be the case that the reference you hold is irrelevant to the following thoughts...)

Jun 18 2015, 1:19 PM

Jun 17 2015

jch added a comment to D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

Hi guys, below a quick update:

Jun 17 2015, 9:58 AM

Jun 13 2015

jch updated the diff for D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

Comment improvement from jhb

Jun 13 2015, 2:40 PM

jch added a comment to D2763: Fix a callout race condition introduced in TCP timers callouts with r281599..

The fact that callout_stop() can return 1 (i.e. callout successfully stopped) where this exact callout is just about to be ran can be seen as bug (/feature). Marc proposed me a fix for this callout bug(/feature) and will ask @rrs if it deserves to be fixed(/documented). Thanks again for your inputs/review and testing.

Jun 13 2015, 2:35 PM

jch added a comment to D2079: Fix TCP timers use-after-free old race conditions.

The race condition introduced with this change has been fixed as part of D2763: Fix a callout race condition introduced in TCP timers callouts with r281599. in HEAD and STABLE-10.

Jun 13 2015, 2:33 PM

Jun 12 2015

jch updated the diff for D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

[tcp-scale]: Use INP_INFO_RLOCK in tcp_timer_discard()
Improve INP_INFO_LOCK assertions in cxgb/cxgbe tom

Jun 12 2015, 2:59 PM

Jun 11 2015

jch set the repository for D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability to rS FreeBSD src repository - subversion.

Jun 11 2015, 3:47 PM

jch updated the diff for D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

Rebased on svn path=/head/; revision=284266

Jun 11 2015, 3:47 PM

jch added a comment to D2763: Fix a callout race condition introduced in TCP timers callouts with r281599..

Patch pushed in both HEAD and 10-STABLE. And it is not too late for comments on this review, it is never too late for improvements. Thanks all for your time.

Jun 11 2015, 2:26 PM

jch committed rS284261: MFC r284245:.

MFC r284245:

Jun 11 2015, 1:44 PM

Jun 10 2015

jch committed rS284245: Fix a callout race condition introduced in TCP timers callouts with r281599..

Fix a callout race condition introduced in TCP timers callouts with r281599.

Jun 10 2015, 8:43 PM

jch closed D2763: Fix a callout race condition introduced in TCP timers callouts with r281599. by committing rS284245: Fix a callout race condition introduced in TCP timers callouts with r281599..

Jun 10 2015, 8:43 PM

jch added a comment to D2763: Fix a callout race condition introduced in TCP timers callouts with r281599..

In D2763#52995, @nitroboost-gmail.com wrote:

So far this is looking solid for us. Both with defaults and lowered keep alives on the same traffic patterns that caused the cores prior. Running with net.inet.tcp.per_cpu_timers = 1

Jun 10 2015, 6:23 PM

jch updated the diff for D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

Add D2763 as dependency
Rebased change on r284151

Jun 10 2015, 5:54 PM

Jun 9 2015

jch updated D2763: Fix a callout race condition introduced in TCP timers callouts with r281599..

Jun 9 2015, 4:01 AM

jch updated D2763: Fix a callout race condition introduced in TCP timers callouts with r281599..

Jun 9 2015, 4:01 AM

jch set the repository for D2763: Fix a callout race condition introduced in TCP timers callouts with r281599. to rS FreeBSD src repository - subversion.

Jun 9 2015, 4:00 AM

jch added reviewers for D2763: Fix a callout race condition introduced in TCP timers callouts with r281599.: hiren, jhb, adrian.

Jun 9 2015, 4:00 AM

jch updated D2763: Fix a callout race condition introduced in TCP timers callouts with r281599..

Jun 9 2015, 3:59 AM

jch retitled D2763: Fix a callout race condition introduced in TCP timers callouts with r281599. from to Fix a callout race condition introduced in TCP timers callouts with r281599..

Jun 9 2015, 3:57 AM

Jun 8 2015

jch added a comment to D2079: Fix TCP timers use-after-free old race conditions.

Here

tcp-timer.patch5 KBDownload

my current patch to fix this issue, I am currently not able to reproduce it on HEAD with this patch applied. Let me know how if it works for you. If it works well, I will create a review with this patch and test it also on stable/10.

Jun 8 2015, 7:32 AM

jch added a comment to D2079: Fix TCP timers use-after-free old race conditions.

Just for the record, below how I got details on this issue:

Jun 8 2015, 6:16 AM

jch added a comment to D2079: Fix TCP timers use-after-free old race conditions.

In D2079#52025, @jch wrote:
In D2079#52021, @jch wrote:
In D2079#51973, @lstewart wrote:

Yes, lowering the keepalive timer was how I was triggering this more quickly during investigation as with our default it took days at high load to trigger. I've also analysed a core dump with the tp in t_state 0, so it's not specific to TIMEWAIT either. I think I might know what's going on but will hopefully confirm my findings later today.

Interesting. On my side I finally reproduce your exact issue:
panic: tcp_timer_keep: tp 0xfffff804210fc418 tp->t_inpcb == NULL
Just I added debugging code to get a better context view (see below). And it appears that:

TCP keep-alive time was running

callout_stop(TT_KEEP) returned successfully

As no TCP callouts were apparently running tcp_discardcb() decided to directly free the tcpcb

Crash because a TT_KEEP callout was indeed still running and called afterward

I am digging this scenario...

Jun 8 2015, 5:31 AM

Jun 5 2015

jch added a comment to D2079: Fix TCP timers use-after-free old race conditions.

In D2079#52021, @jch wrote:
In D2079#51973, @lstewart wrote:

Yes, lowering the keepalive timer was how I was triggering this more quickly during investigation as with our default it took days at high load to trigger. I've also analysed a core dump with the tp in t_state 0, so it's not specific to TIMEWAIT either. I think I might know what's going on but will hopefully confirm my findings later today.

Interesting. On my side I finally reproduce your exact issue:
panic: tcp_timer_keep: tp 0xfffff804210fc418 tp->t_inpcb == NULL
Just I added debugging code to get a better context view (see below). And it appears that:

TCP keep-alive time was running

callout_stop(TT_KEEP) returned successfully

As no TCP callouts were apparently running tcp_discardcb() decided to directly free the tcpcb

Crash because a TT_KEEP callout was indeed still running and called afterward

I am digging this scenario...

Jun 5 2015, 12:15 PM

jch added a comment to D2079: Fix TCP timers use-after-free old race conditions.

In D2079#51973, @lstewart wrote:

Yes, lowering the keepalive timer was how I was triggering this more quickly during investigation as with our default it took days at high load to trigger. I've also analysed a core dump with the tp in t_state 0, so it's not specific to TIMEWAIT either. I think I might know what's going on but will hopefully confirm my findings later today.

Jun 5 2015, 8:49 AM

Jun 4 2015

jch added a comment to D2079: Fix TCP timers use-after-free old race conditions.

I might have found a way to reproduce this issue: Set the TCP keep-alive timers very low:

Jun 4 2015, 9:20 PM

Jun 3 2015

jch added a comment to D2079: Fix TCP timers use-after-free old race conditions.

In D2079#51293, @lstewart wrote:

Randall accidentally misspoke. We're seeing tcp_timer_keep() fire with a tp in TIMEWAIT and t_inpcb==NULL. The rest of the tp looks sane indicating it hasn't been GCed. I'm still trying to understand how this is possible as the code looks correct to me, but I'm continuing to dig...

Jun 3 2015, 2:56 PM

Jun 1 2015

jch added a comment to D2079: Fix TCP timers use-after-free old race conditions.

In D2079#50233, @rrs wrote:

We don't use TOE (we use LRO though). The panic's we have are the persist timer. Lawrence
has an idea though and is investigating that. Maybe he can turn something up.

The problem of course is it happens in production after hours of running at full load.. so
no we have not done an INVARIANT run. Lets see what Lawrence turns up. We will
probably hold off merging this to our next release until we can get the issue resolved ;-)

Jun 1 2015, 9:50 PM

May 29 2015

jch added inline comments to D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

May 29 2015, 1:10 PM

May 28 2015

jch added a comment to D2079: Fix TCP timers use-after-free old race conditions.

In D2079#50069, @rrs wrote:
We have put these changes into our NF caches, and we now are seeing
crashes that all relate to the removal of

if (inp == NULL) {
// count race
return
}

We have several crashes under load with this, so it appears there
is some un-thought out issue with this.

I believe we will have to at least put the inp == NULL check back in
for our purposes, but someone may want to take a look at this
and see why its happening..

(note we don't get the kassert since we don't have INVARIANT compiled
in we just get a crash in the inp lock :-o

May 28 2015, 9:40 PM

jch added a comment to D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

Fixed all @jhb comments (so far). Thanks for your time.

May 28 2015, 10:35 AM

jch added inline comments to D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

May 28 2015, 10:33 AM

jch updated the diff for D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

[tcp-scale]: Fix jhb's comment on ipi_gencnt/ipi_count access:

May 28 2015, 10:31 AM

jch added inline comments to D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

May 28 2015, 10:19 AM

jch updated the diff for D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

[tcp-scale]: Fix jhb's comment on syncache_expand() comments.

May 28 2015, 10:19 AM

jch added inline comments to D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

May 28 2015, 9:57 AM

jch updated the diff for D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

[tcp-scale]: Apply jhb's review comments on code comments.

May 28 2015, 9:56 AM

May 26 2015

jch added a comment to D2079: Fix TCP timers use-after-free old race conditions.

In D2079#49517, @lstewart wrote:
...
Let's talk through tcp_timer_persist() which calls tcp_drop(). First point - as I understand things, given the ref taken in tcp_newtcpcb(), we know that any call to in_pcbrele*() from functions called by the timer will not GC the inpcb. So we call:
tp = tcp_drop(tp, ETIMEDOUT);
...

May 26 2015, 4:28 PM

jch added a reviewer for D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability: lstewart.

May 26 2015, 5:49 AM

May 22 2015

jch added a comment to D2079: Fix TCP timers use-after-free old race conditions.

In D2079#48600, @lstewart wrote:

Sorry for coming very late to the party and I realise you've already committed the changes, but thought I'd ask my question here so that all the context relating to this work is in one place...

In the new world order with your changes, I'm a little unclear about the need for the INP_INFO_LOCK in any of the TCP timer code. Can you please comment on if the lock is needed or not, and if it is, help me understand why?

May 22 2015, 8:11 AM

May 20 2015

jch updated the test plan for D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

May 20 2015, 1:08 PM

jch updated D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability.

May 20 2015, 1:00 PM

jch retitled D2599: Decompose TCP INP_INFO lock to increase short-lived connections scalability from to Decompose TCP INP_INFO lock to increase short-lived connections scalability.

May 20 2015, 12:59 PM

May 15 2015

jch added a comment to D1982: Fix return errors in tcp_usrreq.c.

Change MFC-ed in stable/10 here rS282968: MFC r279821:. Closing this revision.

May 15 2015, 12:38 PM

jch closed D1982: Fix return errors in tcp_usrreq.c.

May 15 2015, 12:36 PM

jch committed rS282968: MFC r279821:.

MFC r279821:

May 15 2015, 12:35 PM

jch committed rS282964: MFC: r280904, r280990, r281599.

MFC: r280904, r280990, r281599

May 15 2015, 12:07 PM

Apr 16 2015

jch added a comment to D1563: Restore multi threaded callouts in the TCP stack.

I believe D2079: Fix TCP timers use-after-free old race conditions fixed the same use-after-free race condition than this patch. The differences are:

It uses the old callout API only (no callout_drain_async())
It does not use inp_lock to protect callouts to avoid the INP_INFO_WLOCK/INP_WLOCK LOR management burden

Apr 16 2015, 3:54 PM

jch closed D2079: Fix TCP timers use-after-free old race conditions.

Apr 16 2015, 2:49 PM

jch accepted D2079: Fix TCP timers use-after-free old race conditions.

Apr 16 2015, 1:50 PM

jch added a comment to D2079: Fix TCP timers use-after-free old race conditions.

Review closed with rS281599: Fix an old and well-documented use-after-free race condition in commit.

Apr 16 2015, 10:13 AM

jch added inline comments to D2079: Fix TCP timers use-after-free old race conditions.

Apr 16 2015, 9:13 AM

jch updated the diff for D2079: Fix TCP timers use-after-free old race conditions.

Expand tcp_timer_stop() comment based on jhb review

Apr 16 2015, 9:12 AM

Apr 15 2015

jch added inline comments to D2079: Fix TCP timers use-after-free old race conditions.

Apr 15 2015, 2:47 PM

Apr 13 2015

jch updated the diff for D2079: Fix TCP timers use-after-free old race conditions.

Rebase patch on top of r281483

Apr 13 2015, 1:29 PM

jch updated the diff for D2079: Fix TCP timers use-after-free old race conditions.

Rebase patch on top of r281483

Apr 13 2015, 1:17 PM

Apr 8 2015

jch added a comment to D2079: Fix TCP timers use-after-free old race conditions.

I plan to propose this change to my mentor (@jhb) by the end of this week, thus @rrs and @bz please scream if you would like more time to study the functional side. (I was indeed expecting more questions/comments on this part).

Apr 8 2015, 7:10 AM

jch updated the test plan for D2079: Fix TCP timers use-after-free old race conditions.

Apr 8 2015, 7:04 AM