Details

Reviewers

kib
jhb
• hselasky
imp
gnn
adrian
rwatson

Summary

Hans as correctly pointed out that if the macro
callout_reset() is used, then c->c_cpu will get passed in. If the
user has previously mixed an call in before to callout_reset_on(.. , newcpu) then
if the process switch is occurring at the callout_reset we could end up with a
cpu set to CPUBLOCK i.e. MAXCPU which is an invalid index to cc_cpu.

Now we also have holes in the cc_cpu array as well for NUMA, so lets also
validate that the cpu was initd

Test Plan

We can test that this works i.e. that it does not break the system and we
can also write a callout_test that will do this switch back and forth between
callout_reset_on and callout_reset.
No function that I can find in the kernel does this, but it would be best to
be cautious and have this protection.

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

rrs updated this revision to Diff 3841.Feb 18 2015, 1:40 PM

rrs retitled this revision from to Guarding against invalid CPU or mixed mode callout_reset and callout_reset_on.

rrs updated this object.

rrs edited the test plan for this revision. (Show Details)

rrs added reviewers: gnn, rwatson, jhb, kib, imp, • hselasky.

• hselasky added inline comments.Feb 18 2015, 7:45 PM

kern/kern_timeout.c
963 ↗	(On Diff #3841)	Why can't this or these check(s) be done after callout_lock(), when we know c_cpu is valid?

hiren added a subscriber: hiren.Feb 18 2015, 8:58 PM

For Han's pointed out callout_stop/callout_reset issue with migration. We can't
re-order the test, since then stop would fail in the locked case.

Separate the internal state keeping flags to a new c_iflags. Also fix
ng which incorrectly accesses the flags directly and was failing to use
the macros (which do the right thing).

This will make the callout system resilient against mis-use by KPI users (if
they fail to get a lock it won't cause the callout system to lockup since
the corruption will happen to the users c_flags not the c_iflags field).

This update takes JHB's suggestion and instead of splitting the
c_flags into two fields, makes callout_deallocate() be a function and it
gets the CC_LOCK for any bit twiddling.

• hselasky added inline comments.Feb 24 2015, 9:52 PM

sys/kern/kern_timeout.c
964	You should probably also add a check for CPU_ABSENT() like CPU_FOREACH() does.
1317	Can you refactor this code a bit and use "cc_cce_cleanup()". I suggest you move "cc_exec_curr(cc, direct) = NULL;" out of cc_cce_cleanup() because the places where it should be NULL, is already NULL'ed and use cc_cce_cleanup() instead of C&P'ing the code. --HPS
sys/sys/_callout.h
62	Can we avoid adding this variable and instead check if the c_cpu == CPUBLOCK and then don't override c_cpu ?

rrs added inline comments.Feb 24 2015, 10:28 PM

sys/kern/kern_timeout.c
964	There is no need, if the CPU is absent it would not be INIT'd i.e. the init var would be 0.
sys/sys/_callout.h
62	I don't think that would work since CPUBLOCK gives us special locking characteristics and we need it set to single gate the lockers during migration.

• hselasky added inline comments.Feb 25 2015, 8:41 AM

sys/kern/kern_timeout.c
964	What if a callout clients set an absent CPU value?
sys/sys/_callout.h
62	If the callout is being migrated, then surely we should not set the CPU number back to the old one, by means of l_cpu? I think it would make more sense to use the new one, I.E. ignore a CPU change upon cpu == CPUBLOCK.

rrs added inline comments.Feb 26 2015, 2:15 AM

sys/kern/kern_timeout.c
964	They can't.. if the CPU is not initialized then you would hit the else of line 970.
1317	I will have this in the next patch.. I like that idea and had missed the fact that we NULL cc_exec after we are done so we can take that out .. of course we have to add it to the init routine.
sys/sys/_callout.h
62	This really becomes a toss up. You either go back to the old one or go to the new one.. and of course its only in effect if the CPU passed in is invalid. So you are talking about a corner of a corner case The cpu passed by the callout_reset_on() is invalid and The CPU is currently migrating. What do you do? Leave it alone? i.e. make the cpu passed be equal to the new target CPU or migrate back. Either choice is probably not want the caller wanted.. they wanted a different CPU then either of those options. So I don't really think it matters IMO.

• hselasky added inline comments.Feb 26 2015, 6:34 AM

sys/kern/kern_timeout.c
964	OK, I see, and have you checked that timeout_cpu is valid, which sets up cc_inited? It is a tunable?
sys/sys/_callout.h
62	Hi, In the cases where c->c_cpu is passed, it is obvious, the driver wants the current or last selected CPU. Only then we can have CPUBLOCK passed. In the other cases where the driver gives a specific CPU, CPUBLOCK will not be passed, simply. --HPS

• hselasky added inline comments.Feb 26 2015, 7:08 AM

sys/kern/kern_timeout.c
964	s/cc_inited/timeout_cpu/

This addresses Han's issue with falling back to the "previous" CPU.. instead we
fall forward to the new one. The only way this occurs with CPUBLOCK in place
is if another thread holds the lock and is doing a cpu_switch.. so we just update
the lcpu to have the new value before we do the switch.

Other changes look good.

sys/kern/kern_timeout.c
963	Move these checks after the callout_lock() just below when c_cpu is stable, then use "cpu = c->c_cpu" instead of "cpu = c->l_cpu" and eliminate the l_cpu variable ??

rrs added inline comments.Feb 27 2015, 2:34 PM

sys/kern/kern_timeout.c
963	No you can not do that. I don't see what you are so worried about here. If the caller specifies an invalid CPU then you will crash when you try to lock an invalid non-init'd mutex. These tests are catching two things, the migrating CPU but also that a caller is not specifying an invalid CPU. The l_cpu variable adds one uint32_t.. and I took out a uint32_t when I moved the cc_exec_next out of the array.. so overall there is no change in size as compared to what was there previously (discounting holes of course not sure where those are in this structure).
964	No its set as it always as been when the O/S starts.. the timeout_cpu is always CPU 0, and I don't see no reason to change that. We could, I suppose, have a sysctl that could allow changing it.. just a matter of adding a small function to take in the timeout CPU, assure its valid and then set it.. But I won't do that has part of this patch since this is only fixing "issues".. and thats a feature.

• hselasky added inline comments.Feb 27 2015, 3:02 PM

sys/kern/kern_timeout.c
963	Hi, Maybe I misunderstand something. The "l_cpu" field you added is per callout. cc_exec_next() is per CPU. We add 4 bytes to each callout everywhere in the kernel to catch a race the probably will not happen that frequently. Did you misplace l_cpu, intending to put it into the cc_exec structure? --HPS

As of this coming Friday I will finish up my testing.. I am going to
contrast the performance penalty we get by using the lock in the deactivate
vs splitting the field.. (all my testing so far as been with a split field for the most part).

After getting those results I will make my final decision to either go with
jhb's lock in callout_deactivate() or a split field.. and I will commit that in. So please
speak up.. I will post some results by Wed, so things can settle in..

sys/kern/kern_timeout.c
963	Nope its not in the wrong place.. I just was not thinking.. it can't be in cc_exec it has to be in the callout structure.. So yes it adds 4 bytes.. for a race that does not often happen or, besides the race, a case where the caller specifies: callout_reset_sbt_on(... , cpu=N) where N is below MAXCPU but is an invalid cpu. This checks server two purposes and if the caller specifies a wrong CPU (or you hit the race) then you would be toast if you continue. I don't see how the locks would work either without this. You need to let cpu become CPUBLOCK in order to assure the case where a race spins waiting for the cpu switch to complete. If you want to protect against all conditions you need the variable..

imp added inline comments.Feb 28 2015, 11:02 PM

sys/kern/kern_timeout.c
585	why no CC_LOCK(cc) here?

rrs added inline comments.Mar 1 2015, 8:14 AM

sys/kern/kern_timeout.c
585	Well what happens when one does a callout_lock(c) is you are actually doing a CC_LOCK().. but gated so that two thread do not do a callout_cpu_switch() on the same c at the same time. Its possible that c->c_cpu is CPUBLOCK, this indicates to the caller of callout_lock() .. spin don't lock since the CC_LOCK is changing to a new CPU.. Its a very subtle but needed thing. I am still not convinced that this is a better mechanism then splitting the flags in two, I will run this under a heavy load and compare the results with running it with a split flag. The extra lock's unlocks may be in the noise factor.. or they may stand out.

Ok I have tested the former method (separate flags) that Hiren currently is using, and I
have tested the method suggested by jhb that uses a lock (shown here). There is
no detectible performance difference that I can see. The same machine with a
similar traffic load ran 40Gbps on both nights with virtually the same CPU load metrics.
Of course our load is all elephants with almost no mice, but it probably will be
the same for large numbers of mice since you most likely hope these timers
do *not* go off and you don't call deactivate unless the timer goes off...

Hiren, have you tested this latest version of the patch? Speak up if you don't
want this committed since I am ambivalent and don't care which way since
both seem to fix the problem. Unless I hear otherwise I will commit this on this
coming weekend.
R

ok so it has the callout_lock() inside callout_deactivate().. perfect.. that is what
I will commit this weekend :-)
R

In D1894#25, @rrs wrote:

Ok I have tested the former method (separate flags) that Hiren currently is using, and I
have tested the method suggested by jhb that uses a lock (shown here). There is
no detectible performance difference that I can see. The same machine with a
similar traffic load ran 40Gbps on both nights with virtually the same CPU load metrics.
Of course our load is all elephants with almost no mice, but it probably will be
the same for large numbers of mice since you most likely hope these timers
do *not* go off and you don't call deactivate unless the timer goes off...

Hiren, have you tested this latest version of the patch? Speak up if you don't
want this committed since I am ambivalent and don't care which way since
both seem to fix the problem. Unless I hear otherwise I will commit this on this
coming weekend.

No, I've been testing the patch you've provided to me with separated c_flags/c_iflags. That has been working great in prod for us. I'd favor that to go in ;-)

Hmm well thats the one that jhb objected to, to complex to look at I think was his
reasoning. It works fine .. IMO.. of course to make it "simpler" then you have to
add the lock which is a bit more overhead but I can't measure the difference on
my systems.

So, maybe we should go with the slightly more complex, less overhead version that
you have been testing... since it seems to have the most runtime (thats also
what I have been testing for the majority of my tests)..

jhb added inline comments.Mar 6 2015, 6:52 PM

sys/kern/kern_timeout.c
963	Eh, I would just panic on an invalid CPU. Guessing what valid CPU to use when someone has made a mistake is just hiding their bug. It also simplifies your handling somewhat. That CPUBLOCK gets passed in is most unfortunate. :) OTOH, when that happens, you know that the caller isn't trying to migrate the callout and that it is passing in c->c_cpu, so you could depend on that and just use regular callout_lock(), yes? (In fact, this argues for the non-*on() methods actually passing a cookie to say "don't move me" rather than accesssing c_cpu directly since in some cases they may "undo" a migration.) Given that, I think you don't need l_cpu at all. When you see CPUBLOCK, just ignore it. If you fix that, I would probably then change the non-on methods to always pass CPUBLOCK to fix the other issue of undoing a migration.

adrian added a reviewer: adrian.Mar 10 2015, 4:31 PM

My only comment here (besides great catch!) is whether any of those other macros are also thread-unsafe.

ie, the manpage doesn't state you need to hold any locks when calling them.

It isn't an immediate "should fix this as part of this review", but it may be useful to turn them into functions that include a lock check if they require a lock is held.

This updates us to:
a) What Hiren is running in production i.e. split the c_flags into two, and avoid a lock in callout_deactivate()
b) Take JHB's suggestion and have non-caring macro calls to the per_cpu base calls *not* pass in c->c_cpu but

instead a cookie, in this case -1, I did not want to use CPUBLOCK since that is already over-loaded.

Hopefully this will about bring to ground all the bug portion of this and I can soon start a *new* one with the
adding of callout_drain_async ;-)

rrs abandoned this revision.Nov 9 2015, 3:04 PM

Guarding against invalid CPU or mixed mode callout_reset and callout_reset_on
AbandonedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 4221

sys/kern/kern_timeout.c

sys/netgraph/atm/sscop/ng_sscop_cust.h

sys/netgraph/atm/uni/ng_uni_cust.h

sys/sys/_callout.h

sys/sys/callout.h

Guarding against invalid CPU or mixed mode callout_reset and callout_reset_onAbandonedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 4221

sys/kern/kern_timeout.c

sys/netgraph/atm/sscop/ng_sscop_cust.h

sys/netgraph/atm/uni/ng_uni_cust.h

sys/sys/_callout.h

sys/sys/callout.h

Guarding against invalid CPU or mixed mode callout_reset and callout_reset_on
AbandonedPublic
Actions

Revision Contents
Changeset List