Page MenuHomeFreeBSD

Guarding against invalid CPU or mixed mode callout_reset and callout_reset_on
AbandonedPublic

Authored by rrs on Feb 18 2015, 1:40 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Dec 2, 4:47 AM
Unknown Object (File)
Mon, Dec 2, 4:47 AM
Unknown Object (File)
Mon, Dec 2, 4:47 AM
Unknown Object (File)
Mon, Dec 2, 4:47 AM
Unknown Object (File)
Mon, Dec 2, 4:47 AM
Unknown Object (File)
Mon, Dec 2, 4:47 AM
Unknown Object (File)
Mon, Dec 2, 4:47 AM
Unknown Object (File)
Mon, Dec 2, 4:24 AM
Subscribers

Details

Summary

Hans as correctly pointed out that if the macro
callout_reset() is used, then c->c_cpu will get passed in. If the
user has previously mixed an call in before to callout_reset_on(.. , newcpu) then
if the process switch is occurring at the callout_reset we could end up with a
cpu set to CPUBLOCK i.e. MAXCPU which is an invalid index to cc_cpu.

Now we also have holes in the cc_cpu array as well for NUMA, so lets also
validate that the cpu was initd

Test Plan

We can test that this works i.e. that it does not break the system and we
can also write a callout_test that will do this switch back and forth between
callout_reset_on and callout_reset.
No function that I can find in the kernel does this, but it would be best to
be cautious and have this protection.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

rrs retitled this revision from to Guarding against invalid CPU or mixed mode callout_reset and callout_reset_on.
rrs updated this object.
rrs edited the test plan for this revision. (Show Details)
rrs added reviewers: gnn, rwatson, jhb, kib, imp, hselasky.
kern/kern_timeout.c
963 ↗(On Diff #3841)

Why can't this or these check(s) be done after callout_lock(), when we know c_cpu is valid?

For Han's pointed out callout_stop/callout_reset issue with migration. We can't
re-order the test, since then stop would fail in the locked case.

Separate the internal state keeping flags to a new c_iflags. Also fix
ng which incorrectly accesses the flags directly and was failing to use
the macros (which do the right thing).

This will make the callout system resilient against mis-use by KPI users (if
they fail to get a lock it won't cause the callout system to lockup since
the corruption will happen to the users c_flags not the c_iflags field).

This update takes JHB's suggestion and instead of splitting the
c_flags into two fields, makes callout_deallocate() be a function and it
gets the CC_LOCK for any bit twiddling.

sys/kern/kern_timeout.c
964

You should probably also add a check for CPU_ABSENT() like CPU_FOREACH() does.

1317

Can you refactor this code a bit and use "cc_cce_cleanup()". I suggest you move "cc_exec_curr(cc, direct) = NULL;" out of cc_cce_cleanup() because the places where it should be NULL, is already NULL'ed and use cc_cce_cleanup() instead of C&P'ing the code.

--HPS

sys/sys/_callout.h
62

Can we avoid adding this variable and instead check if the c_cpu == CPUBLOCK and then don't override c_cpu ?

sys/kern/kern_timeout.c
964

There is no need, if the CPU is absent it would not be INIT'd i.e. the init var would be 0.

sys/sys/_callout.h
62

I don't think that would work since CPUBLOCK gives us special locking characteristics and we
need it set to single gate the lockers during migration.

sys/kern/kern_timeout.c
964

What if a callout clients set an absent CPU value?

sys/sys/_callout.h
62

If the callout is being migrated, then surely we should not set the CPU number back to the old one, by means of l_cpu? I think it would make more sense to use the new one, I.E. ignore a CPU change upon cpu == CPUBLOCK.

sys/kern/kern_timeout.c
964

They can't.. if the CPU is not initialized then
you would hit the else of line 970.

1317

I will have this in the next patch.. I like that idea and had
missed the fact that we NULL cc_exec after we are done so
we can take that out .. of course we have to add it to the init routine.

sys/sys/_callout.h
62

This really becomes a toss up. You either go back to the old one or go
to the new one.. and of course its only in effect *if* the CPU passed
in is invalid. So you are talking about a corner of a corner case

  1. The cpu passed by the callout_reset_on() is invalid

and

  1. The CPU is currently migrating.

What do you do?

Leave it alone? i.e. make the cpu passed be equal to the new target CPU

or

migrate back.

Either choice is probably not want the caller wanted.. they wanted a different
CPU then either of those options.

So I don't really think it matters IMO.

sys/kern/kern_timeout.c
964

OK, I see, and have you checked that timeout_cpu is valid, which sets up cc_inited? It is a tunable?

sys/sys/_callout.h
62

Hi,

In the cases where c->c_cpu is passed, it is obvious, the driver wants the current or last selected CPU. Only then we can have CPUBLOCK passed.

In the other cases where the driver gives a specific CPU, CPUBLOCK will not be passed, simply.

--HPS

sys/kern/kern_timeout.c
964

s/cc_inited/timeout_cpu/

This addresses Han's issue with falling back to the "previous" CPU.. instead we
fall forward to the new one. The only way this occurs with CPUBLOCK in place
is if another thread holds the lock and is doing a cpu_switch.. so we just update
the lcpu to have the new value before we do the switch.

Other changes look good.

sys/kern/kern_timeout.c
963

Move these checks after the callout_lock() just below when c_cpu is stable, then use "cpu = c->c_cpu" instead of "cpu = c->l_cpu" and eliminate the l_cpu variable ??

sys/kern/kern_timeout.c
963

No you can *not* do that. I don't see what you are
so worried about here. If the caller specifies an invalid CPU
then you will crash when you try to lock an invalid non-init'd mutex.
These tests are catching two things, the migrating CPU but also that
a caller is *not* specifying an invalid CPU.

The l_cpu variable adds one uint32_t.. and I took out a uint32_t when
I moved the cc_exec_next out of the array.. so overall there is *no* change
in size as compared to what was there previously (discounting holes of course
not sure where those are in this structure).

964

No its set as it always as been when the O/S starts.. the timeout_cpu is always
CPU 0, and I don't see no reason to change that. We could, I suppose, have a sysctl
that could allow changing it.. just a matter of adding a small function to take in the
timeout CPU, assure its valid and then set it.. But I won't do that has part of this patch
since this is only fixing "issues".. and thats a feature.

sys/kern/kern_timeout.c
963

Hi,
Maybe I misunderstand something.

The "l_cpu" field you added is per callout. cc_exec_next() is per CPU. We add 4 bytes to each callout everywhere in the kernel to catch a race the probably will not happen that frequently.

Did you misplace l_cpu, intending to put it into the cc_exec structure?

--HPS

As of this coming Friday I will finish up my testing.. I am going to
contrast the performance penalty we get by using the lock in the deactivate
vs splitting the field.. (all my testing so far as been with a split field for the most part).

After getting those results I will make my final decision to either go with
jhb's lock in callout_deactivate() or a split field.. and I will commit that in. So please
speak up.. I will post some results by Wed, so things can settle in..

R

sys/kern/kern_timeout.c
963

Nope its not in the wrong place.. I just was not thinking.. it can't be in cc_exec it has
to be in the callout structure.. So yes it adds 4 bytes.. for a race that does not often
happen or, besides the race, a case where the caller specifies:

callout_reset_sbt_on(... , cpu=N) where N is below MAXCPU but is an *invalid* cpu.

This checks server two purposes and if the caller specifies a wrong CPU (or you hit
the race) then you would be toast if you continue.

I don't see how the locks would work either without this. You need to let cpu become
CPUBLOCK in order to assure the case where a race spins waiting for the cpu switch
to complete.

If you want to protect against all conditions you need the variable..

sys/kern/kern_timeout.c
585

why no CC_LOCK(cc) here?

sys/kern/kern_timeout.c
585

Well what happens when one does a callout_lock(c) is you are
actually doing a CC_LOCK().. but gated so that two thread do
*not* do a callout_cpu_switch() on the same c at the same time.

Its possible that c->c_cpu is CPUBLOCK, this indicates to the caller
of callout_lock() .. spin don't lock since the CC_LOCK is changing to
a new CPU..

Its a very subtle but needed thing. I am still not convinced that this
is a better mechanism then splitting the flags in two, I will run this under
a heavy load and compare the results with running it with a split flag.

The extra lock's unlocks may be in the noise factor.. or they may stand out.

Ok I have tested the former method (separate flags) that Hiren currently is using, and I
have tested the method suggested by jhb that uses a lock (shown here). There is
no detectible performance difference that I can see. The same machine with a
similar traffic load ran 40Gbps on both nights with virtually the same CPU load metrics.
Of course our load is all elephants with almost no mice, but it probably will be
the same for large numbers of mice since you most likely hope these timers
do *not* go off and you don't call deactivate unless the timer goes off...

Hiren, have you tested this latest version of the patch? Speak up if you don't
want this committed since I am ambivalent and don't care which way since
both seem to fix the problem. Unless I hear otherwise I will commit this on this
coming weekend.
R

ok so it has the callout_lock() inside callout_deactivate().. perfect.. that is what
I will commit this weekend :-)
R

In D1894#25, @rrs wrote:

Ok I have tested the former method (separate flags) that Hiren currently is using, and I
have tested the method suggested by jhb that uses a lock (shown here). There is
no detectible performance difference that I can see. The same machine with a
similar traffic load ran 40Gbps on both nights with virtually the same CPU load metrics.
Of course our load is all elephants with almost no mice, but it probably will be
the same for large numbers of mice since you most likely hope these timers
do *not* go off and you don't call deactivate unless the timer goes off...

Hiren, have you tested this latest version of the patch? Speak up if you don't
want this committed since I am ambivalent and don't care which way since
both seem to fix the problem. Unless I hear otherwise I will commit this on this
coming weekend.

No, I've been testing the patch you've provided to me with separated c_flags/c_iflags. That has been working great in prod for us. I'd favor that to go in ;-)

Hmm well thats the one that jhb objected to, to complex to look at I think was his
reasoning. It works fine .. IMO.. of course to make it "simpler" then you have to
add the lock which is a bit more overhead but I can't measure the difference on
my systems.

So, maybe we should go with the slightly more complex, less overhead version that
you have been testing... since it seems to have the most runtime (thats also
what I have been testing for the majority of my tests)..

R

sys/kern/kern_timeout.c
963

Eh, I would just panic on an invalid CPU. Guessing what valid CPU to use when someone has made a mistake is just hiding their bug. It also simplifies your handling somewhat.

That CPUBLOCK gets passed in is most unfortunate. :)

OTOH, when that happens, you know that the caller isn't trying to migrate the callout and that it is passing in c->c_cpu, so you could depend on that and just use regular callout_lock(), yes? (In fact, this argues for the non-*on() methods actually passing a cookie to say "don't move me" rather than accesssing c_cpu directly since in some cases they may "undo" a migration.) Given that, I think you don't need l_cpu at all. When you see CPUBLOCK, just ignore it. If you fix that, I would probably then change the non-on methods to always pass CPUBLOCK to fix the other issue of undoing a migration.

My only comment here (besides great catch!) is whether any of those other macros are also thread-unsafe.

ie, the manpage doesn't state you need to hold any locks when calling them.

It isn't an immediate "should fix this as part of this review", but it may be useful to turn them into functions that include a lock check if they require a lock is held.

rrs edited edge metadata.

This updates us to:
a) What Hiren is running in production i.e. split the c_flags into two, and avoid a lock in callout_deactivate()
b) Take JHB's suggestion and have non-caring macro calls to the per_cpu base calls *not* pass in c->c_cpu but

instead a cookie, in this case -1, I did not want to use CPUBLOCK since that is already over-loaded.

Hopefully this will about bring to ground all the bug portion of this and I can soon start a *new* one with the
adding of callout_drain_async ;-)