Page MenuHomeFreeBSD

callout_stop() should return 0 when the callout is currently being serviced and indeed unstoppable.
ClosedPublic

Authored by jch on Jul 14 2015, 12:23 PM.
Tags
None
Referenced Files
F108112764: D3078.id6920.diff
Tue, Jan 21, 12:43 PM
Unknown Object (File)
Fri, Jan 17, 11:46 PM
Unknown Object (File)
Fri, Jan 17, 4:36 PM
Unknown Object (File)
Fri, Jan 17, 12:16 PM
Unknown Object (File)
Sun, Jan 12, 2:51 AM
Unknown Object (File)
Sat, Jan 11, 1:12 AM
Unknown Object (File)
Mon, Jan 6, 6:21 AM
Unknown Object (File)
Dec 14 2024, 6:09 AM

Details

Summary

Currently callout_stop() returns 1 (success) even if the callout is currently
being serviced and indeed can't be possibly stopped.

A scenario to reproduce this case is:

  • the callout is being serviced and at same time,
  • callout_reset() is called on this callout, that adds CALLOUT_PENDING flag and then,
  • callout_stop() is called and return 1 (success) even if the callout is running and unstoppable.

This issue was caught up while makeing D2763 workaround, and discussed at BSDCan.
Once applied the D2763 workaround won't be needed anymore and I will revert it.

Test Plan
  • Tested with net.inet.tcp.per_cpu_timers set to 1 and 0
  • Tested with D2763 reverted

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jch retitled this revision from to callout_stop() should return 0 when the callout is currently being serviced and indeed unstoppable..
jch updated this object.
jch edited the test plan for this revision. (Show Details)
jch added a reviewer: rrs.

Sorry @rrs you are the latest one to have done big changes to callout thus I picked up you first for this review, Tell me if you have time or not for it.

I will push this change by end of next week, thus if you need more time please scream. As usual, comments are more than welcomed even after the commit. Thanks.

jch removed a subscriber: jhb.

This must be a recent regression? The old code definitely checked for this case. For example, in stable/9:

/*
 * If the callout isn't pending, it's not on the queue, so
 * don't attempt to remove it from the queue.  We can try to
 * stop it by other means however.
 */
if (!(c->c_flags & CALLOUT_PENDING)) {
        c->c_flags &= ~CALLOUT_ACTIVE;

        /*
         * If it wasn't on the queue and it isn't the current
         * callout, then we can't stop it, so just bail.
         */
        if (cc->cc_curr != c) {
                ...
        }

        /* Try to use lock if possible. */
        ...
        CTR3(KTR_CALLOUT, "failed to stop %p func %p arg %p",
            c, c->c_func, c->c_arg);
        CC_UNLOCK(cc);
        KASSERT(!sq_locked, ("sleepqueue chain still locked"));
        return (0);
sys/kern/kern_timeout.c
1153 ↗(On Diff #6920)

If you call the new variable 'not_running' you can drop a few !'s. Normally I would not like to have 'not_<foo>' variables, but we already have 'not_on_a_list', so I think it would be ok in this case. However, this is a minor suggestion. Feel free to ignore.

Hi @jhb,

In D3078#66062, @jhb wrote:

This must be a recent regression? The old code definitely checked for this case. For example, in stable/9:

if (!(c->c_flags & CALLOUT_PENDING)) {
    ...

I would rather say that this issue is here since the beginning of callout: :)

As you pointed out, when the CALLOUT_PENDING flags is not set and the callout is currently running, everything is fine and callout_stop() returns 0 (fail to stop callout). But the issue here is when the CALLOUT_PENDING flag is set and the callout is currently running, then callout_stop() currently unconditionally returns 1 (success to stop callout) even if the callout is running and is indeed unstoppable:

        /*
         * If the callout isn't pending, it's not on the queue, so
         * don't attempt to remove it from the queue.  We can try to
         * stop it by other means however.
         */
        if (!(c->c_iflags & CALLOUT_PENDING)) {
            ...
            /* No issue here */
        }

        ...
        /*
         * If we are asked to stop a callout which is currently in progress
         * and indeed impossible to stop then return 0.
         */
        running = !!(cc_exec_curr(cc, direct) == c);

        CC_UNLOCK(cc);
        return (!running); /* Avoid to return 1 (success to stop) unconditionally here */
}
jch edited edge metadata.

Follow jhb's idea: Use 'not_running' instead of 'running'.

Updating D3078: callout_stop() should return 0 when the callout is currently being serviced and

indeed unstoppable.

sys/kern/kern_timeout.c
1153 ↗(On Diff #6920)

It is a good idea, too many !'s are indeed confusing. Updated.

Rebase on top of r286874.

Updating D3078: callout_stop() should return 0 when the callout is currently being serviced and

indeed unstoppable.

This revision was automatically updated to reflect the committed changes.

Updated with hselaski inputs

Updating D3078: callout_stop() should return 0 when the callout is currently being serviced and

indeed unstoppable.

imp requested changes to this revision.Aug 27 2015, 3:12 PM
imp added a reviewer: imp.

Please make sure that rrs@, jhb@ and/or kib@ sign off on any further changes.
This code has been quite tricky and even if your changes are correct, we really
need the oversight in this critical bit of code.

This revision now requires changes to proceed.Aug 27 2015, 3:12 PM

Hi guys, after emails exchanges with @helasky and @kib I believe that if this change makes sense and improves some callout usages (i.e. in TCP timers) it is not worth it overall. In details:

  • I was not able to find a clear answer (from someone or from documentation) of what should be returned by callout_stop() in mpsafe mode, when the callout is both pending and being serviced. And at this level of uncertainty, the historical behaviour should prevail.
  • It all started with D2763, where it felt wrong to have to test both callout_reset() and callout_stop() return values to handle this specific case. But now, it feels more like the right way to do it, and not just a workaround for a callout_stop() bug.

    In conclusion, I will add documentation in callout(9) man page and callout_stop() code, to describe this specific and subtle case. I will put back D2763 in place and revert this change (expect if someone screams loud enough). It seems I am not the first one to have been beaten by changes in callout(9) subtle code, and I learn (the hard way) why.

    Hopefully I did not steal too much of your time. Thanks.
This revision was automatically updated to reflect the committed changes.

Change reverted in rS287305: Revert r286880: If at first this change made sense, it turns out, thanks for your time sharing the subtlety of callout_stop() in mpsafe mode.