Page MenuHomeFreeBSD

Fix race in cancel_delayed_work() in the LinuxKPI
Needs RevisionPublic

Authored by hselasky on Jan 7 2021, 11:13 AM.

Details

Reviewers
rstone
mav
kib
Summary

Ensure the case where timer is executing concurrently with cancel_delayed_work() is handled properly.

Found by: rstone@
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking

Test Plan
  1. A delayed_work struct in the WORK_ST_TIMER state.
  2. Thread A calls mod_delayed_work()
  3. Thread B (a callout thread) simultaneously calls

linux_delayed_work_timer_fn()

The following sequence of events is possible:

A: Call linux_cancel_delayed_work()
A: Change state from TIMER TO CANCEL
B: Change state from CANCEL to TASK
B: taskqueue_enqueue() the task
A: taskqueue_cancel() the task
A: Call linux_queue_delayed_work_on(). This is a no-op because the
state is WORK_ST_TASK.

As a result, the delayed_work struct will never be invoked. This is
causing address resolution in ib_addr.c to stop permanently, as it
never tries to reschedule a task that it thinks is already scheduled.

Do you have a recommendation? Should we unconditionally
taskqueue_enqueue() when in the WORK_ST_TASK state and
linux_queue_delayed_work_on() is called? That is harmless for a
pending task but will break the deadlock if the race is lost.

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

I don't think that spinning is the right answer. That could hold the CPU that the callout needs to make progress, resulting in a livelock.

I'm testing this patch internally. It has survived 12 hours of testing so far, which is a really good sign (usually I would see IB ARP deadlock within an hour or so):

https://people.freebsd.org/~rstone/patches/cancel-workqueue.diff

@rstone: It is not spinning, it is only re-evaluating the state when the race is detected.

There are a few murky things about Linux works which you should know about first:

First of all the work callback function is allowed to free the work itself, which it was called on. Only if the client does a cancel request on the work, we know that the memory is still there.

In the current code, issuing a cancel operation on a zero-initialized, but not initialized work, is supported. After your patch you depend on all works being cancelled have been init, due to the use of the callback's mutex.

Yes, you can solve this by locking the callback mutex too, to serialize the operations, like you did in your patch.

@rstone: TIMER state goes to CANCELLED, so it will not spin.

@rstone: You also patch linux_cancel_delayed_work_sync(), but I think that function is OK, because TIMER state goes to IDLE state, and then the timer CB has IDLE to IDLE, so that function is OK!

rstone requested changes to this revision.Jan 7 2021, 4:03 PM

This patch doesn't work. After the goto retry, nothing stops the cancelling thread from seeing state still in WORK_ST_CANCEL, and then the timer thread can come in and change it to TASK, causing the same bug.

We absolutely need a lock here for proper synchronization. I'm looking at the Linux implementation and it appears to assert that the underlying timer object be initialized (well, specifically it unconditionally calls del_timer on it and that asserts that it is initialized).

This revision now requires changes to proceed.Jan 7 2021, 4:03 PM

@rstone: If you have schedule delayed work racing with the cancel function by means of two threads, then yes you are right, but that is undefined behaviour (or an application problem).

I don't see how the timer can still be running in WORK_ST_CANCEL, when you assume that scheduling and cancelling threads are serialized, because in WORK_ST_TIMER we call linux_cancel_timer() which takes the mutex of the callout, which means the callout is only allowed to execute between the switch() statement and until linux_cancel_timer()! When you stop a mutex protected callout when its mutex is locked, the callback cannot be executing after you drop the mutex after callout_stop() !

--HPS

sys/compat/linuxkpi/common/src/linux_work.c
438

@rstone: I only see that the callout function callback can sneak in right here, but not later on!

sys/compat/linuxkpi/common/src/linux_work.c
438

The callout could have started but be preempted for whatever reason. callout_stop() cannot stop a callout that has started to run.

@rstone:

cannot stop a callout that has started to run

Yes it can! You are missing a piece of logic in the callout code:

	if (c_lock != NULL) {
		class->lc_lock(c_lock, lock_status);
		/*
		 * The callout may have been cancelled
		 * while we switched locks.
		 */
		if (cc_exec_cancel(cc, direct)) {
			class->lc_unlock(c_lock);
			goto skip;
		}
		/* The callout cannot be stopped now. */
		cc_exec_cancel(cc, direct) = true;
		if (c_lock == &Giant.lock_object) {

You see that cc_exec_cancel() is under the mutex, so this makes callout_stop() atomic and the callback will be stopped if it started to run!

--HPS

@rstone : ping - any updates on this issue?

sorry I got distracted by a different critical issue at work. Will be able to return to this next week

Sorry about the delay in coming back to this; I've had to deal with a number of critical issues at work. I ported the patch that we tested internally to main and posted it to this review:

https://reviews.freebsd.org/D28420