Extensive exercising of irdma reveals a problem with the current
implementation of mod_delayed_work: if the task is currently running,
then it is not cancellable and mod_delayed_work() will return false.
This is most obviously a problem in irdma_sched_qp_flush_work(), which
avoids dropping the refcount if it returns false because the expectation
is that the task will be scheduled and this thread 'owns' that. After
a closer look at the Linux implementation, this seems to be fine: they
wait until the task is either pending or idle to act on it.
This could instead be rewritten to loop on the current
linux_cancel_delayed_work / linux_queue_delayed_work_on sequence as long
as they both return false, but inlining the logic feels like the more
correct thing to do.
This patch has run fine on my laptop, but i915kms doesn't stress this
enough to hit a scenario where the task was already running and the
SDT probes won't fire (nor does a thread get stuck in that loop). It's
also been run through typical workloads in irdma, but we haven't
specifically confirmed that the busy loop has been hit.
(Is maybe_yield() wrong here?)