Page MenuHomeFreeBSD

linuxkpi: fix mod_delayed_work() with running tasks
Needs ReviewPublic

Authored by kevans on Apr 8 2026, 2:03 PM.
Tags
None
Referenced Files
F160369220: D56308.diff
Tue, Jun 23, 7:15 PM
Unknown Object (File)
Sun, Jun 14, 8:38 AM
Unknown Object (File)
Wed, Jun 10, 8:06 PM
Unknown Object (File)
Sat, Jun 6, 4:21 AM
Unknown Object (File)
Sat, May 30, 3:31 AM
Unknown Object (File)
May 18 2026, 5:19 PM
Unknown Object (File)
May 18 2026, 9:51 AM
Unknown Object (File)
May 18 2026, 9:51 AM

Details

Reviewers
markj
kib
manu
wulf
Group Reviewers
Klara
Summary

Extensive exercising of irdma reveals a problem with the current
implementation of mod_delayed_work: if the task is currently running,
then it is not cancellable and mod_delayed_work() will return false.

This is most obviously a problem in irdma_sched_qp_flush_work(), which
avoids dropping the refcount if it returns false because the expectation
is that the task will be scheduled and this thread 'owns' that. After
a closer look at the Linux implementation, this seems to be fine: they
wait until the task is either pending or idle to act on it.

This could instead be rewritten to loop on the current
linux_cancel_delayed_work / linux_queue_delayed_work_on sequence as long
as they both return false, but inlining the logic feels like the more
correct thing to do.

This patch has run fine on my laptop, but i915kms doesn't stress this
enough to hit a scenario where the task was already running and the
SDT probes won't fire (nor does a thread get stuck in that loop). It's
also been run through typical workloads in irdma, but we haven't
specifically confirmed that the busy loop has been hit.

(Is maybe_yield() wrong here?)

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 72055
Build 68938: arc lint + arc unit

Event Timeline

kevans requested review of this revision.Apr 8 2026, 2:03 PM
des added inline comments.
sys/compat/linuxkpi/common/src/linux_work.c
242

Should we assert that the lock is held?

sys/compat/linuxkpi/common/src/linux_work.c
539

What is the purpose of __aligned? I see it was copied.

658
kevans added inline comments.
sys/compat/linuxkpi/common/src/linux_work.c
539

I'll admit to not having noticed, and I don't see a reason for it. linux_update_state just passes individual states on through atomic_cmpxchg, so presumably on all of our platforms that's loaded into a register and there's no reaon for the table to over-aligned.

I'm not seeing any clues in rGca2ad6bd779fae857f27b028d736f506b66a528a that introduced the other ones, either. Maybe I'll just remove this one, then do another pass and strip the other ones out?

sys/compat/linuxkpi/common/src/linux_work.c
539

Sure.

  • Assert in the _locked variant
  • Drop mysterious __aligned
  • Fix typo
This revision is now accepted and ready to land.Apr 10 2026, 6:52 PM
sys/compat/linuxkpi/common/src/linux_work.c
567

Why is it ok to ignore the result?

618

This assignment is redundant.

sys/compat/linuxkpi/common/src/linux_work.c
567

In theory, since this is a delayed_work, almost all state updates should be under dwork->timer.mtx, including linux_delayed_work_timer_fn running via callout.

The exception is in linux_work_fn, but this is the WORK_ST_TIMER / WORK_ST_CANCEL branch, so we werr either waiting for the callout to fire, or waiting for the callout to be able to pickup the mutex and transition the state back to WORK_ST_TASK. Since we did callout_stop successfully, though, we know that's not the case and we shouldn't have any competition here.

I'd be tempted to assert both here and below that the transition was successful.

kevans marked 3 inline comments as done.

Drop a redundant assignment and add some atomic_cmpxchg assertions

This revision now requires review to proceed.May 19 2026, 3:27 AM

FWIW: I cherry picked this into local src git branch for use with -CURRENT and drm-kmod 6.12-lts branch sometime in May. 1002:73bf (RDNA2 amdgpu) works stable (~1-2 week uptime, then I usually reboot for newer src) with this, I haven't seen a regression from this. I usually torture this machine with all-core CPU stress with poudriere while doing desktop tasks as well.

FWIW: I cherry picked this into local src git branch for use with -CURRENT and drm-kmod 6.12-lts branch sometime in May. 1002:73bf (RDNA2 amdgpu) works stable (~1-2 week uptime, then I usually reboot for newer src) with this, I haven't seen a regression from this. I usually torture this machine with all-core CPU stress with poudriere while doing desktop tasks as well.

To clarify: did you have some instability that lead you to try this, or did you mostly apply it to be able to provide feedback?

FWIW: I cherry picked this into local src git branch for use with -CURRENT and drm-kmod 6.12-lts branch sometime in May. 1002:73bf (RDNA2 amdgpu) works stable (~1-2 week uptime, then I usually reboot for newer src) with this, I haven't seen a regression from this. I usually torture this machine with all-core CPU stress with poudriere while doing desktop tasks as well.

To clarify: did you have some instability that lead you to try this, or did you mostly apply it to be able to provide feedback?

tl;dr: applied for feedback, I do not hesitate to search for various fixes and apply them before they land,
this is the only LinuxKPI fix that I have in my local branch.

Full: It's complicated. I've never had the chance to test the hardware on FreeBSD without this PR. Last time I dropped FreeBSD it was due to instability with a different older Radeon (RX 580), and inability to debug it on my own (no known good hash to bisect to, no ddb on vt, no UART, etc), that time this review also did not exist. Now I Theseus shipped a different PC with e.g. a newer GPU (1002:73bf), decided to try -CURRENT again, and decided to look for any "not yet landed" changes to see if some are worthy to cherry pick before landing. Your review is the only LinuxKPI change that I applied. It doesn't seem to regress anything for me, as I haven't seen a panic since May, yet.

Consider this as an anecdote.

Perfect, thank you! I had forgotten about this, but I've also switched to amdgpu since the last time I was active on this and I also haven't run it without this applied. I'll redo my previous dtrace'ing here to see if this patch actually has any impact on amdgpu's usage at all -- if it doesn't hit either of the two busy- probes in this patch, then the patch is a net-0 for amdgpu and you'd have the same experience without it.

sys/compat/linuxkpi/common/src/linux_work.c
617

Are you sure it's ok to busy-loop here? We could spend a lot of time doing that if the target thread was preempted.

Is it not possible to sleep?

sys/compat/linuxkpi/common/src/linux_work.c
617

This was just matching what Linux seems to do; from their description:

This function is safe to call from any context including IRQ handler.

So I'm not sure that it's generally safe to sleep here, but then again... it also would be equally unsafe to lock dwork->timer.mtx if went to that extreme?

sys/compat/linuxkpi/common/src/linux_work.c
617

In general we don't support running much of the kernel in hard interrupt context, even though Linux does. They let you do all sorts of things, including allocating memory, which we do not support.

And yes, as you note, it'd be unsafe to lock a regular mutex here if this code could run in interrupt context.

sys/compat/linuxkpi/common/src/linux_work.c
617

Yeah, fair. I'll simplify this to an msleep() loop. I can wakeup() on state updates, but I guess based on tracing that this is an uncommon enough scenario that I should record that there's a waiter in the work_struct struct struct to avoid a lot of spurious overhead.