Page MenuHomeFreeBSD

linuxkpi: Mitigate a seqlock livelock
ClosedPublic

Authored by markj on Apr 21 2022, 3:55 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Mar 28, 9:44 AM
Unknown Object (File)
Mar 7 2024, 4:32 PM
Unknown Object (File)
Feb 20 2024, 9:27 AM
Unknown Object (File)
Dec 23 2023, 2:37 AM
Unknown Object (File)
Dec 12 2023, 10:20 AM
Unknown Object (File)
Nov 10 2023, 1:42 PM
Unknown Object (File)
Nov 7 2023, 3:04 AM
Unknown Object (File)
Nov 4 2023, 8:29 PM

Details

Summary

I'm open to feedback and suggestions for a different solution; apologies
for the long description.

I've seen a hang on a couple of systems using i915kms, apparently caused
by the following livelock:


1. Thread T1 holds a LinuxKPI spin mutex, which means that it called
local_bh_disable(), which means that it's pinned to the CPU, i.e.,
the scheduler can't migrate it to a different CPU.
2. T1 enters the seqlock-protected section in intel_engin_context_in()
or intel_engine_context_out().
3. T1 is interrupted by a timer interrupt, which schedules a callout
thread T2 on the same CPU. T2 executes rps_timer(), which calls
intel_engine_get_busy_time(), which spins on the seqlock owned by T1.
4. T1 never runs since it can't be migrated to another CPU, has a
lower scheduling priority than T2, and there is no priority
propagation mechanism which allows T2 to avoid starving T1.

On Linux the problem doesn't happen since write_seqlock_irqsave()
disables interrupts. In general LinuxKPI does not implement the same
irq/irqsave semantics: FreeBSD executes very little code in hard
interrupt context, so LinuxKPI drivers tend to follow the FreeBSD model
and execute interrupt handlers in threaded contexts. That is, most KPIs
do not disable interrupts even where Linux does.

So what can we do? local_bh_disable() doesn't actually disable tasklets
and softirq handlers, so its implementation seems wrong and maybe we can
just make it a no-op. Then T1 could be scheduled on another CPU. But
on Linux, local_bh_disable() disables preemption, so consumers might
assume that it provides consistent access to per-CPU data structures,
and we may well be on a system with a single CPU, especially now that
some folks are working on GVT-d.

The solution here is to enter a critical section when
write_seqlock_irqsave() is called. Then the write seqlock holder will
never be preempted. This won't work if something ever tries to acquire
a spin lock in a write_seqlock section, but no existing consumers do.
An alternative may be to have the reader yield the CPU if it fails too
many times, but Linux doesn't (need to) do this, nor does our native
seqlock implementation, so it feels too hacky.

Test Plan

Running this patch locally. No hangs yet, but I've only seen it twice.

I'm not sure why it started appearing, I think the bug has been there
for a while. Maybe I'm lucky.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 45299
Build 42187: arc lint + arc unit

Event Timeline

markj requested review of this revision.Apr 21 2022, 3:55 PM
markj added reviewers: linuxkpi, wulf, hselasky.

Sounds good to me. Then only fast IRQ's can interrupt, but those are not used by the LKPI at all, so this should be fine.

sys/compat/linuxkpi/common/include/linux/seqlock.h
104

Use lkpi_ as prefix for internal functions.

This revision is now accepted and ready to land.Apr 21 2022, 4:03 PM
markj marked an inline comment as done.
  • Use the lkpi_ prefix.
  • Add a comment hinting at why we disable preemption.
This revision now requires review to proceed.Apr 21 2022, 4:16 PM
This revision is now accepted and ready to land.Apr 21 2022, 4:31 PM

I've applied this to my local tree and so far it appears to address the livelock issue I was observed somewhat regularly under load (video/keyboard unresponsive, although mouse cursor still moved).

@emaste: I've seen something similar with amdgpu . Testing on 13-stable right now :-)

@emaste: I've seen something similar with amdgpu . Testing on 13-stable right now :-)

amdgpu doesn't use seqlocks at all, it seems. There is one use of write_seqlock() in the core DRM code, which might be affected, but the patch won't help since it only modifies write_seqlock_irqsave(). We could possibly provide the same behaviour for both write_seqlock() and write_seqlock_irqsave(), if you can confirm that it's the same bug. It'd be safe for at least the drm-kmod master branch.

When I saw the livelock, my display froze but I was able to ssh in from another machine and dump kernel stacks with procstat -kk, which is how I found the problem. So if it happens again for you, procstat -kka output would help confirm or deny whether it's the same problem.

@emaste: I've seen something similar with amdgpu . Testing on 13-stable right now :-)

amdgpu doesn't use seqlocks at all, it seems. There is one use of write_seqlock() in the core DRM code, which might be affected, but the patch won't help since it only modifies write_seqlock_irqsave(). We could possibly provide the same behaviour for both write_seqlock() and write_seqlock_irqsave(), if you can confirm that it's the same bug. It'd be safe for at least the drm-kmod master branch.

... to be clear, this patch will only help if you're using i915.

When I saw the livelock, my display froze but I was able to ssh in from another machine and dump kernel stacks with procstat -kk, which is how I found the problem. So if it happens again for you, procstat -kka output would help confirm or deny whether it's the same problem.

I have seen frozen screen several times with procstat -kk reporting about a thread spinning in rps_timer(). Will test the patch. Thanks!

In D35021#793662, @wulf wrote:

I have seen frozen screen several times with procstat -kk reporting about a thread spinning in rps_timer(). Will test the patch. Thanks!

Cool, very likely the same problem. I'll wait for you to test before committing.

I'm reasonably confident that this patch has addressed the issue I observed.

In D35021#793662, @wulf wrote:

I have seen frozen screen several times with procstat -kk reporting about a thread spinning in rps_timer(). Will test the patch. Thanks!

Cool, very likely the same problem. I'll wait for you to test before committing.

So far, so good.

In D35021#794295, @wulf wrote:
In D35021#793662, @wulf wrote:

I have seen frozen screen several times with procstat -kk reporting about a thread spinning in rps_timer(). Will test the patch. Thanks!

Cool, very likely the same problem. I'll wait for you to test before committing.

So far, so good.

Thanks.

I realized that this patch won't help packaged builds of drm-kmod on 13 since the modified functions are inline and packages for 13 are built against 13.0 sources. I'm not sure what we can do there. Some of the ports are using the 5.4 branch, which still contains seqlock.h, so probably we should apply a similar patch there. I can write and submit one if it makes sense.

This revision was automatically updated to reflect the committed changes.

Thank you for catching the bug!

I realized that this patch won't help packaged builds of drm-kmod on 13 since the modified functions are inline and packages for 13 are built against 13.0 sources. I'm not sure what we can do there. Some of the ports are using the 5.4 branch, which still contains seqlock.h, so probably we should apply a similar patch there. I can write and submit one if it makes sense.

rps_timer() first appeared in drm-kmod 5.8 and this issue looks as having the same age too. So probably, older drm-kmods does not require any fixes.