Page MenuHomeFreeBSD

sound: Fix KASSERT panics in chn_read() and chn_write()
Needs ReviewPublic

Authored by christos on Mon, Nov 10, 3:17 PM.
Tags
None
Referenced Files
F135546181: D53666.id166132.diff
Mon, Nov 10, 5:11 PM
F135545561: D53666.id.diff
Mon, Nov 10, 5:04 PM
F135545473: D53666.diff
Mon, Nov 10, 5:03 PM
F135544726: D53666.id166132.diff
Mon, Nov 10, 4:54 PM
F135544435: D53666.id.diff
Mon, Nov 10, 4:51 PM
F135542898: D53666.diff
Mon, Nov 10, 4:34 PM
Subscribers
None

Details

Summary

INVARIANTS kernels may trigger a KASSERT panic from sndbuf_acquire(), when
fuzzing write(2) using stress2, because of a race in chn_write().

In the case of chn_write(), what sndbuf_acquire() does is extend the
ready-to-read area of the buffer by a specified amount of bytes. The KASSERT in
question makes sure the number of bytes we want to extend the ready area by, is
less than or equal to the number of free bytes in the buffer. This makes sense,
because we cannot extend the ready area to something larger than what is
available (i.e., free) in the first place.

What chn_write() currently does for every write is; calculate the appropriate
write size, let's say X, unlock the channel, uiomove() X bytes to the channel's
buffer, lock the channel, and call sndbuf_acquire() to extend the ready area by
X bytes. The problem with this approach, however, is the following. Suppose an
empty channel buffer with a length of 1024 bytes, and 2 threads, (A) and (B),
where (B) is a higher-priority one. Suppose thread (A) wants to write 1024
bytes. It unlocks the channel and uiomove()s 1024 bytes to the channel buffer.
At the same time, thread (B) picks up the lock, and because it is higher
priority, it keeps dominating the lock for a few iterations. By the time thread
(A) picks up the lock again, it tries to call sndbuf_acquire() with an
size of 1024 bytes, which was calculated before it performed the uiomove(). In
this case, there is a very high chance that the buffer will not be empty, that
is, have a free area of 1024 bytes, as was the case when thread (A) started
executing, and so the KASSERT will trigger a panic because the condition (bytes
<= free) is not met.

To fix this, call sndbuf_acquire() and update the remaining total write size
before unlocking the channel. Do the same for chn_read() to avoid similar
panics from sndbuf_dispose().

Reported by: pho
Tested by: christos, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week

Test Plan

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 68508
Build 65391: arc lint + arc unit

Event Timeline

christos created this revision.
christos retitled this revision from sound: Fix KASSERT assertion panics to sound: Fix KASSERT panics in chn_read() and chn_write().Mon, Nov 10, 3:18 PM
christos edited the test plan for this revision. (Show Details)

I'm still running the test script for good measure, but haven't had a panic after more than 20 iterations of the test so far, whereas before the patch it would crash immediately.

@pho just informed me that there is an improvement, but we still panic in some cases. I'm looking into it... Below is his message:

I see a slight improvement in uptime, but ...

login: Nov 10 16:22:01 mercat1 su[3683]: pho to root on /dev/pts/0
pcm0: <Dummy Audio Device>
20251110 16:27:33 all (1/1): write2.sh
20251110 16:29:41 all (1/1): write2.sh
panic: sndbuf_acquire: count 208 > free 0
cpuid = 5
time = 1762788631
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0100404a30
vpanic() at vpanic+0x136/frame 0xfffffe0100404b60
panic() at panic+0x43/frame 0xfffffe0100404bc0
sndbuf_acquire() at sndbuf_acquire+0x1b3/frame 0xfffffe0100404c00
chn_write() at chn_write+0xeb/frame 0xfffffe0100404c60
dsp_io_ops() at dsp_io_ops+0x3a5/frame 0xfffffe0100404cc0
dsp_write() at dsp_write+0x22/frame 0xfffffe0100404ce0
devfs_write_f() at devfs_write_f+0xf3/frame 0xfffffe0100404d40
dofilewrite() at dofilewrite+0x81/frame 0xfffffe0100404d90
sys_write() at sys_write+0x127/frame 0xfffffe0100404e00
amd64_syscall() at amd64_syscall+0x169/frame 0xfffffe0100404f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0100404f30

  • syscall (0, FreeBSD ELF64, syscall), rip = 0x823bbdb9a, rsp = 0x826d2ef48, rbp = 0x826d2efc0 ---

https://people.freebsd.org/~pho/stress/log/log0622.txt

I suspect that the added VM pressure might play a role here?

If so, you may try to add a few "sort /dev/zero &" to the load when
running the s4.sh script or you can run the test scenario from this
panic by:

cd /usr/src/tools/test/stress2/misc
./all.sh -m 30 write2.sh # for a 30 minutest test