Page MenuHomeFreeBSD

sound: Fix KASSERT panics in chn_read() and chn_write()
ClosedPublic

Authored by christos on Mon, Nov 10, 3:17 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Nov 14, 6:14 AM
Unknown Object (File)
Fri, Nov 14, 5:38 AM
Unknown Object (File)
Thu, Nov 13, 12:49 AM
Unknown Object (File)
Thu, Nov 13, 12:48 AM
Unknown Object (File)
Thu, Nov 13, 12:44 AM
Unknown Object (File)
Thu, Nov 13, 12:42 AM
Unknown Object (File)
Thu, Nov 13, 12:30 AM
Unknown Object (File)
Mon, Nov 10, 5:11 PM
Subscribers
None

Details

Summary

INVARIANTS kernels may trigger a KASSERT panic from sndbuf_acquire(),
when fuzzing write(2) using stress2, because of a race in chn_write().

In the case of chn_write(), what sndbuf_acquire() does is extend the
ready-to-read area of the buffer by a specified amount of bytes. The
KASSERT in question makes sure the number of bytes we want to extend the
ready area by, is less than or equal to the number of free bytes in the
buffer. This makes sense, because we cannot extend the ready area to
something larger than what is available (i.e., free) in the first place.

What chn_write() currently does for every write is; calculate the
appropriate write size, let's say X, unlock the channel, uiomove() X
bytes to the channel's buffer, lock the channel, and call
sndbuf_acquire() to extend the ready area by X bytes. The problem with
this approach, however, is the following.

Suppose an empty channel buffer with a length of 1024 bytes, and 2
threads, (A) and (B), where (B) is a higher-priority one. Suppose thread
(A) wants to write 1024 bytes. It unlocks the channel and uiomove()s
1024 bytes to the channel buffer. At the same time, thread (B) picks up
the lock, and because it is higher priority, it keeps dominating the
lock for a few iterations. By the time thread (A) picks up the lock
again, it tries to call sndbuf_acquire() with an size of 1024 bytes,
which was calculated before it performed the uiomove(). In this case,
there is a very high chance that the buffer will not be empty, that is,
have a free area of 1024 bytes, as was the case when thread (A) started
executing, and so the KASSERT will trigger a panic because the condition
(bytes <= free) is not met.

Another scenario that can trigger a panic is the following: suppose a
buffer with a size of 4 bytes, and two threads: (A) and (B). In the
first iteration, thread (A) wants to write 2 bytes, while the buffer is
empty, BUT the pointer (sndbuf_getfreeptr()) is at the end (i.e.,
buf[3]). In the first iteration of the loop, because of the way we
calculate t, we'll end up writing only 1 byte, so after sz -= t, sz will
be 1, and so we'll need one more iteration in the inner loop, to write
the remaining 1 byte. Now we're at the end of the first loop, thread (A)
unlocks the channel, it has written 1 byte, it needs to write 1 more,
and the buffer is left with 3 empty slots. Now thread (B) picks up the
lock, and it wants to write 3 (or more) bytes. Eventually it writes the
3 bytes, and it leaves the buffer with 0 free slots. By the time thread
(A) picks up the lock again, and continues with the second iteration of
the inner loop, it will try to write the last byte, but sndbuf_acquire()
will panic because there is no free space anymore.

To fix this, get rid of the inner loop and calculate the write size on
each iteration. That means that in the scenarios explained above, we'll
end up entering the chn_sleep() case. Modify it as well, so that we do
not kill the channel if we need to sleep more.

Do the same for chn_read() to avoid possible similar panics from
sndbuf_dispose().

Reported by: pho
Tested by: christos, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week

Test Plan

Download and run: https://people.freebsd.org/~pho/s4.sh
Alternatively:

# kldload snd_dummy
$ cd tools/test/stress2/misc/
# ./all.sh -m 30 write2.sh

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 68508
Build 65391: arc lint + arc unit

Event Timeline

christos created this revision.
christos retitled this revision from sound: Fix KASSERT assertion panics to sound: Fix KASSERT panics in chn_read() and chn_write().Mon, Nov 10, 3:18 PM
christos edited the test plan for this revision. (Show Details)

I'm still running the test script for good measure, but haven't had a panic after more than 20 iterations of the test so far, whereas before the patch it would crash immediately.

@pho just informed me that there is an improvement, but we still panic in some cases. I'm looking into it... Below is his message:

I see a slight improvement in uptime, but ...

login: Nov 10 16:22:01 mercat1 su[3683]: pho to root on /dev/pts/0
pcm0: <Dummy Audio Device>
20251110 16:27:33 all (1/1): write2.sh
20251110 16:29:41 all (1/1): write2.sh
panic: sndbuf_acquire: count 208 > free 0
cpuid = 5
time = 1762788631
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0100404a30
vpanic() at vpanic+0x136/frame 0xfffffe0100404b60
panic() at panic+0x43/frame 0xfffffe0100404bc0
sndbuf_acquire() at sndbuf_acquire+0x1b3/frame 0xfffffe0100404c00
chn_write() at chn_write+0xeb/frame 0xfffffe0100404c60
dsp_io_ops() at dsp_io_ops+0x3a5/frame 0xfffffe0100404cc0
dsp_write() at dsp_write+0x22/frame 0xfffffe0100404ce0
devfs_write_f() at devfs_write_f+0xf3/frame 0xfffffe0100404d40
dofilewrite() at dofilewrite+0x81/frame 0xfffffe0100404d90
sys_write() at sys_write+0x127/frame 0xfffffe0100404e00
amd64_syscall() at amd64_syscall+0x169/frame 0xfffffe0100404f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0100404f30

  • syscall (0, FreeBSD ELF64, syscall), rip = 0x823bbdb9a, rsp = 0x826d2ef48, rbp = 0x826d2efc0 ---

https://people.freebsd.org/~pho/stress/log/log0622.txt

I suspect that the added VM pressure might play a role here?

If so, you may try to add a few "sort /dev/zero &" to the load when
running the s4.sh script or you can run the test scenario from this
panic by:

cd /usr/src/tools/test/stress2/misc
./all.sh -m 30 write2.sh # for a 30 minutest test

sys/dev/sound/pcm/channel.c
454

Is this comment really true? What happens if two threads are writing to the channel concurrently?

sys/dev/sound/pcm/channel.c
454

Yeah, I was thinking about this comment as well, and I suppose this assumption might also, incidentally, be the reason for the race here, if it's true.

After some pen and paper debugging, it seems that my current approach does solve most cases, but there is still one case that can reproduce the panic, albeit quite rare. You need to run stress2 usually for about ~20 minutes with enough workload to hit the race. Suppose the following scenario: the buffer has a size of 4 bytes, and we have two threads: (A) and (B).
In the first iteration, thread (A) wants to write 2 bytes, while the buffer is empty, BUT the pointer (sndbuf_getfreeptr()) is at the end (i.e., buf[3]). In the first iteration of the loop, because of the way we calculate t, we'll end up writing only 1 byte, so after sz -= t, sz will be 1, and so we'll need one more iteration in the inner loop, to write the remaining 1 byte. Now we're at the end of the first loop, thread (A) unlocks the channel, it has written 1 byte, it needs to write 1 more, and the buffer is left with 3 empty slots. Now thread (B) picks up the lock, and it wants to write 3 (or more) bytes. Eventually it writes the 3 bytes, and it leaves the buffer with 0 free slots. By the time thread (A) picks up the lock again, and continues with the second iteration of the inner loop, it will try to write the last byte, but sndbuf_acquire() will panic because there is no free space anymore. I'm pretty sure this is what is going on.

A possible solution I can see is this: refactor chn_write() (and I suppose chn_read() too - I haven't tested it yet) to get rid of the inner loop and instead calculate the write size on every iteration. In the scenario I explained above, I suppose we would end up in the last case where we call chn_sleep(). I guess we will also need to stop killing the channel (by setting CHN_F_DEAD) if chn_sleep() returns EAGAIN, in case it needs to sleep more.

@markj @pho what do you think?

christos edited the test plan for this revision. (Show Details)
christos added reviewers: kib, jhb.

Do what I explained in the last message. stress2 has finished after 30 minutes
without panicking. I've yet to test chn_read() though.

@pho, can you also test please?

Do what I explained in the last message. stress2 has finished after 30 minutes
without panicking. I've yet to test chn_read() though.

@pho, can you also test please?

I ran the write(2) tests for 30 minutes followed by some corresponding read(2) tests.
LGTM

This revision is now accepted and ready to land.Wed, Nov 12, 5:27 PM
In D53666#1226459, @pho wrote:

Do what I explained in the last message. stress2 has finished after 30 minutes
without panicking. I've yet to test chn_read() though.

@pho, can you also test please?

I ran the write(2) tests for 30 minutes followed by some corresponding read(2) tests.
LGTM

Great! :-)

I have no knowledge of the dev/sound code. After the channel unlock, what guarantees the validity of the buffer that uiomove() operates on?

In D53666#1226600, @kib wrote:

I have no knowledge of the dev/sound code. After the channel unlock, what guarantees the validity of the buffer that uiomove() operates on?

What do you mean by validity?

In D53666#1226600, @kib wrote:

I have no knowledge of the dev/sound code. After the channel unlock, what guarantees the validity of the buffer that uiomove() operates on?

What do you mean by validity?

I mean that uiomove() operates on kernel memory that is supposed to be in correct state for it.

As I said, I do not know the dev/sound code. But your note about parallel thread manipulating the channel after unlock (which is expectedly allowed) raises the question what manipulations can be done while uiomove() needs to access the channel buffer in parallel.

In D53666#1226612, @kib wrote:
In D53666#1226600, @kib wrote:

I have no knowledge of the dev/sound code. After the channel unlock, what guarantees the validity of the buffer that uiomove() operates on?

What do you mean by validity?

I mean that uiomove() operates on kernel memory that is supposed to be in correct state for it.

As I said, I do not know the dev/sound code. But your note about parallel thread manipulating the channel after unlock (which is expectedly allowed) raises the question what manipulations can be done while uiomove() needs to access the channel buffer in parallel.

It doesn't really matter, whatever we write here will get processed further down the line by sound(4), and eventually go to the sound card's output. We just need to ensure that writes (and reads) happen at the correct buffer locations. Also, the parallel writing scenario I explain is really only a consideration in testing environments, as was the case now with stress2. In real world use-cases, usually a channel is operated on by a single thread.

In D53666#1226612, @kib wrote:
In D53666#1226600, @kib wrote:

I have no knowledge of the dev/sound code. After the channel unlock, what guarantees the validity of the buffer that uiomove() operates on?

What do you mean by validity?

I mean that uiomove() operates on kernel memory that is supposed to be in correct state for it.

As I said, I do not know the dev/sound code. But your note about parallel thread manipulating the channel after unlock (which is expectedly allowed) raises the question what manipulations can be done while uiomove() needs to access the channel buffer in parallel.

It doesn't really matter, whatever we write here will get processed further down the line by sound(4), and eventually go to the sound card's output. We just need to ensure that writes (and reads) happen at the correct buffer locations. Also, the parallel writing scenario I explain is really only a consideration in testing environments, as was the case now with stress2. In real world use-cases, usually a channel is operated on by a single thread.

It doesn't matter what happens usually, since we discussing the correctness of the kernel code. For instance, could this result in reading of uninitialized kernel memory (exposing its previous content)?

In D53666#1226622, @kib wrote:
In D53666#1226612, @kib wrote:
In D53666#1226600, @kib wrote:

I have no knowledge of the dev/sound code. After the channel unlock, what guarantees the validity of the buffer that uiomove() operates on?

What do you mean by validity?

I mean that uiomove() operates on kernel memory that is supposed to be in correct state for it.

As I said, I do not know the dev/sound code. But your note about parallel thread manipulating the channel after unlock (which is expectedly allowed) raises the question what manipulations can be done while uiomove() needs to access the channel buffer in parallel.

It doesn't really matter, whatever we write here will get processed further down the line by sound(4), and eventually go to the sound card's output. We just need to ensure that writes (and reads) happen at the correct buffer locations. Also, the parallel writing scenario I explain is really only a consideration in testing environments, as was the case now with stress2. In real world use-cases, usually a channel is operated on by a single thread.

It doesn't matter what happens usually, since we discussing the correctness of the kernel code. For instance, could this result in reading of uninitialized kernel memory (exposing its previous content)?

The buffer is allocated with with M_ZERO and a specific size by the device drivers when a channel is created.

In D53666#1226622, @kib wrote:
In D53666#1226612, @kib wrote:
In D53666#1226600, @kib wrote:

I have no knowledge of the dev/sound code. After the channel unlock, what guarantees the validity of the buffer that uiomove() operates on?

What do you mean by validity?

I mean that uiomove() operates on kernel memory that is supposed to be in correct state for it.

As I said, I do not know the dev/sound code. But your note about parallel thread manipulating the channel after unlock (which is expectedly allowed) raises the question what manipulations can be done while uiomove() needs to access the channel buffer in parallel.

It doesn't really matter, whatever we write here will get processed further down the line by sound(4), and eventually go to the sound card's output. We just need to ensure that writes (and reads) happen at the correct buffer locations. Also, the parallel writing scenario I explain is really only a consideration in testing environments, as was the case now with stress2. In real world use-cases, usually a channel is operated on by a single thread.

It doesn't matter what happens usually, since we discussing the correctness of the kernel code. For instance, could this result in reading of uninitialized kernel memory (exposing its previous content)?

The buffer is allocated with with M_ZERO and a specific size by the device drivers when a channel is created.

Ok. Can channel deallocated while unlocked?

In D53666#1226813, @kib wrote:

The buffer is allocated with with M_ZERO and a specific size by the device drivers when a channel is created.

Ok. Can channel deallocated while unlocked?

There are two cases:

  1. Hot-unload/detach: pcm_killchans(), which is responsible for killing all channels, will first wait until all I/O is stopped. See ch->inprog here in dsp_io_ops(), as well as how pcm_killchans() uses it.
  2. Calling close(2) on the device: I'm not sure about this one. Is it possible that close(2) and read(2)/write(2) can run concurrently?
  1. Calling close(2) on the device: I'm not sure about this one. Is it possible that close(2) and read(2)/write(2) can run concurrently?

Yes, close(2) and io syscalls can be run in parallel. But the file is only closed (as in d_close for devfs, or cdevpriv destroy) when the file ref count goes to zero. Since both read(2) and close(2) take the reference, effectively the file deallocation occurs by the syscall that fdrop() it the last.

In D53666#1226867, @kib wrote:
  1. Calling close(2) on the device: I'm not sure about this one. Is it possible that close(2) and read(2)/write(2) can run concurrently?

Yes, close(2) and io syscalls can be run in parallel. But the file is only closed (as in d_close for devfs, or cdevpriv destroy) when the file ref count goes to zero. Since both read(2) and close(2) take the reference, effectively the file deallocation occurs by the syscall that fdrop() it the last.

Right, so dsp_close() can de-allocate the channel (and as a result, the buffer) if it's a virtual channel (see vchan_destroy()). Would this be a problem here? If yes, I suppose we could an an inprog barrier in dsp_close() as well?

In D53666#1226867, @kib wrote:
  1. Calling close(2) on the device: I'm not sure about this one. Is it possible that close(2) and read(2)/write(2) can run concurrently?

Yes, close(2) and io syscalls can be run in parallel. But the file is only closed (as in d_close for devfs, or cdevpriv destroy) when the file ref count goes to zero. Since both read(2) and close(2) take the reference, effectively the file deallocation occurs by the syscall that fdrop() it the last.

Right, so dsp_close() can de-allocate the channel (and as a result, the buffer) if it's a virtual channel (see vchan_destroy()). Would this be a problem here? If yes, I suppose we could an an inprog barrier in dsp_close() as well?

I cannot answer your question (I would not read much of sound code now) beyond the information above.

In D53666#1227775, @kib wrote:
In D53666#1226867, @kib wrote:
  1. Calling close(2) on the device: I'm not sure about this one. Is it possible that close(2) and read(2)/write(2) can run concurrently?

Yes, close(2) and io syscalls can be run in parallel. But the file is only closed (as in d_close for devfs, or cdevpriv destroy) when the file ref count goes to zero. Since both read(2) and close(2) take the reference, effectively the file deallocation occurs by the syscall that fdrop() it the last.

Right, so dsp_close() can de-allocate the channel (and as a result, the buffer) if it's a virtual channel (see vchan_destroy()). Would this be a problem here? If yes, I suppose we could an an inprog barrier in dsp_close() as well?

I cannot answer your question (I would not read much of sound code now) beyond the information above.

I've been testing for a while and the current implementation seems fine for scenarios of syscalls running in parallel with dsp_close(). However, I did discover another race in dsp_close(), but it's actually unrelated. I've identified the problem and working on a fix. That being said, this patch should be good to go.