Page MenuHomeFreeBSD

sound: Fix panic caused by sleeping-channel destruction during asynchronous detach
ClosedPublic

Authored by christos on Apr 24 2024, 12:21 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Dec 14, 3:31 PM
Unknown Object (File)
Wed, Dec 11, 8:04 AM
Unknown Object (File)
Dec 5 2024, 6:17 AM
Unknown Object (File)
Dec 3 2024, 9:14 PM
Unknown Object (File)
Nov 20 2024, 3:14 PM
Unknown Object (File)
Nov 20 2024, 3:14 PM
Unknown Object (File)
Nov 20 2024, 3:14 PM
Unknown Object (File)
Nov 19 2024, 6:29 AM
Subscribers

Details

Summary

Currently we are force-destroying all channels unconditionally in
pcm_killchan(). However, since asynchronous audio device detach is
possible as of 44e128fe9d92, if we do not check whether the channel is
sleeping or not and forcefully kill it, we will get a panic from
cv_timedwait_sig() (called from chn_sleep()), because it will try to use
a freed lock/cv.

Modify pcm_killchan() (renamed to pcm_killchans() since that's a more
appropriate name now) to loop through the channel list and destroy only
the channels that are awake, otherwise wake up the sleeping thread and
try again. This loop is repeated until all channels are awakened and
destroyed.

The wakeup code has been moved from pcm_unregister() to pcm_killchans(),
as it makes the code more robust and self-contained. pcm_unregister()
only issues a chn_abort() to all channels

Reported by: KASAN
Fixes: 44e128fe9d92 ("sound: Implement asynchronous device detach")
Sponsored by: The FreeBSD Foundation
MFC after: 1 day

Test Plan

To reproduce it without the patch I did:

for i in $(seq 1 15); do cat /dev/dsp >/dev/null & cat /dev/random >/dev/dsp & done

and hot-unplug immediately. The panic is always something along the lines of:

panic: ASan: Invalid access, 8-byte read at 0xfffffe0002172c38, UMAUseAfterFree(fd)
cpuid = 0
time = 1713834635
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0xa5/frame 0xfffffe0046dab210
kdb_backtrace() at kdb_backtrace+0xc6/frame 0xfffffe0046dab370
vpanic() at vpanic+0x210/frame 0xfffffe0046dab510
panic() at panic+0xb5/frame 0xfffffe0046dab5e0
kasan_code_name() at kasan_code_name/frame 0xfffffe0046dab6b0
__mtx_lock_flags() at __mtx_lock_flags+0xd3/frame 0xfffffe0046dab790
_cv_timedwait_sig_sbt() at _cv_timedwait_sig_sbt+0x381/frame 0xfffffe0046dab8e0
chn_write() at chn_write+0x1a0/frame 0xfffffe0046dab940
dsp_io_ops() at dsp_io_ops+0x275/frame 0xfffffe0046dab990
dsp_write() at dsp_write+0x22/frame 0xfffffe0046dab9b0
devfs_write_f() at devfs_write_f+0x242/frame 0xfffffe0046dabac0
dofilewrite() at dofilewrite+0xf6/frame 0xfffffe0046dabb30
kern_writev() at kern_writev+0xb2/frame 0xfffffe0046dabbf0
sys_write() at sys_write+0x1bc/frame 0xfffffe0046dabd10
amd64_syscall() at amd64_syscall+0x39e/frame 0xfffffe0046dabf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0046dabf30
--- syscall (4, FreeBSD ELF64, write), rip = 0x38eff7cf29ba, rsp = 0x38eff5aec1b8, rbp = 0x38eff5aec650 ---
KDB: enter: panic
[ thread pid 1032 tid 100155 ]
Stopped at      kdb_enter+0x34: movq    $0,0x1f09d01(%rip)
db>

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 57320
Build 54208: arc lint + arc unit

Event Timeline

Convert to do-while to avoid endless loop.

Turns out sometimes there is still an infinite loop/hang occurring for some reason when I apply the test command, but wait for a few seconds before unplugging. The loop gets stuck because found is always false. Will try to debug and fix it ASAP.

It seems that this was not really an infinite loop, but rather that it was
taking forever for the channel to become awake, possibly due to us looping
continuously. Introducing a small pause() seems to fix it. So far tests are
successfull, but I would like to hear from more people first.

sys/dev/sound/pcm/sound.c
697

BTW, I think chn_sleep() should assert that CHN_F_SLEEPING is not already set.

702

If a thread is sleeping on the CV, shouldn't we at least wake it?

710

What is this calculating?

It's possible for hz to be 100, in which case timo will be 0.

sys/dev/sound/pcm/sound.c
702

pcm_unregister() wakes up the sleeping channels. Is there a possibility where a channel might go to sleep in between the sleep-check in pcm_unregister() and here? If yes, then I can add a wakeup here as well.

710

Honestly, I took this calculation verbatim from dsp_oss_syncstart() because I didn't know what else to choose.

christos added inline comments.
sys/dev/sound/pcm/sound.c
697
sys/dev/sound/pcm/sound.c
696

The comment can be a bit more concise: we have to give a thread sleeping in chn_sleep() a chance to observe that the channel is dead.

702

So, at this point CHN_F_DEAD is not yet set, right? So pcm_unregister() will wake up a sleeping thread, and cv_timedwait_sig() will return 0, so chn_sleep() will return 0. Its callers call it in a loop. What prevents them from re-entering chn_sleep()?

It looks to me like you want to set CHN_F_DEAD earlier, so that the thread awoken in chn_sleep() will observe it and return an error.

710

I think pause(1) is fine here. I guess this code is trying to sleep for 5ms, the "modern" way to do that is pause_sbt(SBT_1MS * 5);, but here the time is arbitrary.

christos added inline comments.
sys/dev/sound/pcm/sound.c
702

pcm_unregister() also sets CHN_F_DEAD after calling CHN_BROADCAST:

		if (ch->flags & CHN_F_SLEEPING) {
			/*
			 * We are detaching, so do not wait for the timeout in
			 * chn_read()/chn_write(). Wake up the thread and kill
			 * the channel immediately.
			 */
			CHN_BROADCAST(&ch->intr_cv);
			ch->flags |= CHN_F_DEAD;
		}
sys/dev/sound/pcm/sound.c
702

pcm_unregister() also sets CHN_F_DEAD after calling CHN_BROADCAST

CHN_F_DEAD is only set conditionally there, and it's hard to follow whether one of the other functions called (e.g. chn_abort() sets any condition that would prevent the channel from going to sleep in between. I'd vote to set the CHN_F_DEAD here in pcm_killchans() explicitly, for both clarity and robustness. It won't hurt, and it makes the function more self-contained.

sys/dev/sound/pcm/sound.c
702

I do not object to this. Do you think it's better to modify the pcm_unregister() code to CHN_BROADCAST and set CHN_F_DEAD unconditionally for all channels, or simply move the block here, and only keep chn_abort() in pcm_unregister()?

christos marked 3 inline comments as done.

Address Mark's and Florian's comments:

  • Move wakeup code from pcm_unregister() to pcm_killchans().
  • Use pause_sbt() instead of pause().
sys/dev/sound/pcm/sound.c
702

Okay, I answered this by myself, but this way it also works and I agree that it is better design. Thanks.

Actually, my guess is that it would have had some merit to set CHN_F_DEAD unconditionally, early on in pcm_unregister(), beneath waking up sleeping threads. That would mean some code duplication, yes, but less interference from userspace and sleep waiting afterwards. But I don't have a case to back that up.

Maybe there should be a chn_shutdown() function to set CHN_F_DEAD and wake sleeping threads, if we need to do that multiple times.

sys/dev/sound/pcm/sound.c
702

I don't think it's correct now, as CHN_F_DEAD is still only set conditionally here, another thread could call chn_sleep() between break; and pcm_chn_remove().

Actually, my guess is that it would have had some merit to set CHN_F_DEAD unconditionally, early on in pcm_unregister(), beneath waking up sleeping threads. That would mean some code duplication, yes, but less interference from userspace and sleep waiting afterwards. But I don't have a case to back that up.

Maybe there should be a chn_shutdown() function to set CHN_F_DEAD and wake sleeping threads, if we need to do that multiple times.

Having the wakeup+CHN_F_DEAD code in both pcm_unregister() and pcm_killchans() was my initial approach, and as is expected, worked fine, but the current code also seems to work properly, at least so far. That being said, I do not object to implementing chn_shutdown() and using it in both pcm_unregister() and pcm_killchans(). Alternatively we can just set CHN_F_DEAD in the non-sleeping case as well, and avoid creating a new function, as mentioned in the last inline comment.

sys/dev/sound/pcm/sound.c
702

I will try setting CHN_F_DEAD unconditionally, but so far I tested this code and it works as expected.

@dev_submerge.ch So, both setting CHN_F_DEAD in the non-sleeping case, and implementing chn_shutdown() and calling it from pcm_unregister() and pcm_killchans() (see below) work fine. Do you think it's better to just with the latter?

Diff'd against current patch:

diff --git a/sys/dev/sound/pcm/channel.c b/sys/dev/sound/pcm/channel.c
index b4872fdb8037..cf9239839aca 100644
--- a/sys/dev/sound/pcm/channel.c
+++ b/sys/dev/sound/pcm/channel.c
@@ -1301,6 +1301,15 @@ chn_kill(struct pcm_channel *c)
 	return (0);
 }
 
+void
+chn_shutdown(struct pcm_channel *c)
+{
+	CHN_LOCKASSERT(c);
+
+	chn_wakeup(c);
+	c->flags |= CHN_F_DEAD;
+}
+
 int
 chn_setvolume_multi(struct pcm_channel *c, int vc, int left, int right,
     int center)
diff --git a/sys/dev/sound/pcm/channel.h b/sys/dev/sound/pcm/channel.h
index 21007454584e..c8d33c583188 100644
--- a/sys/dev/sound/pcm/channel.h
+++ b/sys/dev/sound/pcm/channel.h
@@ -264,6 +264,7 @@ int chn_poll(struct pcm_channel *c, int ev, struct thread *td);
 
 int chn_init(struct pcm_channel *c, void *devinfo, int dir, int direction);
 int chn_kill(struct pcm_channel *c);
+void chn_shutdown(struct pcm_channel *c);
 int chn_reset(struct pcm_channel *c, u_int32_t fmt, u_int32_t spd);
 int chn_setvolume_multi(struct pcm_channel *c, int vc, int left, int right,
     int center);
diff --git a/sys/dev/sound/pcm/sound.c b/sys/dev/sound/pcm/sound.c
index f0128b6b735d..0a88d732def7 100644
--- a/sys/dev/sound/pcm/sound.c
+++ b/sys/dev/sound/pcm/sound.c
@@ -686,6 +686,11 @@ pcm_killchans(struct snddev_info *d)
 		found = false;
 		CHN_FOREACH(ch, d, channels.pcm) {
 			CHN_LOCK(ch);
+			/*
+			 * Make sure no channel has went to sleep in the
+			 * meantime.
+			 */
+			chn_shutdown(ch);
 			/*
 			 * We have to give a thread sleeping in chn_sleep() a
 			 * chance to observe that the channel is dead.
@@ -695,13 +700,6 @@ pcm_killchans(struct snddev_info *d)
 				CHN_UNLOCK(ch);
 				break;
 			}
-			/*
-			 * Do not wait for the timeout in
-			 * chn_read()/chn_write(). Wake up the sleeping thread
-			 * and kill the channel.
-			 */
-			CHN_BROADCAST(&ch->intr_cv);
-			ch->flags |= CHN_F_DEAD;
 			CHN_UNLOCK(ch);
 		}
 
@@ -1029,6 +1027,11 @@ pcm_unregister(device_t dev)
 
 	CHN_FOREACH(ch, d, channels.pcm) {
 		CHN_LOCK(ch);
+		/*
+		 * Do not wait for the timeout in chn_read()/chn_write(). Wake
+		 * up the sleeping thread and kill the channel.
+		 */
+		chn_shutdown(ch);
 		chn_abort(ch);
 		CHN_UNLOCK(ch);
 	}

@dev_submerge.ch So, both setting CHN_F_DEAD in the non-sleeping case, and implementing chn_shutdown() and calling it from pcm_unregister() and pcm_killchans() (see below) work fine. Do you think it's better to just with the latter?

Diff looks good to me. Setting CHN_F_DEAD unconditionally in pcm_killchans() is a must I think, doing the same in pcm_unregister() is just me being cautious. I'd choose that, yes, but maybe @markj or @emaste have a more experienced opinion?

markj added inline comments.
sys/dev/sound/pcm/sound.c
715

Can we really have error != 0 here? If not I'd suggest making this an assertion.

This revision is now accepted and ready to land.Apr 28 2024, 4:06 PM
christos added inline comments.
sys/dev/sound/pcm/sound.c
715

Theoretically we can if we pass a channel that is not in the list. Realistically it won't happen.