Page MenuHomeFreeBSD

sound: Fix chn_trigger() and vchan_trigger() races
ClosedPublic

Authored by christos on Nov 5 2024, 9:27 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jan 23, 10:00 PM
Unknown Object (File)
Thu, Jan 16, 9:23 AM
Unknown Object (File)
Thu, Jan 16, 8:21 AM
Unknown Object (File)
Dec 8 2024, 11:54 PM
Unknown Object (File)
Dec 7 2024, 7:45 AM
Unknown Object (File)
Dec 7 2024, 6:54 AM
Unknown Object (File)
Dec 5 2024, 4:32 PM
Unknown Object (File)
Dec 4 2024, 10:10 PM
Subscribers

Details

Summary

Consider the following scenario:

  1. CHN currently has its trigger set to PCMTRIG_STOP.
  2. Thread A locks CHN, calls CHANNEL_TRIGGER(PCMTRIG_START), sets the trigger to PCMTRIG_START and unlocks.
  3. Thread B picks up the lock, calls CHANNEL_TRIGGER(PCMTRIG_ABORT) and returns a non-zero value, so it returns from chn_trigger() as well.
  4. Thread A picks up the lock and adds CHN to the list, which is _wrong_, because the last call to CHANNEL_TRIGGER() was with PCMTRIG_ABORT, meaning the channel is stopped, yet we are adding it to the list and marking it as started.

Another problematic scenario:

  1. Thread A locks CHN, sets the trigger to PCMTRIG_ABORT, and unlocks CHN. It then locks PCM and _removes_ CHN from the list.
  2. In the meantime, since thread A unlocked CHN, thread B has locked it, set the trigger to PCMTRIG_START, unlocked it, and is now blocking on PCM held by thread A.
  3. At the same time, thread C locks CHN, sets the trigger back to PCMTRIG_ABORT, unlocks CHN, and is also blocking on PCM. However, once thread A unlocks PCM, because thread C is higher-priority than thread B, it picks up the PCM lock instead of thread B, and because CHN is already removed from the list, and thread B hasn't added it back yet, we take a page fault in CHN_REMOVE() by trying to remove a non-existent element.

To fix the former scenario, set the channel trigger before the call to
CHANNEL_TRIGGER() (could also come after, doesn't really matter) and
check if anything changed one we lock CHN back.

To fix the latter scenario, use the SAFE variants of CHN_INSERT_HEAD()
and CHN_REMOVE(). A similar scenario can occur in vchan_trigger(), so do
the trigger setting after we've locked the parent channel.

Sponsored by: The FreeBSD Foundation
MFC after: 2 days

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

sys/dev/sound/pcm/channel.c
2351

At this point, it's possible for c->trigger to have changed, no? Don't you need to re-check it before proceeding?

2352

I don't see any reason to unlock here, given that we're about to re-acquire the mutex a couple of lines below.

2358

Same here.

christos added inline comments.
sys/dev/sound/pcm/channel.c
2351

I didn't bumped into any such issue so far, even with stress2 running for a while.

2352

True.

sys/dev/sound/pcm/channel.c
2351

stress2 isn't a comprehensive sound subsystem concurrency test, and the absence of "issues" doesn't prove that multi-threaded code is correct.

The problem I'm trying to point out is very similar to the problem you're trying to fix: if you drop a mutex and re-acquire it later, you must make sure that nothing changed while the mutex was dropped. So, to "look" correct, it needs to be something like

if (c->trigger != PCMTRIG_START) {
    CHN_UNLOCK(c);
    PCM_LOCK(d);
    CHN_LOCK(c);
    if (c->trigger == PCMTRIG_START) {
        /* what to do? */
    } else {
        c->trigger = PCMTRIG_START;
        CHN_INSERT_HEAD(d, c, channels.pcm.busy); 
        PCM_UNLOCK(d);
        chn_syncstate(c);
    }
christos added inline comments.
sys/dev/sound/pcm/channel.c
2351

Yeah, this is right. I thought that locking PCM first would be enough, but indeed, the problem I describe in the commit message (adding/removing twice) would occur here as well, if a higher-priority thread came with the same go value while another thread was blocking on PCM.

christos marked an inline comment as done.

Address Mark's comment.

sys/dev/sound/pcm/channel.c
2349–2350

Isn't this always true since we check

if (go == c->trigger)
        return (0);

above?

2356

If we have races between CHN unlock and lock, this may be inconsistent with the last call to CHANNEL_TRIGGER(). In case that matters, we should probably set the value of c->trigger right after CHANNEL_TRIGGER(), check the value again here and issue a CHN_INSERT_HEAD_SAFE if it hasn't changed in the meantime. Or am I missing something?

sys/dev/sound/pcm/channel.c
2349–2350

It most likely is. I did some tests to make sure we're not missing some edge-case where this statement could be false, but I couldn't stumble upon such a case so far, so it should be fine to remove those checks.

2356

Even though I didn't manage to encounter such a race yet (not saying it can never occur though), I think the right way to go is to modify vchan_trigger() to work the same way as chn_trigger(), that is, once we lock the parent channel, check that nothing changed, and if it didn't, then set the trigger and modify the list. Otherwise simply return.

This seems to work fine (diff'd against the original code):

diff --git a/sys/dev/sound/pcm/vchan.c b/sys/dev/sound/pcm/vchan.c
index ed13a3c55961..36527cf781f9 100644
--- a/sys/dev/sound/pcm/vchan.c
+++ b/sys/dev/sound/pcm/vchan.c
@@ -143,32 +143,40 @@ vchan_trigger(kobj_t obj, void *data, int go)
 {
 	struct vchan_info *info;
 	struct pcm_channel *c, *p;
-	int ret, otrigger;
+	int ret;
 
 	info = data;
-
-	if (!PCMTRIG_COMMON(go) || go == info->trigger)
-		return (0);
-
 	c = info->channel;
 	p = c->parentchannel;
-	otrigger = info->trigger;
-	info->trigger = go;
 
 	CHN_LOCKASSERT(c);
+	if (!PCMTRIG_COMMON(go) || go == info->trigger)
+		return (0);
 
 	CHN_UNLOCK(c);
 	CHN_LOCK(p);
+	/*
+	 * Do nothing if another thread triggered the channel already while we
+	 * had dropped the mutex.
+	 */
+	if (info->trigger == go) {
+		CHN_UNLOCK(p);
+		return (0);
+	}
+	if (snd_verbose > 3) {
+		device_printf(c->dev, "%s() %s: calling go=0x%08x , "
+		    "prev=0x%08x\n", __func__, c->name, go, c->trigger);
+	}
+	info->trigger = go;
 
 	switch (go) {
 	case PCMTRIG_START:
-		if (otrigger != PCMTRIG_START)
-			CHN_INSERT_HEAD(p, c, children.busy);
+		CHN_INSERT_HEAD(p, c, children.busy);
 		break;
 	case PCMTRIG_STOP:
 	case PCMTRIG_ABORT:
-		if (otrigger == PCMTRIG_START)
-			CHN_REMOVE(p, c, children.busy);
+		CHN_REMOVE(p, c, children.busy);
 		break;
 	default:
 		break;
@@ -181,9 +189,7 @@ vchan_trigger(kobj_t obj, void *data, int go)
 	if (ret == 0 && go == PCMTRIG_START && VCHAN_SYNC_REQUIRED(c))
 		ret = vchan_sync(c);
 
-	CHN_UNLOCK(c);
 	CHN_UNLOCK(p);
-	CHN_LOCK(c);
 
 	return (ret);
 }
sys/dev/sound/pcm/channel.c
2356

I managed to crash with the patch above, so I'll work on it a bit more.

sys/dev/sound/pcm/channel.c
2318–2327

Moreover, this looks like it is meant to prevent multiple redundant calls to CHANNEL_TRIGGER(). This is not guaranteed to work currently, since we can have races between this code and the part where we actually set the c->trigger value.

2349–2350

We're holding a lock on the channel here, I suppose there shouldn't be any change to c->trigger? Please note that the similar check in the stop / abort case is not redundant, unlike here.

2356

I don't see how the vchan_trigger() thing is connected, could you elaborate?

My concern would be a race of something like:

  1. c->trigger has value PCMTRIG_STOP
  2. Thread A enters the first part of chn_trigger() and calls CHANNEL_TRIGGER(PCMTRIG_START), unlocks the channel
  3. Thread B enters chn_trigger() and calls CHANNEL_TRIGGER(PCMTRIG_ABORT), exits
  4. Thread A executes the second part, locks the channel, inserts the channel into the list

Then we are inconsistent with our last call to CHANNEL_TRIGGER(PCMTRIG_ABORT). Currently we check consistency between c->trigger and the channel list, but not with CHANNEL_TRIGGER() calls.

Address Florian's comments.

christos added inline comments.
sys/dev/sound/pcm/channel.c
2356

That's a good catch actually. I didn't take into account the case where CHANNEL_TRIGGER() would return non-zero. I'll update the diff shortly. Thanks!

LGTM - seems also easier to reason about the code now, at least to me.

sys/dev/sound/pcm/channel.c
2356

I'll update the diff shortly. Thanks!

Is that update the one you already uploaded, or do you intend to add another change?

christos added inline comments.
sys/dev/sound/pcm/channel.c
2356

It's this one.

This revision is now accepted and ready to land.Nov 24 2024, 8:20 PM