Page MenuHomeFreeBSD

sound: Fix chn_trigger() and vchan_trigger() races
Needs ReviewPublic

Authored by christos on Tue, Nov 5, 9:27 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Nov 20, 3:26 AM
Unknown Object (File)
Mon, Nov 18, 10:41 PM
Unknown Object (File)
Mon, Nov 18, 2:45 PM
Unknown Object (File)
Sun, Nov 17, 4:39 AM
Unknown Object (File)
Fri, Nov 15, 9:30 AM
Unknown Object (File)
Fri, Nov 8, 2:45 PM
Unknown Object (File)
Fri, Nov 8, 2:44 PM
Unknown Object (File)
Fri, Nov 8, 2:43 PM
Subscribers

Details

Summary

Suppose the following scenario:

  1. Thread A locks CHN, sets the trigger to ABORT, and unlocks CHN. It then locks PCM and _removes_ CHN from the list.
  2. In the meantime, since thread A unlocked CHN, thread B has locked it, set the trigger to START, unlocked it, and is now blocking on PCM held by thread A.
  3. At the same time, thread C locks CHN, sets the trigger back to ABORT, unlocks CHN, and is also blocking on PCM. However, when thread A unlocks PCM, because thread C is higher-priority than thread B, it picks up the PCM lock instead of thread B, and because CHN is already removed from the list, and thread B hasn't added it back yet, we take a page fault in CHN_REMOVE() by trying to remove a non-existent element.

To fix this, have PCM locked during both the trigger setting, as well as
adding/removing a channel to/from the list, so that we have a single
section instead of two.

The same scenario applies to vchan_trigger(), where the parent channel's
lock acts as the PCM one.

Sponsored by: The FreeBSD Foundation
MFC after: 2 days

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 60535
Build 57419: arc lint + arc unit

Event Timeline

sys/dev/sound/pcm/channel.c
2336

At this point, it's possible for c->trigger to have changed, no? Don't you need to re-check it before proceeding?

2337

I don't see any reason to unlock here, given that we're about to re-acquire the mutex a couple of lines below.

2362

Same here.

christos added inline comments.
sys/dev/sound/pcm/channel.c
2336

I didn't bumped into any such issue so far, even with stress2 running for a while.

2337

True.

sys/dev/sound/pcm/channel.c
2336

stress2 isn't a comprehensive sound subsystem concurrency test, and the absence of "issues" doesn't prove that multi-threaded code is correct.

The problem I'm trying to point out is very similar to the problem you're trying to fix: if you drop a mutex and re-acquire it later, you must make sure that nothing changed while the mutex was dropped. So, to "look" correct, it needs to be something like

if (c->trigger != PCMTRIG_START) {
    CHN_UNLOCK(c);
    PCM_LOCK(d);
    CHN_LOCK(c);
    if (c->trigger == PCMTRIG_START) {
        /* what to do? */
    } else {
        c->trigger = PCMTRIG_START;
        CHN_INSERT_HEAD(d, c, channels.pcm.busy); 
        PCM_UNLOCK(d);
        chn_syncstate(c);
    }
christos added inline comments.
sys/dev/sound/pcm/channel.c
2336

Yeah, this is right. I thought that locking PCM first would be enough, but indeed, the problem I describe in the commit message (adding/removing twice) would occur here as well, if a higher-priority thread came with the same go value while another thread was blocking on PCM.

christos marked an inline comment as done.

Address Mark's comment.

sys/dev/sound/pcm/channel.c
2332

Isn't this always true since we check

if (go == c->trigger)
        return (0);

above?

2343

If we have races between CHN unlock and lock, this may be inconsistent with the last call to CHANNEL_TRIGGER(). In case that matters, we should probably set the value of c->trigger right after CHANNEL_TRIGGER(), check the value again here and issue a CHN_INSERT_HEAD_SAFE if it hasn't changed in the meantime. Or am I missing something?

sys/dev/sound/pcm/channel.c
2332

It most likely is. I did some tests to make sure we're not missing some edge-case where this statement could be false, but I couldn't stumble upon such a case so far, so it should be fine to remove those checks.

2343

Even though I didn't manage to encounter such a race yet (not saying it can never occur though), I think the right way to go is to modify vchan_trigger() to work the same way as chn_trigger(), that is, once we lock the parent channel, check that nothing changed, and if it didn't, then set the trigger and modify the list. Otherwise simply return.

This seems to work fine (diff'd against the original code):

diff --git a/sys/dev/sound/pcm/vchan.c b/sys/dev/sound/pcm/vchan.c
index ed13a3c55961..36527cf781f9 100644
--- a/sys/dev/sound/pcm/vchan.c
+++ b/sys/dev/sound/pcm/vchan.c
@@ -143,32 +143,40 @@ vchan_trigger(kobj_t obj, void *data, int go)
 {
 	struct vchan_info *info;
 	struct pcm_channel *c, *p;
-	int ret, otrigger;
+	int ret;
 
 	info = data;
-
-	if (!PCMTRIG_COMMON(go) || go == info->trigger)
-		return (0);
-
 	c = info->channel;
 	p = c->parentchannel;
-	otrigger = info->trigger;
-	info->trigger = go;
 
 	CHN_LOCKASSERT(c);
+	if (!PCMTRIG_COMMON(go) || go == info->trigger)
+		return (0);
 
 	CHN_UNLOCK(c);
 	CHN_LOCK(p);
+	/*
+	 * Do nothing if another thread triggered the channel already while we
+	 * had dropped the mutex.
+	 */
+	if (info->trigger == go) {
+		CHN_UNLOCK(p);
+		return (0);
+	}
+	if (snd_verbose > 3) {
+		device_printf(c->dev, "%s() %s: calling go=0x%08x , "
+		    "prev=0x%08x\n", __func__, c->name, go, c->trigger);
+	}
+	info->trigger = go;
 
 	switch (go) {
 	case PCMTRIG_START:
-		if (otrigger != PCMTRIG_START)
-			CHN_INSERT_HEAD(p, c, children.busy);
+		CHN_INSERT_HEAD(p, c, children.busy);
 		break;
 	case PCMTRIG_STOP:
 	case PCMTRIG_ABORT:
-		if (otrigger == PCMTRIG_START)
-			CHN_REMOVE(p, c, children.busy);
+		CHN_REMOVE(p, c, children.busy);
 		break;
 	default:
 		break;
@@ -181,9 +189,7 @@ vchan_trigger(kobj_t obj, void *data, int go)
 	if (ret == 0 && go == PCMTRIG_START && VCHAN_SYNC_REQUIRED(c))
 		ret = vchan_sync(c);
 
-	CHN_UNLOCK(c);
 	CHN_UNLOCK(p);
-	CHN_LOCK(c);
 
 	return (ret);
 }
sys/dev/sound/pcm/channel.c
2343

I managed to crash with the patch above, so I'll work on it a bit more.

sys/dev/sound/pcm/channel.c
2318–2321

Moreover, this looks like it is meant to prevent multiple redundant calls to CHANNEL_TRIGGER(). This is not guaranteed to work currently, since we can have races between this code and the part where we actually set the c->trigger value.

2332

We're holding a lock on the channel here, I suppose there shouldn't be any change to c->trigger? Please note that the similar check in the stop / abort case is not redundant, unlike here.

2343

I don't see how the vchan_trigger() thing is connected, could you elaborate?

My concern would be a race of something like:

  1. c->trigger has value PCMTRIG_STOP
  2. Thread A enters the first part of chn_trigger() and calls CHANNEL_TRIGGER(PCMTRIG_START), unlocks the channel
  3. Thread B enters chn_trigger() and calls CHANNEL_TRIGGER(PCMTRIG_ABORT), exits
  4. Thread A executes the second part, locks the channel, inserts the channel into the list

Then we are inconsistent with our last call to CHANNEL_TRIGGER(PCMTRIG_ABORT). Currently we check consistency between c->trigger and the channel list, but not with CHANNEL_TRIGGER() calls.