umtx: close race between umtxq_sleep() and umtxq_requeue()
AcceptedPublic
Actions

Authored by glebius on Jun 9 2023, 11:05 PM.

Details

Reviewers

mav
dchagin
kib
markj

Summary

The umtxq_requeue() changes the key and umtxq_sleep() loads it again.
However, this load isn't synchronized by the mutex, so we can read out
an old key. Close the race by performing a re-check with the lock.

Fixes: fd6ca665d206b74970e7c01d06ae06fed71500fc

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 51986
Build 48877: arc lint + arc unit

Event Timeline

glebius created this revision.Jun 9 2023, 11:05 PM

Herald added a subscriber: imp. · View Herald TranscriptJun 9 2023, 11:05 PM

glebius requested review of this revision.Jun 9 2023, 11:05 PM

Harbormaster completed remote builds in B51986: Diff 123051.Jun 9 2023, 11:05 PM

I'm not sure this patch is the best, but the issue is real and the patch closes it. The problem easily reproduces playing CS:GO.

Alternative patch, that doesn't use goto, can look like this:

@@ -832,10 +837,21 @@ umtxq_sleep(struct umtx_q *uq, const char *wmesg,
                        if (error != 0)
                                break;
                }
-               error = msleep_sbt(uq, &uc->uc_lock, PCATCH | PDROP, wmesg,
+               error = msleep_sbt(uq, &uc->uc_lock, PCATCH, wmesg,
                    sbt, 0, flags);
-               uc = umtxq_getchain(&uq->uq_key);
-               mtx_lock(&uc->uc_lock);
+               if ((nuc = umtxq_getchain(&uq->uq_key)) != uc) {
+                       /*
+                        * If umtxq_requeue() has changed the key we need to
+                        * change the lock.  To load the new key value we need
+                        * to hold the old key.  The lock order is the same as
+                        * in linux_futex_requeue(): first old, then new.
+                        */
+                       mtx_lock(&nuc->uc_lock);
+                       mtx_unlock(&uc->uc_lock);
+                       uc = nuc;
+                       counter_u64_add(umtxq_races, 1);
+               }

Testing it now with event counter. The old patch has already been tested successfully.

The second patch definitely does some unnecessary relocking, but doesn't use goto.

It looks like someone woke up a slept thread while requeue() run, perhaps thread got a signal?
I think this should be fixed in other way, i.e., in requeue() to ensure that uq_key is stable
it would be nice to show panic stack trace and add markj@ or kib@ to reviewers

In D40481#921699, @glebius wrote:

I'm not sure this patch is the best, but the issue is real and the patch closes it. The problem easily reproduces playing CS:GO.

Alternative patch, that doesn't use goto, can look like this:

@@ -832,10 +837,21 @@ umtxq_sleep(struct umtx_q *uq, const char *wmesg,
                        if (error != 0)
                                break;
                }
-               error = msleep_sbt(uq, &uc->uc_lock, PCATCH | PDROP, wmesg,
+               error = msleep_sbt(uq, &uc->uc_lock, PCATCH, wmesg,
                    sbt, 0, flags);
-               uc = umtxq_getchain(&uq->uq_key);
-               mtx_lock(&uc->uc_lock);
+               if ((nuc = umtxq_getchain(&uq->uq_key)) != uc) {
+                       /*
+                        * If umtxq_requeue() has changed the key we need to
+                        * change the lock.  To load the new key value we need
+                        * to hold the old key.  The lock order is the same as
+                        * in linux_futex_requeue(): first old, then new.
+                        */
+                       mtx_lock(&nuc->uc_lock);
+                       mtx_unlock(&uc->uc_lock);
+                       uc = nuc;
+                       counter_u64_add(umtxq_races, 1);
+               }

Testing it now with event counter. The old patch has already been tested successfully.

This is better, it guarantees that the thread will not wake up during requeue time, just lock order most likely should be reverse

In D40481#921699, @glebius wrote:

Alternative patch, that doesn't use goto, can look like this:

This patch locks the new lock before unlocking the old. I wonder what is the lock order policy there and whether it can cause deadlock?

mav accepted this revision.Jun 12 2023, 4:22 PM

This revision is now accepted and ready to land.Jun 12 2023, 4:22 PM

In D40481#922105, @mav wrote:

In D40481#921699, @glebius wrote:

Alternative patch, that doesn't use goto, can look like this:

This patch locks the new lock before unlocking the old. I wonder what is the lock order policy there and whether it can cause deadlock?

Good question. The lock order in the umtxq_requeue() is first old lock, then new lock. We follow that. However, is it possible that two requeue events happen while we exit from sleep? If so, then the second event can swap the lock order. This is one reason I prefer the first patch, not the second.

The other reason is that with the second alternative patch we are doing extra work of double locking in umtxq_sleep() every time umtxq_requeue() happened. With the first patch the actual race happens ~ 1/1000 times, so there is not performance impact if we care.

glebius added reviewers: kib, markj.Jun 12 2023, 5:55 PM

What I immediately do not like in the patch is that it seems to allow unbound looping in kernel. If requeueing is done in step with uc_lock, potentially user can control how long the thread spins.

Similar issue with ll/sc took a lot of efforts to solve, I do not want to get more such code added. Why not check for errors before retrying?

In D40481#922256, @kib wrote:

What I immediately do not like in the patch is that it seems to allow unbound looping in kernel. If requeueing is done in step with uc_lock, potentially user can control how long the thread spins.

Similar issue with ll/sc took a lot of efforts to solve, I do not want to get more such code added. Why not check for errors before retrying?

What errors?

In D40481#922260, @glebius wrote:

In D40481#922256, @kib wrote:

What I immediately do not like in the patch is that it seems to allow unbound looping in kernel. If requeueing is done in step with uc_lock, potentially user can control how long the thread spins.

Similar issue with ll/sc took a lot of efforts to solve, I do not want to get more such code added. Why not check for errors before retrying?

What errors?

Errors from msleep that break out from the outer loop.

In D40481#922261, @kib wrote:

Errors from msleep that break out from the outer loop.

I don't understand how this is going to help. The lock may change independent of what msleep returned. And we return from the function holding wrong lock.

I propose to use an alternative patch with busy/unbusy second lock

In D40481#922262, @glebius wrote:

In D40481#922261, @kib wrote:

Errors from msleep that break out from the outer loop.

I don't understand how this is going to help. The lock may change independent of what msleep returned. And we return from the function holding wrong lock.

I mean, change the lock and check the break-out conditions before looping again. In fact, I do not understand why do you need to loop there (retrylock), but not in all other places that take the uc_lock.

dchagin accepted this revision.Jun 19 2023, 11:42 AM

Sorry for delay!

In D40481#922309, @kib wrote:

I mean, change the lock and check the break-out conditions before looping again.

That will overcomplicate the code, IMHO. The race happens not so often that its resolution needs to be optimized.

In fact, I do not understand why do you need to loop there (retrylock), but not in all other places that take the uc_lock.

That's a good question! I think we don't panic anywhere else (yet) cause there are no other places like umtx_sleep() that take uq as argument. Most take key as argument so the value stays stable through the function runtime. That prevents panicing with unlocking wrong mutex, but probably leaves other races in. For example, umtxq_busy() looks suspicious to me. The msleep would drop the lock and sleep on it. What if the lock changes? Is it not going to wakeup then?

In D40481#926370, @glebius wrote:

Sorry for delay!

In D40481#922309, @kib wrote:

I mean, change the lock and check the break-out conditions before looping again.

That will overcomplicate the code, IMHO. The race happens not so often that its resolution needs to be optimized.

I do not think so, but this is probably moot. See below.

In fact, I do not understand why do you need to loop there (retrylock), but not in all other places that take the uc_lock.

That's a good question! I think we don't panic anywhere else (yet) cause there are no other places like umtx_sleep() that take uq as argument. Most take key as argument so the value stays stable through the function runtime. That prevents panicing with unlocking wrong mutex, but probably leaves other races in. For example, umtxq_busy() looks suspicious to me. The msleep would drop the lock and sleep on it. What if the lock changes? Is it not going to wakeup then?

In fact, I do not quite understand how this requeue is useful. Typical umtx blocking operation loops over umtxq_sleep(), and there it re-lookup the key on each iteration. Look at the simplest example of do_lock_normal().

If not that, I would say that the right fix is to delegate requeueing to the victim thread, by recording the indicator that changing of the key is needed by umtxq_requeue(), and doing wakeup. Then the umtxq_sleep() code checks for this bit after wakeup and goes to sleep with the new chain. The wakeup code would need to be fixed too, to remove requeue indicator, otherwise wakeups could be lost.

emaste added a subscriber: emaste.Aug 9 2023, 9:34 PM

dan.kotowski_a9development.com added a subscriber: dan.kotowski_a9development.com.Sep 11 2023, 4:53 AM

What's the path forwards for this? This is still fixing and outstanding bug for me but requires manual effort to apply.

I'll note that qemu has a number of hangs in umtx code... I wonder if the emulation changes the timing...
I'll have to see if this band-aide helps that at all and report back.

In D40481#996192, @dan.kotowski_a9development.com wrote:

What's the path forwards for this? This is still fixing and outstanding bug for me but requires manual effort to apply.

Since I filed the patch two things happened:

Steam again stopped working CURRENT (at least for me)
CS:GO was superseded by CS2. Anybody knows does it work on FreeBSD with linux-steam-utils?

Since playing Counter Strike is a low priority thing for me :) and also @kib has valid concerns on the patch, the patch still sits in TODO list.

Dan, what do you use patch for? What software and on what FreeBSD version exercises the bug we address here?

Re: Warner, the bug manifests itself only in the Linux compat layer. AFAIK, QEMU won't use umtx_requeue() as long as it is native.

Sorry, been MIA for a bit with $WORK

Whatever issue I had seems to have been overcome. It was also Steam related, but The Witcher 3.

Herald added a subscriber: olce. · View Herald TranscriptFeb 18 2024, 8:37 PM

In D40481#1002931, @dan.kotowski_a9development.com wrote:

Whatever issue I had seems to have been overcome. It was also Steam related, but The Witcher 3.

The issue hadn't been overcome, it is just hidden. In CS:GO it was only two maps where it reproduced quickly :) The problem is that we don't have ready tools to write a reproducing code, because code needs to be a Linux code.

Is this possibly the same bug reported here? https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280667

In D40481#1054758, @markj wrote:

Is this possibly the same bug reported here? https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280667

PR author reports their panic does not reproduce after applying this patch.

Kostik has had a valid concern about the patch. It is a big improvement but not a bullet proof fix. Since we don't have a reproducer, it is really hard to work on a proper fix. If anybody is ready to take responsibility, I would agree on pushing this to main and merging to stable/14. Otherwise it is just taking too long and already affected number of people.

Revision Contents
Changeset List

Path

Size

sys/

kern/

kern_umtx.c

9 lines

Diff 123051

View Options

umtx: close race between umtxq_sleep() and umtxq_requeue()AcceptedPublicActions