Page MenuHomeFreeBSD

Fix thread suspend deadlock by minding TDF_SBDRY
ClosedPublic

Authored by cse_cem_gmail_com on May 22 2015, 12:01 AM.

Details

Reviewers
kib
jhb
Summary
Threads holding kernel resources may not be suspended. If a thread must
hold a kernel lock and perform an interruptible sleep, it must set the
TDF_SBDRY flag on itself so that the thread suspension code does not
suspend it and deadlock the machine.

This patch unconditionally blocks thread_suspend_check from suspending
threads marked SBDRY. Prior to this change, SBDRY only prevented SIGSTOP
suspensions from deadlocking these threads. However, the exact same
deadlock may occur via other process stopping mechanisms (e.g. kill,
abort(3), or trace).

Here is an example scenario that can lead to system deadlock, before
this patch. We have a process with three threads:

 o thread 1 takes a vnode V exclusive, marks itself TDF_SBDRY and then
   sleeps interruptibly (with PCATCH) on some network operation (for
   example, NFS client)

 o thread 2 sleeps trying to lock V (uninterruptible)

 o thread 3 calls abort(3) and SIGABRT is delivered to it

Thread 3 arrives in thread_single(SINGLE_NO_EXIT) and sets
P_STOPPED_SINGLE on the process. Then it sets TDF_ASTPENDING |
TDF_NEEDSUSPCHK on and wakes the other two threads. Finally, it suspends
itself.

Thread 1 is woken from its interruptible sleep and invokes
thread_suspend_check(). Because the process was stopped by thread_single
and not SIGSTOP, P_STOPPED_SIG is not set on p_flag. Therefore
thread_suspend_check ignores TDF_SBDRY and suspends thread 1.

Thread 2 is not woken, as it is not marked interruptible. Because thread
1 still holds V exclusive and is suspended, thread 2 is now deadlocked.

Now we cannot make forward progress (even if thread 1's operation
completes, it cannot wakeup a suspended thread) and any process trying
to access V is also doomed.
Test Plan

Some test code used to reproduce this condition is available at
http://pastie.org/private/tsdxyzoc8lubsvttsuzbcg .

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

cse_cem_gmail_com retitled this revision from to Fix thread suspend deadlock by minding TDF_SBDRY.
cse_cem_gmail_com updated this object.
cse_cem_gmail_com edited the test plan for this revision. (Show Details)
cse_cem_gmail_com added reviewers: kib, jhb.
cse_cem_gmail_com added a subscriber: benno.
kib edited edge metadata.

This looks fine.

Please ask Peter Holm to run the usual set of tests for the single-threading changes.

This revision is now accepted and ready to land.May 22 2015, 8:02 AM

Verified deadlock with https://people.freebsd.org/~pho/pthread9.sh and confirmed fix.
Ran stress tests and a buildworld for a total of 24 hours.
No problems seen.

Committed as r283320. Thanks Konstantin, Peter.