Threads holding kernel resources may not be suspended. If a thread must hold a kernel lock and perform an interruptible sleep, it must set the TDF_SBDRY flag on itself so that the thread suspension code does not suspend it and deadlock the machine. This patch unconditionally blocks thread_suspend_check from suspending threads marked SBDRY. Prior to this change, SBDRY only prevented SIGSTOP suspensions from deadlocking these threads. However, the exact same deadlock may occur via other process stopping mechanisms (e.g. kill, abort(3), or trace). Here is an example scenario that can lead to system deadlock, before this patch. We have a process with three threads: o thread 1 takes a vnode V exclusive, marks itself TDF_SBDRY and then sleeps interruptibly (with PCATCH) on some network operation (for example, NFS client) o thread 2 sleeps trying to lock V (uninterruptible) o thread 3 calls abort(3) and SIGABRT is delivered to it Thread 3 arrives in thread_single(SINGLE_NO_EXIT) and sets P_STOPPED_SINGLE on the process. Then it sets TDF_ASTPENDING | TDF_NEEDSUSPCHK on and wakes the other two threads. Finally, it suspends itself. Thread 1 is woken from its interruptible sleep and invokes thread_suspend_check(). Because the process was stopped by thread_single and not SIGSTOP, P_STOPPED_SIG is not set on p_flag. Therefore thread_suspend_check ignores TDF_SBDRY and suspends thread 1. Thread 2 is not woken, as it is not marked interruptible. Because thread 1 still holds V exclusive and is suspended, thread 2 is now deadlocked. Now we cannot make forward progress (even if thread 1's operation completes, it cannot wakeup a suspended thread) and any process trying to access V is also doomed.
Details
Details
Some test code used to reproduce this condition is available at
http://pastie.org/private/tsdxyzoc8lubsvttsuzbcg .
Diff Detail
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage
Event Timeline
Comment Actions
This looks fine.
Please ask Peter Holm to run the usual set of tests for the single-threading changes.
Comment Actions
Verified deadlock with https://people.freebsd.org/~pho/pthread9.sh and confirmed fix.
Ran stress tests and a buildworld for a total of 24 hours.
No problems seen.