We have seen several cases of processes which have become "stuck" in kern_sigsuspend(). They exhibit these symptoms:
The kernel shows that the user has blocked signals:
(kgdb) print $td->td_pflags $1 = 0x80000101 (kgdb) print $td->td_sigblock_ptr $2 = (void *) 0x801012038 (kgdb) print $td->td_sigblock_val $3 = 0x10
The userspace side shows that is has unblocked signals:
#0 _sigsuspend () at _sigsuspend.S:4 4 _sigsuspend.S: No such file or directory. (gdb) print *(int *)0x801012038 $1 = 0x0
The kernel sigblock value is out of sync with the userspace value. Normally, this would get resolved on the next syscall return when ast() would read the userspace sigblock value and do the right thing. However, when we sleep while waiting for signals, we instead deadlock. (Because the signals are blocked, cursig() says there are no signals. So, kern_sigsuspend() simply loops without ever returning.)
We should probably consider adding a similar check to kern_sigtimedwait().
It appears this condition became noticable sometime between main-c255535-gd189a74dfdcd and main-c256326-g24a8f6d36996; however, the condition happens infrequently enough that it will be very hard to bisect.