This test is almost always deadlocking in the CheriBSD Jenkins. Since
FreeBSD CI is green, I thought I'd try upstream HEAD and it turns out this
tests also deadlocks sometimes if you run it in a loop. This produces the
following truss output:
<new thread 100160> sigfastblock(0x1,0x801812538) = 0 (0x0) _umtx_op(0x8014cf008,UMTX_OP_MUTEX_WAKE2,0x0,0x0,0x0) = 0 (0x0) _umtx_op(0x8018127c8,UMTX_OP_NWAKE_PRIVATE,0x1,0x0,0x0) = 0 (0x0) _umtx_op(0x8010a6f38,UMTX_OP_WAIT_UINT_PRIVATE,0x0,0x0,0x0) = 0 (0x0) thr_kill(100160,SIGTHR) = 0 (0x0) SIGNAL 32 (SIGTHR) code=SI_LWP pid=913 uid=0 sigreturn(0x7fffdfffdab0) EJUSTRETURN thr_wake(0x18740) = 0 (0x0) thr_wake(0x18740) = 0 (0x0) ^C_umtx_op(0x801812500,UMTX_OP_WAIT,0x18740,0x0,0x0) ERR#4 'Interrupted system call' SIGNAL 2 (SIGINT) code=SI_KERNEL
Attaching GDB reveals that thread 1 is blocked in pthread_join and thread 2
is inside pthread_cond_wait:
` Thread 2 (LWP 100132 of process 892): #0 _umtx_op_err () at /local/scratch/alr48/cheri/cheribsd/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:40 #1 0x00000008010a2360 in _thr_umtx_timedwait_uint (mtx=0x8014f2008, id=id@entry=0, clockid=<optimized out>, abstime=<optimized out>, shared=<optimized out>, shared@entry=0) at /local/scratch/alr48/cheri/cheribsd/lib/libthr/thread/thr_umtx.c:247 #2 0x0000000801099319 in _thr_sleep (curthread=curthread@entry=0x801812500, clockid=0, abstime=abstime@entry=0x0) at /local/scratch/alr48/cheri/cheribsd/lib/libthr/thread/thr_kern.c:199 #3 0x000000080109498f in cond_wait_user (cvp=0x801829100, mp=0x8014d1008, abstime=0x0, cancel=1) at /local/scratch/alr48/cheri/cheribsd/lib/libthr/thread/thr_cond.c:320 #4 cond_wait_common (cond=<optimized out>, cond@entry=0x102a270 <cond>, mutex=<optimized out>, mutex@entry=0x102a268 <mutex>, abstime=abstime@entry=0x0, cancel=cancel@entry=1) at /local/scratch/alr48/cheri/cheribsd/lib/libthr/thread/thr_cond.c:380 #5 0x0000000801094c01 in __thr_cond_wait (cond=0x8014f2008, cond@entry=0x102a270 <cond>, mutex=0xf, mutex@entry=0x102a268 <mutex>) at /local/scratch/alr48/cheri/cheribsd/lib/libthr/thread/thr_cond.c:395 #6 0x00000000010272d4 in destroy_after_cancel_threadfunc (arg=<optimized out>) at /local/scratch/alr48/cheri/cheribsd/contrib/netbsd-tests/lib/libpthread/t_cond.c:569 #7 0x0000000801095b1b in thread_start (curthread=0x801812500) at /local/scratch/alr48/cheri/cheribsd/lib/libthr/thread/thr_create.c:309 #8 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x7fffdfffe000 Thread 1 (LWP 100073 of process 892): #0 _umtx_op_err () at /local/scratch/alr48/cheri/cheribsd/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:40 #1 0x0000000801097853 in join_common (pthread=0x801812500, thread_return=thread_return@entry=0x0, abstime=abstime@entry=0x0, peek=<optimized out>) at /local/scratch/alr48/cheri/cheribsd/lib/libthr/thread/thr_join.c:147 #2 0x000000080109758b in _thr_join (pthread=0x801812500, thread_return=0x2, thread_return@entry=0x0) at /local/scratch/alr48/cheri/cheribsd/lib/libthr/thread/thr_join.c:62 #3 0x0000000001026f5b in atfu_destroy_after_cancel_body (tc=<optimized out>) at /local/scratch/alr48/cheri/cheribsd/contrib/netbsd-tests/lib/libpthread/t_cond.c:614 #4 0x000000080107d11c in atf_tc_run (tc=0x102a250 <atfu_destroy_after_cancel_tc>, tc@entry=0x801819040, resfile=resfile@entry=0x801070b59 "%s: WARNING: %s\n") at /local/scratch/alr48/cheri/cheribsd/contrib/atf/atf-c/tc.c:1020 #5 0x000000080107f19e in atf_tp_run (tp=tp@entry=0x7fffffffda28, tcname=tcname@entry=0x801819040 "destroy_after_cancel", resfile=<optimized out>) at /local/scratch/alr48/cheri/cheribsd/contrib/atf/atf-c/tp.c:201 #6 0x000000080107fbc1 in run_tc (tp=0x7fffffffda28, p=0x7fffffffda40, exitcode=<optimized out>) at /local/scratch/alr48/cheri/cheribsd/contrib/atf/atf-c/detail/tp_main.c:504 #7 controlled_main (argc=<optimized out>, argv=0x7fffffffead0, add_tcs_hook=0x1024d00 <atfu_tp_add_tcs>, exitcode=<optimized out>) at /local/scratch/alr48/cheri/cheribsd/contrib/atf/atf-c/detail/tp_main.c:574 #8 atf_tp_main (argc=<optimized out>, argc@entry=2, argv=argv@entry=0x7fffffffead0, add_tcs_hook=0x1024d00 <atfu_tp_add_tcs>) at /local/scratch/alr48/cheri/cheribsd/contrib/atf/atf-c/detail/tp_main.c:604 #9 0x0000000001024cf1 in main (argc=25240832, argc@entry=2, argv=0x2, argv@entry=0x7fffffffead0) at /local/scratch/alr48/cheri/cheribsd/contrib/netbsd-tests/lib/libpthread/t_cond.c:684 #10 0x0000000001024ab2 in _start (ap=<optimized out>, cleanup=<optimized out>) at /local/scratch/alr48/cheri/cheribsd/lib/csu/amd64/crt1_c.c:75 `
With the current version of the test I get an EBUSY error 2/3 times (it
seems the condvar still has waiters when cancellation happens). If I
uncomment the fprintf statements, it passes most of the time, but also
sometimes gives me an EBUSY.
Note: I added the pthread_mutex_isowned_np((pthread_mutex_t *)arg) check
in the cleanup callback since it seems like the mutex is usually not
owned when thread is cancelled.
I'm not sure if the test is broken, or if the libthr implementation should
ensure that there can't be a lost wakeup and/or having waiters on the
condvar after cancel.