Page MenuHomeFreeBSD

Fix various race conditions leading to an assertion failure in thr_exit()
AbandonedPublic

Authored by sobomax on Dec 22 2024, 3:40 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Nov 6, 5:05 AM
Unknown Object (File)
Mon, Nov 3, 1:16 PM
Unknown Object (File)
Oct 4 2025, 5:01 AM
Unknown Object (File)
Aug 30 2025, 5:16 AM
Unknown Object (File)
Aug 30 2025, 4:41 AM
Unknown Object (File)
Aug 30 2025, 3:41 AM
Unknown Object (File)
Aug 30 2025, 2:35 AM
Unknown Object (File)
Aug 29 2025, 3:51 PM

Details

Reviewers
None
Summary

This change is a combination of 3 changes:

commit 0cc9037296bcbe90211903f4cace291ceebe2fda
Author: Maksym Sobolyev <sobomax@sippysoft.com>
Date: Sun Dec 22 03:00:54 2024 +0000

Release rtld_bind_lock() even if we decide to die. Other threads
might appreciate that.
[T_MAIN] "thr_new" in "libthr.so.3" ==> 0x4002281450 in "libc.so.7"
[T_MAIN] reloc_jmpslot: *0x400213b4f8 = 0x4002281450
[T_CRASHING] "_pthread_once" in "libc.so.7" ==> 0x40021304d0 in "libthr.so.3"
[T_CRASHING] reloc_jmpslot: *0x4002327798 = 0x40021304d0
[T_MAIN] "pthread_join" in "git" ==> 0x400212b210 in "libthr.so.3"
[T_MAIN] reloc_jmpslot: *0x597940 = 0x400212b210
[T_2] "_pthread_exit" in "libthr.so.3" ==> 0x4002129790 in "libthr.so.3"
[T_2] reloc_jmpslot: *0x400213b4c8 = 0x4002129790
[T_2] "__cxa_thread_call_dtors" in "libthr.so.3" ==> 0x40022ae1d0 in "libc.so.7"
[T_2] reloc_jmpslot: *0x400213b540 = 0x40022ae1d0
[T_2] "free" in "libthr.so.3" ==> 0x40022c4ab0 in "libc.so.7"
[T_2] reloc_jmpslot: *0x400213b498 = 0x40022c4ab0
[T_2] "_malloc_thread_cleanup" in "libthr.so.3" ==> 0x40023112a0 in "libc.so.7"
[T_2] reloc_jmpslot: *0x400213b550 = 0x40023112a0
[T_3] "madvise" in "libc.so.7" ==> 0x400227f8b0 in "libc.so.7"
[T_3] reloc_jmpslot: *0x4002327e50 = 0x400227f8b0
[T_3] "thr_exit" in "libthr.so.3" ==> 0x4002281170 in "libc.so.7"
[T_3] reloc_jmpslot: *0x400213b558 = 0x4002281170
[T_2] "thr_exit" in "libthr.so.3" ==> 0x4002281170 in "libc.so.7"
[T_2] reloc_jmpslot: *0x400213b558 = 0x4002281170
[T_CRASHING] "__sys_write" in "libthr.so.3" ==> 0x4002282830 in "libc.so.7"
[T_CRASHING] reloc_jmpslot: *0x400213b670 = 0x4002282830
ENTER(s):
[T_CRASHING] "strlen" in "libthr.so.3" ==> 0x40022a3ca0 in "libc.so.7"
[T_CRASHING] reloc_jmpslot: *0x400213b678 = 0x40022a3ca0
/usr/src/lib/libthr/thread/thr_mutex.c+629, __Tthr_mutex_trylock():     206
/usr/src/libexec/rtld-elf/rtld.c+1036, _rtld_bind():    3
EXITS(s):
/usr/src/lib/libthr/thread/thr_mutex.c+1001, mutex_unlock_common():     206
/usr/src/libexec/rtld-elf/rtld.c+1068, _rtld_bind():    2
Fatal error 'thread T_CRASHING exits with resources held (ll=0, cc=1)!' at line 331 in file /usr/src/lib/libthr/thread/thr_exit.c (errno = 0)
[T_CRASHING] "abort" in "libthr.so.3" ==> 0x40022ada40 in "libc.so.7"
[T_CRASHING] reloc_jmpslot: *0x400213b538 = 0x40022ada40

commit 98a10f620b0f78adbf9f752d9676b3f990abbe1e
Author: Maksym Sobolyev <sobomax@sippysoft.com>
Date: Sun Dec 22 02:28:17 2024 +0000

When our thread in being cancelled, mark us as leaving critical
section in the _thr_rtld_lock_release() even if the underlying
unlocking operation fails. This may happen if locks that are
owned by the main thread are already being destroyed. This
prevents us from tripping assertion in thr_exit() about some
resources not released.
[T_CRASHING] "_pthread_exit" in "libthr.so.3" ==> 0x4002142770 in "libthr.so.3"
[T_CRASHING] reloc_jmpslot: *0x4002154468 = 0x4002142770
[T_CRASHING] "getenv" in "libgcc_s.so.1" ==> 0x40024239f0 in "libc.so.7"
[T_CRASHING] reloc_jmpslot: *0x4002700fe8 = 0x40024239f0
[T_CRASHING] "memcpy" in "libgcc_s.so.1" ==> 0x4002417800 in "libc.so.7"
[T_CRASHING] reloc_jmpslot: *0x4002700f40 = 0x4002417800
[T_CRASHING] "dl_iterate_phdr" in "libgcc_s.so.1" ==> 0x400000dc90 in "ld-elf.so.1"
[T_CRASHING] reloc_jmpslot: *0x4002701020 = 0x400000dc90
[T_MAIN] "_ZNSt3__16thread6detachEv" in "ld" ==> 0x400224c390 in "libc++.so.1"
[T_MAIN] reloc_jmpslot: *0x3b78170 = 0x400224c390
[T_MAIN] "pthread_detach" in "libc++.so.1" ==> 0x4002142230 in "libthr.so.3"
[T_MAIN] reloc_jmpslot: *0x4002256f70 = 0x4002142230
[T_CRASHING] "memset" in "libgcc_s.so.1" ==> 0x4002417e90 in "libc.so.7"
[T_CRASHING] reloc_jmpslot: *0x4002701028 = 0x4002417e90
[T_CRASHING] "__cxa_thread_call_dtors" in "libthr.so.3" ==> 0x4002423700 in "libc.so.7"
[T_CRASHING] reloc_jmpslot: *0x40021544e0 = 0x4002423700
[T_CRASHING] "_ZNSt3__115__thread_structD1Ev" in "libc++.so.1" ==> 0x400224c8e0 in "libc++.so.1"
[T_CRASHING] reloc_jmpslot: *0x4002256f90 = 0x400224c8e0
[T_CRASHING] "_malloc_thread_cleanup" in "libthr.so.3" ==> 0x40024867e0 in "libc.so.7"
[T_CRASHING] reloc_jmpslot: *0x40021544f0 = 0x40024867e0
[T_CRASHING] "memmove" in "libc.so.7" ==> 0x4002417a70 in "libc.so.7"
[T_CRASHING] reloc_jmpslot: *0x400249cb10 = 0x4002417a70
[T_CRASHING] "__sys_write" in "libthr.so.3" ==> 0x40023f7d70 in "libc.so.7"
[T_CRASHING] reloc_jmpslot: *0x4002154610 = 0x40023f7d70
ENTER(s):
[T_CRASHING] "strlen" in "libthr.so.3" ==> 0x40024191d0 in "libc.so.7"
[T_CRASHING] reloc_jmpslot: *0x4002154618 = 0x40024191d0
/usr/src/lib/libthr/thread/thr_mutex.c+629, __Tthr_mutex_trylock():     160
/usr/src/libexec/rtld-elf/rtld.c+1036, _rtld_bind():    11
/usr/src/libexec/rtld-elf/rtld.c+4238, dl_iterate_phdr():       6
/usr/src/libexec/rtld-elf/rtld.c+4239, dl_iterate_phdr():       6
/usr/src/libexec/rtld-elf/rtld.c+4248, dl_iterate_phdr():       22
/usr/src/lib/libthr/thread/thr_list.c+323, _thr_try_gc():       1
EXITS(s):
/usr/src/lib/libthr/thread/thr_mutex.c+1001, mutex_unlock_common():     160
/usr/src/libexec/rtld-elf/rtld.c+1068, _rtld_bind():    11
/usr/src/libexec/rtld-elf/rtld.c+4244, dl_iterate_phdr():       21
/usr/src/libexec/rtld-elf/rtld.c+4253, dl_iterate_phdr():       6
/usr/src/libexec/rtld-elf/rtld.c+4254, dl_iterate_phdr():       6
/usr/src/lib/libthr/thread/thr_list.c+327, _thr_try_gc():       1
Fatal error 'thread T_CRASHING exits with resources held (ll=0, cc=1)![T_MAIN] '"rename" in "ld" ==> 0x40023f50f0 in "libc.so.7"
a[T_MAIN] treloc_jmpslot: *0x3b78740 = 0x40023f50f0
line 333 in file /usr/src/lib/libthr/thread/thr_exit.c (errno = 0)

commit b2f12eb567aa3071f350ec7261ea33124e05b751
Author: Maksym Sobolyev <sobomax@sippysoft.com>
Date: Sun Dec 22 02:19:54 2024 +0000

Unify THR_CRITICAL_LEAVE() and use it everythere.

Move unwinder dlopen into main thread and open it with RTLD_NOW.
This prevents many potential nasty race conditions from happening
if one thread or several decide they need unwinder while in
_pthread_exit() path(s). This also allows us to GC dlopened
object once we done unwinding.
[T1_CRASH] "_pthread_exit" in "libthr.so.3" ==> 0x4002142580 in "libthr.so.3"
[T1_CRASH] reloc_jmpslot: *0x4002153cd8 = 0x4002142580
[T1_CRASH] "dlsym" in "libthr.so.3" ==> 0x400000ce20 in "ld-elf.so.1"
[T1_CRASH] reloc_jmpslot: *0x4002153d30 = 0x400000ce20
[T1_CRASH] "dladdr" in "libthr.so.3" ==> 0x400000d6f0 in "ld-elf.so.1"
[T1_CRASH] reloc_jmpslot: *0x4002153d38 = 0x400000d6f0
[T1_CRASH] "dlopen" in "libthr.so.3" ==> 0x400000cc40 in "ld-elf.so.1"
[T1_CRASH] reloc_jmpslot: *0x4002153d40 = 0x400000cc40
[T1_CRASH] dlopen_object name "/lib/libgcc_s.so.1" fd -1 refobj "/usr/bin/ld" lo_flags 0x2 mode 0x1
[MAIN] "_ZNSt3__16thread6detachEv" in "ld" ==> 0x400224b390 in "libc++.so.1"
[MAIN] reloc_jmpslot: *0x3b78170 = 0x400224b390
[MAIN] "pthread_detach" in "libc++.so.1" ==> 0x4002142110 in "libthr.so.3"
[MAIN] reloc_jmpslot: *0x4002255f70 = 0x4002142110
[T1_CRASH] "getenv" in "libgcc_s.so.1" ==> 0x40024229f0 in "libc.so.7"
[T1_CRASH] reloc_jmpslot: *0x40026fffe8 = 0x40024229f0
[T1_CRASH] "memcpy" in "libgcc_s.so.1" ==> 0x4002416800 in "libc.so.7"
[T1_CRASH] reloc_jmpslot: *0x40026fff40 = 0x4002416800
[T1_CRASH] "dl_iterate_phdr" in "libgcc_s.so.1" ==> 0x400000dc90 in "ld-elf.so.1"
[T1_CRASH] reloc_jmpslot: *0x4002700020 = 0x400000dc90
[T1_CRASH] "memset" in "libgcc_s.so.1" ==> 0x4002416e90 in "libc.so.7"
[T1_CRASH] reloc_jmpslot: *0x4002700028 = 0x4002416e90
[T1_CRASH] "__cxa_thread_call_dtors" in "libthr.so.3" ==> 0x4002422700 in "libc.so.7"
[T1_CRASH] reloc_jmpslot: *0x4002153d50 = 0x4002422700
[T1_CRASH] "_ZNSt3__115__thread_structD1Ev" in "libc++.so.1" ==> 0x400224b8e0 in "libc++.so.1"
[T1_CRASH] reloc_jmpslot: *0x4002255f90 = 0x400224b8e0
[T1_CRASH] "_malloc_thread_cleanup" in "libthr.so.3" ==> 0x40024857e0 in "libc.so.7"
[T1_CRASH] reloc_jmpslot: *0x4002153d60 = 0x40024857e0
[T1_CRASH] "memmove" in "libc.so.7" ==> 0x4002416a70 in "libc.so.7"
[T1_CRASH] reloc_jmpslot: *0x400249bb10 = 0x4002416a70
[MAIN] "rename" in "ld" ==> 0x40023f40f0 in "libc.so.7"
[MAIN] reloc_jmpslot: *0x3b78740 = 0x40023f40f0
[T1_CRASH] "__sys_write" in "libthr.so.3" ==> 0x40023f6d70 in "libc.so.7"
[T1_CRASH] reloc_jmpslot: *0x4002153e80 = 0x40023f6d70
ENTER(s):
[T1_CRASH] "strlen" in "libthr.so.3" ==> 0x40024181d0 in "libc.so.7"
[T1_CRASH] reloc_jmpslot: *0x4002153e88 = 0x40024181d0
/usr/src/lib/libthr/thread/thr_mutex.c+629, __Tthr_mutex_trylock():     160
/usr/src/libexec/rtld-elf/rtld.c+1036, _rtld_bind():    14
/usr/src/libexec/rtld-elf/rtld.c+3930, do_dlsym():      3
/usr/src/libexec/rtld-elf/rtld.c+4116, dladdr():        1
/usr/src/libexec/rtld-elf/rtld.c+3787, dlopen_object(): 1
/usr/src/libexec/rtld-elf/rtld.c+4238, dl_iterate_phdr():       5
/usr/src/libexec/rtld-elf/rtld.c+4239, dl_iterate_phdr():       5
/usr/src/libexec/rtld-elf/rtld.c+4248, dl_iterate_phdr():       21
/usr/src/lib/libthr/thread/thr_list.c+323, _thr_try_gc():       1
EXITS(s):
/usr/src/lib/libthr/thread/thr_mutex.c+1001, mutex_unlock_common():     160
/usr/src/libexec/rtld-elf/rtld.c+1068, _rtld_bind():    14
/usr/src/libexec/rtld-elf/rtld.c+4028, do_dlsym():      3
/usr/src/libexec/rtld-elf/rtld.c+4159, dladdr():        1
/usr/src/libexec/rtld-elf/rtld.c+4244, dl_iterate_phdr():       21
/usr/src/libexec/rtld-elf/rtld.c+4253, dl_iterate_phdr():       5
/usr/src/libexec/rtld-elf/rtld.c+4254, dl_iterate_phdr():       5
/usr/src/lib/libthr/thread/thr_list.c+327, _thr_try_gc():       1
Fatal [MAIN] e"_ZNSt3__17promiseIvE10get_futureEv" in "ld" ==> 0x40021fb0a0 in "libc++.so.1"r
r[MAIN] oreloc_jmpslot: *0x3b78498 = 0x40021fb0a0r
 'thr[MAIN] e"_ZNSt3__16futureIvEC1EPNS_17__assoc_sub_stateE" in "libc++.so.1" ==> 0x40021fad50 in "libc++.so.1"a
d[MAIN]  reloc_jmpslot: *0x4002255778 = 0x40021fad50

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

I did not looked at the rest of the changes for now, what I see are huge stop signs already.

lib/libthr/thread/thr_rtld.c
178

How could the fact that the main thread is being destroyed affect state of rtld locks?

_thr_rwlock_unlock() reporting error means that the lock was not unlocked. Then, decrementing the critical section count is wrong because it would cause the count underflow and de-facto thread entering critical section for-ever.

That said, I find it quite suspicious the statement itself, that _thr_rwlock_unlock() failed. It must means that the state of libthr was corrupted.

libexec/rtld-elf/rtld.c
1050

As I understand, this change is 'Release rtld_bind_lock() even if we decide to die. Other threads
might appreciate that.' In which way other threads 'might appreciate' this?

Note two things:

  • rtld_die() is just _exit(), so there is no blocking or memory store actions on the execution path
  • this change is destructive, because the bind lock might be write-locked if filter is being loaded. Then the rtld state is not yet consistent, and other threads waiting for the rtld bind lock, get access to it. The end result is UB and most likely program corrupting itself/faulting with SIGSEGV etc instead of clean exit in rtld_die().

Mostly bogus. Sorry for the noise.