Page MenuHomeFreeBSD

libc, libthr: coordinate stubs for pthread_{suspend,resume}_all_np
AbandonedPublic

Authored by kevans on Oct 31 2024, 4:20 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 25, 8:36 PM
Unknown Object (File)
Wed, Jan 15, 6:50 PM
Unknown Object (File)
Tue, Jan 14, 4:08 PM
Unknown Object (File)
Sat, Jan 11, 7:53 AM
Unknown Object (File)
Sat, Jan 11, 4:51 AM
Unknown Object (File)
Dec 9 2024, 5:56 AM
Unknown Object (File)
Nov 30 2024, 11:57 AM
Unknown Object (File)
Nov 14 2024, 6:29 AM
Subscribers

Details

Summary

Our sanitizer StopTheWorld implementation needs to be able to freeze the
process so that it can analyze each thread, with the additional caveat that
it needs to be able to guarantee that none of the threads were in the middle
of dl_iterate_phdr() (or have sufficient locking to block it).

Break out pthread_{suspend,resume}_all_np so that we can just wrap our
StopTheWorld implementation in thread suspension to get them all to a point
that should be relatively safe. If libthr isn't linked into the process in
the first place, then nops are perfectly fine because we won't be calling
StopTheWorld() while iterating.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 60283
Build 57167: arc lint + arc unit

Event Timeline

lib/libc/gen/Symbol.map
461

This would not work; you need to export symbol with the same version both from libc and libthr.

Move libc symbols to the correct version

This revision is now accepted and ready to land.Oct 31 2024, 6:57 PM

Holding off on this one for now because I don't think it's really sufficient to fix my problem- I still have a deadlock because an application can take a (non-pthread, it seems) lock inside their dl_iterate_phdr() callback, the sanitizer ends up suspending the thread that holds that particular lock and then we will never be able to suspend the thread waiting inside dl_iterate_phdr().

Unless we'e OK with exporting some interface for sanitizers to take the bind lock (then suspend + drop the lock, resume after StopTheWorld) or otherwise ensure that dl_iterate_phdr() has been evacuated, I'm not sure how to solve this particular problem of the sanitizer needing to suspend everything and call dl_iterate_phdr() since we can't recurse on the bind lock.

We can export something like dl_iterate_phdr_locked() in private namespace, which would not take the phdr lock, if the problem is with it.

If the issue is with the bind lock, it is much harder. First, I do not want to export it in any form, because rtld locking is very much implementation detail. What I can export is some function, e.g. rtld_locked(void (*fn)(void *), void *arg) (in private namespace) which would call fn with all rtld locks held. Would it be useful?

In D47350#1081391, @kib wrote:

We can export something like dl_iterate_phdr_locked() in private namespace, which would not take the phdr lock, if the problem is with it.

I think this would be sufficient for lsan. The entire problem is that we fork off a 'tracer task' and want to dl_iterate_phdr() from it to process .data/.bss for scanning and we have no idea if any thread from the now-parent already had either of the prerequisite locks (since you need both phdr+bind) taken. The tracer task will remain single-threaded and won't personally try to dlopen() or anything, so it seems like it should be fine to totally subvert the locking?

If the issue is with the bind lock, it is much harder. First, I do not want to export it in any form, because rtld locking is very much implementation detail. What I can export is some function, e.g. rtld_locked(void (*fn)(void *), void *arg) (in private namespace) which would call fn with all rtld locks held. Would it be useful?

I did a probably naive hack job at pulling out a _dl_iterate_phdr_locked(), where we'll optionally drop the bind lock if some lockstate was passed in: https://people.freebsd.org/~kevans/rtld-iterate-locked.diff

It does exactly what I need it to do, if we can make it palatable for commit. I do note that the error == 0 conditional at the end of dl_iterate_phdr() is tautological; if we hit an error during traversal we'll return, so there's no path in which we don't execute that branch that I can imagine. I didn't touch it in this patch, though.

Failed Tests (7):
  LeakSanitizer-AddressSanitizer-x86_64 :: TestCases/leak_check_before_thread_started.cpp
  LeakSanitizer-AddressSanitizer-x86_64 :: TestCases/many_tls_keys_pthread.cpp
  LeakSanitizer-AddressSanitizer-x86_64 :: TestCases/swapcontext.cpp
  LeakSanitizer-Standalone-x86_64 :: TestCases/do_leak_check_override.cpp
  LeakSanitizer-Standalone-x86_64 :: TestCases/leak_check_before_thread_started.cpp
  LeakSanitizer-Standalone-x86_64 :: TestCases/many_tls_keys_pthread.cpp
  LeakSanitizer-Standalone-x86_64 :: TestCases/swapcontext.cpp


Testing Time: 225.69s

Total Discovered Tests: 126
  Unsupported: 46 (36.51%)
  Passed     : 73 (57.94%)
  Failed     :  7 (5.56%)

(edit: improved version at https://people.freebsd.org/~kevans/rtld-iterate-locked-v2.diff that doesn't expose the pointer to the lockstate arg, which would always be NULL for external callers)

Yes this is almost exactly what I mean by dl_iterate_phdr_locked().

The only think I need to note, dl_iterate_phdr_locked() is UB, you can get any behavior (practically: fault, infinite loop, or similar) from it.

Why abandon? This is useful in general.

In D47350#1084710, @kib wrote:

Why abandon? This is useful in general.

I wasn't sure if it was worth doing if I don't have a consumer in mind now that I have lsan off of it- I'm more than happy to go ahead and push it if you think it's still useful.

In D47350#1084710, @kib wrote:

Why abandon? This is useful in general.

I wasn't sure if it was worth doing if I don't have a consumer in mind now that I have lsan off of it- I'm more than happy to go ahead and push it if you think it's still useful.

IMO it does not make sense loosing this work.