Page MenuHomeFreeBSD

llvm: implement the bits missing for LSan on FreeBSD
Needs ReviewPublic

Authored by kevans on Feb 23 2024, 2:04 AM.

Details

Reviewers
emaste
dim
markj
Summary

FreeBSD is namely missing a StopTheWorld implementation, but there are other
odds and ends that need to be FreeBSD enabled to work.

The StopTheWorld implementation is generally based on the NetBSD one, which
is in turn based on the Linux one. We use rfork(RFPROC|RFMEM) to spawn off
a tracer thread that ptraces the target process and analyzes it.

Future work could perhaps use, e.g., a pthread with pthread_suspend_all_np,
but we don't really have a way at the moment to fetch a list of threads or
their registers that we currently obtain with PT_GETREGS.

StopTheWorld implementation and prior work done by Luoqi Chen (luoqi@),
with some portability fixes by kevans@.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 56175
Build 53063: arc lint + arc unit

Event Timeline

Oh, and I note that I've only tested this on amd64 and arm64 to make sure that both the rfork_thread and rfork paths work. LSan is incredibly useful when used in an ASan + libFuzzer combination.

Just be curious, should this be directly submitted to https://github.com/llvm/llvm-project/ ?

Yeah, it's fine to test this locally but it should really be submitted upstream. If this goes through changes upstream, due to comments, it's a bit of a burden to maintain downstream.

My hope was to get a sanity check on StopTheWorld, at least, beforehand since that's a FreeBSD implementation detail and it seems easier to collect reviews for that specifically here.

Have you tried running the lsan test suite with this change?

Have you tried running the lsan test suite with this change?

You would probably have to adjust the test suite anyway, because it won't run lsan tests on FreeBSD. I am 100% sure that upstream will first ask to add those tests :)

Have you tried running the lsan test suite with this change?

I'm working on it, but just trying to get the llvm test suite running has been... rough. I chose to redirect from your question to the tangentially related and still valuable: "Have you tried running the asan test suite with this change?" since we can know how much of the asan test suite has passed already.

After fumbling around and finally figuring out that I needed to configure llvm/ with -DLLVM_ENABLE_PROJECTS="clang;compiler-rt" because some of the test prereqs are in llvm or clang, most of the tests that actually run fail because they're trying to do something like:

22:56 <@kevans91_>  70252 103012 Asan-aarch64-inline CALL  execve(0,0x81278f78,0x81278f98)
22:56 <@kevans91_>  70252 103012 Asan-aarch64-inline RET   execve -1 errno 9 Bad file descriptor

... and this is the realization that I just burned a not-insignificant chunk of time because this hasn't been upstreamed yet, three months later: https://cgit.freebsd.org/src/commit/?id=4c9a0adad18263ec8725d9bfc5f560c6ad1da8bd

The lsan tests reliably trigger the deadlock described above LockStuffAndStopTheWorld, so I can't really get any valid results from that one.

The asan tests see 9 regressions, but they all seem to stem from a single leak in libcxxrt. It allocates some thread-specific data here: https://github.com/libcxxrt/libcxxrt/blob/master/src/exception.cc#L420 -- but libthr won't call our dtor on normal program exit, so thread_cleanup() is never seen. Other implementations don't either, so we're not exactly unique in that regard.

Have you tried running the lsan test suite with this change?

I'm working on it, but just trying to get the llvm test suite running has been... rough. I chose to redirect from your question to the tangentially related and still valuable: "Have you tried running the asan test suite with this change?" since we can know how much of the asan test suite has passed already.

After fumbling around and finally figuring out that I needed to configure llvm/ with -DLLVM_ENABLE_PROJECTS="clang;compiler-rt" because some of the test prereqs are in llvm or clang, most of the tests that actually run fail because they're trying to do something like:

22:56 <@kevans91_>  70252 103012 Asan-aarch64-inline CALL  execve(0,0x81278f78,0x81278f98)
22:56 <@kevans91_>  70252 103012 Asan-aarch64-inline RET   execve -1 errno 9 Bad file descriptor

... and this is the realization that I just burned a not-insignificant chunk of time because this hasn't been upstreamed yet, three months later: https://cgit.freebsd.org/src/commit/?id=4c9a0adad18263ec8725d9bfc5f560c6ad1da8bd

If you or @dim upload that change I can give it an approval and we can merge it.

Have you tried running the lsan test suite with this change?

I'm working on it, but just trying to get the llvm test suite running has been... rough. I chose to redirect from your question to the tangentially related and still valuable: "Have you tried running the asan test suite with this change?" since we can know how much of the asan test suite has passed already.

After fumbling around and finally figuring out that I needed to configure llvm/ with -DLLVM_ENABLE_PROJECTS="clang;compiler-rt" because some of the test prereqs are in llvm or clang, most of the tests that actually run fail because they're trying to do something like:

22:56 <@kevans91_>  70252 103012 Asan-aarch64-inline CALL  execve(0,0x81278f78,0x81278f98)
22:56 <@kevans91_>  70252 103012 Asan-aarch64-inline RET   execve -1 errno 9 Bad file descriptor

... and this is the realization that I just burned a not-insignificant chunk of time because this hasn't been upstreamed yet, three months later: https://cgit.freebsd.org/src/commit/?id=4c9a0adad18263ec8725d9bfc5f560c6ad1da8bd

If you or @dim upload that change I can give it an approval and we can merge it.

It was committed past november, in https://github.com/llvm/llvm-project/commit/7440e4ed85aa9 , which was a fold-up of:

In D44030#1005624, @dim wrote:

Have you tried running the lsan test suite with this change?

I'm working on it, but just trying to get the llvm test suite running has been... rough. I chose to redirect from your question to the tangentially related and still valuable: "Have you tried running the asan test suite with this change?" since we can know how much of the asan test suite has passed already.

After fumbling around and finally figuring out that I needed to configure llvm/ with -DLLVM_ENABLE_PROJECTS="clang;compiler-rt" because some of the test prereqs are in llvm or clang, most of the tests that actually run fail because they're trying to do something like:

22:56 <@kevans91_>  70252 103012 Asan-aarch64-inline CALL  execve(0,0x81278f78,0x81278f98)
22:56 <@kevans91_>  70252 103012 Asan-aarch64-inline RET   execve -1 errno 9 Bad file descriptor

... and this is the realization that I just burned a not-insignificant chunk of time because this hasn't been upstreamed yet, three months later: https://cgit.freebsd.org/src/commit/?id=4c9a0adad18263ec8725d9bfc5f560c6ad1da8bd

If you or @dim upload that change I can give it an approval and we can merge it.

It was committed past november, in https://github.com/llvm/llvm-project/commit/7440e4ed85aa9 , which was a fold-up of:

It also isn't in the current version in main, for some reason: https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp#L937

Hmm https://github.com/llvm/llvm-project/commit/691b12a2dcc12fa43517d23f2a9b6039616eebc8 moved this function out of the file, but reverted back to calling elf_aux_info()! That was really a mistake, I'll mention that upstream.

@devnexen_gmail.com I think you implemented https://github.com/llvm/llvm-project/commit/691b12a2dcc12fa43517d23f2a9b6039616eebc8, but the problem is that it specifcally did _not_ call elf_aux_info since that is intercepted, and might lead to segfaults.

In D44030#1005630, @dim wrote:

@devnexen_gmail.com I think you implemented https://github.com/llvm/llvm-project/commit/691b12a2dcc12fa43517d23f2a9b6039616eebc8, but the problem is that it specifcally did _not_ call elf_aux_info since that is intercepted, and might lead to segfaults.

Also, AT_EXECPATH doesn't return a pointer to the path, it copies it out to the buffer supplied:

...
The following values, defined in <sys/elf_common.h> can be
     requested (corresponding buffer sizes are specified in parenthesis):
...
     AT_EXECPATH   The path of executed program (MAXPATHLEN). This may not be
                   present if the process was initialized by fexecve(2) and
                   the namecache no longer contains the file's name.

Emphasis on 'buffer sizes are specified in parenthesis'