Page MenuHomeFreeBSD

Untangle jemalloc and mutexes initialization.
ClosedPublic

Authored by kib on Jan 27 2019, 4:51 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Jan 19, 6:49 AM
Unknown Object (File)
Fri, Jan 17, 3:19 PM
Unknown Object (File)
Thu, Jan 16, 2:05 AM
Unknown Object (File)
Sat, Jan 11, 6:22 AM
Unknown Object (File)
Sat, Jan 11, 3:37 AM
Unknown Object (File)
Sat, Jan 11, 1:52 AM
Unknown Object (File)
Dec 17 2024, 11:33 AM
Unknown Object (File)
Dec 12 2024, 9:09 AM

Details

Summary

The need to use libc malloc(3) from some places in libthr always caused issues. For instance, per-thread key allocation was switched to use plain mmap(2) to get storage, because some third party mallocs used keys for implementation of calloc(3).

Even more important, libthr calls calloc(3) during initialization of pthread mutexes, and jemalloc uses pthread mutexes. jemalloc provides some way to both postpone the initialization, and to make initialization to use specialized allocator, but this is very fragile and often breaks. See the referenced PR for another example.

Add the small malloc implementation used by rtld, to libthr. Use it in thr_spec.c and for mutexes initialization. This avoids the issues with mutual dependencies between malloc and libthr in principle. The drawback is that some more allocations are not interceptable for alternate malloc implementations. There should be not too much memory use from this allocator, and the alternative, direct use of mmap(2) is obviously worse.

PR: 235211

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 22193

Event Timeline

Only disable cast alignment warning for malloc.c.

I went ahead and took the quite literal approach to turning Andrew's test case into an atf test and integrating it into stdlib tests: https://people.freebsd.org/~kevans/dynthr-test.diff -- I guess we could cut out 90% of it, but I'm not sure how much we care to.

I went ahead and took the quite literal approach to turning Andrew's test case into an atf test and integrating it into stdlib tests: https://people.freebsd.org/~kevans/dynthr-test.diff -- I guess we could cut out 90% of it, but I'm not sure how much we care to.

It just occurred to me that this was still in-progress from last night when I uploaded it; it minimally works to produce the failure, but I think I was going to throw another ATF_CHECK or two in.

I went ahead and took the quite literal approach to turning Andrew's test case into an atf test and integrating it into stdlib tests: https://people.freebsd.org/~kevans/dynthr-test.diff -- I guess we could cut out 90% of it, but I'm not sure how much we care to.

It just occurred to me that this was still in-progress from last night when I uploaded it; it minimally works to produce the failure, but I think I was going to throw another ATF_CHECK or two in.

Please create a review.

Two immediate notes: IMO it is _much_ better to split the file into two, so that you do not have to duplicate implicit build rules for the module. Second, it would be useful to add some comment explaining what do we test.

Also, the license probably should be spelled fully.

This revision was not accepted when it landed; it landed in state Needs Review.Jan 29 2019, 10:40 PM
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state Needs Review.Jan 29 2019, 10:45 PM
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state Needs Review.Jan 29 2019, 10:46 PM
This revision was automatically updated to reflect the committed changes.
arichardson added inline comments.
head/lib/libthr/Makefile
61 ↗(On Diff #53407)

Unlikely to make a big code size difference, but couldn't we use the version from rtld when building a shared library?

head/lib/libthr/Makefile
61 ↗(On Diff #53407)

Do you mean exporting __crt_ symbols from rtld and sharing the allocator between rtld and libthr ?

I evaluated this option. The main argument against it is that rtld purpose is to do the dynamic linking and thats all. It should not provide any other runtime services. Of course this is somewhat blurry since e.g. rtld implements runtime part of TLS support, but it is significantly intertwined with dynamic libraries loading anyway. Secondary objection is that I would need to use rtld bind lock to synchronize libthr allocations.

Argument pro is that we would use the single arena for both rtld and libthr causing less fragmenation and other overhead..

So I decided to not over-complicate the setup.

This seems a good approach to make things work. Malloc typically uses mutexes, so it makes things much easier if mutexes do not use malloc.

The separate allocator may make the allocation pattern more like an array of mutexes, but this is likely a small effect and applications that deeply care about mutex performance shouldn't use pthread mutexes anyway (they should use parking_lot or a similar library that provides simple mutexes using 1 byte per mutex and a global contention resolution table).

Sharing the allocator between rtld and libthr may not be worth it. Apart from the complications and strange dependencies added by it, a substantial portion of the <2K savings in text size is going to be eaten by symbol names, GOT entries and PLT entries.