Page MenuHomeFreeBSD

libproc: implement symbol caching.
AbandonedPublic

Authored by rpaulo on Feb 17 2015, 2:53 AM.
Tags
None
Referenced Files
Unknown Object (File)
Jan 29 2024, 1:55 PM
Unknown Object (File)
Dec 20 2023, 2:56 AM
Unknown Object (File)
Dec 5 2023, 10:37 PM
Unknown Object (File)
Dec 2 2023, 11:27 AM
Unknown Object (File)
Aug 23 2023, 2:54 PM
Unknown Object (File)
Jul 4 2023, 10:31 AM
Unknown Object (File)
Jul 2 2023, 11:34 AM
Unknown Object (File)
May 11 2023, 8:47 PM
Subscribers

Details

Reviewers
None
Group Reviewers
DTrace
Summary

For processes that loop frequently, like top(1), the cache hit ratio was
measured at > 90%. For short lived processes, like ls(1), the cache hit
ratio is around 50%.

While it would be possible to index the cache via the GElf_Sym, it's
more effective to cache the address instead. The reason is because the
ELF symbol structure doesn't contain any information about VM mappings
which are essential to do lookups by addresses.

Test Plan

Ran with and without this patch. Confirmed the output of dtruss -s
was the same.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

rpaulo retitled this revision from to libproc: implement symbol caching..
rpaulo updated this object.
rpaulo edited the test plan for this revision. (Show Details)
rpaulo added a reviewer: DTrace.

We already read in the entire symbol table every time we do a lookup. Why not just pre-emptively cache the whole table whenever an object is loaded (during attach or after a dlopen)? If the ELF handle is kept open, you can build an array of symbols sorted by addr, and just use elf_strptr(3) to do name lookups instead of keeping truncated copies around. This will also make lookup-by-address quick, since a binary search can be used on the symbol array. It also makes it easy to fix an existing problem with symbol aliasing: currently, if a symbol has multiple aliases, we just return the first one we find (which is why pid$target::malloc:entry always returns "__malloc" as the probefunc instead of "malloc"). But if they're all grouped together, we can pick the "right" one more easily (e.g. use the symbol with fewer leading underscores). There are a few pid provider tests that fail because of this (see tst.weak1.c and tst.weak2.c).

lib/libproc/_libproc.h
39

This is kind of arbitrary. Large multithreaded programs (e.g. firefox) will probably result in a low hit/miss ratio.

43

This isn't really enough for C++ symbols; see r274637. At least there should be checks for truncation after the strlcpy() calls below.

Well, we might not read the whole symbol table. We just read enough to find our symbol. If the symbol is at the end of the table, we have iterated the whole table like you said, but if it's in the beginning, it won't be as bad. I guess it depends on how many functions we end up resolving. libc.so has about 3096 symbols. I'm still wondering if it's ok to cache all of that in memory. Perhaps it's going to be less than 512k which would be ok even on ARM given the size of DTrace now. This would negatively impact the startup size, though.

lib/libproc/_libproc.h
39

Yes, it's arbitrary but it could be easily changed based on further testing.

In D1863#5, @rpaulo wrote:

Well, we might not read the whole symbol table. We just read enough to find our symbol. If the symbol is at the end of the table, we have iterated the whole table like you said, but if it's in the beginning, it won't be as bad. I guess it depends on how many functions we end up resolving. libc.so has about 3096 symbols. I'm still wondering if it's ok to cache all of that in memory. Perhaps it's going to be less than 512k which would be ok even on ARM given the size of DTrace now. This would negatively impact the startup size, though.

libelf will copy the entire section contents into a malloced buffer as soon as elf_getdata(3) is called for that section, even when the ELF handle is read-only. So as soon as libproc accesses a single symbol in a table, libelf makes a copy of the entire table. It seemed to me that one could just run qsort() on that, and cache symbols on a per-symbol table basis rather a per-symbol basis. This way there's no extra copying, and a binary search will perform better than a linked list for large applications, so this approach scales better.

So we already copy all of the symbol tables we touch. Based on the symbol table sizes in /usr/lib/debug, symbol tables for /usr/bin are 16KB on average, with 6KB in /usr/sbin, 4KB for /usr/lib, 8Kb for /lib and 21KB for /usr/local/lib.