Details

Reviewers

markj
emaste

Commits

rS357450: addr2line: Cache CU DIEs upon a successful address lookup.

Summary

Used RB_TREE to store cu info.
Translate function logic:
First lookup in cache
Then lookup in tree. If success update cache.
Then brute force. If success insert in tree and cache

Test Plan

Compared output with stock addr2line on 10000 random kernel addresses. The outputs are the same.
Compared performance with stock addr2line and gnu.
Memory usage on 10000 random addresses:
Tree: 77 MB
stock: 256 KB
GNU: 36 KB

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

tig_freebsdfoundation.org created this revision.Jan 29 2020, 8:55 PM

Herald added a subscriber: imp. · View Herald TranscriptJan 29 2020, 8:55 PM

tig_freebsdfoundation.org updated this revision to Diff 67510.Jan 30 2020, 2:04 PM

markj added inline comments.Jan 30 2020, 3:17 PM

contrib/elftoolchain/addr2line/addr2line.c
82 ↗	(On Diff #67510)	Why not add a DIE field to struct CU above? Otherwise you are duplicating the lowpc/hipc fields.
434 ↗	(On Diff #67510)	Do we need to keep the last_die cache anymore? Does it really help much?
518 ↗	(On Diff #67510)	I think this structure never gets freed? Before that kind of made sense since we were putting it in an (unused) hash table, but now it does not. That said, I think it makes sense for struct node and struct CU to be merged, in which case you would not want to free the CUs since they end up in the lookup tree.
525 ↗	(On Diff #67510)	I think we want to check for a match before doing any allocation, not after.
549 ↗	(On Diff #67510)	This label name doesn't really make sense to me: the code which does the cache insertion comes before the label.
599 ↗	(On Diff #67510)	On a cache hit, we will have cu == NULL here. Is that correct? Note that addr2line has -f and -i options that cause the (func \|\| inlines) check to be true. Make sure that this case still works.
818 ↗	(On Diff #67510)	This code should just be deleted.

I fixed the issues according to the comments. We're using CU as tree node now and only allocate CU when addr is in the CU's range. The code looks much cleaner now.

contrib/elftoolchain/addr2line/addr2line.c
525 ↗	(On Diff #67510)	Yes. We should just allocate cu here and add it to our tree.

markj added inline comments.Jan 31 2020, 4:11 PM

contrib/elftoolchain/addr2line/addr2line.c
120 ↗	(On Diff #67516)	As a matter of style I would move this above the #define of USAGE_MESSAGE. USAGE_MESSAGE is used by usage() below.
413 ↗	(On Diff #67516)	Extra newline.
433 ↗	(On Diff #67516)	Per style(9) variable declarations should be at the beginning of the function.
791 ↗	(On Diff #67516)	I don't really like the useless node. Instead of using it, can't we modify the lookup algorithm slightly to look at RB_LAST() if RB_NFIND() returns NULL?

I removed the useless node and now we check max node if nfind returns null. I guess useless node was indeed useless lol.

This is looking good.

contrib/elftoolchain/addr2line/addr2line.c
458 ↗	(On Diff #67558)	I would suggest putting this in a subroutine, culookup() or so.
543 ↗	(On Diff #67558)	I think the insertion should be done in the block where cu is allocated.
549 ↗	(On Diff #67558)	Again, do we really need the single-entry cache anymore?

tig_freebsdfoundation.org updated this revision to Diff 67570.Jan 31 2020, 6:49 PM

tig_freebsdfoundation.org updated this revision to Diff 67571.

Updated. Current code is passing my tests.

contrib/elftoolchain/addr2line/addr2line.c
549 ↗	(On Diff #67558)	I think we do because tree lookup is O(n) and it's O(1) if it's a single cache HIT. This is good for sequential lookups.

tig_freebsdfoundation.org marked an inline comment as not done.Jan 31 2020, 6:53 PM

markj added inline comments.Jan 31 2020, 7:01 PM

contrib/elftoolchain/addr2line/addr2line.c
549 ↗	(On Diff #67558)	I guess my real question is, does it make a difference in practice? That is, can you measure a significant benefit from it when input is sorted? I don't object to keeping it if so, but the goal here is really just to ensure that addr2line is not abysmally slow. A 1% improvement is not very important and not worth the extra complexity in my opinion.