Page MenuHomeFreeBSD

Load reloc modules (amd64, mips) faster by caching global syms (PR 192249)
ClosedPublic

Authored by cse_cem_gmail_com on Jan 29 2015, 3:25 PM.

Details

Summary

Add a special ELF SHN_ value (in the SHN_LOOS to SHN_HIOS range) to represent cached SHN_UNDEF globals. Cache the results of global lookup during relocatable object load.

Modify Dtrace to ignore these SHN_FBSD_CACHED symbols the same way it ignored SHN_UNDEF symbols before. I didn't find other users of SHN_UNDEF that might be negatively affected by this change.

Test Plan
time kldload foo.ko

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

cse_cem_gmail_com retitled this revision from to Load reloc modules (amd64, mips) faster by caching global syms (PR 192249).
cse_cem_gmail_com updated this object.
cse_cem_gmail_com edited the test plan for this revision. (Show Details)

When uploading diffs through the web can you please generate them with full context - e.g. git diff -U999999

Sorry, forgot on this one. I'll regenerate.

cse_cem_gmail_com edited edge metadata.

Add 9999 context, drop fbt_powerpc.c modification (it has since been deleted or merged with fbt.c).

Ping! Anything I can do to help further this along?

This is hack. Besides being a hack, I am not sure, how the hack is complete. You identified one place where the hack breaks things, but did you inspected all users of the module page tables ?

That said, I do not see module load being performance critical in any way.

In D1718#9, @kostikbel wrote:

This is hack.

So are modules on amd64, but that's not getting fixed any time soon. If amd64 kmods were properly linked, we could just do fast symbol hash lookups (solves our problem), and also we could then drop link_elf_obj.c (99% duplicate code with link_elf.c).

Besides being a hack, I am not sure, how the hack is complete. You identified one place where the hack breaks things, but did you inspected all users of the module page tables ?

Yes, we looked at all users that might be affected. Not sure what you mean by module page tables.

That said, I do not see module load being performance critical in any way.

Well, that's fine, but it affects us (and even zfs.ko). Priorities aside, we (FreeBSD) fix non-critical bugs, too.

In D1718#9, @kostikbel wrote:

This is hack.

So are modules on amd64, but that's not getting fixed any time soon. If amd64 kmods were properly linked, we could just do fast symbol hash lookups (solves our problem), and also we could then drop link_elf_obj.c (99% duplicate code with link_elf.c).

amd64 .o use for klds is good, while other architectures using .so are not so. amd64 modules do not have overhead of the PIC code, which is less important on amd64, but much more critical on i386, where PIC eats one register out of spare 6. The GOT redirection for non-static var access is paid everywhere. So the useful change is to convert rest of the arches to link_elf_obj.

Otherwise, you are trading run-time performance to the useless module load time optimization.

Besides being a hack, I am not sure, how the hack is complete. You identified one place where the hack breaks things, but did you inspected all users of the module page tables ?

Yes, we looked at all users that might be affected. Not sure what you mean by module page tables.

This should have been 'module symbol table', sorry. Fingers automatically typed the words I use most often :/.

That said, I do not see module load being performance critical in any way.

Well, that's fine, but it affects us (and even zfs.ko). Priorities aside, we (FreeBSD) fix non-critical bugs, too.

What bug does the patch fix ?

In D1718#11, @kostikbel wrote:

What bug does the patch fix ?

The bug where module loading is quite slow for modules with lots of symbols. For ZFS this is noticeable (a few seconds); for EFS at Isilon this is really really significant (30+ seconds of 100% CPU spinning around doing O(N^2) strcmp()s).

In D1718#11, @kostikbel wrote:

What bug does the patch fix ?

The bug where module loading is quite slow for modules with lots of symbols. For ZFS this is noticeable (a few seconds); for EFS at Isilon this is really really significant (30+ seconds of 100% CPU spinning around doing O(N^2) strcmp()s).

Ok. There are two possibilities:

  1. Introduce symbol cache used during the module load, and discarded afterward. You can use hash for the symbol names there, which would not help for the first lookup, but does provide more speedup for the duplicate lookups (more than your current patch).
  1. Do what you did, but revert the SHM_FBSD_CACHED brokeness after the linking is finished, so other places do not see this. In particular, the fbt.c change would be no longer needed, and I become much more convinced that other places are not broken.
In D1718#13, @kostikbel wrote:

Ok. There are two possibilities:

  1. Introduce symbol cache used during the module load, and discarded afterward. You can use hash for the symbol names there, which would not help for the first lookup, but does provide more speedup for the duplicate lookups (more than your current patch).

The current patch got us from 34.7s sys to 2.7s sys (kldload efs.ko) on the same system. I don't have cycles right now to try adding a hash soon (for the next month or so).

  1. Do what you did, but revert the SHM_FBSD_CACHED brokeness after the linking is finished, so other places do not see this. In particular, the fbt.c change would be no longer needed, and I become much more convinced that other places are not broken.

I was thinking of something like this. It seems pretty straightforward to implement. I'll update the patch to do so and drop fbt changes.

cse_cem_gmail_com edited edge metadata.

Updated to revert the SHN_FBSD_CACHED symtable changes after loading has completed, per (2) above.

sys/kern/link_elf_obj.c
893

Why do you only clean for the runtime loaded objects, and do not clean for the preloaded objects ?

IMO, the cleanup must be done much closer to the relocator, probably in relocate_file().

It would be nice to assert that kld_sx is exclusively owned while SHN_FBSD_CACHED hack is in the symbol table, but this might be not easy to implement.

sys/sys/elf_common.h
363

I think that the define deserves much more detailed explanation. Either there, or in the elf_obj_lookup(), to explicitly state the scope of the use, and the reasoning.

Move cleanup to the end of relocation; preloaded and runtime-loaded modules both go through this path.

Add clarifying comment about SHN_FBSD_CACHED usage, longevity, and intent.

(Actual:)

Move cleanup to the end of relocation; preloaded and runtime-loaded modules both go through this path.

Add clarifying comment about SHN_FBSD_CACHED usage, longevity, and intent.

Committed by kib@ in r281003. Thanks!

This revision is now accepted and ready to land.Apr 2 2015, 8:29 PM