They were introduced to take care of ifunc, but right now no architecture provides ifunc'ed variants. Since rtld uses memset extensively this results in a pessmization. Should someone want to use ifunc here they should provide a mandatory symbol (e.g., rtld_memset).
Probing with dtrace over a binary which calls fexecve in a loop shows a 4 x reduction of time spent in memset on amd64.
Sizes seen when execing the program below:
memset: size 656 memset: size 656 memset: size 944 memset: size 656 memset: size 96 memset: size 240 memset: size 240 memset: size 208 memset: size 50624 memset: size 32 memset: size 56
cpu profile with:
dtrace -w -n 'profile:::profile-4999 /execname =="a.out"/ { @[usym(arg1)] = count(); } tick-5s { system("clear"); trunc(@, 40); printa("%40A %@16d\n", @); clear(@); }'
before:
ld-elf.so.1`matched_symbol 456 ld-elf.so.1`reloc_non_plt 1067 ld-elf.so.1`strcmp 1068 ld-elf.so.1`symlook_obj 1156 ld-elf.so.1`memset 1667 ld-elf.so.1`find_symdef 1675
after:
ld-elf.so.1`memset 460 ld-elf.so.1`matched_symbol 504 ld-elf.so.1`strcmp 1126 ld-elf.so.1`symlook_obj 1271 ld-elf.so.1`reloc_non_plt 1292 ld-elf.so.1`find_symdef 1659