Quoting from https://maskray.me/blog/2023-04-12-elf-hash-function: The System V Application Binary Interface (generic ABI) specifies the ELF object file format. When producing an output executable or shared object needing a dynamic symbol table (.dynsym), a linker generates a .hash section with type SHT_HASH to hold a symbol hash table. A DT_HASH tag is produced to hold the address of .hash. The function is supposed to return a value no larger than 0x0fffffff. Unfortunately, there is a bug. When unsigned long consists of more than 32 bits, the return value may be larger than UINT32_MAX. For instance, elf_hash((const unsigned char *)"\xff\x0f\x0f\x0f\x0f\x0f\x12") returns 0x100000002, which is clearly unintended, as the function should behave the same way regardless of whether long represents a 32-bit integer or a 64-bit integer. Sponsored by: The FreeBSD Foundation
Details
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
I read that thread. Wouldn't it be better/cleaner to change types to uint32_t, or even Elf32_Word?
sys/kern/link_elf.c | ||
---|---|---|
1480 ↗ | (On Diff #120191) | g is unused now? |
Still do not want to use Elf32Word?
BTW, what about re-indenting the function as the preparatory commit? You change more lines than keep intact.
sys/kern/link_elf.c | ||
---|---|---|
1474 ↗ | (On Diff #120193) | The last sentence from the comment is worth keeping as well, IMO. |
1479 ↗ | (On Diff #120193) | I mean, the h var type should be changed as well. |
- More kib feedback
- Reindent first (this diff is against the reindent, which will be committed independently first)
- edit libexec/rtld-elf/rtld.c instead of sys/kern/link_elf.c
Presumably we should make the same change in rtld.c and link_elf.c
Include both rtld and kernel.
I'd expect this to be three separate commits:
- reindent rtld
- rtld
- kernel
libexec/rtld-elf/rtld.c | ||
---|---|---|
1805–1806 | will include with the reindent commit |
LGTM. I made a post last night and did plan to make this FreeBSD change for my exercise, but you beat it to me :)
elf_hash((const unsigned char *)"ZZZZZW9p")) == 100000000. This is an all-ascii-alphanumeric identifier.
For extra fun, you may try ld -shared --hash-style=sysv and ensure 100000000 % nbucket != 0 (lld sets nbucket to the number of dynamic symbols) and see whether the dynamic symbol cannot be bound without this patch:)
For extra fun, you may try ld -shared --hash-style=sysv and ensure 100000000 % nbucket != 0 (lld sets nbucket to the number of dynamic symbols) and see whether the dynamic symbol cannot be bound without this patch:)
Indeed, I made a test object with two functions
void ZZZZZW9p(void) { } void fn(void) { }
which gives 3 dynamic symbols.
SysV hash section:
Hex dump of section '.hash': 0x00000210 03000000 03000000 01000000 00000000 ................ 0x00000220 02000000 00000000 00000000 00000000 ................
When linked with --hash-style=sysv, dlsym(dl, "ZZZZZW9p") returns null (i.e., fails). When linked wtih --hash-style=gnu it works. With this patch (D39517) applied to rtld it works for both hash styles.
it looks like bfd sets nbuckets=1 for my test object so isn't useful to create the problem.
Hex dump of section '.hash': 0x00000120 01000000 03000000 02000000 00000000 ................ 0x00000130 00000000 01000000 ........
but bfd added & 0xffffffff decades ago as mentioned in the blog - https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=32dfa85d9015feea4a06d423fe58f6eaf841456e