Page MenuHomeFreeBSD

rtld: fix SysV hash function overflow
ClosedPublic

Authored by emaste on Apr 12 2023, 2:10 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Nov 25, 1:04 AM
Unknown Object (File)
Oct 23 2024, 2:10 PM
Unknown Object (File)
Oct 18 2024, 4:35 AM
Unknown Object (File)
Oct 8 2024, 2:15 AM
Unknown Object (File)
Oct 4 2024, 10:05 AM
Unknown Object (File)
Oct 4 2024, 7:24 AM
Unknown Object (File)
Oct 1 2024, 8:51 PM
Unknown Object (File)
Sep 30 2024, 12:52 AM
Subscribers

Details

Summary
Quoting from https://maskray.me/blog/2023-04-12-elf-hash-function:

The System V Application Binary Interface (generic ABI) specifies the
ELF object file format. When producing an output executable or shared
object needing a dynamic symbol table (.dynsym), a linker generates a
.hash section with type SHT_HASH to hold a symbol hash table. A DT_HASH
tag is produced to hold the address of .hash.

The function is supposed to return a value no larger than 0x0fffffff.
Unfortunately, there is a bug. When unsigned long consists of more than
32 bits, the return value may be larger than UINT32_MAX. For instance,
elf_hash((const unsigned char *)"\xff\x0f\x0f\x0f\x0f\x0f\x12") returns
0x100000002, which is clearly unintended, as the function should behave
the same way regardless of whether long represents a 32-bit integer or
a 64-bit integer.

Sponsored by:   The FreeBSD Foundation

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

emaste created this revision.

I read that thread. Wouldn't it be better/cleaner to change types to uint32_t, or even Elf32_Word?

sys/kern/link_elf.c
1479–1480

g is unused now?

Use 0x0fffffff to make it more obvious that this is 28 bits not 32.

Still do not want to use Elf32Word?

BTW, what about re-indenting the function as the preparatory commit? You change more lines than keep intact.

sys/kern/link_elf.c
1474

The last sentence from the comment is worth keeping as well, IMO.

1479

I mean, the h var type should be changed as well.

  • More kib feedback
  • Reindent first (this diff is against the reindent, which will be committed independently first)
  • edit libexec/rtld-elf/rtld.c instead of sys/kern/link_elf.c

Presumably we should make the same change in rtld.c and link_elf.c

libexec/rtld-elf/rtld.c
1805 ↗(On Diff #120196)
sys/kern/link_elf.c
1478
1479–1480

Include both rtld and kernel.
I'd expect this to be three separate commits:

  • reindent rtld
  • rtld
  • kernel
libexec/rtld-elf/rtld.c
1805 ↗(On Diff #120196)

will include with the reindent commit

This revision is now accepted and ready to land.Apr 12 2023, 3:45 PM

LGTM. I made a post last night and did plan to make this FreeBSD change for my exercise, but you beat it to me :)

elf_hash((const unsigned char *)"ZZZZZW9p")) == 100000000. This is an all-ascii-alphanumeric identifier.
For extra fun, you may try ld -shared --hash-style=sysv and ensure 100000000 % nbucket != 0 (lld sets nbucket to the number of dynamic symbols) and see whether the dynamic symbol cannot be bound without this patch:)

For extra fun, you may try ld -shared --hash-style=sysv and ensure 100000000 % nbucket != 0 (lld sets nbucket to the number of dynamic symbols) and see whether the dynamic symbol cannot be bound without this patch:)

Indeed, I made a test object with two functions

void ZZZZZW9p(void) { }
void fn(void) { }

which gives 3 dynamic symbols.

SysV hash section:

Hex dump of section '.hash':
  0x00000210 03000000 03000000 01000000 00000000 ................
  0x00000220 02000000 00000000 00000000 00000000 ................

When linked with --hash-style=sysv, dlsym(dl, "ZZZZZW9p") returns null (i.e., fails). When linked wtih --hash-style=gnu it works. With this patch (D39517) applied to rtld it works for both hash styles.

For extra fun, you may try ld -shared --hash-style=sysv and ensure 100000000 % nbucket != 0 (lld sets nbucket to the number of dynamic symbols) and see whether the dynamic symbol cannot be bound without this patch:)

Indeed, I made a test object with two functions

void ZZZZZW9p(void) { }
void fn(void) { }

which gives 3 dynamic symbols.

SysV hash section:

Hex dump of section '.hash':
  0x00000210 03000000 03000000 01000000 00000000 ................
  0x00000220 02000000 00000000 00000000 00000000 ................

When linked with --hash-style=sysv, dlsym(dl, "ZZZZZW9p") returns null (i.e., fails). When linked wtih --hash-style=gnu it works. With this patch (D39517) applied to rtld it works for both hash styles.

Did you tried linking the dso with ld.bfd?

it looks like bfd sets nbuckets=1 for my test object so isn't useful to create the problem.

Hex dump of section '.hash':
  0x00000120 01000000 03000000 02000000 00000000 ................
  0x00000130 00000000 01000000                   ........

but bfd added & 0xffffffff decades ago as mentioned in the blog - https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=32dfa85d9015feea4a06d423fe58f6eaf841456e