Page MenuHomeFreeBSD

Rewrite brk()/sbrk() to avoid using _end.
ClosedPublic

Authored by markj on Jun 4 2018, 2:03 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 25, 8:28 PM
Unknown Object (File)
Wed, Jan 22, 9:42 PM
Unknown Object (File)
Sun, Jan 19, 11:33 PM
Unknown Object (File)
Sun, Jan 19, 9:31 PM
Unknown Object (File)
Sun, Jan 19, 8:12 PM
Unknown Object (File)
Sat, Jan 11, 4:05 AM
Unknown Object (File)
Dec 23 2024, 1:54 PM
Unknown Object (File)
Dec 14 2024, 1:27 AM
Subscribers

Details

Summary

_end is a special symbol emitted by the static linker. It is supposed
to follow the .bss section in memory and is used by libc to set the
initial break (curbrk and minbrk). GNU ld and lld have differing
behaviour around this symbol, leading to incompatibilities when libc.so
and an executable using brk()/sbrk() are linked using different linkers.
For instance, libc.so may end up using its own internal definition of
_end instead of that of the executable. Some details are described in
PR 228574.

To avoid issues of compatibility, rewrite brk() and sbrk() to avoid
using _end. To accomplish this, we add a return value to the kernel's
break() system call and change its error handling slightly. Previously,
break() would return EINVAL if the new break address was smaller than
the address of the beginning of the data segment. With this change,
such an input has no effect. The syscall returns its notion of the
break address (always page-aligned), so break(0) can be used to query
the kernel for the current break address. I believe this change is
backwards-compatible. Further, note that brk() implementations
silently clamp the input break address to minbrk, so brk(0) behaves the
same now as it did before. The syscall itself is undocumented and not
exported by libc.

Before this change, the brk() and sbrk() libc functions were implemented
separately for each supported platform. This change consolidates the
implementations with a rewrite in C and adds a few simple tests. Both
brk() and sbrk() take care to initialize curbrk and minbrk upon first
use, incurring a penalty of one extra system call per process which uses
these interfaces. Given that brk() and sbrk() are rarely used today, I
think this is acceptable.

This change also adds sbrk.o to NOASM on all architectures. The sbrk()
system call is and has always been a no-op.

Test Plan

I added some basic regression tests.

A statically-linked executable using sbrk() still works on amd64.

The change appears to "fix" sbrk() in executables run using rtld direct exec.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 17007
Build 16875: arc lint + arc unit

Event Timeline

markj added reviewers: emaste, kib, brooks.
  • Remove brk.S from MDASM in a couple more places.
lib/libc/sys/brk.c
40

I suspect that if you use uintptr_t type for curbrk and minbrk, amount of casts would reduce.

sys/kern/syscalls.master
121

Wouldn't compat32 syscalls.master require change ?

sys/vm/vm_unix.c
88

Did you experimented with direct exec ? This expression returns rtld daddr, not the main binary. Might be, it is fine, after all, but it certainly changes the behavior of the code.

232

Does this need compat32 shims ?

markj marked 2 inline comments as done.Jun 4 2018, 5:16 PM
markj added inline comments.
sys/vm/vm_unix.c
88

I did some quick experiments. Without this change problems arise because, as you note, libc and the kernel's notion of curbrk differ. The executable's .bss comes before that of rtld in memory, so any attempt to extend the break with sbrk() results in EINVAL.

With this change, libc initializes curbrk to rtld's daddr, so libc and the kernel agree on the curbrk value and sbrk() functions normally.

232

I don't expect so, but I haven't tested yet. Similar to mmap(), I think we can just rely on the VM to return a 32-bit address.

  • Address some review feedback.
This revision is now accepted and ready to land.Jun 4 2018, 5:32 PM
This revision was automatically updated to reflect the committed changes.