Rewrite brk()/sbrk() to avoid using _end.
ClosedPublic
Actions

Authored by markj on Jun 4 2018, 2:03 PM.

Details

Reviewers

emaste
kib
brooks

Group Reviewers

manpages

Commits

rS334626: Reimplement brk() and sbrk() to avoid the use of _end.

Summary

_end is a special symbol emitted by the static linker. It is supposed
to follow the .bss section in memory and is used by libc to set the
initial break (curbrk and minbrk). GNU ld and lld have differing
behaviour around this symbol, leading to incompatibilities when libc.so
and an executable using brk()/sbrk() are linked using different linkers.
For instance, libc.so may end up using its own internal definition of
_end instead of that of the executable. Some details are described in
PR 228574.

To avoid issues of compatibility, rewrite brk() and sbrk() to avoid
using _end. To accomplish this, we add a return value to the kernel's
break() system call and change its error handling slightly. Previously,
break() would return EINVAL if the new break address was smaller than
the address of the beginning of the data segment. With this change,
such an input has no effect. The syscall returns its notion of the
break address (always page-aligned), so break(0) can be used to query
the kernel for the current break address. I believe this change is
backwards-compatible. Further, note that brk() implementations
silently clamp the input break address to minbrk, so brk(0) behaves the
same now as it did before. The syscall itself is undocumented and not
exported by libc.

Before this change, the brk() and sbrk() libc functions were implemented
separately for each supported platform. This change consolidates the
implementations with a rewrite in C and adds a few simple tests. Both
brk() and sbrk() take care to initialize curbrk and minbrk upon first
use, incurring a penalty of one extra system call per process which uses
these interfaces. Given that brk() and sbrk() are rarely used today, I
think this is acceptable.

This change also adds sbrk.o to NOASM on all architectures. The sbrk()
system call is and has always been a no-op.

Test Plan

I added some basic regression tests.

A statically-linked executable using sbrk() still works on amd64.

The change appears to "fix" sbrk() in executables run using rtld direct exec.

Diff Detail

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 17001
Build 16869: arc lint + arc unit

Event Timeline

markj created this revision.Jun 4 2018, 2:03 PM

Harbormaster completed remote builds in B17001: Diff 43308.Jun 4 2018, 2:03 PM

markj edited the test plan for this revision. (Show Details)Jun 4 2018, 2:06 PM

markj added reviewers: emaste, kib, brooks.

Remove brk.S from MDASM in a couple more places.

Harbormaster completed remote builds in B17002: Diff 43309.Jun 4 2018, 2:08 PM

markj edited the test plan for this revision. (Show Details)Jun 4 2018, 2:11 PM

kib added inline comments.Jun 4 2018, 3:33 PM

lib/libc/sys/brk.c
40	I suspect that if you use uintptr_t type for curbrk and minbrk, amount of casts would reduce.
sys/kern/syscalls.master
121	Wouldn't compat32 syscalls.master require change ?
sys/vm/vm_unix.c
88	Did you experimented with direct exec ? This expression returns rtld daddr, not the main binary. Might be, it is fine, after all, but it certainly changes the behavior of the code.
232	Does this need compat32 shims ?

markj marked 2 inline comments as done.Jun 4 2018, 5:16 PM

markj added inline comments.

sys/vm/vm_unix.c
88	I did some quick experiments. Without this change problems arise because, as you note, libc and the kernel's notion of curbrk differ. The executable's .bss comes before that of rtld in memory, so any attempt to extend the break with sbrk() results in EINVAL. With this change, libc initializes curbrk to rtld's daddr, so libc and the kernel agree on the curbrk value and sbrk() functions normally.
232	I don't expect so, but I haven't tested yet. Similar to mmap(), I think we can just rely on the VM to return a 32-bit address.