Page MenuHomeFreeBSD

Rework the stack gap implementation
ClosedPublic

Authored by markj on Dec 30 2021, 10:41 PM.

Details

Summary

The previous implementation mapped a stack below the shared page and
inserted a randomly sized gap between PS_STRINGS and other process
metadata, below which the main stack stack begins. This has a couple of
problems:

  1. The size of the gap is included in the stack limit, which means that small RLIMIT_STACK values cannot be used.
  2. The gap is mapped, and thus mlockall(MCL_CURRENT) will wire it.

Fix the problem by moving the gap before PS_STRINGS. This means that
the gap region is unmapped and in particular is not covered by the stack
mapping. Thus RLIMIT_STACK no longer includes the size of the gap.

As a consequence, the location of PS_STRINGS is no longer fixed. If no
stack gap is configured then it is located in the same place as before.
Very old binaries depend on the location being fixed, but they predate
the introduction of any surviving 64-bit platform ports. Thus,
re-enable the stack gap only on 64-bit platforms (ASLR is not enabled
for elf32 by default anyway) and disallow enabling it for 32-bit
executables if COMPAT_43 is configured.

Since the PS_STRINGS location is now variable, add a field to struct
proc to store it. This field is copied on fork and initialized by image
activators.

Remove the newly KERN_STACKTOP sysctl and modify KERN_USRSTACK to return
the true top of the stack. This effectively reverts commits
78df56ccfcb4 ("libthr: Use kern.stacktop for thread stack calculation.")
and
a97d697122da ("kern_exec: Add kern.stacktop sysctl.")
The KERN_STACKTOP sysctl does not provide compatibility for old copies
of libthr.so in any case.

Modify setrlimit() to stop including the stack gap in RLIMIT_STACK, it
is no longer necessary to do so. Remove the vm_stkgap field from struct
vmspace, as it is no longer needed.

At this point we could go further and randomize more of the stack
address bits, but I didn't attempt it.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

[To write out something that was discussed elsewere]
I think that if we do not care about PS_STRINGS, and more generally, if we do not care about layout and placement of the execve data passed by kernel, except to be compatible enough with crt1.o to find argv/env/auxv, it makes sense to stop treating main thread' stack specially at all. The whole notion of 'gap' and split between execve data (strings) and stack proper can be abandoned. Just map the stack as non-fixed normal mapping subject to normal ASLR shuffling, when 'gap' control is enabled.

It should avoid stomping on the sbrk grow area automatically, and I do not see any other reason to keep it at top. On the other hand, it would greatly simplify things conceptually, and provide some code cleanup as well.

In D33704#765894, @kib wrote:

[To write out something that was discussed elsewere]
I think that if we do not care about PS_STRINGS, and more generally, if we do not care about layout and placement of the execve data passed by kernel, except to be compatible enough with crt1.o to find argv/env/auxv, it makes sense to stop treating main thread' stack specially at all. The whole notion of 'gap' and split between execve data (strings) and stack proper can be abandoned. Just map the stack as non-fixed normal mapping subject to normal ASLR shuffling, when 'gap' control is enabled.

I'm working on this, will post an updated patch soon(tm).

It should avoid stomping on the sbrk grow area automatically, and I do not see any other reason to keep it at top. On the other hand, it would greatly simplify things conceptually, and provide some code cleanup as well.

The wrinkle here is that the bounds of the data segment are not known at the time that exec_new_vmspace() is called.

The wrinkle here is that the bounds of the data segment are not known at the time that exec_new_vmspace() is called.

Stack can be allocated later. When loading PIE binary, ELF activator puts it into low half of the user address space, so there should be a plenty space for stack after the image is loaded (and the map is fully initialized, including limits and MAP_ASLR flag).

Fully randomize the stack:

  • Change the struct proc p_psstrings member to p_stacktop and introduce PROC_PS_STRINGS() to get the psstrings address for a particular process.
  • Remove exec_stackgap() and related machinery.
  • Move stack creation out of exec_new_vmspace(), defer it until the data segment bounds are fixed. Introduce exec_map_stack() and update image activators to use it.
  1. Updating D33704: Rework the stack gap implementation #
  2. Enter a brief description of the changes included in this update.
  3. The first line is used as subject, next lines as comment. #
  4. If you intended to create a new revision, use:
  5. $ arc diff --create
sys/kern/imgact_elf.c
211 ↗(On Diff #101415)

Why is it set to RD? First, COMPAT_43 is really meaningful for a.out binaries. Second, why disable toggling the stack_gap if 43 is configured?

I am not questioning default values.

221 ↗(On Diff #101415)

Should it be renamed? E.g. stack_loc_randomize

224 ↗(On Diff #101415)

This should be edited.

sys/sys/proc.h
706 ↗(On Diff #101415)

I am not sure that this is right. Consider rfork(RFMEM), these two processes cannot have different stacktop. So wouldn't it make sense to put it either to vmspace or to vm_map (I think vmspace is better).

I understand that the value is copied on fork, but for me it looks as a wrong location to keep it.

sys/kern/imgact_elf.c
221 ↗(On Diff #101415)

I think it should yes. We have a .pie_enable already, maybe .stack_enable?

markj added inline comments.
sys/kern/imgact_elf.c
211 ↗(On Diff #101415)

I was looking at i386_setup_lcall_gate(), which references the elf32 sysent, and I couldn't convince myself that it applied only to a.out binaries, so it seemed safer to simply disallow toggling entirely if COMPAT_43 is defined. I'm not sure about this.

221 ↗(On Diff #101415)

So I wonder about the purpose of having a separate sysctl. When is it useful to disable only stack randomization, aside from debugging? If debugging is the only reason, then I would add a new sysctl under the debug OID instead, in kern_exec.c.

Right now the value of stack_gap is actually ignored: exec_map_stack() only looks at MAP_ASLR and the image activator is not involved in the decision. To maintain this sysctl, we need some interface for image activators to indicate the desired policy, e.g., we could add a MAP_ASLR_STACK flag.

sys/sys/proc.h
706 ↗(On Diff #101415)

Putting it in vmspace makes sense to me, and it simplifies the MFC.

markj marked an inline comment as done.
  • Move p_stacktop to the vmspace.
  • For ELF, map the stack after rtld.
sys/kern/imgact_elf.c
211 ↗(On Diff #101415)

I think i386_setup_lcall_gate() references elf sysent due to combination of two issues:

  • a.out sysent can be in module
  • LDT entry should be preconfigured for the new LDT (might be this can be relaxed indeed by doing it for a.out images only, but I think it is a waste of efforts)

I would be curious to see a real case of elf i386 binary using lcall $7,$0 instead of of int $0x80. Anyway, this cannot be an argument for keeping the control read-only.

221 ↗(On Diff #101415)

The knob allows to ease diagnostic, for instance if the new stack randomization appears to have some rough edges. I do not like moving it under debug subtree, kern.elfXX.aslr provides good structuring and grouping of all related knobs.

sys/kern/imgact_elf.c
221 ↗(On Diff #101415)

Ok, I will pass it along with a new map flag then.

markj marked 3 inline comments as done.
  • Add MAP_ASLR_STACK, set it in the ELF image activator when appriproriate preserve it upon fork, clear it upon exec
  • Rename kern.elfN.aslr.stack_gap to just "stack", make it read-write always
  • Update security.7
  • Remove support for per-process handling of stack randomization, in procctl(2), proccontrol(1), elfctl(1)
  • Handle holes in the procctl command array by returning EINVAL
  • Fix an apparent bug where procctl(2) did not reject negative command indices

Restore non-ASLR stack gap

sys/kern/imgact_elf.c
1320 ↗(On Diff #101483)

Did you intended to write maxv = sv->sv_usrstack - lim_max(td, RLIMIT_STACK) ?

sys/kern/kern_exec.c
1201

stack_off can be int

1220

It is somewhat cryptic to use VMFS_ANY_SPACE. Why not check for MAP_ASLR_STACK?

sys/vm/vm_map.h
297 ↗(On Diff #101483)

I would note in comment that the stacktop is not page aligned.

markj added inline comments.
sys/kern/imgact_elf.c
1320 ↗(On Diff #101483)

I removed it on purpose, since with stack randomization there is no reason in principle to exclude that region(?). But it should at least be conditional on MAP_ASLR_STACK.

markj marked an inline comment as done.

Handle feedback

Manual page change looks understandable to me.

This revision is now accepted and ready to land.Jan 15 2022, 12:43 AM
share/man/man7/security.7
1077 ↗(On Diff #101487)

stack address randomization

sys/kern/imgact_elf.c
208 ↗(On Diff #101487)

stack address randomization

danfe added inline comments.
share/man/man7/security.7
1071 ↗(On Diff #101487)

I think the better (proper?) spelling is hyphenated: 64-bit (here and below).

markj added inline comments.
share/man/man7/security.7
1071 ↗(On Diff #101487)

FWIW here's an old LLVM review I just found again with some discussion of userland PS_STRINGS use
https://reviews.llvm.org/D19027