Details

Reviewers

mw
emaste
kib
dgr_semihalf.com
manu
vangyzen

Group Reviewers

manpages

Commits

rG1811c1e957ee: exec: Reimplement stack address randomization
rG758d98debec4: exec: Remove the stack gap implementation
rG706f4a81a812: exec: Introduce the PROC_PS_STRINGS() macro
rG5a8413e779a2: setrlimit: Remove special handling for RLIMIT_STACK with a stack gap
rG3fc21fdd5f8a: sysent: Add a sv_psstringssz field to struct sysentvec
rGf75b1ff6e569: Revert "libthr: Use kern.stacktop for thread stack calculation."
rG1544f5add8c7: Revert "kern_exec: Add kern.stacktop sysctl."

Summary

The previous implementation mapped a stack below the shared page and
inserted a randomly sized gap between PS_STRINGS and other process
metadata, below which the main stack stack begins. This has a couple of
problems:

The size of the gap is included in the stack limit, which means that small RLIMIT_STACK values cannot be used.
The gap is mapped, and thus mlockall(MCL_CURRENT) will wire it.

Fix the problem by moving the gap before PS_STRINGS. This means that
the gap region is unmapped and in particular is not covered by the stack
mapping. Thus RLIMIT_STACK no longer includes the size of the gap.

As a consequence, the location of PS_STRINGS is no longer fixed. If no
stack gap is configured then it is located in the same place as before.
Very old binaries depend on the location being fixed, but they predate
the introduction of any surviving 64-bit platform ports. Thus,
re-enable the stack gap only on 64-bit platforms (ASLR is not enabled
for elf32 by default anyway) and disallow enabling it for 32-bit
executables if COMPAT_43 is configured.

Since the PS_STRINGS location is now variable, add a field to struct
proc to store it. This field is copied on fork and initialized by image
activators.

Remove the newly KERN_STACKTOP sysctl and modify KERN_USRSTACK to return
the true top of the stack. This effectively reverts commits
78df56ccfcb4 ("libthr: Use kern.stacktop for thread stack calculation.")
and
a97d697122da ("kern_exec: Add kern.stacktop sysctl.")
The KERN_STACKTOP sysctl does not provide compatibility for old copies
of libthr.so in any case.

Modify setrlimit() to stop including the stack gap in RLIMIT_STACK, it
is no longer necessary to do so. Remove the vm_stkgap field from struct
vmspace, as it is no longer needed.

At this point we could go further and randomize more of the stack
address bits, but I didn't attempt it.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 43926
Build 40814: arc lint + arc unit

Event Timeline

markj created this revision.Dec 30 2021, 10:41 PM

Herald added a reviewer: manu. · View Herald TranscriptDec 30 2021, 10:41 PM

Herald added subscribers: jhibbits, andrew, imp. · View Herald Transcript

markj requested review of this revision.Dec 30 2021, 10:41 PM

Harbormaster completed remote builds in B43654: Diff 100769.Dec 30 2021, 10:41 PM

markj added a parent revision: D33703: exec: Simplify sv_copyout_strings implementations a bit.Dec 30 2021, 10:41 PM

kd added a subscriber: kd.Jan 10 2022, 9:54 AM

markj added a reviewer: vangyzen.Jan 12 2022, 4:14 PM

vangyzen added a subscriber: vangyzen.Jan 12 2022, 4:22 PM

dab added a subscriber: dab.Jan 12 2022, 4:25 PM

[To write out something that was discussed elsewere]
I think that if we do not care about PS_STRINGS, and more generally, if we do not care about layout and placement of the execve data passed by kernel, except to be compatible enough with crt1.o to find argv/env/auxv, it makes sense to stop treating main thread' stack specially at all. The whole notion of 'gap' and split between execve data (strings) and stack proper can be abandoned. Just map the stack as non-fixed normal mapping subject to normal ASLR shuffling, when 'gap' control is enabled.

It should avoid stomping on the sbrk grow area automatically, and I do not see any other reason to keep it at top. On the other hand, it would greatly simplify things conceptually, and provide some code cleanup as well.

akumar3_isilon.com added a subscriber: akumar3_isilon.com.Jan 12 2022, 6:09 PM

Rebase

Harbormaster completed remote builds in B43892: Diff 101382.Jan 12 2022, 6:44 PM

In D33704#765894, @kib wrote:

[To write out something that was discussed elsewere]
I think that if we do not care about PS_STRINGS, and more generally, if we do not care about layout and placement of the execve data passed by kernel, except to be compatible enough with crt1.o to find argv/env/auxv, it makes sense to stop treating main thread' stack specially at all. The whole notion of 'gap' and split between execve data (strings) and stack proper can be abandoned. Just map the stack as non-fixed normal mapping subject to normal ASLR shuffling, when 'gap' control is enabled.

I'm working on this, will post an updated patch soon(tm).

It should avoid stomping on the sbrk grow area automatically, and I do not see any other reason to keep it at top. On the other hand, it would greatly simplify things conceptually, and provide some code cleanup as well.

The wrinkle here is that the bounds of the data segment are not known at the time that exec_new_vmspace() is called.

In D33704#765960, @markj wrote:

The wrinkle here is that the bounds of the data segment are not known at the time that exec_new_vmspace() is called.

Stack can be allocated later. When loading PIE binary, ELF activator puts it into low half of the user address space, so there should be a plenty space for stack after the image is loaded (and the map is fully initialized, including limits and MAP_ASLR flag).

Fully randomize the stack:

Change the struct proc p_psstrings member to p_stacktop and introduce PROC_PS_STRINGS() to get the psstrings address for a particular process.
Remove exec_stackgap() and related machinery.
Move stack creation out of exec_new_vmspace(), defer it until the data segment bounds are fixed. Introduce exec_map_stack() and update image activators to use it.

Updating D33704: Rework the stack gap implementation #
Enter a brief description of the changes included in this update.
The first line is used as subject, next lines as comment. #
If you intended to create a new revision, use:
$ arc diff --create

Harbormaster completed remote builds in B43902: Diff 101415.Jan 13 2022, 4:09 PM

kib added inline comments.Jan 13 2022, 4:48 PM

sys/kern/imgact_elf.c
211	Why is it set to RD? First, COMPAT_43 is really meaningful for a.out binaries. Second, why disable toggling the stack_gap if 43 is configured? I am not questioning default values.
221	Should it be renamed? E.g. stack_loc_randomize
224	This should be edited.
sys/sys/proc.h
706 ↗	(On Diff #101415)	I am not sure that this is right. Consider rfork(RFMEM), these two processes cannot have different stacktop. So wouldn't it make sense to put it either to vmspace or to vm_map (I think vmspace is better). I understand that the value is copied on fork, but for me it looks as a wrong location to keep it.

emaste added inline comments.Jan 13 2022, 8:27 PM

sys/kern/imgact_elf.c
221	I think it should yes. We have a `.pie_enable` already, maybe `.stack_enable`?

markj marked an inline comment as done.Jan 14 2022, 5:23 PM

markj added inline comments.

sys/kern/imgact_elf.c
211	I was looking at i386_setup_lcall_gate(), which references the elf32 sysent, and I couldn't convince myself that it applied only to a.out binaries, so it seemed safer to simply disallow toggling entirely if COMPAT_43 is defined. I'm not sure about this.
221	So I wonder about the purpose of having a separate sysctl. When is it useful to disable only stack randomization, aside from debugging? If debugging is the only reason, then I would add a new sysctl under the debug OID instead, in kern_exec.c. Right now the value of `stack_gap` is actually ignored: exec_map_stack() only looks at MAP_ASLR and the image activator is not involved in the decision. To maintain this sysctl, we need some interface for image activators to indicate the desired policy, e.g., we could add a MAP_ASLR_STACK flag.
sys/sys/proc.h
706 ↗	(On Diff #101415)	Putting it in vmspace makes sense to me, and it simplifies the MFC.

Move p_stacktop to the vmspace.
For ELF, map the stack after rtld.

Harbormaster completed remote builds in B43926: Diff 101474.Jan 14 2022, 5:23 PM

kib added inline comments.Jan 14 2022, 5:53 PM

sys/kern/imgact_elf.c
211	I think i386_setup_lcall_gate() references elf sysent due to combination of two issues: a.out sysent can be in module LDT entry should be preconfigured for the new LDT (might be this can be relaxed indeed by doing it for a.out images only, but I think it is a waste of efforts) I would be curious to see a real case of elf i386 binary using lcall $7,$0 instead of of int $0x80. Anyway, this cannot be an argument for keeping the control read-only.
221	The knob allows to ease diagnostic, for instance if the new stack randomization appears to have some rough edges. I do not like moving it under debug subtree, kern.elfXX.aslr provides good structuring and grouping of all related knobs.

markj added inline comments.Jan 14 2022, 6:10 PM

sys/kern/imgact_elf.c
221	Ok, I will pass it along with a new map flag then.

Add MAP_ASLR_STACK, set it in the ELF image activator when appriproriate preserve it upon fork, clear it upon exec
Rename kern.elfN.aslr.stack_gap to just "stack", make it read-write always
Update security.7
Remove support for per-process handling of stack randomization, in procctl(2), proccontrol(1), elfctl(1)
Handle holes in the procctl command array by returning EINVAL
Fix an apparent bug where procctl(2) did not reject negative command indices

Herald added a reviewer: manpages. · View Herald TranscriptJan 14 2022, 7:57 PM

Harbormaster completed remote builds in B43928: Diff 101477.Jan 14 2022, 7:57 PM

Restore non-ASLR stack gap

Harbormaster completed remote builds in B43932: Diff 101483.Jan 14 2022, 8:56 PM

kib added inline comments.Jan 14 2022, 9:28 PM

sys/kern/imgact_elf.c
1334	Did you intended to write `maxv = sv->sv_usrstack - lim_max(td, RLIMIT_STACK)` ?
sys/kern/kern_exec.c
1196	stack_off can be int
1247	It is somewhat cryptic to use VMFS_ANY_SPACE. Why not check for MAP_ASLR_STACK?
sys/vm/vm_map.h
296	I would note in comment that the stacktop is not page aligned.

markj marked 4 inline comments as done.Jan 14 2022, 11:21 PM

markj added inline comments.

sys/kern/imgact_elf.c
1334	I removed it on purpose, since with stack randomization there is no reason in principle to exclude that region(?). But it should at least be conditional on MAP_ASLR_STACK.