Details

Reviewers

markj
alc
emaste

Commits

rS343964: Implement Address Space Layout Randomization (ASLR)
rS343607: Reserve a bit in the FreeBSD feature control note for marking the

Summary

With this change, randomization is applied to all non-fixed mappings.
By randomization I mean the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours
the superpage attributes.

The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation
prevents entropy injection. It is trivial to implement a strong mode
where failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.

I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation.
The current amount is controlled by aslr_pages_rnd.

To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, the locality is implemented for
anonymous private mappings, which are automatically grouped until
fragmentation kicks in. The initial location for the anon group range
is, of course, randomized.

The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
the small AS architectures (funny that 32bits is considered small).
This is tied with the question of following an application's hint
about the mmap(2) base address. Testing shows that ignoring the hint
does not affect the function of common applications, but I would expect
more demanding code could break. By default sbrk is preserved and mmap
hints are satisfied, which can be changed by using the
kern.elf{32,64}.aslr_care_sbrk sysctl.

Stack gap, W^X, shared page randomization, KASLR and other techniques
are explicitely out of scope of this work.

The paxtest results for the run with the patch applied and aggresively
tuned can be seen at the https://www.kib.kiev.ua/kib/aslr/paxtest.log .
For comparision, the run on Fedora 23 on the same machine is at
https://www.kib.kiev.ua/kib/aslr/fedora.log .

ASLR is enabled on per-ABI basis, and currently it is only enabled on
native i386 and amd64 (including compat 32bit) ABIs. I expect to test
and enable ASLR for armv6 and arm64 as well, later.

The procctl(2) control for ASLR is implemented, by I have not provided
a userspace wrapper around the syscall. In fact, the most reasonable
control needed is per-image and not per-process, but we have no
tradition to put the kernel-read attributes into the extattrs of binary,
so I am still pondering that part and this also explains the non-written
tool.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Add parsing of the feature control note both to the kernel loader and to the dynamic linker.
Allocate a bit in the feature control mask for disabling ASLR for the image.

Harbormaster completed remote builds in B21141: Diff 50870.Nov 21 2018, 11:58 AM

Update after the feature control note parsing bits were merged.

Harbormaster completed remote builds in B21186: Diff 51030.Nov 23 2018, 11:39 PM

markj added inline comments.Nov 24 2018, 10:30 PM

sys/vm/vm_map.c
1648 ↗	(On Diff #50504)	I think you need to initialize anon_loc to 0 in _vm_map_init() for this to work as intended? Even then, when curr_min_addr == 0, the amount of randomization applied to the initial anon mapping is quite small. For PIEs, libraries are loaded after the (random) base load address, but otherwise, the set of possible initial addresses is quite small.

kib marked an inline comment as done.Nov 25 2018, 10:05 AM

kib added inline comments.

sys/vm/vm_map.c
1648 ↗	(On Diff #50504)	I fixed several more bugs with anon_loc, e.g. copying it on fork. Also I added explicit setting of anon_loc on execution of the ELF binary in 'hard' mode, similar to the interpreter base address selection. I am not sure what do you mean by the amount of randomization. Either rnd is applied or not, if it is applied, then the amount of the entropy is guaranteed to be some.

Some fixes for anon coalescing.

Rework selection of coalescing base address after failure in vm_map_find(), trying to randomize it first.
Fix map init and fork by zeroing and copying anon_loc.
Apply same randomization to anon_loc as for the interpreter if ASLR is enabled.

Harbormaster completed remote builds in B21196: Diff 51086.Nov 25 2018, 10:09 AM

markj added inline comments.Nov 25 2018, 7:00 PM

sys/vm/vm_map.c
1614 ↗	(On Diff #51086)	IMO it would be clearer if you rename "anon" to "coalesce".
1683 ↗	(On Diff #51086)	I guess you can just write `anon = update_anon` instead.
1700 ↗	(On Diff #51086)	Does it make sense to update anon_loc if `find_space == VMFS_NO_SPACE`?
1648 ↗	(On Diff #50504)	I mean that if the curr_min_addr is not randomized, the amount of entropy added is quite small. In the latest version this is still a problem for non-anonymous mappings in non-PIE binaries: the starting min address is constant (vm_daddr + lim(RLIMIT_DATA)), so the load address of libc.so, for example, can be guessed without much work. I am not sure if this is really a significant problem when the executable's address is not randomized, however.

kib marked 4 inline comments as done.Nov 25 2018, 7:43 PM

kib added inline comments.

sys/vm/vm_map.c
1648 ↗	(On Diff #50504)	I am still not sure about this. Do you mean that the amount of entropy allowed by the aslr_pages_rnd_XXX arrays is too small ?

Rename anon to coalesce.
Simplify restart when coalesce failed, Inline the last helper.
goto done is never done for KERN_SUCCESS.
Do not update anon_loc for VMFS_NO_SPACE.

Harbormaster completed remote builds in B21200: Diff 51092.Nov 25 2018, 7:45 PM

markj added inline comments.Nov 25 2018, 8:36 PM

sys/vm/vm_map.c
1648 ↗	(On Diff #50504)	Indeed, it does not provide nearly as much entropy as the initial randomization of et_dyn_addr for PIEs or anon_loc. Consider that libc.so is mapped with VMFS_OPTIMAL_SPACE, so we will set addr += (arc4random() % 0x10) 0x200000; For a non-PIE on amd64 this means that libc.so will get loaded somewhere in [0x800000000, 0x800200000], so the entropy added is quite minimal. PIEs do not have this problem.

markj added inline comments.Nov 25 2018, 10:04 PM

sys/vm/vm_map.c
1658 ↗	(On Diff #51092)	Don't we need to reset curr_min_addr here too?

kib marked 2 inline comments as done.Nov 25 2018, 11:09 PM

kib added inline comments.

sys/vm/vm_map.c
1658 ↗	(On Diff #51092)	The intent is to make two normal passes without coalescing. Second pass resets curr_min_addr.
1648 ↗	(On Diff #50504)	Yes, this is how I want to keep it now, by disturbing the normal layout as minimal as possible for PoC. On the other hand, since PIE base, ld.elf load address, and now initial anon base are already 'hard' randomized, might be it is indeed does not make sense to keep that part of entropy low. In fact I think we will see after another exp run.

markj added inline comments.Nov 26 2018, 1:58 PM

sys/vm/vm_map.c
1589 ↗	(On Diff #51092)	I noticed that we are coalescing mappings in pipe_map. Is there any advantage to be gained by doing this? As a downside, I think the coalescing increases page table usage, especially since most pipe_map mappings are the same size.
1658 ↗	(On Diff #51092)	mm, we reset curr_min_addr only if en_aslr is set though.

Disable coalescing on submaps.
Remove bogus try reset on retry with coalescing disabled, since curr_min_addr recalculation depends on try == 2.

Harbormaster completed remote builds in B21230: Diff 51226.Nov 27 2018, 6:38 PM

markj added inline comments.Dec 3 2018, 8:39 PM

sys/vm/vm_map.c
1613 ↗	(On Diff #51226)	"When creating an anonymous mapping, try clustering with an existing anonymous mapping first."
1617 ↗	(On Diff #51226)	The text is a bit misleading since coalesce == false doesn't imply that coalescing failed. How about: "We make up to two attempts to find address space for a given find_space value. The first attempt may apply randomization or may cluster with an existing anonymous mapping. If this first attempt fails, perform a first-fit search of the available address space."
1651 ↗	(On Diff #51226)	Why is it necessary to set curr_min_addr here? We know try == 1, so after following the goto we will assign to curr_min_addr again.
2003 ↗	(On Diff #51226)	MAP_IS_SUB_MAP, for consistency with MAP_ENTRY_IS_SUB_MAP?
2028 ↗	(On Diff #51226)	Did you mean to clear the flag here?
1493 ↗	(On Diff #50504)	Sorry for being indecisive. Thinking some more, I think "clustering" actually makes more sense than "coalescing." Coalescing is the process of bringing together multiple entities that were previously separate, but in this case, the anonymous mappings are not separate to begin with.

Reword comments.
Rename variable and symbol.
Remove dup assignment.

Harbormaster completed remote builds in B21342: Diff 51553.Dec 3 2018, 9:15 PM

I suspect that the anon clustering will interact suboptimally with the jemalloc behaviour discussed in D16501 and elsewhere. In particular, jemalloc will unmap small regions of the address space, leaving holes. With clustering, those holes won't be reused since we no longer perform a first-fit search. IMO it would be worth reconsidering how anon_loc works; rather than advancing it after each successful clustering, maybe it should be constant after the initialization to a non-zero value, so that we attempt to fill holes with new mappings before extending the clustered region further. I do not think this needs to be done prior to commit though.

sys/vm/vm_map.c
1634 ↗	(On Diff #51553)	"... or to cluster with an existing mapping."
1644 ↗	(On Diff #51553)	I'd consider calling this a "gap" instead, here and in the code (instead of "preserve").

preserve->gap.
Comment update.

Harbormaster completed remote builds in B21344: Diff 51555.Dec 3 2018, 9:59 PM

markj added inline comments.Dec 3 2018, 10:01 PM

sys/vm/vm_map.c
1589 ↗	(On Diff #51555)	I don't think we should update anon_loc if max_addr is specified (e.g., MAP_32BIT was passed).

Disable clustering if map is limited by max_addr.

Harbormaster completed remote builds in B21349: Diff 51561.Dec 3 2018, 11:54 PM

markj added inline comments.Dec 15 2018, 8:03 PM

sys/kern/imgact_elf.c
994 ↗	(On Diff #51561)	"Decide whether to"
sys/vm/vm_map.c
1652 ↗	(On Diff #51561)	I think it would be worth adding a counter for vm_map_findspace() failures, at least for the en_aslr case.

kib marked 2 inline comments as done.Dec 16 2018, 2:23 AM

kib added inline comments.

sys/vm/vm_map.c
1652 ↗	(On Diff #51561)	I added the counter for try == 2 restarts. IMO iti is of limited usefulness because it is global, but I do not think it is worth adding the per-vmspace counters and the whole required infrastructure for it.

Grammar.
Add global restart counter.

Harbormaster completed remote builds in B21596: Diff 52070.Dec 16 2018, 2:23 AM

rozhuk.im-gmail.com added a subscriber: rozhuk.im-gmail.com.Jan 23 2019, 9:33 PM

emaste added inline comments.Jan 25 2019, 9:39 PM

sys/kern/imgact_elf.c
147 ↗	(On Diff #50504)	I don't follow what you mean by the "entire executable image."
sys/sys/elf_common.h
765–766 ↗	(On Diff #52070)	For ASLR having only opt-out is reasonable, IMO. For other bits (max_prot, W^X, etc.) initially we probably want both opt-in and opt-out, as there may be some time before we can enable features by default.
sys/vm/vm_map.c
1493 ↗	(On Diff #50504)	@markj are you suggesting even committing the coalescing separately?
sys/vm/vm_map.h
216 ↗	(On Diff #52070)	Let's comment this here too

Update the patch with the fixes made after Peter' testing.

Harbormaster completed remote builds in B22275: Diff 53435.Jan 30 2019, 7:13 PM

emaste added inline comments.Jan 30 2019, 8:11 PM

sys/sys/elf_common.h
766 ↗	(On Diff #53435)	Discussed on IRC, suggest either `NT_FREEBSD_FCTL_ASLR_DIS` or `NT_FREEBSD_FCTL_ASLR_DISABLE` - doesn't matter so much for this in isolation but want something that will have a regular pattern when we add `MAX_PROT` and other feature bits.

Rename the bit.

Harbormaster completed remote builds in B22279: Diff 53440.Jan 30 2019, 8:16 PM

emaste removed a reviewer: emaste.Jan 30 2019, 8:22 PM

emaste added inline comments.Jan 30 2019, 8:26 PM

sys/sys/elf_common.h
766 ↗	(On Diff #53440)	Flag name LGTM

emaste added a reviewer: emaste.Jan 30 2019, 9:13 PM

emaste added inline comments.Jan 31 2019, 1:25 AM

sys/kern/imgact_elf.c
142 ↗	(On Diff #53440)	"enable" is more common in the tree than "enabled" (27 to 3 on my laptop) and IMO preferable. Also, should we have a `kern.elfN.aslr.*` tree? So e.g. `kern.elf64.aslr.enable`, perhaps `kern.elf64.aslr.pie_enable`?

Tweak sysctls.

Harbormaster completed remote builds in B22288: Diff 53453.Jan 31 2019, 2:04 AM

This revision was not accepted when it landed; it landed in state Needs Review.Jan 31 2019, 3:45 PM

Closed by commit rS343607: Reserve a bit in the FreeBSD feature control note for marking the (authored by kib). · Explain Why

This revision was automatically updated to reflect the committed changes.

kib reopened this revision.Jan 31 2019, 4:25 PM

Regen patch after the bit definition was committed.

Harbormaster completed remote builds in B22297: Diff 53480.Jan 31 2019, 4:32 PM

gor_clogic.com.ua added a subscriber: gor_clogic.com.ua.Feb 7 2019, 3:57 PM

This revision was not accepted when it landed; it landed in state Needs Review.Feb 10 2019, 5:19 PM

Closed by commit rS343964: Implement Address Space Layout Randomization (ASLR) (authored by kib). · Explain Why

This revision was automatically updated to reflect the committed changes.

dougm added a subscriber: dougm.Mar 23 2019, 1:45 AM

dougm added inline comments.

head/sys/vm/vm_map.c
1665	If max_addr != 0, you've guaranteed that addr + length <= max_addr, but after this modification to addr, the guarantee won't hold and you may return an address beyond max_addr. Or so it seems to me.

kib added inline comments.Mar 23 2019, 9:49 AM

head/sys/vm/vm_map.c
1665	You mean, in the situation where vm_map_maxaddr(map) > max_addr. Please see D19688.

akumar3_isilon.com added a subscriber: akumar3_isilon.com.Oct 12 2021, 3:13 PM

ASLR
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 53745

head/sys/amd64/amd64/elf_machdep.c

head/sys/arm/arm/elf_machdep.c

head/sys/compat/freebsd32/freebsd32_misc.c

head/sys/compat/ia32/ia32_sysvec.c

head/sys/i386/i386/elf_machdep.c

head/sys/kern/imgact_elf.c

head/sys/kern/kern_exec.c

head/sys/kern/kern_fork.c

head/sys/kern/kern_procctl.c

head/sys/sys/imgact.h

head/sys/sys/proc.h

head/sys/sys/procctl.h

head/sys/sys/sysent.h

head/sys/vm/vm_map.h

head/sys/vm/vm_map.c

head/usr.bin/proccontrol/proccontrol.c

ASLRClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 53745

head/sys/amd64/amd64/elf_machdep.c

head/sys/arm/arm/elf_machdep.c

head/sys/compat/freebsd32/freebsd32_misc.c

head/sys/compat/ia32/ia32_sysvec.c

head/sys/i386/i386/elf_machdep.c

head/sys/kern/imgact_elf.c

head/sys/kern/kern_exec.c

head/sys/kern/kern_fork.c

head/sys/kern/kern_procctl.c

head/sys/sys/imgact.h

head/sys/sys/proc.h

head/sys/sys/procctl.h

head/sys/sys/sysent.h

head/sys/vm/vm_map.h

head/sys/vm/vm_map.c

head/usr.bin/proccontrol/proccontrol.c

ASLR
ClosedPublic
Actions

Revision Contents
Changeset List