Page MenuHomeFreeBSD

ASLR
ClosedPublic

Authored by kib on Mar 10 2016, 3:55 PM.

Details

Summary

With this change, randomization is applied to all non-fixed mappings.
By randomization I mean the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours
the superpage attributes.

The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation
prevents entropy injection. It is trivial to implement a strong mode
where failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.

I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation.
The current amount is controlled by aslr_pages_rnd.

To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, the locality is implemented for
anonymous private mappings, which are automatically grouped until
fragmentation kicks in. The initial location for the anon group range
is, of course, randomized.

The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
the small AS architectures (funny that 32bits is considered small).
This is tied with the question of following an application's hint
about the mmap(2) base address. Testing shows that ignoring the hint
does not affect the function of common applications, but I would expect
more demanding code could break. By default sbrk is preserved and mmap
hints are satisfied, which can be changed by using the
kern.elf{32,64}.aslr_care_sbrk sysctl.

Stack gap, W^X, shared page randomization, KASLR and other techniques
are explicitely out of scope of this work.

The paxtest results for the run with the patch applied and aggresively
tuned can be seen at the https://www.kib.kiev.ua/kib/aslr/paxtest.log .
For comparision, the run on Fedora 23 on the same machine is at
https://www.kib.kiev.ua/kib/aslr/fedora.log .

ASLR is enabled on per-ABI basis, and currently it is only enabled on
native i386 and amd64 (including compat 32bit) ABIs. I expect to test
and enable ASLR for armv6 and arm64 as well, later.

The procctl(2) control for ASLR is implemented, by I have not provided
a userspace wrapper around the syscall. In fact, the most reasonable
control needed is per-image and not per-process, but we have no
tradition to put the kernel-read attributes into the extattrs of binary,
so I am still pondering that part and this also explains the non-written
tool.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
kib updated this revision to Diff 50870.Nov 21 2018, 11:58 AM

Add parsing of the feature control note both to the kernel loader and to the dynamic linker.
Allocate a bit in the feature control mask for disabling ASLR for the image.

kib updated this revision to Diff 51030.Nov 23 2018, 11:39 PM

Update after the feature control note parsing bits were merged.

markj added inline comments.Nov 24 2018, 10:30 PM
sys/vm/vm_map.c
1648 ↗(On Diff #50504)

I think you need to initialize anon_loc to 0 in _vm_map_init() for this to work as intended?

Even then, when curr_min_addr == 0, the amount of randomization applied to the initial anon mapping is quite small. For PIEs, libraries are loaded after the (random) base load address, but otherwise, the set of possible initial addresses is quite small.

kib marked an inline comment as done.Nov 25 2018, 10:05 AM
kib added inline comments.
sys/vm/vm_map.c
1648 ↗(On Diff #50504)

I fixed several more bugs with anon_loc, e.g. copying it on fork. Also I added explicit setting of anon_loc on execution of the ELF binary in 'hard' mode, similar to the interpreter base address selection.

I am not sure what do you mean by the amount of randomization. Either rnd is applied or not, if it is applied, then the amount of the entropy is guaranteed to be some.

kib updated this revision to Diff 51086.Nov 25 2018, 10:07 AM

Some fixes for anon coalescing.

  • Rework selection of coalescing base address after failure in vm_map_find(), trying to randomize it first.
  • Fix map init and fork by zeroing and copying anon_loc.
  • Apply same randomization to anon_loc as for the interpreter if ASLR is enabled.
markj added inline comments.Nov 25 2018, 7:00 PM
sys/vm/vm_map.c
1614 ↗(On Diff #51086)

IMO it would be clearer if you rename "anon" to "coalesce".

1683 ↗(On Diff #51086)

I guess you can just write anon = update_anon instead.

1700 ↗(On Diff #51086)

Does it make sense to update anon_loc if find_space == VMFS_NO_SPACE?

1648 ↗(On Diff #50504)

I mean that if the curr_min_addr is not randomized, the amount of entropy added is quite small. In the latest version this is still a problem for non-anonymous mappings in non-PIE binaries: the starting min address is constant (vm_daddr + lim(RLIMIT_DATA)), so the load address of libc.so, for example, can be guessed without much work. I am not sure if this is really a significant problem when the executable's address is not randomized, however.

kib marked 4 inline comments as done.Nov 25 2018, 7:43 PM
kib added inline comments.
sys/vm/vm_map.c
1648 ↗(On Diff #50504)

I am still not sure about this. Do you mean that the amount of entropy allowed by the aslr_pages_rnd_XXX arrays is too small ?

kib updated this revision to Diff 51092.Nov 25 2018, 7:44 PM

Rename anon to coalesce.
Simplify restart when coalesce failed, Inline the last helper.
goto done is never done for KERN_SUCCESS.
Do not update anon_loc for VMFS_NO_SPACE.

markj added inline comments.Nov 25 2018, 8:36 PM
sys/vm/vm_map.c
1648 ↗(On Diff #50504)

Indeed, it does not provide nearly as much entropy as the initial randomization of et_dyn_addr for PIEs or anon_loc. Consider that libc.so is mapped with VMFS_OPTIMAL_SPACE, so we will set

*addr += (arc4random() % 0x10) * 0x200000;

For a non-PIE on amd64 this means that libc.so will get loaded somewhere in [0x800000000, 0x800200000], so the entropy added is quite minimal. PIEs do not have this problem.

markj added inline comments.Nov 25 2018, 10:04 PM
sys/vm/vm_map.c
1658 ↗(On Diff #51092)

Don't we need to reset curr_min_addr here too?

kib marked 2 inline comments as done.Nov 25 2018, 11:09 PM
kib added inline comments.
sys/vm/vm_map.c
1648 ↗(On Diff #50504)

Yes, this is how I want to keep it now, by disturbing the normal layout as minimal as possible for PoC. On the other hand, since PIE base, ld.elf load address, and now initial anon base are already 'hard' randomized, might be it is indeed does not make sense to keep that part of entropy low. In fact I think we will see after another exp run.

1658 ↗(On Diff #51092)

The intent is to make two normal passes without coalescing. Second pass resets curr_min_addr.

markj added inline comments.Nov 26 2018, 1:58 PM
sys/vm/vm_map.c
1589 ↗(On Diff #51092)

I noticed that we are coalescing mappings in pipe_map. Is there any advantage to be gained by doing this? As a downside, I think the coalescing increases page table usage, especially since most pipe_map mappings are the same size.

1658 ↗(On Diff #51092)

mm, we reset curr_min_addr only if en_aslr is set though.

kib updated this revision to Diff 51226.Nov 27 2018, 6:37 PM

Disable coalescing on submaps.
Remove bogus try reset on retry with coalescing disabled, since curr_min_addr recalculation depends on try == 2.

markj added inline comments.Dec 3 2018, 8:39 PM
sys/vm/vm_map.c
1493 ↗(On Diff #50504)

Sorry for being indecisive. Thinking some more, I think "clustering" actually makes more sense than "coalescing." Coalescing is the process of bringing together multiple entities that were previously separate, but in this case, the anonymous mappings are not separate to begin with.

1613 ↗(On Diff #51226)

"When creating an anonymous mapping, try clustering with an existing anonymous mapping first."

1617 ↗(On Diff #51226)

The text is a bit misleading since coalesce == false doesn't imply that coalescing failed. How about:

"We make up to two attempts to find address space for a given find_space value. The first attempt may apply randomization or may cluster with an existing anonymous mapping. If this first attempt fails, perform a first-fit search of the available address space."

1651 ↗(On Diff #51226)

Why is it necessary to set curr_min_addr here? We know try == 1, so after following the goto we will assign to curr_min_addr again.

2003 ↗(On Diff #51226)

MAP_IS_SUB_MAP, for consistency with MAP_ENTRY_IS_SUB_MAP?

2028 ↗(On Diff #51226)

Did you mean to clear the flag here?

kib updated this revision to Diff 51553.Dec 3 2018, 9:13 PM
kib marked 5 inline comments as done.

Reword comments.
Rename variable and symbol.
Remove dup assignment.

markj accepted this revision.Dec 3 2018, 9:43 PM

I suspect that the anon clustering will interact suboptimally with the jemalloc behaviour discussed in D16501 and elsewhere. In particular, jemalloc will unmap small regions of the address space, leaving holes. With clustering, those holes won't be reused since we no longer perform a first-fit search. IMO it would be worth reconsidering how anon_loc works; rather than advancing it after each successful clustering, maybe it should be constant after the initialization to a non-zero value, so that we attempt to fill holes with new mappings before extending the clustered region further. I do not think this needs to be done prior to commit though.

sys/vm/vm_map.c
1634 ↗(On Diff #51553)

"... or to cluster with an existing mapping."

1644 ↗(On Diff #51553)

I'd consider calling this a "gap" instead, here and in the code (instead of "preserve").

kib updated this revision to Diff 51555.Dec 3 2018, 9:58 PM
kib marked 2 inline comments as done.

preserve->gap.
Comment update.

markj added inline comments.Dec 3 2018, 10:01 PM
sys/vm/vm_map.c
1589 ↗(On Diff #51555)

I don't think we should update anon_loc if max_addr is specified (e.g., MAP_32BIT was passed).

kib updated this revision to Diff 51561.Dec 3 2018, 11:52 PM
kib marked an inline comment as done.

Disable clustering if map is limited by max_addr.

markj added inline comments.Dec 15 2018, 8:03 PM
sys/kern/imgact_elf.c
994 ↗(On Diff #51561)

"Decide whether to"

sys/vm/vm_map.c
1652 ↗(On Diff #51561)

I think it would be worth adding a counter for vm_map_findspace() failures, at least for the en_aslr case.

kib marked 2 inline comments as done.Dec 16 2018, 2:23 AM
kib added inline comments.
sys/vm/vm_map.c
1652 ↗(On Diff #51561)

I added the counter for try == 2 restarts.

IMO iti is of limited usefulness because it is global, but I do not think it is worth adding the per-vmspace counters and the whole required infrastructure for it.

kib updated this revision to Diff 52070.Dec 16 2018, 2:23 AM
kib marked an inline comment as done.

Grammar.
Add global restart counter.

emaste added inline comments.Jan 25 2019, 9:39 PM
sys/kern/imgact_elf.c
147 ↗(On Diff #50504)

I don't follow what you mean by the "entire executable image."

sys/sys/elf_common.h
765–766 ↗(On Diff #52070)

For ASLR having only opt-out is reasonable, IMO.

For other bits (max_prot, W^X, etc.) initially we probably want both opt-in and opt-out, as there may be some time before we can enable features by default.

sys/vm/vm_map.c
1493 ↗(On Diff #50504)

@markj are you suggesting even committing the coalescing separately?

sys/vm/vm_map.h
216 ↗(On Diff #52070)

Let's comment this here too

kib updated this revision to Diff 53435.Jan 30 2019, 7:13 PM

Update the patch with the fixes made after Peter' testing.

emaste added inline comments.Jan 30 2019, 8:11 PM
sys/sys/elf_common.h
766 ↗(On Diff #53435)

Discussed on IRC, suggest either NT_FREEBSD_FCTL_ASLR_DIS or NT_FREEBSD_FCTL_ASLR_DISABLE - doesn't matter so much for this in isolation but want something that will have a regular pattern when we add MAX_PROT and other feature bits.

kib updated this revision to Diff 53440.Jan 30 2019, 8:16 PM

Rename the bit.

emaste removed a reviewer: emaste.Jan 30 2019, 8:22 PM
emaste added inline comments.Jan 30 2019, 8:26 PM
sys/sys/elf_common.h
766 ↗(On Diff #53440)

Flag name LGTM

emaste added inline comments.Jan 31 2019, 1:25 AM
sys/kern/imgact_elf.c
142 ↗(On Diff #53440)

"enable" is more common in the tree than "enabled" (27 to 3 on my laptop) and IMO preferable.

Also, should we have a kern.elfN.aslr.* tree? So e.g. kern.elf64.aslr.enable, perhaps kern.elf64.aslr.pie_enable?

kib updated this revision to Diff 53453.Jan 31 2019, 2:03 AM
kib marked an inline comment as done.

Tweak sysctls.

This revision was not accepted when it landed; it landed in state Needs Review.Jan 31 2019, 3:45 PM
This revision was automatically updated to reflect the committed changes.
kib reopened this revision.Jan 31 2019, 4:25 PM
kib updated this revision to Diff 53480.Jan 31 2019, 4:32 PM

Regen patch after the bit definition was committed.

This revision was not accepted when it landed; it landed in state Needs Review.Feb 10 2019, 5:19 PM
Closed by commit rS343964: Implement Address Space Layout Randomization (ASLR) (authored by kib, committed by ). · Explain Why
This revision was automatically updated to reflect the committed changes.
dougm added a subscriber: dougm.Mar 23 2019, 1:45 AM
dougm added inline comments.
head/sys/vm/vm_map.c
1665

If max_addr != 0, you've guaranteed that *addr + length <= max_addr, but after this modification to *addr, the guarantee won't hold and you may return an address beyond max_addr. Or so it seems to me.

kib added inline comments.Mar 23 2019, 9:49 AM
head/sys/vm/vm_map.c
1665

You mean, in the situation where vm_map_maxaddr(map) > max_addr. Please see D19688.