Page MenuHomeFreeBSD


Authored by kib on Mar 10 2016, 3:55 PM.
Referenced Files
F72028444: D5603.id53745.diff
Thu, Nov 30, 8:10 AM
Unknown Object (File)
Wed, Nov 29, 1:50 AM
Unknown Object (File)
Sun, Nov 26, 8:09 AM
Unknown Object (File)
Sat, Nov 25, 9:45 PM
Unknown Object (File)
Sat, Nov 25, 12:25 PM
Unknown Object (File)
Thu, Nov 23, 3:13 PM
Unknown Object (File)
Thu, Nov 23, 9:44 AM
Unknown Object (File)
Thu, Nov 23, 9:39 AM



With this change, randomization is applied to all non-fixed mappings.
By randomization I mean the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours
the superpage attributes.

The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation
prevents entropy injection. It is trivial to implement a strong mode
where failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.

I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation.
The current amount is controlled by aslr_pages_rnd.

To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, the locality is implemented for
anonymous private mappings, which are automatically grouped until
fragmentation kicks in. The initial location for the anon group range
is, of course, randomized.

The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
the small AS architectures (funny that 32bits is considered small).
This is tied with the question of following an application's hint
about the mmap(2) base address. Testing shows that ignoring the hint
does not affect the function of common applications, but I would expect
more demanding code could break. By default sbrk is preserved and mmap
hints are satisfied, which can be changed by using the
kern.elf{32,64}.aslr_care_sbrk sysctl.

Stack gap, W^X, shared page randomization, KASLR and other techniques
are explicitely out of scope of this work.

The paxtest results for the run with the patch applied and aggresively
tuned can be seen at the .
For comparision, the run on Fedora 23 on the same machine is at .

ASLR is enabled on per-ABI basis, and currently it is only enabled on
native i386 and amd64 (including compat 32bit) ABIs. I expect to test
and enable ASLR for armv6 and arm64 as well, later.

The procctl(2) control for ASLR is implemented, by I have not provided
a userspace wrapper around the syscall. In fact, the most reasonable
control needed is per-image and not per-process, but we have no
tradition to put the kernel-read attributes into the extattrs of binary,
so I am still pondering that part and this also explains the non-written

Diff Detail

rS FreeBSD src repository - subversion
Lint Skipped
Tests Skipped
Build Status
Buildable 5424
Build 5626: CI src buildJenkins

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Add parsing of the feature control note both to the kernel loader and to the dynamic linker.
Allocate a bit in the feature control mask for disabling ASLR for the image.

Update after the feature control note parsing bits were merged.


I think you need to initialize anon_loc to 0 in _vm_map_init() for this to work as intended?

Even then, when curr_min_addr == 0, the amount of randomization applied to the initial anon mapping is quite small. For PIEs, libraries are loaded after the (random) base load address, but otherwise, the set of possible initial addresses is quite small.

kib marked an inline comment as done.Nov 25 2018, 10:05 AM
kib added inline comments.

I fixed several more bugs with anon_loc, e.g. copying it on fork. Also I added explicit setting of anon_loc on execution of the ELF binary in 'hard' mode, similar to the interpreter base address selection.

I am not sure what do you mean by the amount of randomization. Either rnd is applied or not, if it is applied, then the amount of the entropy is guaranteed to be some.

Some fixes for anon coalescing.

  • Rework selection of coalescing base address after failure in vm_map_find(), trying to randomize it first.
  • Fix map init and fork by zeroing and copying anon_loc.
  • Apply same randomization to anon_loc as for the interpreter if ASLR is enabled.

IMO it would be clearer if you rename "anon" to "coalesce".


I guess you can just write anon = update_anon instead.


I mean that if the curr_min_addr is not randomized, the amount of entropy added is quite small. In the latest version this is still a problem for non-anonymous mappings in non-PIE binaries: the starting min address is constant (vm_daddr + lim(RLIMIT_DATA)), so the load address of, for example, can be guessed without much work. I am not sure if this is really a significant problem when the executable's address is not randomized, however.


Does it make sense to update anon_loc if find_space == VMFS_NO_SPACE?

kib marked 4 inline comments as done.Nov 25 2018, 7:43 PM
kib added inline comments.

I am still not sure about this. Do you mean that the amount of entropy allowed by the aslr_pages_rnd_XXX arrays is too small ?

Rename anon to coalesce.
Simplify restart when coalesce failed, Inline the last helper.
goto done is never done for KERN_SUCCESS.
Do not update anon_loc for VMFS_NO_SPACE.


Indeed, it does not provide nearly as much entropy as the initial randomization of et_dyn_addr for PIEs or anon_loc. Consider that is mapped with VMFS_OPTIMAL_SPACE, so we will set

*addr += (arc4random() % 0x10) * 0x200000;

For a non-PIE on amd64 this means that will get loaded somewhere in [0x800000000, 0x800200000], so the entropy added is quite minimal. PIEs do not have this problem.


Don't we need to reset curr_min_addr here too?

kib marked 2 inline comments as done.Nov 25 2018, 11:09 PM
kib added inline comments.

The intent is to make two normal passes without coalescing. Second pass resets curr_min_addr.


Yes, this is how I want to keep it now, by disturbing the normal layout as minimal as possible for PoC. On the other hand, since PIE base, ld.elf load address, and now initial anon base are already 'hard' randomized, might be it is indeed does not make sense to keep that part of entropy low. In fact I think we will see after another exp run.


I noticed that we are coalescing mappings in pipe_map. Is there any advantage to be gained by doing this? As a downside, I think the coalescing increases page table usage, especially since most pipe_map mappings are the same size.


mm, we reset curr_min_addr only if en_aslr is set though.

Disable coalescing on submaps.
Remove bogus try reset on retry with coalescing disabled, since curr_min_addr recalculation depends on try == 2.


Sorry for being indecisive. Thinking some more, I think "clustering" actually makes more sense than "coalescing." Coalescing is the process of bringing together multiple entities that were previously separate, but in this case, the anonymous mappings are not separate to begin with.


"When creating an anonymous mapping, try clustering with an existing anonymous mapping first."


The text is a bit misleading since coalesce == false doesn't imply that coalescing failed. How about:

"We make up to two attempts to find address space for a given find_space value. The first attempt may apply randomization or may cluster with an existing anonymous mapping. If this first attempt fails, perform a first-fit search of the available address space."


Why is it necessary to set curr_min_addr here? We know try == 1, so after following the goto we will assign to curr_min_addr again.


MAP_IS_SUB_MAP, for consistency with MAP_ENTRY_IS_SUB_MAP?


Did you mean to clear the flag here?

kib marked 5 inline comments as done.

Reword comments.
Rename variable and symbol.
Remove dup assignment.

I suspect that the anon clustering will interact suboptimally with the jemalloc behaviour discussed in D16501 and elsewhere. In particular, jemalloc will unmap small regions of the address space, leaving holes. With clustering, those holes won't be reused since we no longer perform a first-fit search. IMO it would be worth reconsidering how anon_loc works; rather than advancing it after each successful clustering, maybe it should be constant after the initialization to a non-zero value, so that we attempt to fill holes with new mappings before extending the clustered region further. I do not think this needs to be done prior to commit though.


"... or to cluster with an existing mapping."


I'd consider calling this a "gap" instead, here and in the code (instead of "preserve").

kib marked 2 inline comments as done.

Comment update.


I don't think we should update anon_loc if max_addr is specified (e.g., MAP_32BIT was passed).

kib marked an inline comment as done.

Disable clustering if map is limited by max_addr.


"Decide whether to"


I think it would be worth adding a counter for vm_map_findspace() failures, at least for the en_aslr case.

kib marked 2 inline comments as done.Dec 16 2018, 2:23 AM
kib added inline comments.

I added the counter for try == 2 restarts.

IMO iti is of limited usefulness because it is global, but I do not think it is worth adding the per-vmspace counters and the whole required infrastructure for it.

kib marked an inline comment as done.

Add global restart counter.


I don't follow what you mean by the "entire executable image."

765–766 ↗(On Diff #52070)

For ASLR having only opt-out is reasonable, IMO.

For other bits (max_prot, W^X, etc.) initially we probably want both opt-in and opt-out, as there may be some time before we can enable features by default.


@markj are you suggesting even committing the coalescing separately?


Let's comment this here too

Update the patch with the fixes made after Peter' testing.

766 ↗(On Diff #53435)

Discussed on IRC, suggest either NT_FREEBSD_FCTL_ASLR_DIS or NT_FREEBSD_FCTL_ASLR_DISABLE - doesn't matter so much for this in isolation but want something that will have a regular pattern when we add MAX_PROT and other feature bits.

766 ↗(On Diff #53440)

Flag name LGTM


"enable" is more common in the tree than "enabled" (27 to 3 on my laptop) and IMO preferable.

Also, should we have a kern.elfN.aslr.* tree? So e.g. kern.elf64.aslr.enable, perhaps kern.elf64.aslr.pie_enable?

kib marked an inline comment as done.

Tweak sysctls.

This revision was not accepted when it landed; it landed in state Needs Review.Jan 31 2019, 3:45 PM
This revision was automatically updated to reflect the committed changes.

Regen patch after the bit definition was committed.

This revision was not accepted when it landed; it landed in state Needs Review.Feb 10 2019, 5:19 PM
This revision was automatically updated to reflect the committed changes.
dougm added inline comments.
1665 ↗(On Diff #53745)

If max_addr != 0, you've guaranteed that *addr + length <= max_addr, but after this modification to *addr, the guarantee won't hold and you may return an address beyond max_addr. Or so it seems to me.

1665 ↗(On Diff #53745)

You mean, in the situation where vm_map_maxaddr(map) > max_addr. Please see D19688.