Page MenuHomeFreeBSD

arm64: add KASAN support
Needs ReviewPublic

Authored by kevans on Sep 25 2022, 6:42 PM.


Group Reviewers

This entails:

  • Marking some obvious candidates for __nosanitizeaddress
  • Similar trap frame markings as amd64, for similar reasons
  • Shadow map implementation

The shadow map implementation is roughly similar to what was done on
amd64, with some exceptions. Attempting to use available space at
preinit_map_va + PMAP_PREINIT_MAPPING_SIZE (up to the end of that range,
as depicted in the physmap) results in odd failures, so we instead
search the physmap for free regions that we can carve out, fragmenting
the shadow map as necessary to try and fit as much as we need for the
initial kernel map. pmap_bootstrap_san() is thus after
pmap_bootstrap(), which still included some technically reserved areas
of the memory map that needed to be included in the DMAP.

The odd failure noted above may be a bug, but I haven't investigated it
all that much.

Initial work by: mhorne
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.

Diff Detail

rG FreeBSD src repository
Lint Passed
No Test Coverage
Build Status
Buildable 47559
Build 44446: arc lint + arc unit

Event Timeline

I also still haven't put too much thought into why arm64 KASAN seems to use way more stack than amd64 KASAN.


This is broken with a 16k PAGE_SIZE as there are only 2 L0 entries with both already created.


I don't think you need PMAP_SAN_PTE_BITS on table entries. It just sets bits in ignored fields.

I also wonder if we could reuse pmap_bootstrap_dmap_l1_table here. I don't think it's quite correct as it assume there are no page tables present so the second user will fail, but should allow KASAN + 16k PAGE_SIZE


I'm not sure we need this (or other cases in pmap.c). We should never be modifying a valid entry in the bootstrap process.

37 ↗(On Diff #110954)

I wonder if we should just build this file with compile-with "${NORMAL_C:N-fsanitize*}" given it's used in the sanitizers to get a stack trace


Missing license


Why the commented out return statement?


Why is this needed?


Why so many? We use 6 on amd64.

kevans marked 7 inline comments as done.

Address review feedback.

This incorporates a version of andy's suggestion;
dmap_bootstrap_state->bootstrapped was added to skip earlier stages of table
bootstrap and applied to all uses of the state, just to keep the semantics
consistent across the board (but I'm not tied to it...). I didn't rename
pmap_bootstrap_dmap_l1_table, either. It works with 4k pages, but I can't test
16k pages because that seems to put the kernel up to 37M which can't fit in the
single L2 table that we have at this point (it blows up in .bss zeroing).

Having thought about it a bit more I think we should make the struct dmap_bootstrap_state global and to share it between the different callers (probably renaming it to pmap_bootstrap_state). The pmap_bootstrap_dmap_* functions could then be used by pmap_bootstrap_l2 and pmap_bootstrap_l3 with some work to check they're not trashing an existing mappin (easy after the DMAP region is avilable).

I marked a few changes that could be committed before the main KASAN change. They are also needed by KCSAN which is buildable with one compiler change to allow the thread sanitizer to be used on arm64.

283 ↗(On Diff #110969)

Can you split this out to a new review as we need it for other sanitizers, e.g. KCSAN

489 ↗(On Diff #110969)

Can you split the bus_space_set_multi_stream and bus_space_set_region_stream bits to a new review as they don't depend on KASAN


Can you split this out to a new review as we need it for other sanitizers, e.g. KCSAN

59 ↗(On Diff #110969)

Do we really need this annotation anywhere except stack_capture()?


Elsewhere you write defined(KASAN) || defined(KMSAN), which I think is ok, but to be consistent we should use that condition here too.


Extra newline.


I think this bit could be committed as-is.

37 ↗(On Diff #110954)

That seems reasonable.


We still want to build sys/arm64/arm64/support_ifunc.c, but otherwise the symbol's only declared here in the !sanitizer branch below.


This is what I haven't really spent too much time digging into; with fewer than 12 it crashes and burns elsewhere, so this is perhaps papering over something else.

kevans marked 5 inline comments as done.

Second round of review feedback; rebased on top of Andy's pmap bootstrap cleanup

59 ↗(On Diff #110969)

We discussed this a bit OOB and I removed the annotation from all three of these, as stack_save_td() doesn't support capturing a running thread. I may add back in a comment to note that x86 prevents KASAN/KMSAN there potentially due to inconsistencies arising from that scenario, but it's unclear if there's any plan/desire to change this.


Sorry, I had actually forgotten this in the first round, but this is done now as well.


Landed early in f2963b530e17


I'm still tracking down an issue here. With these values, I see:

  • 6: Panics (fault in a nofault region) as sp drops below the thread's stack allocation under load in random processes; sometimes this is just a couple pages below (always past the guard page), other times it's many many pages.
  • 8: Panics become much rarer, but processes are randomly blown away under load somewhat infrequently (either SIGABRT or SIGSEGV)
  • 12: Generally fine

The behavior at 12 pages would lead me to think it's just higher stack usasge, but jumping 3+ pages below the stack seems suspicious enough to warrant further investigation.

When I examine the stack to try and figure out where we may have been when things blew up, the results are... not great. The only addresses I can find are for data_abort and something else along the trap path, and the stack is otherwise sprayed with 0x5a ('Z'?) repeatedly in between other occurrences of data_abort and <mumble>'s addresses. I couldn't track down where that pattern might be coming from.

I hacked in a quick td_faultcnt to track recursion into data_abort to try and see if we were actually recursively faulting, asserting that it not raise higher than 2 (user trap into kernel trap) -- this is not the case.


All of the leaf entries in the shadow map should also have ATTR_SW_DBM set. That bit effectively marks the mapping as dirty, since shadow map entries are always writeable. Without it, the first write to the mapping will cause the DBM bit to be set; if the platform performs DBM management in software, it'll trap and pmap_fault() will fail to fix up the mapping.


Extra newline.