Page MenuHomeFreeBSD

Support arm64 stage2 TLB invalidation
Needs ReviewPublic

Authored by andrew on Nov 3 2022, 6:56 PM.
Referenced Files
Unknown Object (File)
Thu, Feb 2, 10:53 PM
Unknown Object (File)
Wed, Jan 18, 3:08 PM
Unknown Object (File)
Mon, Jan 9, 12:26 AM
Unknown Object (File)
Sun, Jan 8, 4:28 PM
Unknown Object (File)
Dec 18 2022, 7:47 PM
Unknown Object (File)
Dec 14 2022, 3:16 AM
Unknown Object (File)
Dec 8 2022, 12:58 AM


Group Reviewers

To invalidate stage 2 mappings on arm64 we may need to call into the
hypervisor so add a function pointer that bhyve can use to implement

Sponsored by: The FreeBSD Foundation

Diff Detail

rG FreeBSD src repository
Lint Passed
No Test Coverage
Build Status
Buildable 48369
Build 45255: arc lint + arc unit

Event Timeline

andrew requested review of this revision.Nov 3 2022, 6:56 PM

I am curious. So bhyve port to arm64 does not run in EL2 mode?

This revision is now accepted and ready to land.Nov 3 2022, 11:58 PM

Most of it is in the kernel. Only the virtual machine switcher, stage 2 tlb handling, cache handling, and support to read a few EL2 registers needs to live in EL2.

Is reclaim_pv_chunk_domain() still missing stage2 handling?


Extra newline.


What about this call? It's reachable from pmap_remove().


If the TLB invalidations are not completed before the removal of the PV entries, then a race can arise where a stale TLB entry still gives access to a physical page that has been recycled for a new use. Doing this would require the introduction of a delayed invalidation mechanism, like we have on amd64. (I wouldn't be surprised if introducing that mechanism wouldn't be beneficial to performance on arm64.)

  • Rebase on D37302
  • Remove an extra space
  • Create a common entry point for stage 1 and 2 invalidation functions
  • Implement stage 2 page invalidation with a range callback
This revision now requires review to proceed.Nov 7 2022, 4:19 PM

Superpages are disabled for stage 2 mappings (see D37299) so this should never happen.

Add stage 2 TLB invalidation to pmap_enter


It would be strange to add DI to arm64, given that the arch provides TLB invalidation broadcasts facilities.

BTW, Intel is trying to introduce experimental broadcast facility as well, I am not aware of existence of a usable implementation in the available hardware.


Suppose that we are destroying a large number of mappings, say, for example, ten or twenty thousand pages within an address space. At that point, you don't want to be doing individual, i.e., per-page, invalidations, because you are invalidating several times more mappings than the TLB can hold. Instead, you want to issue a single invalidation for all mappings belonging to the ASID. In order to do such batching correctly, you need the delayed invalidation mechanism from amd64, or something close to it.

I don't remember the exact details now, since it has been a while since I looked at the Linux code in this area, but they were making an effort to reduce the number of broadcast invalidations because of their indirect costs on large-scale machines.


Do we need to perform the TLB invalidation if we can guarentee to not use the TLB with the given ASID/VMID at the current pmap epoch? If so we could skip the TLB invalidations as long as we perform one before reusing the ASID/VMID.

For a hypervisor we can know, e.g. we call pmap_remove_pages just before calling vmspace_free where the refcount should be zero. Because of this we could likely skip TLB invalidation in pmap_remove_pages, but would need a way to know the pmap will not be reused.


No, we do not need to perform a TLB invalidation if we can guarantee that the ASID will not be used again within the same epoch. However, you cannot make that guarantee with pmap_remove because the ASID may be active on other cores, e.g., you are running a multithreaded application. Only pmap_remove_pages can guarantee that the ASID is not active on other cores, and it already exploits this by performing a single TLB invalidation for the ASID.

The place where we could be clever, and seek to reduce/batch TLB invalidations, without needing a delayed invalidation mechanism is pmap_protect., which could reduce the cost of setting up copy-on-write during fork. The potential race condition that exists within pmap_remove that we prevent with the delayed invalidation mechanism on amd64 cannot arise within pmap_protect.