Page MenuHomeFreeBSD

Support arm64 stage2 TLB invalidation
ClosedPublic

Authored by andrew on Nov 3 2022, 6:56 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Apr 16, 6:56 AM
Unknown Object (File)
Feb 25 2024, 7:18 AM
Unknown Object (File)
Feb 19 2024, 6:45 PM
Unknown Object (File)
Feb 19 2024, 6:45 PM
Unknown Object (File)
Jan 12 2024, 8:21 AM
Unknown Object (File)
Jan 11 2024, 6:42 PM
Unknown Object (File)
Dec 26 2023, 10:25 AM
Unknown Object (File)
Dec 23 2023, 3:16 AM
Subscribers

Details

Summary

To invalidate stage 2 mappings on arm64 we may need to call into the
hypervisor so add a function pointer that bhyve can use to implement
this.

Sponsored by: The FreeBSD Foundation

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

andrew requested review of this revision.Nov 3 2022, 6:56 PM

I am curious. So bhyve port to arm64 does not run in EL2 mode?

This revision is now accepted and ready to land.Nov 3 2022, 11:58 PM

Most of it is in the kernel. Only the virtual machine switcher, stage 2 tlb handling, cache handling, and support to read a few EL2 registers needs to live in EL2.

Is reclaim_pv_chunk_domain() still missing stage2 handling?

sys/arm64/arm64/pmap.c
1650

Extra newline.

3307

What about this call? It's reachable from pmap_remove().

sys/arm64/arm64/pmap.c
3465–3471

If the TLB invalidations are not completed before the removal of the PV entries, then a race can arise where a stale TLB entry still gives access to a physical page that has been recycled for a new use. Doing this would require the introduction of a delayed invalidation mechanism, like we have on amd64. (I wouldn't be surprised if introducing that mechanism wouldn't be beneficial to performance on arm64.)

  • Rebase on D37302
  • Remove an extra space
  • Create a common entry point for stage 1 and 2 invalidation functions
  • Implement stage 2 page invalidation with a range callback
This revision now requires review to proceed.Nov 7 2022, 4:19 PM
sys/arm64/arm64/pmap.c
3307

Superpages are disabled for stage 2 mappings (see D37299) so this should never happen.

Add stage 2 TLB invalidation to pmap_enter

sys/arm64/arm64/pmap.c
3465–3471

It would be strange to add DI to arm64, given that the arch provides TLB invalidation broadcasts facilities.

BTW, Intel is trying to introduce experimental broadcast facility as well, I am not aware of existence of a usable implementation in the available hardware.

sys/arm64/arm64/pmap.c
3465–3471

Suppose that we are destroying a large number of mappings, say, for example, ten or twenty thousand pages within an address space. At that point, you don't want to be doing individual, i.e., per-page, invalidations, because you are invalidating several times more mappings than the TLB can hold. Instead, you want to issue a single invalidation for all mappings belonging to the ASID. In order to do such batching correctly, you need the delayed invalidation mechanism from amd64, or something close to it.

I don't remember the exact details now, since it has been a while since I looked at the Linux code in this area, but they were making an effort to reduce the number of broadcast invalidations because of their indirect costs on large-scale machines.

sys/arm64/arm64/pmap.c
3465–3471

Do we need to perform the TLB invalidation if we can guarentee to not use the TLB with the given ASID/VMID at the current pmap epoch? If so we could skip the TLB invalidations as long as we perform one before reusing the ASID/VMID.

For a hypervisor we can know, e.g. we call pmap_remove_pages just before calling vmspace_free where the refcount should be zero. Because of this we could likely skip TLB invalidation in pmap_remove_pages, but would need a way to know the pmap will not be reused.

sys/arm64/arm64/pmap.c
3465–3471

No, we do not need to perform a TLB invalidation if we can guarantee that the ASID will not be used again within the same epoch. However, you cannot make that guarantee with pmap_remove because the ASID may be active on other cores, e.g., you are running a multithreaded application. Only pmap_remove_pages can guarantee that the ASID is not active on other cores, and it already exploits this by performing a single TLB invalidation for the ASID.

The place where we could be clever, and seek to reduce/batch TLB invalidations, without needing a delayed invalidation mechanism is pmap_protect., which could reduce the cost of setting up copy-on-write during fork. The potential race condition that exists within pmap_remove that we prevent with the delayed invalidation mechanism on amd64 cannot arise within pmap_protect.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 15 2023, 12:02 PM
This revision was automatically updated to reflect the committed changes.