Page MenuHomeFreeBSD

Allow us to mark a pmap as dead
Needs ReviewPublic

Authored by andrew on Jun 21 2021, 5:54 PM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 24 2023, 2:11 PM
Unknown Object (File)
Dec 4 2023, 3:48 AM
Unknown Object (File)
Oct 30 2023, 3:04 AM
Unknown Object (File)
Jan 17 2023, 1:45 AM
Unknown Object (File)
Jan 12 2023, 3:59 PM
Unknown Object (File)
Nov 28 2022, 9:49 AM
Subscribers

Details

Reviewers
alc
kib
markj
manu
Summary

This allows us to skip tlb invalidation, e.g. in bhyve when a VM has
shutdown.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 40041
Build 36930: arc lint + arc unit

Event Timeline

There is no use of pmap_pre_destroy() in the patch.

Is it that significant effect to skip TLB invalidations on dead pmap?

sys/arm64/arm64/pmap.c
1182

Perhaps you want to assert that kernel_pmap is never marked as dead.

  • KASSERT we don't mark the kernel pmap as dead
  • Remove a PMAP_ASSERT_STAGE1 that shouln't be part of this patch

You can see the use in bhyve in https://github.com/CTSRD-CHERI/freebsd-morello/commit/d12af6f53e9c9ba08d80974be2fbce7fa9381f95#diff-1c77bb82662186164af0461f49f764096e3c3bd3d3f324a2db019d242cbdb3de

Is there anywhere in the vm code it would make sense to call pmap_pre_destroy? If so I can look at making it a MI interface.

You can see the use in bhyve in https://github.com/CTSRD-CHERI/freebsd-morello/commit/d12af6f53e9c9ba08d80974be2fbce7fa9381f95#diff-1c77bb82662186164af0461f49f764096e3c3bd3d3f324a2db019d242cbdb3de

Is there anywhere in the vm code it would make sense to call pmap_pre_destroy? If so I can look at making it a MI interface.

I do not think the dead state in the proposed interpretation can be set earlier than in pmap_remove_pages(), but there it is already useless. We cannot set it earlier because pmap_remove_pages() is the first call into pmap on the path of the vmspace destruction. On the other hand, we do not destroy vmspace until only the current thread has this pmap active, and no new activations of this pmap can occur. So nothing should invalidate the pmap on remote CPUs.

For the bhyve patch above, why is it useful? As I understand, this vmspace/pmap is never going to be activated at all, so why claiming that TLB invalidation not needed helps? Also, for ARMv8, SMP TLB invalidations do not require IPI, this is why I was surprised that such optimization is ever helpful.

In D30845#694140, @kib wrote:

For the bhyve patch above, why is it useful? As I understand, this vmspace/pmap is never going to be activated at all, so why claiming that TLB invalidation not needed helps? Also, for ARMv8, SMP TLB invalidations do not require IPI, this is why I was surprised that such optimization is ever helpful.

It's useful in bhyve as we need to call into the hypervisor to perform the cache invalidation. In bhyve I'm either moving from invalid -> valid or destroying the pmap. For the former case there is no need for TLB invalidation, and in the latter it can be handled when we reset the VMID space.

In D30845#694140, @kib wrote:

You can see the use in bhyve in https://github.com/CTSRD-CHERI/freebsd-morello/commit/d12af6f53e9c9ba08d80974be2fbce7fa9381f95#diff-1c77bb82662186164af0461f49f764096e3c3bd3d3f324a2db019d242cbdb3de

Is there anywhere in the vm code it would make sense to call pmap_pre_destroy? If so I can look at making it a MI interface.

I do not think the dead state in the proposed interpretation can be set earlier than in pmap_remove_pages(), but there it is already useless. We cannot set it earlier because pmap_remove_pages() is the first call into pmap on the path of the vmspace destruction. On the other hand, we do not destroy vmspace until only the current thread has this pmap active, and no new activations of this pmap can occur. So nothing should invalidate the pmap on remote CPUs.

For the bhyve patch above, why is it useful? As I understand, this vmspace/pmap is never going to be activated at all, so why claiming that TLB invalidation not needed helps? Also, for ARMv8, SMP TLB invalidations do not require IPI, this is why I was surprised that such optimization is ever helpful.

My understanding is that there is a cost to the invalidation on other cores even though shootdown is handled by the hardware. The invalidation ASID and address are broadcast to the other cores, and I doubt that many core implementations would have a dedicated port to the TLB structure for handling these incoming invalidations, so invalidations might delay TLB lookups by instructions running on the core.

In D30845#694140, @kib wrote:

You can see the use in bhyve in https://github.com/CTSRD-CHERI/freebsd-morello/commit/d12af6f53e9c9ba08d80974be2fbce7fa9381f95#diff-1c77bb82662186164af0461f49f764096e3c3bd3d3f324a2db019d242cbdb3de

Is there anywhere in the vm code it would make sense to call pmap_pre_destroy? If so I can look at making it a MI interface.

I do not think the dead state in the proposed interpretation can be set earlier than in pmap_remove_pages(), but there it is already useless. We cannot set it earlier because pmap_remove_pages() is the first call into pmap on the path of the vmspace destruction. On the other hand, we do not destroy vmspace until only the current thread has this pmap active, and no new activations of this pmap can occur. So nothing should invalidate the pmap on remote CPUs.

Is bhyve actually taking advantage of pmap_remove_pages()? I glanced at the file that Andrew sent a link to, and it appears to me that bhyve performs a series of vm_map_remove()s, which call pmap_remove(), before performing vmspace_free(). So, pmap_remove_pages() never gets called. Whereas pmap_remove_pages() does a single TLB invalidation on the whole address space, pmap_remove() is performing a TLB invalidation per valid PTE destroyed.

In D30845#694803, @alc wrote:

Whereas pmap_remove_pages() does a single TLB invalidation on the whole address space ...

BTW I asked myself why this invalidation is performed by pmap_remove_pages() at all, at least on amd64. There is no more userspace that can activate our pmap. So for the same reason that DI is not entered there, can we avoid the invalidation? It would occur after final teardown of vmspace, on the last context switch anyway.

I convinced myself that the invalidation prevents any speculative loads of PTEs and higher paging structures, but this is relatively weak reason for an IPI.

A couple thoughts:

  1. I'm curious to know if the nested page table pmap is active on the current processor when bhyve is destroying the guest physical address space. If so, it ought to be calling pmap_remove_pages(), whether it is running on arm64 or any other architecture. That said, pmap_remove_pages() won't help if the guest physical address space is wired memory to allow direct access to, e.g., SR-IOV devices.
  2. I wonder if adding a current activation count to the pmap and when that count is zero resetting the pmap's ASID rather than performing TLB invalidations wouldn't be a better, more general solution if pmap_remove_pages() can't be used.