amd64 pmap: microoptimize local shootdowns for PCID PTI configurations
When pmap operates in PTI mode, we must reload %cr3 on return to
userspace. In non-PCID mode the reload always flushes all non-global
TLB entries and we take advantage of it by only invalidating the KPT
TLB entries (there is no cached UPT entries at all).
In PCID mode, we flush both KPT and UPT TLB explicitly, but we can
take advantage of the fact that PCID mode command to reload %cr3
includes a flag to flush/not flush target TLB. In particular, we can
avoid the flush for UPT, instead record that load of pc_ucr3 into %cr3
on return to usermode should be flushing. This is done by providing
either all-1s or ~CR3_PCID_MASK in pc_ucr3_load_mask. The mask is
automatically reset to all-1s on return to usermode.
Similarly, we can avoid flushing UPT TLB on context switch, replacing
it by setting pc_ucr3_load_mask. This unifies INVPCID and non-INVPCID
PTI ifunc, leaving only 4 cases instead of 6. This trick is also
applicable both to the TLB shootdown IPI handlers, since handlers
interrupt the target thread.
But then we need to check pc_curpmap in handlers, and this would
reopen the same race for INVPCID machines as was fixed in r306350 for
non-INVPCID. To not introduce the same bug, unconditionally do
spinlock_enter() in pmap_activate().
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D25483