When pmap operates in PTI mode, we must reload %cr3 on return to userspace. In non-PCID mode the reload always flushes all non-global TLB entries and we take advantage of it by only invalidating the kpt TLB entries (there is no upt entries at all).
In PCID mode, we flush both kpt and upt TLB explicitly, but we can take advantage of the fact that PCID mode command to reload %cr3 includes a flag to flush/not flush target TLB. In particular, we can avoid the flush for upt, instead record that load of pc_ucr3 into %cr3 on return to usermode should be flushing. This is done by providing either all-1s or ~CR3_PCID_MASK in pc_ucr3_load_mask. The mask is automatically reset to all-1s on return to usermode.
Similarly, we can avoid flushing upt TLB on context switch, replacing it by setting pc_ucr3_load_mask. This unifies INVPCID and non-INVPCID PTI ifunc, leaving only 4 cases instead of 6. This trick is also applicable both to the TLB shootdown IPI handlers, since handlers interrupt the target thread.
But then we need to check pc_curpmap in handlers, and this would reopen the same race for INVPCID machines as was fixed in r306350 for non-INVPCID. To not introduce the same bug, unconditionally do spinlock_enter() in pmap_activate().
[This is motivated by the same VMWare TLB shootdown article, but not exactly]