MFC r368649 / 3fd989da by kib: amd64 pmap: fix PCID mode invalidations
r368649 fixed a regression in r362031 that was MFC-ed to stable/12 as
a part of r362572.  That commit reordered IPI send and local TLB flush in
TLB invalidations.
Without this fix we've been seeing problems with stale memory content
where changes done under a mutex were not immediately observed by
another thread after taking the same mutex.  Those inconsistenices were
correlated to copy-on-write faults for pages contaning the data.
The change needed some adaptations as I elected to skip two significant
intermediate changes:
- r363195 / dc43978a, amd64: allow parallel shootdown IPIs
- r363311 / 3ec7e169, amd64 pmap: microoptimize local shootdowns for PCID PTI configurations
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D33413