amd64 pmap: reorder IPI send and local TLB flush in TLB invalidations.
Right now code first flushes all local TLB entries that needs to be
flushed, then signals IPI to remote cores, and then waits for
acknowledgements while spinning idle. In the VMWare article 'Don’t
shoot down TLB shootdowns!' it was noted that the time spent spinning
is lost, and can be more usefully used doing local TLB invalidation.
We could use the same invalidation handler for local TLB as for
remote, but typically for pmap == curpmap we can use INVLPG for locals
instead of INVPCID on remotes, since we cannot control context
switches on them. Due to that, keep the local code and provide the
callbacks to be called from smp_targeted_tlb_shootdown() after IPIs
are fired but before spin wait starts.
Reviewed by: alc, cem, markj, Anton Rang <rang at acm.org>
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D25188