When we issue shootdown IPIs, we first assign zero to pm_gens to indicate the need to flush on the next context switch in case our IPI misses the context, next we read pm_active. On context switch we set our bit in pm_active, then we read pm_gen. It is crucial that both threads see the memory in the program order, otherwise invalidation thread might read pm_active bit as zero and the context switching thread might read pm_gen as zero.
IA32 allows CPU for both reads to see zero. We must use the barriers between write and read. The pm_active bit set is already locked, so only the invalidation functions need it.
I never saw it in real life, or at least I do not have a good reproduction case. I found this during code inspection when hunting for the Xen TLB issue reported by cperciva.