Page MenuHomeFreeBSD

am64 pmap: microoptimize local shootdowns for PCID PTI configurations
Needs ReviewPublic

Authored by kib on Sat, Jun 27, 2:25 AM.

Details

Reviewers
alc
markj
Summary

When pmap operates in PTI mode, we must reload %cr3 on return to userspace. In non-PCID mode the reload always flushes all non-global TLB entries and we take advantage of it by only invalidating the kpt TLB entries (there is no upt entries at all).

In PCID mode, we flush both kpt and upt TLB explicitly, but we can take advantage of the fact that PCID mode command to reload %cr3 includes a flag to flush/not flush target TLB. In particular, we can avoid the flush for upt, instead record that load of pc_ucr3 into %cr3 on return to usermode should be flushing. This is done by providing either all-1s or ~CR3_PCID_MASK in pc_ucr3_load_mask. The mask is automatically reset to all-1s on return to usermode.

Similarly, we can avoid flushing upt TLB on context switch, replacing it by setting pc_ucr3_load_mask. This unifies INVPCID and non-INVPCID PTI ifunc, leaving only 4 cases instead of 6. This trick is also applicable both to the TLB shootdown IPI handlers, since handlers interrupt the target thread.

But then we need to check pc_curpmap in handlers, and this would reopen the same race for INVPCID machines as was fixed in r306350 for non-INVPCID. To not introduce the same bug, unconditionally do spinlock_enter() in pmap_activate().

[This is motivated by the same VMWare TLB shootdown article, but not exactly]

Test Plan

I did very light checks, but I did run jvm on the patched kernel (before, jvms were very sensitive to TLB invalidation bugs).

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 32026

Event Timeline

kib created this revision.Sat, Jun 27, 2:25 AM
kib requested review of this revision.Sat, Jun 27, 2:25 AM
markj added a comment.Sun, Jun 28, 8:07 PM

Just to make sure that I understand: the idea is to reduce the amount of work done in invalidation handlers, by deferring the flush of upt TLB entries until the interrupted thread is about to switch back to user mode?

sys/amd64/amd64/pmap.c
8960

Assert that interrupts are disabled in pmap_activate_sw()?

8961

I do not really understand why spinlock_*() are used instead of MD intr_disable/enable().

kib added a comment.Sun, Jun 28, 8:33 PM

Just to make sure that I understand: the idea is to reduce the amount of work done in invalidation handlers, by deferring the flush of upt TLB entries until the interrupted thread is about to switch back to user mode?

I would formulate it differently, that the patch removes unneeded work from both the pmap_invalidateXXX() functions and from the shootdown IPI handlers. The observation is that for PCID+PTI configs, we must reload %cr3 when returning to usermode anyway. This reload can handle the invalidation as well, so there is no need to either reload %cr3 (when INVPCID instruction is not available) or do INVPCID in functions and handlers.

sys/amd64/amd64/pmap.c
8961

spinlock_enter == interrupt disable + critical enter. Parts of the PCID recalculations on the context switch need just the disable of the context switch, so that we are not preempted in the middle of it. So the code asserts critical sections.

On the other hand, for the reason explained in the comment above, non-context switch use of pmap_activate() must use this trick to prevent handling IPIs during non-consistent state of the PCPU/hw.

In principle just disabling interrupts would serve, but most important case of context switch has both interrupts disabled and critical section around, so I do not want to lower assertions in the code.

I might add interrupt disable asserts.

kib updated this revision to Diff 73818.Sun, Jun 28, 9:33 PM

Add asserts about interrupt enable state.
Collape pmap_activate_sw_pcid_pti1() into the caller.

In D25483#563260, @kib wrote:

Just to make sure that I understand: the idea is to reduce the amount of work done in invalidation handlers, by deferring the flush of upt TLB entries until the interrupted thread is about to switch back to user mode?

I would formulate it differently, that the patch removes unneeded work from both the pmap_invalidateXXX() functions and from the shootdown IPI handlers. The observation is that for PCID+PTI configs, we must reload %cr3 when returning to usermode anyway. This reload can handle the invalidation as well, so there is no need to either reload %cr3 (when INVPCID instruction is not available) or do INVPCID in functions and handlers.

Thanks, I see.

As I understand, for correctness, the return-from-exception code must restore the default ucr3 mask (PMAP_UCR3_NOMASK). If it fails to do so, a subsequent exception+return to usermode will use a stale mask and thus will fail to flush upt TLB entries. Is that right?

kib added a comment.Sun, Jun 28, 11:04 PM

As I understand, for correctness, the return-from-exception code must restore the default ucr3 mask (PMAP_UCR3_NOMASK). If it fails to do so, a subsequent exception+return to usermode will use a stale mask and thus will fail to flush upt TLB entries. Is that right?

PMAP_UCR3_NOMASK means that no invalidation is done on reloading UCR3. For correctness, we must never return to usermode without invalidation if mask is not NOMASK, be it exception, interrupt or syscall return. But if we return with invalidation, it does not make sense to not reset the mask, since we invalidated and doing more invalidation without a reason would only spend CPU cycles.