Details

Reviewers

alc
markj

Commits

rS363311: amd64 pmap: microoptimize local shootdowns for PCID PTI configurations

Summary

When pmap operates in PTI mode, we must reload %cr3 on return to userspace. In non-PCID mode the reload always flushes all non-global TLB entries and we take advantage of it by only invalidating the kpt TLB entries (there is no upt entries at all).

In PCID mode, we flush both kpt and upt TLB explicitly, but we can take advantage of the fact that PCID mode command to reload %cr3 includes a flag to flush/not flush target TLB. In particular, we can avoid the flush for upt, instead record that load of pc_ucr3 into %cr3 on return to usermode should be flushing. This is done by providing either all-1s or ~CR3_PCID_MASK in pc_ucr3_load_mask. The mask is automatically reset to all-1s on return to usermode.

Similarly, we can avoid flushing upt TLB on context switch, replacing it by setting pc_ucr3_load_mask. This unifies INVPCID and non-INVPCID PTI ifunc, leaving only 4 cases instead of 6. This trick is also applicable both to the TLB shootdown IPI handlers, since handlers interrupt the target thread.

But then we need to check pc_curpmap in handlers, and this would reopen the same race for INVPCID machines as was fixed in r306350 for non-INVPCID. To not introduce the same bug, unconditionally do spinlock_enter() in pmap_activate().

[This is motivated by the same VMWare TLB shootdown article, but not exactly]

Test Plan

I did very light checks, but I did run jvm on the patched kernel (before, jvms were very sensitive to TLB invalidation bugs).

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kib created this revision.Jun 27 2020, 2:25 AM

Herald added a subscriber: imp. · View Herald TranscriptJun 27 2020, 2:25 AM

kib requested review of this revision.Jun 27 2020, 2:25 AM

Harbormaster completed remote builds in B32003: Diff 73734.Jun 27 2020, 2:25 AM

Just to make sure that I understand: the idea is to reduce the amount of work done in invalidation handlers, by deferring the flush of upt TLB entries until the interrupted thread is about to switch back to user mode?

sys/amd64/amd64/pmap.c
8937 ↗	(On Diff #73734)	Assert that interrupts are disabled in pmap_activate_sw()?
8938 ↗	(On Diff #73734)	I do not really understand why spinlock_*() are used instead of MD intr_disable/enable().

In D25483#563256, @markj wrote:

Just to make sure that I understand: the idea is to reduce the amount of work done in invalidation handlers, by deferring the flush of upt TLB entries until the interrupted thread is about to switch back to user mode?

I would formulate it differently, that the patch removes unneeded work from both the pmap_invalidateXXX() functions and from the shootdown IPI handlers. The observation is that for PCID+PTI configs, we must reload %cr3 when returning to usermode anyway. This reload can handle the invalidation as well, so there is no need to either reload %cr3 (when INVPCID instruction is not available) or do INVPCID in functions and handlers.

sys/amd64/amd64/pmap.c
8938 ↗	(On Diff #73734)	spinlock_enter == interrupt disable + critical enter. Parts of the PCID recalculations on the context switch need just the disable of the context switch, so that we are not preempted in the middle of it. So the code asserts critical sections. On the other hand, for the reason explained in the comment above, non-context switch use of pmap_activate() must use this trick to prevent handling IPIs during non-consistent state of the PCPU/hw. In principle just disabling interrupts would serve, but most important case of context switch has both interrupts disabled and critical section around, so I do not want to lower assertions in the code. I might add interrupt disable asserts.

Add asserts about interrupt enable state.
Collape pmap_activate_sw_pcid_pti1() into the caller.

Harbormaster completed remote builds in B32026: Diff 73818.Jun 28 2020, 9:33 PM

In D25483#563260, @kib wrote:

In D25483#563256, @markj wrote:

Just to make sure that I understand: the idea is to reduce the amount of work done in invalidation handlers, by deferring the flush of upt TLB entries until the interrupted thread is about to switch back to user mode?

I would formulate it differently, that the patch removes unneeded work from both the pmap_invalidateXXX() functions and from the shootdown IPI handlers. The observation is that for PCID+PTI configs, we must reload %cr3 when returning to usermode anyway. This reload can handle the invalidation as well, so there is no need to either reload %cr3 (when INVPCID instruction is not available) or do INVPCID in functions and handlers.

Thanks, I see.

As I understand, for correctness, the return-from-exception code must restore the default ucr3 mask (PMAP_UCR3_NOMASK). If it fails to do so, a subsequent exception+return to usermode will use a stale mask and thus will fail to flush upt TLB entries. Is that right?

In D25483#563292, @markj wrote:

As I understand, for correctness, the return-from-exception code must restore the default ucr3 mask (PMAP_UCR3_NOMASK). If it fails to do so, a subsequent exception+return to usermode will use a stale mask and thus will fail to flush upt TLB entries. Is that right?

PMAP_UCR3_NOMASK means that no invalidation is done on reloading UCR3. For correctness, we must never return to usermode without invalidation if mask is not NOMASK, be it exception, interrupt or syscall return. But if we return with invalidation, it does not make sense to not reset the mask, since we invalidated and doing more invalidation without a reason would only spend CPU cycles.

In D25483#563300, @kib wrote:

For correctness, we must never return to usermode without invalidation if mask is not NOMASK, be it exception, interrupt or syscall return.

Ok, this is the point I was trying to clarify. Can we assert this somehow, perhaps upon entry to kernel mode while handling an interrupt?

sys/amd64/amd64/mp_machdep.c
654 ↗	(On Diff #73818)	I think we can have ucr3_load_mask != NOMASK only if the thread was interrupted while in kernel mode. Is that correct?
sys/amd64/amd64/pmap.c
8832 ↗	(On Diff #73818)	Perhaps write, /* See the comment in pmap_invalidate_page_pcid(). */

This revision is now accepted and ready to land.Jul 6 2020, 5:44 PM

alc added inline comments.Jul 6 2020, 6:17 PM

sys/amd64/amd64/pmap.c
2531 ↗	(On Diff #73818)	Can you please replace "trivial" by a phrase that describes the intended meaning of "trivial"?
8952 ↗	(On Diff #73818)	"most typical" -> "most common"
8953 ↗	(On Diff #73818)	-> "a context switch"

In D25483#565572, @markj wrote:

In D25483#563300, @kib wrote:

For correctness, we must never return to usermode without invalidation if mask is not NOMASK, be it exception, interrupt or syscall return.

Ok, this is the point I was trying to clarify. Can we assert this somehow, perhaps upon entry to kernel mode while handling an interrupt?

There is no easy place for such assertion. We have many assembler trampolines (one for each block of 32 interrupts, two for each exception, specific handlers for vectored, timer, and shootdown IPIs) which call into individual C handlers.

The only place where it is relatively straightforward to add such assertion is at syscall entry, where we can assert that mask == NOMASK, since we either came from usermode or were context-switched.

I will think about your proposal more, but right now I do not have a good answer to it.

sys/amd64/amd64/mp_machdep.c
654 ↗	(On Diff #73818)	Yes, this should be true.

Edited comments.

This revision now requires review to proceed.Jul 6 2020, 7:56 PM

Harbormaster completed remote builds in B32168: Diff 74134.Jul 6 2020, 7:56 PM

kib retitled this revision from am64 pmap: microoptimize local shootdowns for PCID PTI configurations to amd64 pmap: microoptimize local shootdowns for PCID PTI configurations.Jul 6 2020, 7:57 PM

alc added inline comments.Jul 7 2020, 6:33 PM

sys/amd64/amd64/pmap.c
2770–2771 ↗	(On Diff #74134)	Can't this "if" statement be moved below, outside the enclosing "if" statement? That would eliminate a duplicate "if" statement in the below "else".
8955 ↗	(On Diff #74134)	"so the IPI is delayed until after ..."

Simplify pmap_invalidate_all_pcid().
Grammar in the comment.

Harbormaster completed remote builds in B32185: Diff 74167.Jul 7 2020, 7:07 PM

markj accepted this revision.Jul 9 2020, 4:38 PM

This revision is now accepted and ready to land.Jul 9 2020, 4:38 PM

I would encourage you to create a followup patch that provides an overview of the various TLB management implementations. I've got time now to help with refining the text if you can generate a first draft.

sys/amd64/amd64/mp_machdep.c
625 ↗	(On Diff #74167)	Everywhere else you write this as `smp_tlb_pmap == PCPU_GET(curpmap)`.
628 ↗	(On Diff #74167)	The variable "ucr3" no longer serves to simplify this code, as it is only used once. (The below function, invlpg_invpcid_handler(), similarly does without a "ucr3" variable.)

kib marked 2 inline comments as done.Jul 10 2020, 9:31 AM

kib added inline comments.

sys/amd64/amd64/mp_machdep.c
628 ↗	(On Diff #74167)	kcr3 is also not too much needed. I kept pcid since otherwith load_cr3() line becomes too unreadable.

Simplify invltlb_pcid_handler().

This revision now requires review to proceed.Jul 10 2020, 9:34 AM

Harbormaster completed remote builds in B32224: Diff 74271.Jul 10 2020, 9:34 AM

In D25483#566493, @alc wrote:

I would encourage you to create a followup patch that provides an overview of the various TLB management implementations. I've got time now to help with refining the text if you can generate a first draft.

I will do this.

Currently Peter finished testing D25510, and Mark already reviewed that patch. I will commit that first, then rebase D25483 (this one, the patches extensively conflict), and work on the overview in parallel.

In D25483#566623, @kib wrote:

In D25483#566493, @alc wrote:

I would encourage you to create a followup patch that provides an overview of the various TLB management implementations. I've got time now to help with refining the text if you can generate a first draft.

I will do this.

Currently Peter finished testing D25510, and Mark already reviewed that patch. I will commit that first, then rebase D25483 (this one, the patches extensively conflict), and work on the overview in parallel.

Okay, then I will look at D25510 in the next 24 hours.

Rebase after IPI commit.

Harbormaster completed remote builds in B32333: Diff 74460.Jul 14 2020, 8:52 PM

This revision was not accepted when it landed; it landed in state Needs Review.Jul 18 2020, 6:20 PM

Closed by commit rS363311: amd64 pmap: microoptimize local shootdowns for PCID PTI configurations (authored by kib). · Explain Why

This revision was automatically updated to reflect the committed changes.

kib added a commit: rS363311: amd64 pmap: microoptimize local shootdowns for PCID PTI configurations.

alc added inline comments.Jul 19 2020, 12:57 AM

head/sys/amd64/amd64/exception.S
529	This is unrelated to the original purpose of the change, but I was reading the code, so ... Why do this store unconditionally? Why not save only after the below "je" and restore immediately after the move to %cr3? Then, you wouldn't need the SCRATCH_RAX store and load after IDTVEC(fast_syscall).
539	Reiterating the above, this can be deleted.
550	And, this can be deleted.

kib marked 3 inline comments as done.Jul 19 2020, 9:55 AM

kib added inline comments.

head/sys/amd64/amd64/exception.S
529	D25722

amd64 pmap: microoptimize local shootdowns for PCID PTI configurations
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 74630

head/sys/amd64/amd64/exception.S

head/sys/amd64/amd64/genassym.c

head/sys/amd64/amd64/machdep.c

head/sys/amd64/amd64/mp_machdep.c

head/sys/amd64/amd64/pmap.c

head/sys/amd64/include/pcpu.h

head/sys/amd64/include/pmap.h

amd64 pmap: microoptimize local shootdowns for PCID PTI configurationsClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 74630

head/sys/amd64/amd64/exception.S

head/sys/amd64/amd64/genassym.c

head/sys/amd64/amd64/machdep.c

head/sys/amd64/amd64/mp_machdep.c

head/sys/amd64/amd64/pmap.c

head/sys/amd64/include/pcpu.h

head/sys/amd64/include/pmap.h

amd64 pmap: microoptimize local shootdowns for PCID PTI configurations
ClosedPublic
Actions

Revision Contents
Changeset List