Page Menu
Home
FreeBSD
Search
Configure Global Search
Log In
Files
F100670437
D25815.diff
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Flag For Later
Award Token
Size
8 KB
Referenced Files
None
Subscribers
None
D25815.diff
View Options
diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
--- a/sys/amd64/amd64/pmap.c
+++ b/sys/amd64/amd64/pmap.c
@@ -2693,28 +2693,155 @@
invltlb_glob();
}
}
-#ifdef SMP
/*
- * For SMP, these functions have to use the IPI mechanism for coherence.
+ * The amd64 pmap uses different approaches to TLB invalidation
+ * depending on the kernel configuration, available hardware features,
+ * and known hardware errata. The kernel configuration option that
+ * has the greatest operational impact on TLB invalidation is PTI,
+ * which is enabled automatically on affected Intel CPUs. The most
+ * impactful hardware features are first PCID, and then INVPCID
+ * instruction presence. PCID usage is quite different for PTI
+ * vs. non-PTI.
*
- * N.B.: Before calling any of the following TLB invalidation functions,
- * the calling processor must ensure that all stores updating a non-
- * kernel page table are globally performed. Otherwise, another
- * processor could cache an old, pre-update entry without being
- * invalidated. This can happen one of two ways: (1) The pmap becomes
- * active on another processor after its pm_active field is checked by
- * one of the following functions but before a store updating the page
- * table is globally performed. (2) The pmap becomes active on another
- * processor before its pm_active field is checked but due to
- * speculative loads one of the following functions stills reads the
- * pmap as inactive on the other processor.
- *
- * The kernel page table is exempt because its pm_active field is
- * immutable. The kernel page table is always active on every
- * processor.
+ * * Kernel Page Table Isolation (PTI or KPTI) is used to mitigate
+ * the Meltdown bug in some Intel CPUs. Under PTI, each user address
+ * space is served by two page tables, user and kernel. The user
+ * page table only maps user space and a kernel trampoline. The
+ * kernel trampoline includes the entirety of the kernel text but
+ * only the kernel data that is needed to switch from user to kernel
+ * mode. The kernel page table maps the user and kernel address
+ * spaces in their entirety. It is identical to the per-process
+ * page table used in non-PTI mode.
+ *
+ * User page tables are only used when the CPU is in user mode.
+ * Consequently, some TLB invalidations can be postponed until the
+ * switch from kernel to user mode. In contrast, the user
+ * space part of the kernel page table is used for copyout(9), so
+ * TLB invalidations on this page table cannot be similarly postponed.
+ *
+ * The existence of a user mode page table for the given pmap is
+ * indicated by a pm_ucr3 value that differs from PMAP_NO_CR3, in
+ * which case pm_ucr3 contains the %cr3 register value for the user
+ * mode page table's root.
+ *
+ * * The pm_active bitmask indicates which CPUs currently have the
+ * pmap active. A CPU's bit is set on context switch to the pmap, and
+ * cleared on switching off this CPU. For the kernel page table,
+ * the pm_active field is immutable and contains all CPUs. The
+ * kernel page table is always logically active on every processor,
+ * but not necessarily in use by the hardware, e.g., in PTI mode.
+ *
+ * When requesting invalidation of virtual addresses with
+ * pmap_invalidate_XXX() functions, the pmap sends shootdown IPIs to
+ * all CPUs recorded as active in pm_active. Updates to and reads
+ * from pm_active are not synchronized, and so they may race with
+ * each other. Shootdown handlers are prepared to handle the race.
+ *
+ * * PCID is an optional feature of the long mode x86 MMU where TLB
+ * entries are tagged with the 'Process ID' of the address space
+ * they belong to. This feature provides a limited namespace for
+ * process identifiers, 12 bits, supporting 4095 simultaneous IDs
+ * total.
+ *
+ * Allocation of a PCID to a pmap is done by an algorithm described
+ * in section 15.12, "Other TLB Consistency Algorithms", of
+ * Vahalia's book "Unix Internals". A PCID cannot be allocated for
+ * the whole lifetime of a pmap in pmap_pinit() due to the limited
+ * namespace. Instead, a per-CPU, per-pmap PCID is assigned when
+ * the CPU is about to start caching TLB entries from a pmap,
+ * i.e., on the context switch that activates the pmap on the CPU.
+ *
+ * The PCID allocator maintains a per-CPU, per-pmap generation
+ * count, pm_gen, which is incremented each time a new PCID is
+ * allocated. On TLB invalidation, the generation counters for the
+ * pmap are zeroed, which signals the context switch code that the
+ * previously allocated PCID is no longer valid. Effectively,
+ * zeroing any of these counters triggers a TLB shootdown for the
+ * given CPU/address space, due to the allocation of a new PCID.
+ *
+ * Zeroing can be performed remotely. Consequently, if a pmap is
+ * inactive on a CPU, then a TLB shootdown for that pmap and CPU can
+ * be initiated by an ordinary memory access to reset the target
+ * CPU's generation count within the pmap. The CPU initiating the
+ * TLB shootdown does not need to send an IPI to the target CPU.
+ *
+ * * PTI + PCID. The available PCIDs are divided into two sets: PCIDs
+ * for complete (kernel) page tables, and PCIDs for user mode page
+ * tables. A user PCID value is obtained from the kernel PCID value
+ * by setting the highest bit, 11, to 1 (0x800 == PMAP_PCID_USER_PT).
+ *
+ * User space page tables are activated on return to user mode, by
+ * loading pm_ucr3 into %cr3. If the PCPU(ucr3_load_mask) requests
+ * clearing bit 63 of the loaded ucr3, this effectively causes
+ * complete invalidation of the user mode TLB entries for the
+ * current pmap. In which case, local invalidations of individual
+ * pages in the user page table are skipped.
+ *
+ * * Local invalidation, all modes. If the requested invalidation is
+ * for a specific address or the total invalidation of a currently
+ * active pmap, then the TLB is flushed using INVLPG for a kernel
+ * page table, and INVPCID(INVPCID_CTXGLOB)/invltlb_glob() for a
+ * user space page table(s).
+ *
+ * If the INVPCID instruction is available, it is used to flush entries
+ * from the kernel page table.
+ *
+ * * mode: PTI disabled, PCID present. The kernel reserves PCID 0 for its
+ * address space, all other 4095 PCIDs are used for user mode spaces
+ * as described above. A context switch allocates a new PCID if
+ * the recorded PCID is zero or the recorded generation does not match
+ * the CPU's generation, effectively flushing the TLB for this address space.
+ * Total remote invalidation is performed by zeroing pm_gen for all CPUs.
+ * local user page: INVLPG
+ * local kernel page: INVLPG
+ * local user total: INVPCID(CTX)
+ * local kernel total: INVPCID(CTXGLOB) or invltlb_glob()
+ * remote user page, inactive pmap: zero pm_gen
+ * remote user page, active pmap: zero pm_gen + IPI:INVLPG
+ * (Both actions are required to handle the aforementioned pm_active races.)
+ * remote kernel page: IPI:INVLPG
+ * remote user total, inactive pmap: zero pm_gen
+ * remote user total, active pmap: zero pm_gen + IPI:(INVPCID(CTX) or
+ * reload %cr3)
+ * (See note above about pm_active races.)
+ * remote kernel total: IPI:(INVPCID(CTXGLOB) or invltlb_glob())
+ *
+ * PTI enabled, PCID present.
+ * local user page: INVLPG for kpt, INVPCID(ADDR) or (INVLPG for ucr3)
+ * for upt
+ * local kernel page: INVLPG
+ * local user total: INVPCID(CTX) or reload %cr3 for kpt, clear PCID_SAVE
+ * on loading UCR3 into %cr3 for upt
+ * local kernel total: INVPCID(CTXGLOB) or invltlb_glob()
+ * remote user page, inactive pmap: zero pm_gen
+ * remote user page, active pmap: zero pm_gen + IPI:(INVLPG for kpt,
+ * INVPCID(ADDR) for upt)
+ * remote kernel page: IPI:INVLPG
+ * remote user total, inactive pmap: zero pm_gen
+ * remote user total, active pmap: zero pm_gen + IPI:(INVPCID(CTX) for kpt,
+ * clear PCID_SAVE on loading UCR3 into $cr3 for upt)
+ * remote kernel total: IPI:(INVPCID(CTXGLOB) or invltlb_glob())
+ *
+ * No PCID.
+ * local user page: INVLPG
+ * local kernel page: INVLPG
+ * local user total: reload %cr3
+ * local kernel total: invltlb_glob()
+ * remote user page, inactive pmap: -
+ * remote user page, active pmap: IPI:INVLPG
+ * remote kernel page: IPI:INVLPG
+ * remote user total, inactive pmap: -
+ * remote user total, active pmap: IPI:(reload %cr3)
+ * remote kernel total: IPI:invltlb_glob()
+ * Since on return to user mode, the reload of %cr3 with ucr3 causes
+ * TLB invalidation, no specific action is required for user page table.
+ *
+ * EPT. EPT pmaps do not map KVA, all mappings are userspace.
+ * XXX TODO
*/
+#ifdef SMP
/*
* Interrupt the cpus that are executing in the guest context.
* This will force the vcpu to exit and the cached EPT mappings
File Metadata
Details
Attached
Mime Type
text/plain
Expires
Sat, Oct 19, 4:00 AM (21 h, 38 m)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
14258787
Default Alt Text
D25815.diff (8 KB)
Attached To
Mode
D25815: amd64 pmap: add comment explaining TLB invalidation modes.
Attached
Detach File
Event Timeline
Log In to Comment