Page Menu
Home
FreeBSD
Search
Configure Global Search
Log In
Files
F144211358
D25815.id79332.diff
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Flag For Later
Award Token
Size
7 KB
Referenced Files
None
Subscribers
None
D25815.id79332.diff
View Options
Index: sys/amd64/amd64/pmap.c
===================================================================
--- sys/amd64/amd64/pmap.c
+++ sys/amd64/amd64/pmap.c
@@ -2695,23 +2695,138 @@
#ifdef SMP
/*
- * For SMP, these functions have to use the IPI mechanism for coherence.
+ * The amd64 pmap uses different approaches to TLB invalidation
+ * depending on the kernel configuration, available hardware features,
+ * and known hardware errata. For SMP, immediate invalidations have
+ * to use the IPI mechanism for TLB coherence.
+
+ * The high operational impact configuration option is PTI, which is
+ * enabled automatically on affected Intel CPUs. The hardware
+ * features are mainly PCID, and then INVPCID instruction
+ * presence. PCID usage is quite different for PTI vs. non-PTI.
*
- * N.B.: Before calling any of the following TLB invalidation functions,
- * the calling processor must ensure that all stores updating a non-
- * kernel page table are globally performed. Otherwise, another
- * processor could cache an old, pre-update entry without being
- * invalidated. This can happen one of two ways: (1) The pmap becomes
- * active on another processor after its pm_active field is checked by
- * one of the following functions but before a store updating the page
- * table is globally performed. (2) The pmap becomes active on another
- * processor before its pm_active field is checked but due to
- * speculative loads one of the following functions stills reads the
- * pmap as inactive on the other processor.
- *
- * The kernel page table is exempt because its pm_active field is
- * immutable. The kernel page table is always active on every
- * processor.
+ * * Kernel Page Table Isolation (PTI or KPTI) is used to mitigate
+ * Meltdown bug in some Intel CPUs. Under PTI , each user address
+ * space is served by two page tables, user and kernel. The user
+ * page table only maps user space and a kernel trampoline. The
+ * kernel trampoline includes the entirety of the kernel text but
+ * only the kernel data that is needed to switch from user to kernel
+ * mode. The kernel page table maps the user and kernel address
+ * spaces in their entirety. It is identical to the per-process
+ * page table allocated in non-PTI mode.
+ *
+ * Note that user space part of the kernel page tables is used for
+ * copyout(9) and needs to maintain TLB coherency. User page tables
+ * are only used when CPU is in user mode, and some invalidations
+ * can be postponed until the switch from kernel mode to user mode.
+ *
+ * Presence of the usermode pagetable for the given pmap is indicated
+ * by pm_ucr3 value different from PMAP_NO_CR3, in which case it contains
+ * the %cr3 register value for user mode page tables root.
+ *
+ * * The pm_active bitmask indicates which CPUs have pmap active
+ * currently, the bit is set on context switch to, and cleared on
+ * switching off this CPU. For kernel page table, pm_active field
+ * is immutable and contains all CPUs. The kernel page table is
+ * always logically active on every processor, but not necessarily
+ * present in hardware, e.g. in PTI mode.
+ *
+ * When requesting invalidation of virtual addresses with
+ * pmap_invalidate_XXX() functions, pmap sends shootdown IPIs to all
+ * CPUs recorded in pm_active. The pm_active updates are not
+ * synchronized and its reading is necessary racy. Shootdown
+ * handlers are prepared to handle the race.
+ *
+ * * PCID is an optional feature of the long mode x86 MMU where TLB
+ * entries are tagged with the 'Process ID' of the address space
+ * they belong to. PCID provides limited namespace for process
+ * identifiers, 12 bits, 4095 simultaneous IDs total.
+ *
+ * Allocation of PCID to pmap is done by an algorithm described in
+ * the book of Vahalia' "Unix Internals" section 15.12 "Other TLB
+ * Consistency Algorithms". PCID cannot be allocated for the whole
+ * pmap lifetime in pmap_pinit() due to the limited namespace.
+ * Instead, a per-CPU, per-pmap PCID is assigned when CPU is about
+ * to start caching TLB entries from a pmap, i.e. on the context
+ * switch which activates the pmap on the CPU.
+ *
+ * The PCID allocator maintains a per-CPU, per-pmap generation count
+ * pm_gen which is incremented each time a new PCID is allocated.
+ * On invalidation, the generation counters for the pmap is zeroed,
+ * which signals the context switch code that already allocated PCID
+ * is no longer valid. The implication is the TLB shootdown for the
+ * given cpu/address space, due to the allocation of new PCID.
+ * Zeroing can be performed remotely.
+ *
+ * * PTI + PCID. The available PCIDs are divided into two sets: PCIDs for
+ * complete (kernel) page tables, and PCIDs for usermode page tables.
+ * User PCID value is obtained from the kernel PCID value by setting the
+ * highest bit 11 to 1 (0x800 == PMAP_PCID_USER_PT).
+ *
+ * Userspace page tables are activated on return to usermode, by loading
+ * pm_ucr3 into %cr3. If the PCPU(ucr3_load_mask) requests clearing the
+ * bit 63 of loaded ucr3, this effectively causes total invalidation of
+ * the usermode TLB. If ucr3_load_mask is set, then local invalidations
+ * of individual pages in user page table are skipped.
+ *
+ * * Local invalidation, all modes. If requested invalidation of
+ * specific address or total invalidation for pmap that is currently
+ * active, pmap explicitly flushes TLB using INVTLB for kernel page
+ * table, and INVPCID(INVPCID_CTXGLOB)/invltlb_glob().
+ *
+ * If INVPCID instruction is available, it is used to flush entries
+ * from kernel page table.
+ *
+ * * mode: PTI disabled, PCID present. Kernel reserves PCID 0 for its
+ * address space, all other 4095 PCIDs are used for usermode spaces
+ * as described above. Context switch allocates new PCID if
+ * recorded pcid is zero or recorded generation does not match CPU
+ * generation, effectively flushing TLB for this address space.
+ * Total remote invalidation is performed by zeroing pm_gen for all CPUs.
+ * local user page: INVLPG
+ * local kernel page: INVLPG
+ * local user total: INVPCID(CTX)
+ * local kernel total: INVPCID(CTXGLOB) or invltlb_glob()
+ * remote user page inactive: zero pm_gen
+ * remote user page active: zero pm_gen + IPI:INVLPG
+ * remote kernel page: IPI:INVLPG
+ * remote user total inactive: zero pm_gen
+ * remote user total active: zero pm_gen + IPI:(INVPCID(CTX) or
+ * reload %cr3)
+ * remote kernel total: IPI:(INVPCID(CTXGLOB) or invltlb_glob())
+ *
+ * PTI enabled, PCID present.
+ * local user page: INVLPG for kpt, INVPCID(ADDR) or (INVLPG for ucr3)
+ * for upt
+ * local kernel page: INVLPG
+ * local user total: INVPCID(CTX) or reload %cr3 for kpt, clear PCID_SAVE
+ * on loading UCR3 into %cr3 for upt
+ * local kernel total: INVPCID(CTXGLOB) or invltlb_glob()
+ * remote user page inactive: zero pm_gen
+ * remote user page active: zero pm_gen + IPI:(INVLPG for kpt,
+ * INVPCID(ADDR) for upt)
+ * remote kernel page: IPI:INVLPG
+ * remote user total inactive: zero pm_gen
+ * remote user total active: zero pm_gen + IPI:(INVPCID(CTX) for kpt,
+ * clear PCID_SAVE on loading UCR3 into $cr3 for upt)
+ * remote kernel total: IPI:(INVPCID(CTXGLOB) or invltlb_glob())
+ *
+ * No PCID.
+ * local user page: INVLPG
+ * local kernel page: INVLPG
+ * local user total: reload %cr3
+ * local kernel total: invltlb_glob()
+ * remote user page inactive: -
+ * remote user page active: IPI:INVLPG
+ * remote kernel page: IPI:INVLPG
+ * remote user total inactive: -
+ * remote user total active: IPI:(reload %cr3)
+ * remote kernel total: IPI:invltlb_glob()
+ * Since on return to usermode, the reload of %cr3 with ucr3 causes
+ * TLB invalidation, no specific action is required for upt.
+ *
+ * EPT. EPT pmaps do not map KVA, all mappings are userspace.
+ * XXX TODO
*/
/*
File Metadata
Details
Attached
Mime Type
text/plain
Expires
Sat, Feb 7, 1:20 PM (2 h, 25 m)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
28456434
Default Alt Text
D25815.id79332.diff (7 KB)
Attached To
Mode
D25815: amd64 pmap: add comment explaining TLB invalidation modes.
Attached
Detach File
Event Timeline
Log In to Comment