Page MenuHomeFreeBSD

D25815.id74979.diff
No OneTemporary

D25815.id74979.diff

Index: sys/amd64/amd64/pmap.c
===================================================================
--- sys/amd64/amd64/pmap.c
+++ sys/amd64/amd64/pmap.c
@@ -2444,23 +2444,136 @@
#ifdef SMP
/*
- * For SMP, these functions have to use the IPI mechanism for coherence.
+ * Pmap operates in several modes of TLB invalidation, depending on
+ * the kernel configuration, and on the available hardware features
+ * and known errata. For SMP, immediate invalidations have to use the
+ * IPI mechanism for TLB coherence.
+
+ * The high-impact configuration option is PTI, which is enabled
+ * automatically on affected Intel CPUs. The hardware features are
+ * mainly PCID, and then INVPCID instruction presence. PCID usage is
+ * quite different for PCID vs. non-PCID.
*
- * N.B.: Before calling any of the following TLB invalidation functions,
- * the calling processor must ensure that all stores updating a non-
- * kernel page table are globally performed. Otherwise, another
- * processor could cache an old, pre-update entry without being
- * invalidated. This can happen one of two ways: (1) The pmap becomes
- * active on another processor after its pm_active field is checked by
- * one of the following functions but before a store updating the page
- * table is globally performed. (2) The pmap becomes active on another
- * processor before its pm_active field is checked but due to
- * speculative loads one of the following functions stills reads the
- * pmap as inactive on the other processor.
- *
- * The kernel page table is exempt because its pm_active field is
- * immutable. The kernel page table is always active on every
- * processor.
+ * * Kernel Page Table Isolation (PTI or KPTI) is a trick where each
+ * user address space is served by two page tables, user and kernel.
+ * User page table only maps user space and a trampoline is needed to
+ * switch from user to kernel mode (FreeBSD maps whole kernel text
+ * as well). Kernel page table maps all user and kernel space, and
+ * is the only page table allocated in non-PTI mode. PTI is used to
+ * mitigate Meltdown bug in some Intel CPUs.
+ *
+ * Note that user space part of the kernel page tables is used for
+ * copyout(9) and needs to maintain TLB coherency. User page tables
+ * are only used when CPU is in user mode, and some invalidations
+ * can be postponed until the switch from kernel mode to user mode.
+ *
+ * Presence of the usermode pagetable for the given pmap is indicated
+ * by pm_ucr3 value different from PMAP_NO_CR3, in which case it contains
+ * the %cr3 register value for user mode page tables root.
+ *
+ * * The pm_active bitmask indicates which CPUs have pmap active
+ * currently, the bit is set on context switch to, and cleared on
+ * switching off this CPU. For kernel page table, pm_active field
+ * is immutable and contains all CPUs. The kernel page table is
+ * always logically active on every processor, but not necessarily
+ * present in hardware, e.g. in PTI mode.
+ *
+ * When requesting invalidation of virtual addresses with
+ * pmap_invalidate_XXX() functions, pmap sends shootdown IPIs to all
+ * CPUs recorded in pm_active. The pm_active updates are not
+ * synchronized and its reading is necessary racy. Shootdown
+ * handlers are prepared to handle the race.
+ *
+ * * PCID is an optional feature of the long mode x86 MMU where TLB
+ * entries are tagged with the 'Process ID' of the address space
+ * they belong to. PCID provides limited namespace for process
+ * identifiers, 12 bits, 4095 simultaneous IDs total.
+ *
+ * Allocation of PCID to pmap is done by an algorithm described in
+ * the book of Vahalia' "Unix Internals" section 15.12 "Other TLB
+ * Consistency Algorithms". PCID cannot be allocated for the whole
+ * pmap lifetime in pmap_pinit() due to the limited namespace.
+ * Instead, a per-CPU, per-pmap PCID is assigned when CPU is about
+ * to start caching TLB entries from a pmap, i.e. on the context
+ * switch which activates the pmap on the CPU.
+ *
+ * The PCID allocator maintains a per-CPU, per-pmap generation count
+ * pm_gen which is incremented each time a new PCID is allocated.
+ * On invalidation, the generation counters for the pmap is zeroed,
+ * which signals the context switch code that already allocated PCID
+ * is no longer valid. The implication is the TLB shootdown for the
+ * given cpu/address space, due to the allocation of new PCID.
+ * Zeroing can be performed remotely.
+ *
+ * * PTI + PCID. The available PCIDs are divided into two sets: PCIDs for
+ * complete (kernel) page tables, and PCIDs for usermode page tables.
+ * User PCID value is obtained from the kernel PCID value by setting the
+ * highest bit 11 to 1 (0x800 == PMAP_PCID_USER_PT).
+ *
+ * Userspace page tables are activated on return to usermode, by loading
+ * pm_ucr3 into %cr3. If the PCPU(ucr3_load_mask) requests clearing the
+ * bit 63 of loaded ucr3, this effectively causes total invalidation of
+ * the usermode TLB. If ucr3_load_mask is set, then local invalidations
+ * of individual pages in user page table are skipped.
+ *
+ * * Local invalidation, all modes. If requested invalidation of
+ * specific address or total invalidation for pmap that is currently
+ * active, pmap explicitly flushes TLB using INVTLB for kernel page
+ * table, and INVPCID(INVPCID_CTXGLOB)/invltlb_glob().
+ *
+ * If INVPCID instruction is available, it is used to flush entries
+ * from kernel page table.
+ *
+ * * mode: PTI disabled, PCID present. Kernel reserves PCID 0 for its
+ * address space, all other 4095 PCIDs are used for usermode spaces
+ * as described above. Context switch allocates new PCID if
+ * recorded pcid is zero or recorded generation does not match CPU
+ * generation, effectively flushing TLB for this address space.
+ * Total remote invalidation is performed by zeroing pm_gen for all CPUs.
+ * local user page: INVLPG
+ * local kernel page: INVLPG
+ * local user total: INVPCID(CTX)
+ * local kernel total: INVPCID(CTXGLOB) or invltlb_glob()
+ * remote user page inactive: zero pm_gen
+ * remote user page active: zero pm_gen + IPI:INVLPG
+ * remote kernel page: IPI:INVLPG
+ * remote user total inactive: zero pm_gen
+ * remote user total active: zero pm_gen + IPI:(INVPCID(CTX) or
+ * reload %cr3)
+ * remote kernel total: IPI:(INVPCID(CTXGLOB) or invltlb_glob())
+ *
+ * PTI enabled, PCID present.
+ * local user page: INVLPG for kpt, INVPCID(ADDR) or (INVLPG for ucr3)
+ * for upt
+ * local kernel page: INVLPG
+ * local user total: INVPCID(CTX) or reload %cr3 for kpt, clear PCID_SAVE
+ * on loading UCR3 into %cr3 for upt
+ * local kernel total: INVPCID(CTXGLOB) or invltlb_glob()
+ * remote user page inactive: zero pm_gen
+ * remote user page active: zero pm_gen + IPI:(INVLPG for kpt,
+ * INVPCID(ADDR) for upt)
+ * remote kernel page: IPI:INVLPG
+ * remote user total inactive: zero pm_gen
+ * remote user total active: zero pm_gen + IPI:(INVPCID(CTX) for kpt,
+ * clear PCID_SAVE on loading UCR3 into $cr3 for upt)
+ * remote kernel total: IPI:(INVPCID(CTXGLOB) or invltlb_glob())
+ *
+ * No PCID.
+ * local user page: INVLPG
+ * local kernel page: INVLPG
+ * local user total: reload %cr3
+ * local kernel total: invltlb_glob()
+ * remote user page inactive: -
+ * remote user page active: IPI:INVLPG
+ * remote kernel page: IPI:INVLPG
+ * remote user total inactive: -
+ * remote user total active: IPI:(reload %cr3)
+ * remote kernel total: IPI:invltlb_glob()
+ * Since on return to usermode, the reload of %cr3 with ucr3 causes
+ * TLB invalidation, no specific action is required for upt.
+ *
+ * EPT. EPT pmaps do not map KVA, all mappings are userspace.
+ * XXX TODO
*/
/*

File Metadata

Mime Type
text/plain
Expires
Sun, Jun 7, 1:33 AM (13 h, 34 m)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
33790683
Default Alt Text
D25815.id74979.diff (7 KB)

Event Timeline