Overview
This new PMAP implementation is based on the AMD64 PMAP and brings in many of the features/changes of that code including superpages, PV chunk and list locking, etc. for MIPS64. In addition, it adds per Page Table Entry (PTE) referenced bit emulation, better "Machine Check" exception handling/recovery, and uses a large page size (16K) for the kernel thread stack. The new pmap implementation for MIPS64 is enabled at compile time by adding "options MIPS64_NEW_PMAP" to the kernel config file. The larger kernel thread stack is enabled by adding "options KSTACK_LARGE_PAGE".
Additional Software PTE Bits
To support referenced bit emulation and superpages some of the unused bits in the MIPS64 PTE are used:
Bit 59: Software Valid Bit (PTE_SV)
Bits 56-58: Page Size Index (PTE_PS_16K, PTE_PS_64K, PTE_PS_256K, PTE_PS_1M, PTE_PS_4M, PTE_PS_16M, or PTE_PS_64M)
See sys/mips/include/pte.h for more information.
Per-PTE Referenced Bit Emulation
The hardware valid bit is repurposed as a referenced bit. Managed PTE's are created with the hardware valid bit cleared but with the software valid bit set. On the TLB exception the software valid bit is checked and, if set, the hardware valid/referenced bit is set in the TLB and page table for the entry. Therefore, the hardware valid bit is now a per-PTE referenced bit and the missing PMAP features that required a referenced bit are now supported.
Automatic Promotion of Superpages
The page size index bits in the PTE indicate the size of page (from 4K to 64M). Since only three bits are used the 1K and 256M MIPS pages sizes are not represented. Currently only the 1M page size is used. The page size index may be easily converted into a page mask for the page mask register by doing the following: (((1 << ((page_size_idx) << 1)) - 1) << TLBMASK_SHIFT) where TLBMASK_SHIFT is 12.
On MIPS64 the 2MB superpages are actually an even and odd pair of 1M pages that act as a single 2MB superpage mapping. This allows the VM layer to believe it is just a single 2M superpage and, therefore, it does not have to deal with the way the TLB is implemented on the MIPS64 in a special machine-depedent way. The 4K pages are still mapped in the TLB as before (i.e. even and odd contiguous pages in virtual memory share a single TLB entry in hardware but may map non-contiguous physical memory.)
Automatic promotion of superpages can be enabled by setting the tunable "vm.pmap.pg_ps_enabled" to a non-zero value. By default, automatic promotion is disabled because the superpage support is not completely stable. Further testing and debugging is needed before it is enabled by default. Some preliminary results have been obtained, however:
GUPS benchmark: GUPS (Giga Updates Per Second) measures how frequently system can issue updates to randomly generated memory locations in large allocations.
CPU time used: 69.958790 seconds โ> 43.632812 seconds (37.6% improvement)
GUP/s: 0.001919604 Billion(10^9) Updates per second [GUP/s] โ> 0.003074518 Billion(10^9) Updates per second [GUP/s] (60.2% improvement)
Kernel Build: Compile and build of the FreeBSD MIPS64 Kernel on native hardware in 4.2% less time. (vm.pmap.pde.demotions: 134 vm.pmap.pde.mappings: 577 vm.pmap.pde.p_failures: 3611 vm.pmap.pde.promotions: 2386)
Both results were obtained on an Ubiquiti EdgeRouter Lite running FreeBSD-Current, my reference platform.