Page MenuHomeFreeBSD

New, experimental PMAP implementation for MIPS64
AbandonedPublic

Authored by sbruno on May 13 2015, 3:16 PM.
Tags
Referenced Files
Unknown Object (File)
Thu, Jan 23, 12:42 PM
Unknown Object (File)
Fri, Jan 17, 5:43 PM
Unknown Object (File)
Thu, Jan 9, 1:20 PM
Unknown Object (File)
Dec 18 2024, 6:27 PM
Unknown Object (File)
Dec 10 2024, 5:54 PM
Unknown Object (File)
Dec 10 2024, 5:45 PM
Unknown Object (File)
Nov 25 2024, 5:18 AM
Unknown Object (File)
Nov 19 2024, 8:09 AM

Details

Reviewers
sson
Commits
rS362898: MFC r362379:
Summary

Overview

This new PMAP implementation is based on the AMD64 PMAP and brings in many of the features/changes of that code including superpages, PV chunk and list locking, etc. for MIPS64. In addition, it adds per Page Table Entry (PTE) referenced bit emulation, better "Machine Check" exception handling/recovery, and uses a large page size (16K) for the kernel thread stack. The new pmap implementation for MIPS64 is enabled at compile time by adding "options MIPS64_NEW_PMAP" to the kernel config file. The larger kernel thread stack is enabled by adding "options KSTACK_LARGE_PAGE".

Additional Software PTE Bits

To support referenced bit emulation and superpages some of the unused bits in the MIPS64 PTE are used:

Bit 59: Software Valid Bit (PTE_SV)
Bits 56-58: Page Size Index (PTE_PS_16K, PTE_PS_64K, PTE_PS_256K, PTE_PS_1M, PTE_PS_4M, PTE_PS_16M, or PTE_PS_64M)

See sys/mips/include/pte.h for more information.

Per-PTE Referenced Bit Emulation

The hardware valid bit is repurposed as a referenced bit. Managed PTE's are created with the hardware valid bit cleared but with the software valid bit set. On the TLB exception the software valid bit is checked and, if set, the hardware valid/referenced bit is set in the TLB and page table for the entry. Therefore, the hardware valid bit is now a per-PTE referenced bit and the missing PMAP features that required a referenced bit are now supported.

Automatic Promotion of Superpages

The page size index bits in the PTE indicate the size of page (from 4K to 64M). Since only three bits are used the 1K and 256M MIPS pages sizes are not represented. Currently only the 1M page size is used. The page size index may be easily converted into a page mask for the page mask register by doing the following: (((1 << ((page_size_idx) << 1)) - 1) << TLBMASK_SHIFT) where TLBMASK_SHIFT is 12.

On MIPS64 the 2MB superpages are actually an even and odd pair of 1M pages that act as a single 2MB superpage mapping. This allows the VM layer to believe it is just a single 2M superpage and, therefore, it does not have to deal with the way the TLB is implemented on the MIPS64 in a special machine-depedent way. The 4K pages are still mapped in the TLB as before (i.e. even and odd contiguous pages in virtual memory share a single TLB entry in hardware but may map non-contiguous physical memory.)

Automatic promotion of superpages can be enabled by setting the tunable "vm.pmap.pg_ps_enabled" to a non-zero value. By default, automatic promotion is disabled because the superpage support is not completely stable. Further testing and debugging is needed before it is enabled by default. Some preliminary results have been obtained, however:

GUPS benchmark: GUPS (Giga Updates Per Second) measures how frequently system can issue updates to randomly generated memory locations in large allocations.

CPU time used: 69.958790 seconds โ€”> 43.632812 seconds (37.6% improvement)
GUP/s: 0.001919604 Billion(10^9) Updates per second [GUP/s] โ€”> 0.003074518 Billion(10^9) Updates per second [GUP/s] (60.2% improvement)

Kernel Build: Compile and build of the FreeBSD MIPS64 Kernel on native hardware in 4.2% less time. (vm.pmap.pde.demotions: 134 vm.pmap.pde.mappings: 577 vm.pmap.pde.p_failures: 3611 vm.pmap.pde.promotions: 2386)

Both results were obtained on an Ubiquiti EdgeRouter Lite running FreeBSD-Current, my reference platform.

Diff Detail

Event Timeline

sson retitled this revision from to New, experimental PMAP implementation for MIPS64.
sson updated this object.
sson edited the test plan for this revision. (Show Details)
sson set the repository for this revision to rS FreeBSD src repository - subversion.
sson added a project: MIPS.
sson added a subscriber: Unknown Object (MLST).
sbruno added a reviewer: sson.
sbruno added a subscriber: sbruno.

Going to update this to be applicable to HEAD after some build test. Some of this review has been committed by various folks.

Update review to head.

Build of libkvm fails however as pte.h changes are causing asserts to
fire. I suspect that a simple change is required.

Currently fails to buildworld due to changes to pte.h:

===> lib/libkvm (obj,all,install)
cc -isystem /var/tmp/mips.mips64/home/sbruno/fbsd_head/tmp/usr/include -L/var/tmp/mips.mips64/home/sbruno/fbsd_head/tmp/usr/lib -B/var/tmp/mips.mips64/home/sbruno/fbsd_head/tmp/usr/lib --sysroot=/var/tmp/mips.mips64/home/sbruno/fbsd_head/tmp -B/var/tmp/mips.mips64/home/sbruno/fbsd_head/tmp/usr/bin  -O -pipe -DLIBC_SCCS -I/home/sbruno/fbsd_head/lib/libkvm -G0   -MD  -MF.depend.kvm_minidump_mips.o -MTkvm_minidump_mips.o -std=gnu99 -Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wno-uninitialized -Wno-pointer-sign     -c /home/sbruno/fbsd_head/lib/libkvm/kvm_minidump_mips.c -o kvm_minidump_mips.o
In file included from /home/sbruno/fbsd_head/lib/libkvm/kvm_minidump_mips.c:49:
/home/sbruno/fbsd_head/lib/libkvm/kvm_mips.h:85: error: size of array '__assert_7' is negative
*** Error code 1

Pull in CHERI change to fix compile time errors. Update and compile
tested world/kernel via mips64 target and MALTA64 under qemu.

ref:
https://github.com/CTSRD-CHERI/cheribsd/commit/ca8924546faff411d2539c4cd30bd15aaa232824#diff-1f37a6f8b69678b7700abccf0d51facb

Fix mips32 build.

Boot test kernel/world via MALTA under QEMU is successful

Copy/paste a couple of quick functions from pmap.c to pmap_mips64.c
and now the new option compiles and boots under MALTA64

lib/libkvm/kvm_mips.h
52

In this and the PFN mask changes in the kernel headers, I really wonder whether we can safely make these changes. I understand they give us more upper software bits, and that on CHERI these are safe because the software and hardware are harmonized here, but are we quite sure that there's no real hardware where the physical address layout might require being able to decode these bits of the PFN?

sys/conf/options.mips
103

Can we get a timeline on deprecation of the old pmap, and note that the option is temporary?

115

I would really rather just see us only use large kernel stack pages. It just makes more sense. I don't think the option here is healthy for us. Do we really support hardware which doesn't use PageMask?

sys/mips/include/pmap.h
184

Are these ifdefs correct? I am having a hard time making sense of what changed here and why.

sys/mips/include/pte.h
82

Again, really not sure we can safely do this on all systems. Robert is right to be cautious.

sys/mips/include/vmparam.h
109

I'm surprised by this, since there seems to half-be infrastructure for other levels. What's the plan here?

sys/mips/mips/pmap.c
812

pte_is_valid or whatever here, as was done elsewhere? Consistency would help get over all the PTE bit renaming.

3459

So why all the new PTE fields? Again, confused.

sys/mips/mips/tlb.c
43

Necessary?

sys/mips/mips/trap.c
646

Can you explain to me why this happens with the new pmap? Is it related to superpages somehow? It seems worrying. Again, I must be missing something?

sys/vm/vm_glue.c
315

This is a good change to make stack allocation a bit more semantic, flexible, clear, etc. It's also very self-contained, and I'd suggest it be reviewed and committed separately.

I committed the pagemask enumeration/printing logic to -head a while ago. We should gather some dmesg's from various embedded MIPS boards.

IIRC, some of the mips24kc parts were only doing 4k pages.. :(

I committed the pagemask enumeration/printing logic to -head a while ago. We should gather some dmesg's from various embedded MIPS boards.

IIRC, some of the mips24kc parts were only doing 4k pages.. :(

This superpage/new pmap implementation really only applies to mips64. To support superpages on mips32 the page table may need to be changed a bit. Currently there isn't enough room in the PTE for the additional bits needed. e.g, bit for reference bit emulation (reference/valid bit), superpage flag, etc.

FYI, I do have a patch kicking around for large kernel stack support that uses multiple wired TLB entries that might be useful for mips32 with only 4k page support. Of course, I don't know if blowing the kernel stack (e.g. using NFS) is a problem for mips32 or not. Brooks had it committed to CheriBSD at one time so I know it is in that repository.

Seems this patch is a lot more "clang friendly" as the MIPS64 clang compiled kernel can now at least *get* to trying to start init:

sbruno@tasty.ysv:~ % qemu-system-mips64 -m 512M -M malta -kernel /var/tmp/mips.mips64/home/sbruno/mips64-clang/sys/MALTA64/kernel -hda ./mips64_clang.img -nographic
WARNING: Image format was not specified for './mips64_clang.img' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions.
entry: platform_start()
cmd line: /var/tmp/mips.mips64/home/sbruno/mips64-clang/sys/MALTA64/kernel  
envp:
        memsize = 268435456
        ememsize = 536870912
        modetty0 = 38400n8r
memsize = 268435456 (0x10000000)
ememsize = 536870912
Cache info:
  icache is virtual
  picache_stride    = 8192
  picache_loopcount = 4
  pdcache_stride    = 4096
  pdcache_loopcount = 8
cpu0: MIPS Technologies processor v160.130
  MMU: Standard TLB, 48 entries (4K 16K 64K 256K 1M 4M 16M 64M 256M pg sizes)
(4K 16K 64K 256K 1M 16M 64M 256M pg sizes)
  L1 i-cache: 4 ways of 256 sets, 32 bytes per line
  L1 d-cache: 4 ways of 256 sets, 32 bytes per line
  L2 cache: disabled
  Config1=0xdea3519b<PerfCount,WatchRegs,EJTAG,FPU>
  Config2=0x80000000
Physical memory chunk(s):
0x72a000 - 0xfffffff, 260923392 bytes (63702 pages)
0x90000000 - 0x9fffffff, 268435456 bytes (65536 pages)
Maxmem is 0xa0000000
KDB: debugger backends: ddb
KDB: current backend: ddb
Copyright (c) 1992-2016 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 12.0-CURRENT #9 r305890M: Fri Sep 16 22:52:02 UTC 2016
    sbruno@tasty.ysv.freebsd.org:/var/tmp/mips.mips64/home/sbruno/mips64-clang/sys/MALTA64 mips
clang version 4.0.0 (trunk 281608)
Preloaded elf kernel "kernel" at 0xffffffff8071e100.
real memory  = 536870912 (524288K bytes)
Physical memory chunk(s):
0x00849000 - 0x0fffffff, 259747840 bytes (63415 pages)
0x90000000 - 0x9f2b8fff, 254513152 bytes (62137 pages)
avail memory = 511668224 (487MB)
random: entropy device external interface
null: <full device, null device, zero device>
mem: <memory>
nfslock: pseudo-device
nexus0: <MIPS32 root nexus>
random: harvesting attach, 8 bytes (4 bits) from nexus0
gt0: <GT64120 chip> on nexus0
pcib0: <GT64120 PCI bridge> on gt0
pci0: <PCI bus> on pcib0
pci0: domain=0, physical bus=0
found-> vendor=0x11ab, dev=0x4620, revid=0x10
        domain=0, bus=0, slot=0, func=0
        class=06-00-00, hdrtype=0x00, mfdev=0
        cmdreg=0x0000, statreg=0x0280, cachelnsz=0 (dwords)
        lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
        intpin=a, irq=0
        map[14]: type Prefetchable Memory, range 32, base 0x1000000, size 24, memory disabled
        map[18]: type Memory, range 32, base 0x1c000000, size 26, memory disabled
        map[1c]: type Memory, range 32, base 0x1f000000, size 24, memory disabled
        map[20]: type Memory, range 32, base 0x14000000, size 26, memory disabled
        map[24]: type I/O Port, range 32, base 0x14000000, size 26, port disabled
pcib0: no IRQ mapping for 0/0/0/1
found-> vendor=0x8086, dev=0x7110, revid=0x00
        domain=0, bus=0, slot=10, func=0
        class=06-01-00, hdrtype=0x00, mfdev=1
        cmdreg=0x0000, statreg=0x0200, cachelnsz=0 (dwords)
        lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
found-> vendor=0x8086, dev=0x7111, revid=0x00
        domain=0, bus=0, slot=10, func=1
        class=01-01-80, hdrtype=0x00, mfdev=0
        cmdreg=0x0000, statreg=0x0280, cachelnsz=0 (dwords)
        lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
        map[20]: type I/O Port, range 32, base 0, size  4, port disabled
found-> vendor=0x8086, dev=0x7112, revid=0x01
        domain=0, bus=0, slot=10, func=2
        class=0c-03-00, hdrtype=0x00, mfdev=0
        cmdreg=0x0000, statreg=0x0000, cachelnsz=0 (dwords)
        lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
        intpin=d, irq=0
        map[20]: type I/O Port, range 32, base 0, size  5, port disabled
pcib0: no IRQ mapping for 0/10/2/4
found-> vendor=0x8086, dev=0x7113, revid=0x03
        domain=0, bus=0, slot=10, func=3
        class=06-80-00, hdrtype=0x00, mfdev=0
        cmdreg=0x0000, statreg=0x0280, cachelnsz=0 (dwords)
        lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
        intpin=a, irq=0
pcib0: no IRQ mapping for 0/10/3/1
found-> vendor=0x1022, dev=0x2000, revid=0x10
        domain=0, bus=0, slot=11, func=0
        class=02-00-00, hdrtype=0x00, mfdev=0
        cmdreg=0x0000, statreg=0x0280, cachelnsz=0 (dwords)
        lattimer=0x00 (0 ns), mingnt=0x06 (1500 ns), maxlat=0xff (63750 ns)
        intpin=a, irq=0
        map[10]: type I/O Port, range 32, base 0, size  5, port disabled
        map[14]: type Memory, range 32, base 0, size  5, memory disabled
found-> vendor=0x1013, dev=0x00b8, revid=0x00
        domain=0, bus=0, slot=18, func=0
        class=03-00-00, hdrtype=0x00, mfdev=0
        cmdreg=0x0000, statreg=0x0000, cachelnsz=0 (dwords)
        lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
        map[10]: type Prefetchable Memory, range 32, base 0, size 25, memory disabled
        map[14]: type Memory, range 32, base 0, size 12, memory disabled
obio0 irq 0 at device 0.0 on pci0
uart0: <16550 or compatible> on obio0
uart0: console (9600,n,8,1)
uart0: fast interrupt
uart0: PPS capture mode: DCD
random: harvesting attach, 8 bytes (4 bits) from uart0
random: harvesting attach, 8 bytes (4 bits) from obio0
pci0: <bridge, PCI-ISA> at device 10.0 (no driver attached)
atapci0: <Intel PIIX4 UDMA33 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376 at device 10.1 on pci0
atapci0: Lazy allocation of 0x10 bytes rid 0x20 type 4 at 0x100
ata0: <ATA channel> at channel 0 on atapci0
random: harvesting attach, 8 bytes (4 bits) from ata0
ata1: <ATA channel> at channel 1 on atapci0
random: harvesting attach, 8 bytes (4 bits) from ata1
random: harvesting attach, 8 bytes (4 bits) from atapci0
pci0: <serial bus, USB> at device 10.2 (no driver attached)
pci0: <bridge> at device 10.3 (no driver attached)
le0: <AMD PCnet-PCI> irq 10 at device 11.0 on pci0
le0: Lazy allocation of 0x20 bytes rid 0x10 type 4 at 0x120
le0: 16 receive buffers, 4 transmit buffers
le0: bpf attached
le0: Ethernet address: 52:54:00:12:34:56
random: harvesting attach, 8 bytes (4 bits) from le0
vgapci0: <VGA-compatible display> at device 18.0 on pci0
random: harvesting attach, 8 bytes (4 bits) from vgapci0
random: harvesting attach, 8 bytes (4 bits) from pci0
random: harvesting attach, 8 bytes (4 bits) from pcib0
random: harvesting attach, 8 bytes (4 bits) from gt0
clock0: <Generic MIPS32 ticker> on nexus0
Timecounter "MIPS32" frequency 100000000 Hz quality 800
Event timer "MIPS32" frequency 100000000 Hz quality 800
random: harvesting attach, 8 bytes (4 bits) from clock0
Device configuration finished.
Timecounters tick every 10.000 msec
lo0: bpf attached
tcp_init: net.inet.tcp.tcbhashsize auto tuned to 8192
ata0: reset tp1 mask=03 ostat0=50 ostat1=00
ata0: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
ata0: stat1=0x00 err=0x00 lsb=0xff msb=0xff
ata0: reset tp2 stat0=50 stat1=00 devices=0x1
ata1: reset tp1 mask=03 ostat0=50 ostat1=00
ata1: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb
ata1: stat1=0x00 err=0x00 lsb=0xff msb=0xff
ata1: reset tp2 stat0=00 stat1=00 devices=0x10000
pass0 at ata0 bus 0 scbus0 target 0 lun 0
pass0: <QEMU HARDDISK 2.5+> ATA-7 device
pass0: Serial Number QM00001
pass0: 33.300MB/s transfers (UDMA2, PIO 8192bytes)
pass1 at ata1 bus 0 scbus1 target 0 lun 0
pass1: <QEMU QEMU DVD-ROM 2.5+> Removable CD-ROM SCSI device
pass1: Serial Number QM00003
pass1: 13.300MB/s transfers (WDMA1, ATAPI 12bytes, PIO 65534bytes)
ada0 at ata0 bus 0 scbus0 target 0 lun 0
ada0: <QEMU HARDDISK 2.5+> ATA-7 device
ada0: Serial Number QM00001
ada0: 33.300MB/s transfers (UDMA2, PIO 8192bytes)
ada0: 1024MB (2097152 512 byte sectors)
taskqgroup_adjust failed cnt: 1 stride: 1 mp_ncpus: 1 smp_started: 0
taskqgroup_adjust failed cnt: 1 stride: 1 mp_ncpus: 1 smp_started: 0
Trying to mount root from ufs:ada0 []...
GEOM: new disk cd0
GEOM: new disk ada0
cd0 at ata1 bus 0 scbus1 target 0 lun 0
cd0: <QEMU QEMU DVD-ROM 2.5+> Removable CD-ROM SCSI device
cd0: Serial Number QM00003
cd0: 13.300MB/s transfers (WDMA1, ATAPI 12bytes, PIO 65534bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present
warning: no time-of-day clock registered, system time will not be set accurately
start_init: trying /sbin/init
ADDRESS_SPACE_ERR: pid 1 tid 100001 (init), uid 0: pc 0x12002fa78 got a read fault (type 0x4) at 0xffffffff806e91f0
Trapframe Register Dump:
        zero: 0 at: 0xffffffff806e8510  v0: 0   v1: 0
        a0: 0x7fffffedf0        a1: 0x8 a2: 0   a3: 0x120110000
        a4: 0x3 a5: 0   a6: 0   a6: 0
        t0: 0   t1: 0   t2: 0   t3: 0
        t8: 0   t9: 0x12002fa08 s0: 0x7fffffede0        s1: 0x7fffffedc8
        s2: 0   s3: 0x2 s4: 0   s5: 0
        s6: 0   s7: 0   k0: 0   k1: 0
        gp: 0x120142de0 sp: 0x7fffffed50        s8: 0x7fffffed50        ra: 0x120005afc
        sr: 0x8084b3    mullo: 0x18     mulhi: 0        badvaddr: 0xffffffff806e91f0
        cause: 0x10     pc: 0x12002fa78
sys/mips/include/pte.h
82

I think this comment and the related change is unrelated to the new pmap code and shouldn't be merged.

sys/mips/include/pte.h
82

I assumed, perhaps wrongly (I have not bothered to do the math, nor exhaustively tried to wade through them all) that this was necessary to support adding all the new software bits.

jhb added inline comments.
sys/mips/include/pte.h
86

Is this used?

142

If we go back to using 55 for TLBLO_SWSHIFT_BITS, this comment needs to be updated (52 -> 54)

188

I wonder if we can accommodate the existing 55-bit TLB lo layout by shifting these bits all up by 2 and losing the currently unused 60/61 bits?

289

I think we should move the 'pte_cache_bits' that is now in upstream here as an inline as well.

608

Are these two still needed?

sys/mips/mips/cpu.c
411

The committed version of this in FreeBSD was moved out of the branch but I think it should be moved back in (it doesn't make sense if you don't have a TLB). Also, the upstream version is missing the printf for 4M.