Page MenuHomeFreeBSD

amd64 pmap: LA57 AKA 5-level paging
Needs ReviewPublic

Authored by kib on Jun 14 2020, 10:01 PM.

Details

Reviewers
alc
markj
jhb
grehan
gnn
Group Reviewers
bhyve
Summary

Since LA57 was moved to the main SDM document with revision 072, it seems that we should have a support for it, and silicons are coming.

This patch makes pmap support both LA48 and LA57 hardware. The selection of page table level is done at startup, kernel always receives control from loader with 4-level paging. It is not clear how UEFI spec would adapt LA57, for instance it could hand out control in LA57 mode sometimes.

To switch from LA48 to LA57 requires turning off long mode, requesting LA57 in CR4, then re-entering long mode. This is somewhat delicate and done in pmap_bootstrap_la57(). AP startup in LA57 mode is much easier, we only need to toggle a bit in CR4 and load right value in CR3.

I decided to not change kernel map for now. Single PML5 entry is created that points to the existing kernel_pml4 (KML4Phys) page, and a pml5 entry to create our recursive mapping for vtopte()/vtopde(). This decision is motivated by the fact that we cannot overcommit for KVA, so large space there is unusable until machines start providing wider physical memory addressing. Another reason is that I do not want to break our fragile autotuning, so the KVA expansion is not included into this first step.

On the other hand, (very) large address space is definitely immediately useful for some userspace applications.

For userspace, numbering of pte entries (or page table pages) is always done for 5-level structures even if we operate in 4-level mode. The pmap_is_la57() function is added to report the mode of the specified pmap, this is done not to allow simultaneous 4-/5-levels (which is not allowed by hw), but to accomodate for EPT which has separate level control and in principle might not allow 5-leve EPT despite x86 paging supports it. Anyway, it does not seems critical to have 5-level EPT support now.

elfcontrol and proccontrol allow to request or disable LA57 for specific binary, for ABI compat.

Bhyve, efirt, suspend/resume, and large map are adapted to LA57 but not tested.

PID              START                END PRT  RES PRES REF SHD FLAG TP PATH
 17           0x400000           0x426000 r-x   38   39   1   0 CN-- vn /bin/sh
 17           0x626000           0x629000 rw-    3    3   1   0 C--- df 
 17        0x800626000        0x800648000 r-x   34   36   2   0 CN-- vn /libexec/ld-elf.so.1
 17        0x800648000        0x80066b000 rw-   28   28   1   0 C--- df 
 17        0x80066b000        0x80066c000 r--    1    1   3   0 ---- dv 
 17        0x80066c000        0x800706000 rw-   50   50   1   0 C--- df 
 17        0x800848000        0x80084a000 rw-    2    2   1   0 CN-- df 
 17        0x80084a000        0x80087e000 r-x   52   55   2   0 CN-- vn /lib/libedit.so.7
 17        0x80087e000        0x800a7e000 ---    0    0   0   0 CN-- -- 
 17        0x800a7e000        0x800a80000 rw-    2    0   1   0 CN-- vn /lib/libedit.so.7
 17        0x800a80000        0x800a84000 rw-    1    1   1   0 CN-- df 
 17        0x800a84000        0x800c4f000 r-x  355  384   4   0 CN-- vn /lib/libc.so.7
 17        0x800c4f000        0x800e4e000 ---    0    0   0   0 CN-- -- 
 17        0x800e4e000        0x800e5d000 rw-   15    0   1   0 CN-- vn /lib/libc.so.7
 17        0x800e5d000        0x801087000 rw-   17   17   1   0 CN-- df 
 17        0x801087000        0x8010e0000 r-x   89   94   2   0 CN-- vn /lib/libncursesw.so.8
 17        0x8010e0000        0x8012df000 ---    0    0   0   0 CN-- -- 
 17        0x8012df000        0x8012e5000 rw-    6    0   1   0 CN-- vn /lib/libncursesw.so.8
 17        0x8012e5000        0x8018e5000 rw-   12   12   1   0 CN-- df 
 17   0xffffffdffff000   0xfffffffffdf000 ---    0    0   0   0 ---- -- 
 17   0xfffffffffdf000   0xfffffffffff000 rw-    6    6   1   0 C--D df 
 17   0xfffffffffff000  0x100000000000000 r-x    1    1   4   0 ---- ph

Tested by: pho (LA48 hw)

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 32492

Event Timeline

kib created this revision.Jun 14 2020, 10:01 PM
kib requested review of this revision.Jun 14 2020, 10:01 PM
kib edited the summary of this revision. (Show Details)Jun 14 2020, 10:06 PM
kib updated this revision to Diff 73792.Jun 27 2020, 10:50 PM
kib edited the summary of this revision. (Show Details)
kib removed a reviewer: gnn.

Handle wakeup.
Update description of ptepindex.

kib updated this revision to Diff 74014.Jul 2 2020, 9:08 AM

bhyve: Handle guest' LA57 paging mode.
This is in fact independent of the rest of the patch, since guest can set the bit in %cr4 at will.

Noted by: grehan

emaste added inline comments.Jul 2 2020, 1:40 PM
sys/amd64/amd64/elf_machdep.c
154

to me _checker seems a bit unclear. IMO boolean functions should answer a question, e.g. _is_la57 or _la57_supported or _la57_wanted or such as appropriate

sys/sys/elf_common.h
799

I assume that similar changes will come to other archs, e.g. for Arm 48 / 52. If we're going to offer similar control there is there a more MI name we could use that's applicable everywhere (even if in the Arm case LA48 would still apply)? I don't have a great idea though; things incorporating "smaller" or "legacy" or whatnot are all relative to something else, and a term that stands alone is preferable.

kib added inline comments.Jul 2 2020, 2:06 PM
sys/amd64/amd64/elf_machdep.c
154

freebsd_brand_info_la57_img_compat ?

sys/sys/elf_common.h
799

Are you referring to ARM 8.2 'large VA' ? From what I remember, they do it by increasing page size to 64k (or doing something that is equivalent to that). I doubt that we ever would support such page size on arm64.

I was not able to find an extension in up to 8.6 that would increased the page table levels.

Still, if you have a proposal to rename the bit, I will apply it of course. I cannot propose anything better than LA_GEN1.

emaste added inline comments.Jul 2 2020, 2:20 PM
sys/amd64/amd64/elf_machdep.c
154

sounds good

sys/sys/elf_common.h
799

Ah, yes, so Intel only for the time being.

LA_GEN1 is a fine name if we think it will indeed become MI in the future, but probably unnecessary.

kib marked 2 inline comments as done.Jul 2 2020, 2:51 PM
grehan added inline comments.Jul 6 2020, 7:01 AM
sys/amd64/vmm/intel/vmx.c
1872

The same change is needed in usr.sbin/bhyve/gdb.c:guest_paging_info()

kib updated this revision to Diff 74104.Jul 6 2020, 7:29 AM
kib marked an inline comment as done.

Handle bhyve/gdb.c

grehan accepted this revision.Jul 6 2020, 7:42 AM

bhyve bits look fine.

kib added a subscriber: pho.Sat, Jul 18, 9:19 PM
kib updated this revision to Diff 74807.Wed, Jul 22, 5:11 PM

Fix initialization of sv_sigcode_base/sv_timekeep_base for LA48 sv sysent.

kib edited the summary of this revision. (Show Details)Wed, Jul 22, 5:11 PM