Page MenuHomeFreeBSD

i386 4/4G split
ClosedPublic

Authored by kib on Mar 9 2018, 7:35 PM.
Tags
None
Referenced Files
F80233252: D14633.id40938.diff
Fri, Mar 29, 1:09 PM
F80231172: D14633.id40687.diff
Fri, Mar 29, 12:13 PM
F80177661: D14633.id40793.diff
Thu, Mar 28, 9:53 PM
F80172480: D14633.diff
Thu, Mar 28, 8:42 PM
Unknown Object (File)
Tue, Mar 19, 3:40 PM
Unknown Object (File)
Tue, Mar 19, 3:03 PM
Unknown Object (File)
Sun, Mar 3, 10:56 PM
Unknown Object (File)
Sun, Mar 3, 10:16 AM
Subscribers

Details

Summary

The change makes the user and kernel address spaces on i386 independent, giving each the full 4G of usable virtual addresses except for one PDE at top used for trampoline and per-CPU trampoline stacks, and system structures that must be always mapped, namely IDT, GDT, common TSS and LDT, and process-private TSS and LDT if allocated.

By using 1:1 mapping for the kernel text and data, it appeared possible to eliminate assembler part of the locore.S which bootstraps initial page table and KPTmap. The code is rewritten in C and moved into the pmap_cold(). The comment in vmparam.h explains the KVA layout.

There is no PCID mechanism available in protected mode, so each kernel/user switch forth and back completely flushes the TLB, except for the trampoline PTD region. The TLB invalidations for userspace becomes trivial, because IPI handlers switch page tables. On the other hand, context switches no longer need to reload %cr3.

copyout(9) was rewritten to use vm_fault_quick_hold(). To handle vm_fault_disable_faults(), small change was needed to vm_fault_quick_hold() to only call pmap_extract_and_hold() and to not enter into vm_fault_hold() if we are in no-fault mode. This is managed by an additional flag instead of only testing td_pflags, because I fear KPI change for third-party consumers of vm_fault_quick_hold(), in particular port of linux drm drivers. Another issue for new copyout(9) is compatibility with wiring user buffers around sysctl handlers. This explains two kind of locks for copyout ptes and accounting of the vslock() calls.

The change was motivated by the need to implement the Meltdown mitigation, but instead of KPTI the full split is done. The i386 architecture already shows the sizing problems, in particular, it is impossible to link clang and lld with debugging. I expect that the issues due to the virtual address space limits would only exaggerate and the split gives more liveness to the platform.

Test Plan

Patch got some insufficient testing. I want to get any feedback before spending resources on validating it.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib edited the test plan for this revision. (Show Details)
kib added reviewers: alc, markj, jhb, jeff.
kib edited the summary of this revision. (Show Details)
kib added a subscriber: pho.

Create guard page for thread0 stack (discussed with bde).
Correct numeric values in the KVA map.

git complaints upon applying the patch, FYI:

<stdin>:595: trailing whitespace.
     * XXX only needs to be invlpg(0) but that doesn't work on the 386 
<stdin>:1354: trailing whitespace.
IDTVEC(mchk)
<stdin>:2459: space before tab in indent.
        setidt(IDT_BP, &IDTVEC(bpt), SDT_SYS386IGT, SEL_UPL,
<stdin>:2491: space before tab in indent.
        setidt(IDT_SYSCALL, &IDTVEC(int0x80_syscall),
error: sys/i386/conf/X: No such file or directory

git complaints upon applying the patch, FYI:

<stdin>:595: trailing whitespace.
     * XXX only needs to be invlpg(0) but that doesn't work on the 386 
<stdin>:1354: trailing whitespace.
IDTVEC(mchk)
<stdin>:2459: space before tab in indent.
        setidt(IDT_BP, &IDTVEC(bpt), SDT_SYS386IGT, SEL_UPL,
<stdin>:2491: space before tab in indent.
        setidt(IDT_SYSCALL, &IDTVEC(int0x80_syscall),
error: sys/i386/conf/X: No such file or directory

Should be fixed in the branch.

gnu/usr.bin/gdb/kgdb/trgt_i386.c
292

#define?

sys/dev/dcons/dcons_os.c
316

not a big deal but prefer consistency - #if __amd64__ ... #else __i386__ to match above?

sys/i386/i386/exception.s
94

"address of linking" seems a bit nonstandard - I'd probably use "the linked address" or "the address at which it is linked"

321

Aside, "new" NetBSD executables should no longer be here.

sys/i386/i386/machdep.c
4

For cases where there is already a 4-clause license I think we should add a new 2-clause FreeBSD for ours.

kib marked 3 inline comments as done.Mar 16 2018, 1:40 PM
kib added inline comments.
sys/dev/dcons/dcons_os.c
316

I do not understand what do you propose there.

sys/i386/i386/exception.s
321

I changed it to '"new" a.out executables'.

sys/dev/dcons/dcons_os.c
316

dcons_crom.c has

#ifdef __amd64__
        ...
#else /* __i386__ */
        ...
#endif

vs here

#ifdef __i386__
        ...
#else /* __amd64__ */
        ...
#endif

Absent a reason where in context one or the other is preferable, it seems we should be consistent and always have one of the two first.

kib marked 2 inline comments as done.Mar 16 2018, 1:56 PM

Update patch to the today snapshot.

Most important is the fix to the trampoline stack top calculation. Other than that, there are assorted less important fixes and comment updates.

There are still two bugs known, one reported by Peter Holm, and one which I see in the double fault handler. WIP.

Fix double-fault handler, and handling of faults at the doreti path.

Today snapshot: fixed bugs. Known issue with vga bios and vm86.

Today snapshot, lots of bugs fixed, esp. the vm86 mode should no longer obliterate the calling thread stack.

Passes smoke tests in QEMU (defaults; invoked as qemu-system-i386 -hda image.i386)

Regenerate after today commits.

Today snapshot, supposedly final.

Final version as tested by pho.

gnu/usr.bin/gdb/kgdb/trgt_i386.c
34

vm/pmap.h has always included machine/pmap.h.

sys/i386/include/vmparam.h
172

What do you mean by "system allocation"?

173

aligned is misspelled.

kib marked 2 inline comments as done.Mar 31 2018, 7:50 PM
kib added inline comments.
sys/i386/include/vmparam.h
172

Allocation of KVA chunks for the purpose of laying out cpu system structures. It underlies the KVA which is then handed to MI VM.

Two notes by alc.

This patch contains WIP for optimization of copyout(9). It boots but the optimization is only applied to fuword and fuword16.

Bugfixes for fast copyout.

Whitespace nit while applying to my test tree:

<stdin>:1221: trailing whitespace.

<stdin>:1286: trailing whitespace.

<stdin>:1593: trailing whitespace.

<stdin>:2005: trailing whitespace.
#include <i386/i386/copyout_fast.s>

Fix ERESTART for lcall $7,$0 syscalls.

Do not forget to map early KPTmap.

Fix several uses of KERNBASE in ppc, syscons and acpica.

Submitted by: bde

Handle KERNBASE in mptable.c

This revision is now accepted and ready to land.May 15 2018, 1:49 PM