Page MenuHomeFreeBSD

i386 4/4G split
ClosedPublic

Authored by kib on Mar 9 2018, 7:35 PM.

Details

Summary

The change makes the user and kernel address spaces on i386 independent, giving each the full 4G of usable virtual addresses except for one PDE at top used for trampoline and per-CPU trampoline stacks, and system structures that must be always mapped, namely IDT, GDT, common TSS and LDT, and process-private TSS and LDT if allocated.

By using 1:1 mapping for the kernel text and data, it appeared possible to eliminate assembler part of the locore.S which bootstraps initial page table and KPTmap. The code is rewritten in C and moved into the pmap_cold(). The comment in vmparam.h explains the KVA layout.

There is no PCID mechanism available in protected mode, so each kernel/user switch forth and back completely flushes the TLB, except for the trampoline PTD region. The TLB invalidations for userspace becomes trivial, because IPI handlers switch page tables. On the other hand, context switches no longer need to reload %cr3.

copyout(9) was rewritten to use vm_fault_quick_hold(). To handle vm_fault_disable_faults(), small change was needed to vm_fault_quick_hold() to only call pmap_extract_and_hold() and to not enter into vm_fault_hold() if we are in no-fault mode. This is managed by an additional flag instead of only testing td_pflags, because I fear KPI change for third-party consumers of vm_fault_quick_hold(), in particular port of linux drm drivers. Another issue for new copyout(9) is compatibility with wiring user buffers around sysctl handlers. This explains two kind of locks for copyout ptes and accounting of the vslock() calls.

The change was motivated by the need to implement the Meltdown mitigation, but instead of KPTI the full split is done. The i386 architecture already shows the sizing problems, in particular, it is impossible to link clang and lld with debugging. I expect that the issues due to the virtual address space limits would only exaggerate and the split gives more liveness to the platform.

Test Plan

Patch got some insufficient testing. I want to get any feedback before spending resources on validating it.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 15906

Event Timeline

kib created this revision.Mar 9 2018, 7:35 PM
kib edited the summary of this revision. (Show Details)Mar 11 2018, 10:36 PM
kib edited the test plan for this revision. (Show Details)
kib added reviewers: alc, markj, jhb, jeff.
kib edited the summary of this revision. (Show Details)
kib added a subscriber: pho.
kib updated this revision to Diff 40250.Mar 13 2018, 4:29 PM

Create guard page for thread0 stack (discussed with bde).
Correct numeric values in the KVA map.

git complaints upon applying the patch, FYI:

<stdin>:595: trailing whitespace.
     * XXX only needs to be invlpg(0) but that doesn't work on the 386 
<stdin>:1354: trailing whitespace.
IDTVEC(mchk)
<stdin>:2459: space before tab in indent.
        setidt(IDT_BP, &IDTVEC(bpt), SDT_SYS386IGT, SEL_UPL,
<stdin>:2491: space before tab in indent.
        setidt(IDT_SYSCALL, &IDTVEC(int0x80_syscall),
error: sys/i386/conf/X: No such file or directory
kib added a comment.Mar 15 2018, 10:48 AM

git complaints upon applying the patch, FYI:

<stdin>:595: trailing whitespace.
     * XXX only needs to be invlpg(0) but that doesn't work on the 386 
<stdin>:1354: trailing whitespace.
IDTVEC(mchk)
<stdin>:2459: space before tab in indent.
        setidt(IDT_BP, &IDTVEC(bpt), SDT_SYS386IGT, SEL_UPL,
<stdin>:2491: space before tab in indent.
        setidt(IDT_SYSCALL, &IDTVEC(int0x80_syscall),
error: sys/i386/conf/X: No such file or directory

Should be fixed in the branch.

kib updated this revision to Diff 40315.Mar 15 2018, 11:51 AM

MInor editings.

emaste added inline comments.Mar 16 2018, 1:22 AM
gnu/usr.bin/gdb/kgdb/trgt_i386.c
293

#define?

sys/dev/dcons/dcons_os.c
316

not a big deal but prefer consistency - #if __amd64__ ... #else __i386__ to match above?

sys/i386/i386/exception.s
92

"address of linking" seems a bit nonstandard - I'd probably use "the linked address" or "the address at which it is linked"

312

Aside, "new" NetBSD executables should no longer be here.

sys/i386/i386/machdep.c
4

For cases where there is already a 4-clause license I think we should add a new 2-clause FreeBSD for ours.

kib marked 3 inline comments as done.Mar 16 2018, 1:40 PM
kib added inline comments.
sys/dev/dcons/dcons_os.c
316

I do not understand what do you propose there.

sys/i386/i386/exception.s
312

I changed it to '"new" a.out executables'.

emaste added inline comments.Mar 16 2018, 1:45 PM
sys/dev/dcons/dcons_os.c
316

dcons_crom.c has

#ifdef __amd64__
        ...
#else /* __i386__ */
        ...
#endif

vs here

#ifdef __i386__
        ...
#else /* __amd64__ */
        ...
#endif

Absent a reason where in context one or the other is preferable, it seems we should be consistent and always have one of the two first.

kib marked 2 inline comments as done.Mar 16 2018, 1:56 PM
kib updated this revision to Diff 40344.Mar 16 2018, 1:59 PM

Update patch to the today snapshot.

Most important is the fix to the trampoline stack top calculation. Other than that, there are assorted less important fixes and comment updates.

There are still two bugs known, one reported by Peter Holm, and one which I see in the double fault handler. WIP.

kib updated this revision to Diff 40362.Mar 16 2018, 9:30 PM

Fix double-fault handler, and handling of faults at the doreti path.

kib updated this revision to Diff 40391.Mar 17 2018, 8:43 PM

Today snapshot: fixed bugs. Known issue with vga bios and vm86.

kib updated this revision to Diff 40640.Mar 23 2018, 3:56 PM

Today snapshot, lots of bugs fixed, esp. the vm86 mode should no longer obliterate the calling thread stack.

Passes smoke tests in QEMU (defaults; invoked as qemu-system-i386 -hda image.i386)

kib updated this revision to Diff 40687.Mar 24 2018, 1:57 PM

Regenerate after today commits.

kib updated this revision to Diff 40793.Mar 27 2018, 2:28 PM

Today snapshot, supposedly final.

kib updated this revision to Diff 40938.Mar 31 2018, 10:23 AM

Final version as tested by pho.

alc added inline comments.Mar 31 2018, 7:08 PM
gnu/usr.bin/gdb/kgdb/trgt_i386.c
34

vm/pmap.h has always included machine/pmap.h.

sys/i386/include/vmparam.h
172

What do you mean by "system allocation"?

173

aligned is misspelled.

kib marked 2 inline comments as done.Mar 31 2018, 7:50 PM
kib added inline comments.
sys/i386/include/vmparam.h
172

Allocation of KVA chunks for the purpose of laying out cpu system structures. It underlies the KVA which is then handed to MI VM.

kib updated this revision to Diff 40951.Mar 31 2018, 7:51 PM

Two notes by alc.

This patch contains WIP for optimization of copyout(9). It boots but the optimization is only applied to fuword and fuword16.

kib updated this revision to Diff 40953.Mar 31 2018, 11:19 PM

Fast copyout.

kib updated this revision to Diff 40969.Apr 1 2018, 8:07 PM

Bugfixes for fast copyout.

emaste added a comment.Apr 2 2018, 2:13 PM

Whitespace nit while applying to my test tree:

<stdin>:1221: trailing whitespace.

<stdin>:1286: trailing whitespace.

<stdin>:1593: trailing whitespace.

<stdin>:2005: trailing whitespace.
#include <i386/i386/copyout_fast.s>
kib updated this revision to Diff 41107.Apr 4 2018, 8:18 PM

Fix ERESTART for lcall $7,$0 syscalls.

kib updated this revision to Diff 41141.Apr 5 2018, 5:49 PM

Do not forget to map early KPTmap.

kib updated this revision to Diff 41168.Apr 6 2018, 10:15 AM

Fix several uses of KERNBASE in ppc, syscons and acpica.

Submitted by: bde

kib updated this revision to Diff 41202.Apr 6 2018, 7:56 PM

Handle KERNBASE in mptable.c

kib accepted this revision.May 15 2018, 1:49 PM
This revision is now accepted and ready to land.May 15 2018, 1:49 PM
kib closed this revision.May 15 2018, 1:49 PM