Page MenuHomeFreeBSD

amd64 UEFI boot: stop copying staging area to 2M phys
ClosedPublic

Authored by kib on Jul 9 2021, 4:08 PM.

Details

Summary

instead activate kernel in-place whenever the EFI boot env found a free chunk.

The advantages are that we:

  • do not silently destroy EFI RT code/data or any other memory that BIOS still needs at runtime, if it happens to be located at 2M/kernel size.
  • do not corrupt trampoline if it happens to be allocated at the location that is copied to
  • the 1G 1:1 all over VA trick is eliminated, kernel and other loaded blobs can be located anywhere below 4G
  • the AP startup is streamlined, AP now directly loads %cr3 with pointer to kernel page table without intermediate page, which removed the need to reserve low memory at early boot, and made startup compatible with the new boot scheme

The new kernel is compatbile with an old loader.

Loader code currently support both old and new handoff scheme. Loader automaticly detects supported handoff scheme for booting kernel. A new loader command copy_staging does that manually. Do copy_staging enable to switch loader into legacy mode, copy_staging disable sets new mode, 'auto' uses autodetected. Without argument, it prints current mode.

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Staging area realloc debugged.
Added safety check to avoid AllocPages() call after ExitBootServices().
Several other loader bug fixes.

Added pmap_page_alloc_below_4g().
Kernel patch split into more readable chunks.
Per-commit view is available at https://kib.kiev.ua/git/gitweb.cgi?p=deviant3.git;a=shortlog;h=refs/heads/kpages1

Tested AP startup rework or non-EFI real hw.

I tested this patch on my Sandy Bridge Samsung laptop. It doesn't boot.

I tested both the previous version and today's updated version of the patch by applying it to the main git branch and building an amd64-memstick image.

  • With copy_staging disabled (default), the laptop reboots immediately after the bootloader displays the EFI framebuffer information.
  • With copy_staging enabled, the laptop freezes after the EFI framebuffer information is displayed, but does not reboot (same situation as before without D30828 applied).
  • In legacy mode (BIOS instead of UEFI), the laptop boots fine like it did before this patch.

I fixed my problem: when constructing the page table, the index "i" was defined as int, and it overflowed. After changing the type of "i" from int to uint64, it boots.

I added a few more comments, but only to the parts of code that I read.
Are fixes from D30828 going to be implemented?

stand/efi/loader/arch/amd64/elf64_freebsd.c
108

"i" would otherwise overflow during virtual page table construction

170

It would be great if return values like this were not ignored.

248

i was defined as int and could overflow

264

i overflowed when defined as int

265
kib marked 5 inline comments as done.Jul 14 2021, 10:07 AM

Are fixes from D30828 going to be implemented?

I think that it would still be useful, for copy case which is not going away in near future. But I believe that more work is needed for D30828; for now I fixate on nocopy and then we will discuss D30828

stand/efi/loader/arch/amd64/elf64_freebsd.c
248

Indeed, there was an overflow, but it is not there. i itself cannot overflow, NPTEPG is just 512.

265

I explicitly casted i to pd_entry_t, I believe this is the self-explaining way of fixing the issue.

kib marked 2 inline comments as done.

Loader:

  • Overflow fix.
  • Check and report errors allocating trampoline and trampoline page table.
stand/efi/loader/bootinfo.c
401

I looked at the UEFI specification, and it says that some boot services may actually be gone even if ExitBootServices() does not return success.

I looked at:

  • UEFI 2.0, which does not seem to mention what happens when ExitBootServices() returns a failure
  • UEFI 2.3.1 mentions that only GetMemoryMap() may be called after the first call to ExitBootServices()
  • UEFI 2.8 mentions that only Memory Allocation Services may be called after the first call to ExitBootServices()

So it seems like GetMemoryMap() is the only boot service function that can be expected to work after ExitBootServices() is called and fails.

My laptop freezes, whenever printf() or panic() is called after ExitBootServices() because printing to screen apparently uses boot services. On my laptop, ExitBootServices() succeeds the first time it is called. I wonder if eliminating all calls to boot service functions other than GetMemoryMap() after the first unsuccessful call to ExitBootServices() would fix boot freeze problems on some computers.

stand/efi/loader/copy.c
370–371

I don't think it is possible to print messages on the screen after ExitBootServices() has been called.

kib marked an inline comment as done.Jul 14 2021, 3:56 PM
kib added inline comments.
stand/efi/loader/copy.c
370–371

On screen perhaps not, on the directly driven serial port we can (which is e.g. my test configuration). There not much can be done anyway at this point.

kib marked an inline comment as done.
kib added reviewers: markj, emaste, alc, tsoome.
kib removed subscribers: markj, emaste.

Add autodetection of the kernel type WRT copying of the staging area. Infer it from the type and size of the kernphys symbol. I am very happy with this hack because it allows us to not depend on any special notes or rather expensive and fragile mechanism. Instead, loader directly tests for the present functionality.

WIth this added, patch should be functionally complete, except Xen is not sorted out.

Restore automatic state of copy_staging if kernel exec failed.

Tested, to the best of my ability, with an HP ProBook 440 G7 – one of the HP models that feature in bug 255073.

I'm delighted to confirm that the computer boots, with a build of 14.0-CURRENT (main-n247985-d1a0eab9fbc). Observations, thanks to some private guidance:

  1. if I copy_staging enable at the loader prompt, then boot leads to the symptom that was originally reported in bug 255073 – boot fails
  2. the kernel panics in response to service zfs onestart (I imagine that this is off-topic; I'll update the OS).

The kernel panic after service zfs onestart seems to be caused by this patch. I can reproduce it every time I run the command from an installation media on a KVM virtual machine.

  • Without this patch, service zfs onestart succeeds.
  • With this patch and copy_staging enable, the service also starts.
  • With this patch, and with the default value of copy_staging, the kernel panics every time I run service zfs onestart. The error message is panic: vm_radix_insert: key 7fffffffc16 is already present.

The kernel panic after service zfs onestart seems to be caused by this patch. I can reproduce it every time I run the command from an installation media on a KVM virtual machine.

  • Without this patch, service zfs onestart succeeds.
  • With this patch and copy_staging enable, the service also starts.
  • With this patch, and with the default value of copy_staging, the kernel panics every time I run service zfs onestart. The error message is panic: vm_radix_insert: key 7fffffffc16 is already present.

This is another bug due to compat mapping of zero address at KERNBASE. As result kernel page table pages indexes were off by one, which matters for promotion of KVA mappings. zfs.ko has large enough text (>4M) to trigger promotion on load.

Fix indexes for kernel page table pages.

stand/efi/loader/copy.c
238

I think some enum for these values would be worthwhile.

240
375
378
sys/amd64/amd64/machdep.c
1630 ↗(On Diff #92306)

Out of paranoia, perhaps CR3_PCID_SAVE should be masked off too.

sys/amd64/amd64/pmap.c
1813 ↗(On Diff #92306)
2088 ↗(On Diff #92306)

Should be conditional on zeroed too, otherwise we might as well not specify zeroed at all.

kib marked 8 inline comments as done.Jul 17 2021, 5:36 PM
kib added inline comments.
sys/amd64/amd64/machdep.c
1630 ↗(On Diff #92306)

I am quite sure that we cannot read CR3_PCID_SAVE back from %CR3. It is not defined in SDM as a bit in the register, Intel only talks about 'operation that has bit 63 set in the source operand' as something that preserves TLB.

In fact, we already relay on that in when storing SAVED_UCR3.

Also, loader does not enable PCID at all, and I am sure that firmware cannot safely pass control to OS with PCID enabled either.

I can add it if you insist, but this is excessive IMO.

kib marked an inline comment as done.

Add symbolic names for copy_staging values.
Fix pmap_page_alloc_below_4g() to use zeroed for bzero() call.
Comments fixes and style.

sys/amd64/amd64/machdep.c
1630 ↗(On Diff #92306)

Ok, I don't insist at all. I just wondered if it would be sensible to mask off any upper flag bits, but I didn't look at the CR3 layout closely.

stand/efi/loader/copy.c
384

I wonder if this should be larger. We allocate at least 1 page per CPU, for example, for pcpu. Doesn't that come from this region?

sys/amd64/amd64/machdep.c
1637 ↗(On Diff #92312)

Logically, shouldn't it be (vm_paddr_t)((vm_offset_t)hammer_time - KERNBASE)?

kib marked 3 inline comments as done.

Set slop to 8M for amd64, and 0 for other arches.
Add 'staging_slop' loader command to set slop to any value.

Make kernphys calculation pedantically type-correct.

In vmparam.h, the KERNLOAD definition implies that the kernel is still always loaded at 0x200000.

sys/amd64/amd64/machdep.c
1688 ↗(On Diff #92446)

IMO it would be nice to provide a KERNSTART or kernvirt (corresponding to kernphys) equal to KERNBASE+NBPDR.

sys/amd64/amd64/mpboot.S
98 ↗(On Diff #92446)

Why does PGE now need to be set here? In particular, the load_cr4() in init_secondary_tail() which sets CR4_PGE no longer flushes the TLB, I believe.

kib marked 2 inline comments as done.

KERNSTART
Remove second set of CR4_PGE
Update comment about KERNLOAD
Do not corrupt low memory doing AP startup if EFI boot

markj added inline comments.
stand/efi/loader/copy.c
217

Strange indentation here.

This revision is now accepted and ready to land.Jul 22 2021, 5:58 PM
kib marked an inline comment as done.Jul 23 2021, 4:37 PM
kib added inline comments.
sys/amd64/amd64/machdep.c
1688 ↗(On Diff #92446)

Yes, I pondered this as well.

sys/amd64/amd64/mpboot.S
98 ↗(On Diff #92446)

Because I am switching to the kernel page table which in some configurations have PG_G bit set in PTEs. If not enabled there, CPU reacts with #gp when paging is enabled.

I removed reload of %cr4 for PGE from the amd64-specific block in init_secondary_tail(). There is no need to flush TLB, we are already on the right page table.

kib updated this revision to Diff 92713.

Rebase after some trivial bits were committed.

This revision was not accepted when it landed; it landed in state Needs Review.Jul 27 2021, 5:20 PM
This revision was automatically updated to reflect the committed changes.
kib updated this revision to Diff 92816.

Rebase after AP startup bits were committed upstream.

This revision was not accepted when it landed; it landed in state Needs Review.Jul 31 2021, 1:54 PM
This revision was automatically updated to reflect the committed changes.
kib updated this revision to Diff 93050.

Rebase after kernphys commit

This revision is now accepted and ready to land.Aug 8 2021, 5:56 AM