Page MenuHomeFreeBSD

loader.efi: Fix when staging moves late
ClosedPublic

Authored by imp on Fri, Jun 5, 5:29 AM.
Tags
None
Referenced Files
F159433360: D57462.id179291.diff
Sun, Jun 14, 1:01 AM
Unknown Object (File)
Sat, Jun 13, 10:45 AM
Unknown Object (File)
Sat, Jun 13, 2:02 AM
Unknown Object (File)
Sat, Jun 13, 1:34 AM
Unknown Object (File)
Sat, Jun 13, 12:50 AM
Unknown Object (File)
Fri, Jun 12, 9:59 PM
Unknown Object (File)
Fri, Jun 12, 8:06 PM
Unknown Object (File)
Fri, Jun 12, 6:05 PM

Details

Summary

Prior to this commit, we'd compute the page tables and have the last
entries point to the staging area. We'd then add some more metadata to
the image and boot. This assumed the staging area didn't need to move
for this last bit of data.

However, if we go over the staging limit, when we copyin new data, we
grow the staging area, usually be moving it to a lower address. This
overage usually happens when we're loading modules and so things work
out nicely. Sometimes we're close to the limit, and we need to do this
growing inside bi_load, after we've computed the page table, making the
page table wrong, and the code we jump to random rather than the btext
routine we normally start at.

To fix this, move computation of the table (but not its allocation) to
after bi_load, but before we call the trampoline.

This problem was most observed when loading microcode for many peole,
but Gleb reproduced the error with a set of modules that didn't include
ucode.

This bug hunt was greatly assisted by Claude who looked at the crash
from the EFI boot loader and surmised that we weren't jumping to the
code we thought we were jumping to. After inspecting the code, I asked
claude how corruption could happen (I thought overwriting the page
table), but claude notice the possibility that staging might change
after we computed the page table, and this fix is the result. Claude
didn't suggest a diff, but did provide many helpful clues that lead me
to this fix.

Sponsored by: Netflix

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

imp requested review of this revision.Fri, Jun 5, 5:29 AM

Thanks!

stand/efi/loader/arch/amd64/elf64_freebsd.c
158–159

Just a style suggestion: set it to false in this branch and don't initialize it at the declaration line.

vladlen added inline comments.
stand/efi/loader/arch/amd64/elf64_freebsd.c
200

Moving the bi_load() call does not solve all problems related to possible staging area relocation.
The fourth parameter (exit_bs - exit boot services) is set to true, and inside, after exiting
boot services in bi_load_efi_data(), there is code that may cause staging area relocation -
calls to md_copymodules(), md_align().
This code was added later and should likely be placed before calling bi_load_efi_data();
otherwise, problems due to staging area relocation will persist.

stand/efi/loader/arch/amd64/elf64_freebsd.c
200

one of the quick solutions is assign staging_slop=0 after exiting boot services. It prevents staging area copy (it is anyway impossible)
and all allocation will be done inside of the slop (it is why it was introduced - to allocate inside all structures after boot services exist and before kernel get full control of memory).

stand/efi/loader/arch/amd64/elf64_freebsd.c
200

Also the code in bi_load() can call printf() after bi_load_efi_data() (md_copymodules() -> arch_copyin() -> efi_copyin() -> efi_check_space() -> printf("efi_check_space: Unable to expand staging area\n");, but boot services have been already exited and behaviour is undefined (use unallocated structures by UEFI).

kevans added inline comments.
stand/efi/loader/arch/amd64/elf64_freebsd.c
200

We disable boot service use when we ExitBootServices, so the printf is not a practical concern.

stand/efi/loader/arch/amd64/elf64_freebsd.c
177

The printf's should be done after the last move, otherwise the values shown are misleading.

But then, I do not understand what code would update the PT4 value itself after the move?

Hm, so the problem might be, instead, that extending the stage area overwrites the allocation for the page tables?
Then wouldn't the effect of the patch that now page tables fill overwrite the staging area?

IMO the fix is to put the page table pages into staging, and get the pointer to top-level page table page from the staging area. After that, we can fill page tables.

stand/efi/loader/arch/amd64/elf64_freebsd.c
200

We disable boot service use when we ExitBootServices, so the printf is not a practical concern.

printf uses UEFI call available only for Boot Services, but biit services structures are not allocated anymore, the behaviour is undefined (it is not simple return with error code). Even if loader code has error code, it can not inform.

stand/efi/loader/arch/amd64/elf64_freebsd.c
177

We can move the printf to kust before we exit boot services, but the exact placement is tricky...

Inside the PT4 we encode the staging value, so it has to be the final value. Pt4 isn't overwritten in yhe bad scenario

200

This is 100% the right place. After bi_load, staging can't move. It's right before we jump to the trampoline. Nothing moves it here

The printf issue is the complication I said above.

In D57462#1316594, @kib wrote:

Hm, so the problem might be, instead, that extending the stage area overwrites the allocation for the page tables?

No. The value encoded in the table are wrong. The table isn't overwritten.

Then wouldn't the effect of the patch that now page tables fill overwrite the staging area?

No. Since we can't know the final staging value until later. My patch moves it.

IMO the fix is to put the page table pages into staging, and get the pointer to top-level page table page from the staging area. After that, we can fill page tables.

Since we allocate the tables, we won't have a collision. Putting it in staging is trickier. Having it right after the stack works well since we replace the stack soon after we start. It's what I did for LinuxBoot and it works well there. We don't have to manage it in the modules area, etc after boot.

This revision is now accepted and ready to land.Fri, Jun 5, 6:41 PM

So I'll see if I can move where we print the staging value..

stand/efi/loader/arch/amd64/elf64_freebsd.c
200

That is to say: the call is after bi_load. The concern you have is because it was before bi_load before. Now that it's after, that concern goes away. That is the exact bug I'm fixing here.

230

This is where we use staging, and it must be after bi_load().

Move printf into main code
Add commented out PageFree since we could free the memory
if boot services was still running. It might be better to
use malloc here to get the memory to get the memory map
since we could free it after boot services stops running.
The EFI loader allocates a big array of pages before starting
for routine allocations while we run.

This revision now requires review to proceed.Fri, Jun 5, 6:56 PM

Wouldn't the ia32 loader need this too?

Wouldn't the ia32 loader need this too?

Yes. We need very similar changes there.

In D57462#1316706, @imp wrote:
In D57462#1316594, @kib wrote:

IMO the fix is to put the page table pages into staging, and get the pointer to top-level page table page from the staging area. After that, we can fill page tables.

Since we allocate the tables, we won't have a collision. Putting it in staging is trickier. Having it right after the stack works well since we replace the stack soon after we start. It's what I did for LinuxBoot and it works well there. We don't have to manage it in the modules area, etc after boot.

In the debug output, it already is above the staging area. However, that's not guaranteed. But, we can grow the staging up, it will grow down if we're above it instead of overwriting, and if we're below it and it's growing down, it won't overwrite it because UEFI won't give out the same address twice. If we can do neither, then we'll fail inside of efi_check_space(), printing a message (maybe several), eventually failing before we try to boot. This isn't new behavior.

So I think it's safe as we do it and doesn't introduce any new failure modes that we're not observing today.

Oh, and aarch64 isn't affected because we don't create a page table for the handoff. Likewise with riscv and arm.

also take a stab at ia32, no clue if it works or compiles

Wouldn't the ia32 loader need this too?

Updated to do this. Please take a look.

Can confirm that it boots fine with the things I mentioned below fixed.
Thanks!

stand/efi/loader/arch/i386/elf64_freebsd.c
102
204–205

These need to be moved to the new if statement below.

stand/efi/loader/bootinfo.c
217

(<machine/_inttypes.h> need to be included above for this)

fix format and ia32 build issues.

stand/efi/loader/bootinfo.c
217

For various historical reasons, we usually do %j. Can you confirm that, and the other changes, worked?

stand/efi/loader/bootinfo.c
217

Ah. Yes, it builds now.

Is this ready to go? And do we still need 16f21c5af35002b8361ffb2e83ff3c92cd899a3a ?

I'm holding off on starting 15.1-RC3 builds until we can get this landed but I am very eager to get them going.

stand/efi/loader/arch/i386/elf64_freebsd.c
93

Accidental blank line?

stand/efi/loader/arch/i386/elf64_freebsd.c
93

yup. Removed on the version I just pushed.

This revision was not accepted when it landed; it landed in state Needs Review.Sat, Jun 6, 1:25 AM
This revision was automatically updated to reflect the committed changes.