Prior to this commit, we'd compute the page tables and have the last
entries point to the staging area. We'd then add some more metadata to
the image and boot. This assumed the staging area didn't need to move
for this last bit of data.
However, if we go over the staging limit, when we copyin new data, we
grow the staging area, usually be moving it to a lower address. This
overage usually happens when we're loading modules and so things work
out nicely. Sometimes we're close to the limit, and we need to do this
growing inside bi_load, after we've computed the page table, making the
page table wrong, and the code we jump to random rather than the btext
routine we normally start at.
To fix this, move computation of the table (but not its allocation) to
after bi_load, but before we call the trampoline.
This problem was most observed when loading microcode for many peole,
but Gleb reproduced the error with a set of modules that didn't include
ucode.
This bug hunt was greatly assisted by Claude who looked at the crash
from the EFI boot loader and surmised that we weren't jumping to the
code we thought we were jumping to. After inspecting the code, I asked
claude how corruption could happen (I thought overwriting the page
table), but claude notice the possibility that staging might change
after we computed the page table, and this fix is the result. Claude
didn't suggest a diff, but did provide many helpful clues that lead me
to this fix.
Sponsored by: Netflix