Page MenuHomeFreeBSD

Unify boot1 with loader
ClosedPublic

Authored by eric_metricspace.net on Apr 21 2017, 2:57 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 25, 2:58 AM
Unknown Object (File)
Wed, Jan 22, 11:23 PM
Unknown Object (File)
Tue, Jan 21, 6:22 AM
Unknown Object (File)
Fri, Jan 17, 1:20 AM
Unknown Object (File)
Thu, Jan 16, 12:29 AM
Unknown Object (File)
Mon, Jan 13, 5:27 PM
Unknown Object (File)
Mon, Jan 13, 5:27 PM
Unknown Object (File)
Sun, Jan 12, 7:18 PM

Details

Summary

This patch is a redo of the EFI refactoring, with a lot of the complexity removes. It basically refactors boot1 to use the same IO code as loader.

This opens the way to a lot of follow-up work, including GELI support for EFI, volume selection in boot1, and other things.

Test Plan

This has been tested on qemu so far with a fairly complex setup. It has also been on use on real hardware for several months.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This review is a little out of date and is much more ambitious than what I've just done in https://reviews.freebsd.org/D11820 . which also sets down this path, but not as completely. I fear that will make this a little harder, but maybe not by too much.

In D10447#245058, @imp wrote:

This review is a little out of date and is much more ambitious than what I've just done in https://reviews.freebsd.org/D11820 . which also sets down this path, but not as completely. I fear that will make this a little harder, but maybe not by too much.

Please note a couple of things:

  1. The GELI stuff depends on this patch, and trying to rebase it to something else is potentially a whole lot of work
  2. Allan Jude has already indicated his approval of this patch (pending some issues that have been fixed)
  3. One version or another of this patch has been used on real hardware for at least a year now
In D10447#245058, @imp wrote:

This review is a little out of date and is much more ambitious than what I've just done in https://reviews.freebsd.org/D11820 . which also sets down this path, but not as completely. I fear that will make this a little harder, but maybe not by too much.

Please note a couple of things:

  1. The GELI stuff depends on this patch, and trying to rebase it to something else is potentially a whole lot of work
  2. Allan Jude has already indicated his approval of this patch (pending some issues that have been fixed)
  3. One version or another of this patch has been used on real hardware for at least a year now

Now that this is rebased to head, I can hold off a little on my changes if they would be disruptive. That was my goal in giving the heads up: is this so stale it doesn't matter, or is it close and I'll upend the apple cart.

What's your timeline to having this pushed in?

imp requested changes to this revision.Aug 3 2017, 2:41 AM

We need to restore the priority scheme that was here before. At most, we can have zfs boot environments take precedence over the old behavior (since ZFS tends to be self-describing). But the old code used to do something similar and had all kinds of problems, which is why the preferred stuff was put in to begin with. It must be restored, or we must implement the UEFI boot manager protocol (either is acceptable, but the latter is likely beyond the scope).

sys/boot/efi/boot1/boot1.c
532–537 ↗(On Diff #31486)

This structure (find one and boot it) has some problems. It doesn't try to implement the UEFI boot manager protocol, and its structure makes it hard to create a priority list of items to try to load, which the old code made somewhat easy (easy enough we implemented simple boot next protocol to work around issues we were having the the BIOS magically changing which partition it booted off of, but I digress).

It also should prefer the device we've been loaded off of to other devices in the system. It doesn't seem to do that, but plow through the list until if finds something, anything, that's bootable. This is a mistake and one that must be corrected before it's committed. With this behavior, you'll get whatever is first in the UEFI list of devices, which could be that thumb drive you plugged in that happens to have a bootable image on it just because it comes before other drives in the list.

564 ↗(On Diff #31486)

This comment seems stale since we have all those return (0) above here. This hierarchy that was in the old code is critical and needs to be preserved.

This revision now requires changes to proceed.Aug 3 2017, 2:41 AM

Also, this patch is too big. There's many smaller bits that can easily be committed now which would trim maybe half of the changes here from this change. Without objection, I'll do that. The interstwingled changes here will also make it hard to bisect if there's any bugs in this refactor, which experience has shown will be important for a change of this size.

Also, the current change in size for boot1.efi from these changes is going to present problems for the upgrade path since we foolishly chose such small FAT partitions in the past, in violation of the UEFI standard (though to be honest, even at work where we have acres of disk space, I only make 50MB partitions). Some thought may need to be given to the upgrade path since the old installs won't be able to upgrade to the new boot1.efi.

In D10447#245408, @imp wrote:

Also, the current change in size for boot1.efi from these changes is going to present problems for the upgrade path since we foolishly chose such small FAT partitions in the past, in violation of the UEFI standard (though to be honest, even at work where we have acres of disk space, I only make 50MB partitions). Some thought may need to be given to the upgrade path since the old installs won't be able to upgrade to the new boot1.efi.

Just as an remark - the UEFI standard does not really specify the size of the ESP, there are suggestions from vendors and some limiting factors (like minimum size for FAT32). For zpool create -B patch, I did pick 256MB as an default.

I will pull out some components of the patch and make some smaller ones.

Regarding the MSDOSFS image size issue, I've had a patch to increase the size out for some time, however, there's been ongoing indecision on how to deal with that issue. It really needs to get resolved soon, so this effort can move forward.

In D10447#245408, @imp wrote:

Also, the current change in size for boot1.efi from these changes is going to present problems for the upgrade path since we foolishly chose such small FAT partitions in the past, in violation of the UEFI standard (though to be honest, even at work where we have acres of disk space, I only make 50MB partitions). Some thought may need to be given to the upgrade path since the old installs won't be able to upgrade to the new boot1.efi.

Just as an remark - the UEFI standard does not really specify the size of the ESP, there are suggestions from vendors and some limiting factors (like minimum size for FAT32). For zpool create -B patch, I did pick 256MB as an default.

Since it specifies it should be FAT32, that specifies a minimum size. There's no additional minimum required by the standard, true. We're cheating in many ways and it is now going to bite us.

I will pull out some components of the patch and make some smaller ones.

Regarding the MSDOSFS image size issue, I've had a patch to increase the size out for some time, however, there's been ongoing indecision on how to deal with that issue. It really needs to get resolved soon, so this effort can move forward.

Yes. I know. It should have been fixed before 11.1, honestly. I thought it had been, or I'd have driven that aspect to completion faster. I've been complaining about the size issue now for a full year.

I will pull out some components of the patch and make some smaller ones.

I'm happy to help in that process. Some of these are no-brainers (like moving devicename.c, which is 100% independent or the EFI time function enhancement). I'm happy to mine the patch for these things and commit in such a way that rebasing is easy (this is easy to do with git, since I don't have to push to make sure the rebase is easy after a change). I honestly think it would be faster for me to do that than to have more review cycles here for the stuff that's more mundane and generated no comments to date. But I'm happy to defer too.

In D10447#245717, @imp wrote:

I will pull out some components of the patch and make some smaller ones.

I'm happy to help in that process. Some of these are no-brainers (like moving devicename.c, which is 100% independent or the EFI time function enhancement). I'm happy to mine the patch for these things and commit in such a way that rebasing is easy (this is easy to do with git, since I don't have to push to make sure the rebase is easy after a change). I honestly think it would be faster for me to do that than to have more review cycles here for the stuff that's more mundane and generated no comments to date. But I'm happy to defer too.

that's a long way of saying "I'm happy to take the non-contentious parts of this patch so you have more times to deal with the issue I outlined"

I will return to the size change very soon.

Okay, I've broken this one up into a number of smaller ones. I will wait for the last of them to go in, then I'll rebase and we can go from there.

eric_metricspace.net edited edge metadata.
eric_metricspace.net edited the test plan for this revision. (Show Details)

Rebased to HEAD after committing portions of it as independent patches.

sys/boot/efi/boot1/Makefile
18 ↗(On Diff #31601)

What triggers this?

84 ↗(On Diff #31601)

Can you you use SRCTOP/OBJTOP instead here?

sys/boot/efi/boot1/boot1.c
204 ↗(On Diff #31601)

Don't bother to change this. I'm about to commit something that will conflict in that it removes all these functions.

376 ↗(On Diff #31601)

through here. None of this needs to be done and will conflict.

499 ↗(On Diff #31601)

My comment from an earlier review moved: we still need to actually implement preferential treatment for where boot1.efi was loaded from.

eric_metricspace.net added inline comments.
sys/boot/efi/boot1/Makefile
18 ↗(On Diff #31601)

Likely remnant of older changes. Removed in my repo

eric_metricspace.net added inline comments.
sys/boot/efi/boot1/boot1.c
499 ↗(On Diff #31601)

Take a look at the actual probe statements. This function scans everything looking for the boot partition, which should preserve the old behavior.

sys/boot/efi/boot1/boot1.c
499 ↗(On Diff #31601)

Right. I see one loop that matches handles (which will never be the same). Then I see a loop for each partition. I see tis code then almost replicated for fd and cd inline. Then I see it do another probe of all the devices. So is that how we get the current behavior? It's quite confusing and very uncommented what it's doing here.

I'm still unconvinced that it's actually doing the right thing. And it's making it harder for any scheme that walks through the lists looking for the highest value partition to boot off of (which are mods that we have at work).

also, why does it set currdev for the hddev, but not the cddev or fddev cases? That seems wrong to me and deserves at least a comment, but I suspect that the cddev and fddev code isn't actually working and we really need it for all three.

I forwarded ported this, but we're missing functionality that's been added in the past year.
See https://reviews.freebsd.org/D12161 for my forward port.
Functionality added in the year that's not in this forward port (or this current review):
o fallback to any device, that's not present.
o parsing /boot.config
o And efi_main() is being picked up from some place weird (since I moved it since this was posted, not surprising)
I'm sure there's some other things missing

I'll cherry pick some of these things to reduce trivial diffs....

In D10447#252525, @imp wrote:

I forwarded ported this, but we're missing functionality that's been added in the past year.
See https://reviews.freebsd.org/D12161 for my forward port.
Functionality added in the year that's not in this forward port (or this current review):
o fallback to any device, that's not present.

I'm still certain that load_preferred does what you're talking about.

o parsing /boot.config

boot1.c, line 555

o And efi_main() is being picked up from some place weird (since I moved it since this was posted, not surprising)

That was because efi_main.c got moved out of libefi to loader. I moved it back.

eric_metricspace.net added inline comments.
sys/boot/efi/boot1/boot1.c
499 ↗(On Diff #31601)

load_preferred filters for devices that are on the same device handle as the loaded image. It first tries ZFS, then all drives (and all) partitions, then attempts to probe the loaded image handle itself, then falls back to the device path. load_all goes through everything looking for something it can boot.

In any case, I stole the structure of the code from loader, so it's consistent with the behavior of loader.

Overhauled preferred device detection

! In D10447#254852, @eric_metricspace.net wrote:
That was because efi_main.c got moved out of libefi to loader. I moved it back.

Yea, they need to be two separate instances of efi_main because they do different things with the arguments and in the setup in general, especially for the efi boot manager changes (which need to grab the raw arguments which cannot be assumed to be ASCII or UCS2 or anything really. So I think that's too much refactoring. It's not really needed for the rest of the stuff to work.

Fixed issues with preferred device detection. This includes detailed analysis of the code via debug messages to make sure it's doing the right thing. Preferred devices should now be correctly detected.

Also, confirmed working with the loader efi_main.

This should be good to go, pending necessary testing on various setups, of course.

Also, someone please do a check for stray debug printfs. I am notoriously bad at spotting those.

Merged from HEAD, corrected a trivial merge conflict

Finally got time to do QEMU tests on this. Found some issues with setting the DeviceHandle on the loaded image. I fixed it for UFS detection. I also seem to have introduced a regression in ZFS preferred device detection, which I'm working to fix.

Fixed issues with setting image_handle->DeviceHandle incorrectly. Correct behavior confirmed on all my QEMU tests.

I'll take another look at it this week. I have on my plate unifying the efi_main routines that we have in the tree and I have for my uefi boot manager work since they are somewhat similar. Once that's complete, I think the biggest obstacle to getting this into the tree will be behind us since that's the biggest source of conflicts at the moment.

Also, just deployed to my laptop (multi-disk ZFS pool), and obviously it works.

Thanks for rebasing and testing functionality. I know it's taken a while, but these changes are (a) large and (b) conflict with other large work...

In D10447#260523, @imp wrote:

I'll take another look at it this week. I have on my plate unifying the efi_main routines that we have in the tree and I have for my uefi boot manager work since they are somewhat similar. Once that's complete, I think the biggest obstacle to getting this into the tree will be behind us since that's the biggest source of conflicts at the moment.

Can we possibly land this and then follow up with the EFI boot selection stuff? This is holding up GELI, and I'm getting people asking me when it'll land after my vBSDcon talk.

If we land this then we can move forward with GELI in parallel to your stuff.

We can't land this first and then do my stuff. That would break already committed work in boot1's copy of efi_main.c.

This commit is also kinda too big to land all at once still, but I'll pull in as much as I can as I merge the efi_main's together to make the changes more bite-sized and bisectable should there be issues. There's been much grumbling of late about huge commits landing that break things that are impossible to bisect, so I'll have to break this up to ensure I won't be fielding complaints like that.

In D10447#260527, @imp wrote:

We can't land this first and then do my stuff. That would break already committed work in boot1's copy of efi_main.c.

This commit is also kinda too big to land all at once still, but I'll pull in as much as I can as I merge the efi_main's together to make the changes more bite-sized and bisectable should there be issues. There's been much grumbling of late about huge commits landing that break things that are impossible to bisect, so I'll have to break this up to ensure I won't be fielding complaints like that.

The unification of efi_mains is already accomplished in this patch.

I suppose I can try to separate things out again, but at this point I've sunk more work into fixing conflicts than I did into the GELI work itself.

arc patch can't apply this patch.

The diffs don't make sense to me.... The latest tree (and even some fairly old trees) have changes that would require diffs to appear here that don't.... Can you make sure that things here are up to date? I've hit a snag in my efforts and was trying to bring these in when I noticed that.

Checking patch sys/boot/efi/loader/main.c...
Checking patch sys/boot/efi/libefi/efizfs.c...
Checking patch sys/boot/efi/libefi/devpath.c...
Checking patch sys/boot/efi/include/efizfs.h...
Checking patch sys/boot/efi/include/efilib.h...
Applied patch sys/boot/efi/loader/main.c cleanly.
Applied patch sys/boot/efi/libefi/efizfs.c cleanly.
Applied patch sys/boot/efi/libefi/devpath.c cleanly.
Applied patch sys/boot/efi/include/efizfs.h cleanly.
Applied patch sys/boot/efi/include/efilib.h cleanly.
 OKAY  Successfully committed patch.
Checking patch head/sys/boot/efi/loader/loader_efi.h...
error: head/sys/boot/efi/loader/loader_efi.h: does not exist in index
Checking patch head/sys/boot/efi/loader/Makefile...
error: head/sys/boot/efi/loader/Makefile: does not exist in index
Checking patch head/sys/boot/efi/loader/devicename.c...
error: head/sys/boot/efi/loader/devicename.c: does not exist in index
Checking patch head/sys/boot/efi/libefi/devicename.c...
error: head/sys/boot/efi/libefi/devicename.c: does not exist in index
Checking patch head/sys/boot/efi/libefi/Makefile...
error: head/sys/boot/efi/libefi/Makefile: does not exist in index
Checking patch head/sys/boot/efi/include/efilib.h...
error: head/sys/boot/efi/include/efilib.h: does not exist in index

Patch Failed!

The raw diff also looks messed up (it doesn't seem to include the head/... paths that gave arc patch heartburn). These issues may be the source of the mismatch between what I think the path does and what eric has maintained it does.... it's a phab failure of some sort.

Merged from current and updated. No conflicts, so I think the tests are still good.

This now cleanly applies with 'arc patch' for me

sys/boot/efi/boot1/Makefile
40 ↗(On Diff #33747)

excess newline

sys/boot/efi/boot1/Makefile.fat
5 ↗(On Diff #33747)

excess newline here

i've started integrating these changes, and have noticed a few problems...

sys/boot/efi/boot1/Makefile
86–94 ↗(On Diff #33747)

None of these changes are needed.

111 ↗(On Diff #33747)

because this is already here, LIBSTAND is already properly defined, and we don't need to add the .PATH statements.

119 ↗(On Diff #33747)

This is good, however.

145 ↗(On Diff #33747)

I have another change that hoists this up into Makefile.inc

sys/boot/efi/boot1/Makefile.fat
5 ↗(On Diff #33747)

In the integration I'm doing, I'm just dropping this change.

sys/boot/efi/boot1/boot1.c
461–465 ↗(On Diff #33747)

This shouldn't be deleted.

466–477 ↗(On Diff #33747)

This shouldn't be deleted either.

Do you want me to fix these, or do you want me to just sit tight?

eric_metricspace.net marked 8 inline comments as done.

Addressed review comments

Addressed review comments

Thanks for the updated. I'll pull this into the work I've done (I did that with a previous revision as well)

I'll look closely to see if there's anything else that's a show stopper. My quick spot check just shows niggles that can be handled after the commit and/or cleaned up prior to the commit. I'll get this in front of the boot1/loader changes I'm planning for efi boot manager. Though I'm wondering more and more why we even have boot1.... But that question isn't quite ripe to explore.

Note: this needs to get a test before it's merged, because I did modify the code. But a basic smoke-test ought to do it.

I wonder why we even have boot1 anymore after these changes.... It's turning into a mini loader.efi. I wonder why we don't just kill it entirely. We don't need boot volume selection (that's supposed to be the EFI boot BIOS's job). We seem to duplicate a lot of code just to load /boot/loader.efi off of UFS or ZFS partitions. However, we could just use loader.efi directly (though some of the root detection code would need to be tweaked). boot1 used to be small, limited and basically unchanging (so updates weren't needed). Now that boot1.efi has hitched its wagon to loader.efi, I wonder why have an extra boot stage that seems to be good at messing up it's partition guessing...

I'm going to go ahead and commit this (after adding back the bits I keep saying were bogusly deleted), but I'm thinking strongly that we consider just moving to loader.efi installed into \efi\freebsd\loader.efi starting in FreeBSD 12 since we'll be able to point efibootmgr at it by then.

In D10447#262729, @imp wrote:

I wonder why we even have boot1 anymore after these changes.... It's turning into a mini loader.efi. I wonder why we don't just kill it entirely. We don't need boot volume selection (that's supposed to be the EFI boot BIOS's job). We seem to duplicate a lot of code just to load /boot/loader.efi off of UFS or ZFS partitions. However, we could just use loader.efi directly (though some of the root detection code would need to be tweaked). boot1 used to be small, limited and basically unchanging (so updates weren't needed). Now that boot1.efi has hitched its wagon to loader.efi, I wonder why have an extra boot stage that seems to be good at messing up it's partition guessing...

I'm going to go ahead and commit this (after adding back the bits I keep saying were bogusly deleted), but I'm thinking strongly that we consider just moving to loader.efi installed into \efi\freebsd\loader.efi starting in FreeBSD 12 since we'll be able to point efibootmgr at it by then.

I have had the same thought since ... long time;) the only reason given has been that it is easier to have signed boot1. I still think it will be reasonable to remove it and use just loader binary.

That's how the EFI boot stuff originally functioned. At some point, boot1.efi got added, but in the very beginning, you just installed loader.efi to the ESP.

That's how the EFI boot stuff originally functioned. At some point, boot1.efi got added, but in the very beginning, you just installed loader.efi to the ESP.

My point is that boot1 was originally a tiny thing we could toss on a partition and get EFI booting. Not very well supported, but hey it did it's best and it didn't change enough to bother with.

Now, it's gotten to the point where it's got everything except scripting that /boot/loader.efi has... why continue down that path... let's just use /boot/loader :)

I'm thinking for us, the best transition point would be when we start having efibootmgr... which I'm doing code reviews on right now...

Addressed review comments

Thanks for the updated. I'll pull this into the work I've done (I did that with a previous revision as well)

I'll look closely to see if there's anything else that's a show stopper. My quick spot check just shows niggles that can be handled after the commit and/or cleaned up prior to the commit. I'll get this in front of the boot1/loader changes I'm planning for efi boot manager. Though I'm wondering more and more why we even have boot1.... But that question isn't quite ripe to explore.

[Edit: bogus comment about FICL being there removed]

In D10447#262876, @imp wrote:

Addressed review comments

Thanks for the updated. I'll pull this into the work I've done (I did that with a previous revision as well)

I'll look closely to see if there's anything else that's a show stopper. My quick spot check just shows niggles that can be handled after the commit and/or cleaned up prior to the commit. I'll get this in front of the boot1/loader changes I'm planning for efi boot manager. Though I'm wondering more and more why we even have boot1.... But that question isn't quite ripe to explore.

  • Looking at the symbols defined / referenced in boot1 there's the full ficl/forth interpreter as well... (I have this code rebased, with some of the removals reinserted).. will need to check further... -

[edited: this was due to the refactor I've done, fixed]

I'll give it one last round of testing over lunch.

In D10447#262876, @imp wrote:

Addressed review comments

Thanks for the updated. I'll pull this into the work I've done (I did that with a previous revision as well)

I'll look closely to see if there's anything else that's a show stopper. My quick spot check just shows niggles that can be handled after the commit and/or cleaned up prior to the commit. I'll get this in front of the boot1/loader changes I'm planning for efi boot manager. Though I'm wondering more and more why we even have boot1.... But that question isn't quite ripe to explore.

  • Looking at the symbols defined / referenced in boot1 there's the full ficl/forth interpreter as well... (I have this code rebased, with some of the removals reinserted).. will need to check further... -

[edited: this was due to the refactor I've done, fixed]

I'll give it one last round of testing over lunch.

Not sure how best to share my rebased version... ah, let's do a bogus review...

https://reviews.freebsd.org/D12659

This revision was automatically updated to reflect the committed changes.