Remove global uint64_t pool_guid and instead iterate over all pools that efizfs_get_zfsinfo_list() provides. Make insert_zfs() sort probed pools so that pool(s) probed at the boot device are at the head of the list. Try to boot from pool(s)s that do not match the boot device criteria, unless a a strict boot_policy is configured. This fixes two scenarios. First scenario is that if the very first device to probe is a spare member of a pool, it will be used to instantiate a pool but (pd->pd_handle == boot_img->DeviceHandle) won't be true, thus global pool_guid won't be populated and ZFS boot won't be tried. Second scenario is that potentially we may have several pools, and the bootable one was not probed first.
Details
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped - Build Status
Buildable 70421 Build 67304: arc lint + arc unit
Event Timeline
So I agree in part, and disagree in part for this.
I've always thought the global pool_guid was bad, but never was motivated to fix it.
I disagree that this is always desirable behavior. There are definitely times we want to only try the drive we booted off of. The classic example is when we're booting the installer: we don't want to randomly pick something else because of device ordering in UEFI (so boot 15.0 to do an upgrade or repair and but it boots off the zpool you are trying to repair). There's other times you want to fail for sure if we can't boot off this device.
However, I also agree there's times we want to search. When you know there's only one bootable system, you want to boot it.
So I think we need a policy knob that can be set in the environment (or the fallback env we read from the ESP) that dictates booting policy. We'd set it to 'boot-device-only' (to pick a name) when we're in an installer or 'loader matched to the root' setup. And we'd set it to something like 'prefer-boot' or 'any-bootable-zpool' or 'any-ufs' or 'any' (not sure the set of valid values here, but you get the idea). This would let both sets of users get what they want, and have more of the use cases handled by default (or with trivial re-tooling).
I generally like the structure here. It would be easy to implement the knob I want in main.c, I think, so I'll take a stab at doing that. But I want to think through all the weird cases that I know people have chatted with me about over the years before I make a more concrete proposal. There's time you want to only do the boot device, times you want to prefer it (which I think should be the default) and times you want to do them in any order.
Do we also want to augment probe_zfs_currdev. The current 'sanity_check()' function for it is too weak. I think for ZFS we want to see if there's a BE before we try it / select it.
tl;dr: Almost the right fix. We could likely get an acceptably close fix if we just tried the zi that matched the bootdev first, then went looking. Especially if we exclude all zpools that don't have a boot environment. I'm not sure how the sanity check actually works now, to be honest, with BEs, so I'd like to dive into the code a bit more and understand that better. That likely can be done with a zi_boot_dev bool in zfsinfo_t, and setting it when we insert it into the list.
So I'll look at this first thing in the morning... It's late here now...
Echoing Warner, I'd like to notice that we've had problems with TrueNAS Core when it mounted ZFS root from a wrong pool called boot-pool on systems that happen to have two. IIRC root mounting code does not (or did not) verify the pool guid, so what we saw many times is that we got kernel from one (proper) pool, but root form another, causing some nasty problems. While it is a different problem, I worry that this change may extend it to loader also.
So I can think of really only two policies (the first two) that would be useful, and a bunch that might not be:
loader.boot-policy:
"bootdev" Only boot off the bootdev (unpatched)
"bootdev-preferred" Try the bootdev first, otherwise any will do
"any" Boot off any bootable device you find (doesnt't try bootdev first, so like bootdev-preferred, but w/o any determinism due to bios ordering)
"zfs+<above>" Boot anything you find that's zfs. Could also do ufs. Not sure what this usefully adds over the first two though
Or we could call it 'strict' and 'relaxed' to keep it simpler. The installer would need to set 'strict' I'd think, via the ESP \efi\freebsd\loader.env. I think the rest could go with 'relaxed'.
I usually don't sweat names too much, but here it's more of a 'default root search' policy, so maybe a better name is needed....
What about ordering the zfsinfo? If (pd->pd_handle == boot_img->DeviceHandle) add entry to head, otherwise add it to the tail.
https://reviews.freebsd.org/D55107 might be a reasonable first stab at modifying this.
That would work as well... I worked up a settable policy, but that would be an improvement over this. Take a look at the straw-man code in the review I just posted to see which path might be better... If you insert it first, then we'd still need the on_bootdev flag, but we'd just break when we found the first one that's false in strict mode.
If you do the head insert, I'll fix my followup.
OK. I've added the following to this patch to implement the 'insert at head for bootdev devices'
diff --git a/stand/efi/libefi/efizfs.c b/stand/efi/libefi/efizfs.c index 563cf841143a..20c6354cc58b 100644 --- a/stand/efi/libefi/efizfs.c +++ b/stand/efi/libefi/efizfs.c @@ -76,7 +76,7 @@ efizfs_get_guid_by_handle(EFI_HANDLE handle, uint64_t *guid) } static void -insert_zfs(EFI_HANDLE handle, uint64_t guid) +insert_zfs(EFI_HANDLE handle, uint64_t guid, bool head) { zfsinfo_t *zi; @@ -84,7 +84,10 @@ insert_zfs(EFI_HANDLE handle, uint64_t guid) if (zi != NULL) { zi->zi_handle = handle; zi->zi_pool_guid = guid; - STAILQ_INSERT_TAIL(&zfsinfo, zi, zi_link); + if (head) + STAILQ_INSERT_HEAD(&zfsinfo, zi, zi_link); + else + STAILQ_INSERT_TAIL(&zfsinfo, zi, zi_link); } } @@ -110,7 +113,8 @@ efi_zfs_probe(void) snprintf(devname, sizeof(devname), "%s%dp%d:", efipart_hddev.dv_name, hd->pd_unit, pd->pd_unit); if (zfs_probe_dev(devname, &guid, false) == 0) - insert_zfs(pd->pd_handle, guid); + insert_zfs(pd->pd_handle, guid, + zi->zi_handle == boot_img->DeviceHandle); } } }