Page MenuHomeFreeBSD

zfsboot: drvsize() may be unusable on some systems
ClosedPublic

Authored by tsoome on May 3 2017, 8:08 PM.

Details

Summary

From user report, the errors are seen:
error 1
error 1
gptzfsboot: error 1 lba 4294967288
gptzfsboot: error 1 lba 1
gptzfsboot: no ZFS pools located, can't boot

The first two errors above are from issuing INT13 EAX=0x4800, meaning we
need to check if EDD is available and use EAX=0x800 if not.

For an workaround I'm using the similar idea as in biosdisk.c - first probe
ah=8h, then check if we have EDD.

Note we would like to see the correct disk size info, but we *may*
get away with anything >64MB, so we could at least test 2 zfs pool labels
on whole disk setup and not to freak out the INT13 interface.

If we get away with initial disk probing, then we have partition sizes from
the partition table and we should be able to complete the disk probing.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

tsoome created this revision.May 3 2017, 8:08 PM
julian accepted this revision.May 4 2017, 3:32 AM
julian added a subscriber: julian.

DOn't see issues in this code but see other comment

sys/boot/i386/zfsboot/zfsboot.c
475 ↗(On Diff #27992)

in the very oldest code (27 years ago) we looked at the table to decide a default size and if we couldn't do any better, we just went ahead and used that. I also once had test code that didn't trust it and instead did a binary type search.

I also saw one version of bios (back then) that didn't clear the error status but only updated it if there WAS an error. it was up tot he caller to make sure you went in with the error set to zero, which I remember lost me a day or so of time (but I don't remember the details).
result was that after an error every following command appeared to get an error. Your output reminds me of that.

This revision is now accepted and ready to land.May 4 2017, 3:32 AM
tsoome added a comment.May 4 2017, 5:15 AM

This fix did remove the first 2 errors, but remaining 2 are still there, however, they appear to be related something else - looks like we are getting wrap over 0 there as the reported lba 4294967288 is 0x00000000fffffff8. I think the remaining ones have to be dealt with separate issues/commits anyhow.

This revision was automatically updated to reflect the committed changes.