loader: remove open_disk cache
ClosedPublic

Authored by tsoome on Feb 22 2017, 10:41 PM.

Details

Summary

The recent rework on implementing support to read disk/partition size from
partition table and implementing more strict validation for reads in biosdisk.c
did reveal the problem with open_disk cache. To save the memory and based on
the fact that with ufs there was only one file system reader active at the time,
the open_disk data was cached and shared. The cache implementation is still
there and active, however, it really does not do any good in case of zfs,
because the disk devices are always open with pools.

Moreover, sharing the open disk info in case of multiple file systems on the
same disk can render the pool unreadable because the last partition size will
be stored in open_disk structure and since this structure is shared, the pool
IO will be limited to the last partition size.

The fix is relatively simple and straight forward - we just remove the cache
part, ensuring every opened disk device will get its own private instance of
open disk info and the structure data will not be disturbed.

Test Plan

To replicate the issue, it will help to construct a bit extreme layout
of the partitioning. What I did was following:

  1. Allocate most of the disk from start for zpool.
  2. Allocate relatively small partition after the pool.

Populate the /boot directory to the pool and verify the loader will start.

Now copy copy some data to the pool, to trigger the rootbp updates, the
dataset create is needed as well, so we could hope we get the rootbp
pointing far enough.

Confirm the boot loader will fail to read the pool, resulting with
MOS read errors.

The better proof can be achieved by printing out the disk/partition
size we get from disk_ioctl in biosdisk.c - so it can be verified that
indeed, in case of the shared open disk info, the wrong size is used,
namely, the size of the last partition of the particular disk.

Since the fix is removing the shared info, every open device will get
its private instance and the problem is gone.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
tsoome retitled this revision from to loader: remove open_disk cache.Feb 22 2017, 10:41 PM
tsoome updated this object.
tsoome edited the test plan for this revision. (Show Details)
tsoome added reviewers: allanjude, imp.
smh accepted this revision.Feb 23 2017, 9:13 AM

LGTM

This revision is now accepted and ready to land.Feb 23 2017, 9:13 AM
tsoome updated this revision to Diff 26288.Mar 15 2017, 8:37 PM

ptable_print needs to open the nested partition with disk_open,
as the state is not shared any more.

This revision now requires review to proceed.Mar 15 2017, 8:37 PM
allanjude accepted this revision.Mar 15 2017, 8:40 PM

Approved By: allanjude

This revision is now accepted and ready to land.Mar 15 2017, 8:40 PM
smh added inline comments.Mar 16 2017, 10:27 AM
sys/boot/mips/beri/loader/beri_disk_sdcard.c
60 ↗(On Diff #26288)

Is this alignment issue a reviews problem or actually in the code?

tsoome updated this revision to Diff 26311.Mar 16 2017, 11:49 AM

whitespace cleanup

This revision now requires review to proceed.Mar 16 2017, 11:49 AM
tsoome marked an inline comment as done.Mar 16 2017, 11:50 AM
tsoome added inline comments.
sys/boot/mips/beri/loader/beri_disk_sdcard.c
60 ↗(On Diff #26288)

After careful check - it was space issue, thanks.

This revision was automatically updated to reflect the committed changes.
tsoome marked an inline comment as done.