Page MenuHomeFreeBSD

libsa/zfs: refactor vdev tree for better resiliency against stale disks
ClosedPublic

Authored by glebius on Aug 15 2025, 5:42 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Oct 11, 8:07 AM
Unknown Object (File)
Sat, Oct 11, 8:07 AM
Unknown Object (File)
Sat, Oct 11, 8:07 AM
Unknown Object (File)
Sat, Oct 11, 8:06 AM
Unknown Object (File)
Sat, Oct 11, 12:45 AM
Unknown Object (File)
Fri, Oct 10, 7:16 AM
Unknown Object (File)
Thu, Oct 9, 7:01 PM
Unknown Object (File)
Thu, Oct 9, 3:52 PM
Subscribers

Details

Summary

Before this change in vdev_insert() we would avoid inserting a duplicate
vdev to the list of children, however this duplicate being unlinked from
the parent is still stored on the global list with initialized v_guid.
Such leaked duplicate can later be returned by vdev_find(). After
6dd0803ffd31 such leaked vdev may be freed or pointing to a freed parent,
which leads to a loader crash. Note that the leak problem was there
before 6dd0803ffd31.

First, in vdev_insert() free conflicting vdev and return the existing one.
Update callers accordingly. There is only one caller that actually may
encounter this condition.

Second, eliminate global list of vdevs and make vdev_find() to work
recursively on the tree that a caller must provide. Of course, a chance
of GUID collision between members of different pools is extremely low. The
main motivation here is just to increase code robustness and fully isolate
the data structures of different pools being tasted by the loader, and
make easier debugging of bugs like the one being fixed.

Fixes: 6dd0803ffd31c60a84488d06928813353c6303d3

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Seems OK. Just one nit:

stand/libsa/zfs/zfsimpl.c
841

Do we really need SAFE here? Recursed vdev_find() won't modify anything, while from third-party modification this may be insufficient.

This revision is now accepted and ready to land.Aug 18 2025, 4:34 PM
stand/libsa/zfs/zfsimpl.c
841

Of course we need! It is not for the recursive call, but for the assignment to the main cycle variable inside the cycle.

stand/libsa/zfs/zfsimpl.c
841

Ah. I see. Or we could just use a different variable there.

Pass list, not vdev, to vdev_find(). No functional change against
previous version. To be used in later revisions.

This revision now requires review to proceed.Aug 19 2025, 7:45 PM
mav added inline comments.
stand/libsa/zfs/zfsimpl.c
841

I would still prefer separate variable below instead of _SAFE here.

This revision is now accepted and ready to land.Aug 19 2025, 11:32 PM