Page MenuHomeFreeBSD

Fix vdev_geom_attach_by_guids for partitioned disks
ClosedPublic

Authored by asomers on Apr 11 2017, 11:27 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Nov 22, 7:45 PM
Unknown Object (File)
Oct 17 2024, 7:43 AM
Unknown Object (File)
Oct 17 2024, 7:43 AM
Unknown Object (File)
Oct 17 2024, 7:42 AM
Unknown Object (File)
Oct 17 2024, 7:22 AM
Unknown Object (File)
Sep 30 2024, 9:24 PM
Unknown Object (File)
Sep 30 2024, 3:57 AM
Unknown Object (File)
Sep 29 2024, 2:09 AM
Subscribers

Details

Summary

Fix vdev_geom_attach_by_guids for partitioned disks

When opening a vdev whose path is unknown, vdev_geom must find a geom
provider with a label whose guids match the desired vdev. However, due to
partitioning, it is possible that two non-synonomous providers will share
two labels. For example, if the first partition starts at the beginning of
the drive, then ada0 and ada0p1 will share the first label. More troubling,
if the last partition runs to the end of the drive, then ada0p3 and ada0
will share the last label. If vdev_geom opens ada0 when it should've opened
ada0p3, then the pool won't be readable. If it opens ada0 when it should've
opened ada0p1, then it will corrupt some other partition when it writes the
3rd and 4th labels.

Fix this situation by modifying the search algorithm to prefer geom
providers that have all four labels intact. If no such provider exists,
then open whichever provider has the most.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

I have no serious objections, this patch should probably work, but can not say I like it for two reasons:

  • I don't like idea of counting number of labels. It may work, but I'd prefer some other mechanism to identify that we are detecting labels in wrong offset.
  • vdev_geom_attach_by_guids() code is already broken from the point of GEOM locking -- it drops topology lock while iterating over the list of providers. Your patch additionally reduces race window by postponing the vdev_geom_attach() call.
In D10365#214878, @mav wrote:

I have no serious objections, this patch should probably work, but can not say I like it for two reasons:

  • I don't like idea of counting number of labels. It may work, but I'd prefer some other mechanism to identify that we are detecting labels in wrong offset.

I tried a few other approaches before I settled on this one. First I based it on the label's asize, reasoning that a vdev opened as the whole disk when it should've been just the partition would have an asize that doesn't match the provider. However, that approach doesn't work for two reasons. First, the label's asize actually refers to the top-level vdev's size, not this particular device's size. Thus, in a heterogeneous pool, some vdev's labels' asizes won't match their physical size. Secondly, this would cause problems when a provider's size changes, as in the case of expanding an iSCSI LUN. My second approach was to use the label's path field and specifically ensure that any partition identifier matches the geom provider. But I didn't like that idea, because it would require ZFS to grow knowledge of partitioning, and of multiple partition schemes. It also wouldn't work for VMs that use one of the host's physical partitions for their pool. In that case, the device would have a label that could be imported from either the VM or the host, but not both, depending on where the pool was created. My third approach was to add a label_offset field to the label, which would record the offset of that label from the beginning of the device. That should always work, even in the case of LUN expansion. However, it would require adding a field to the on-disk format of the label. I didn't want to do that, so I settled on the approach here instead.

  • vdev_geom_attach_by_guids() code is already broken from the point of GEOM locking -- it drops topology lock while iterating over the list of providers. Your patch additionally reduces race window by postponing the vdev_geom_attach() call.

Yeah, that's a problem. Do you have any ideas about how to fix it?

In D10365#214878, @mav wrote:
  • vdev_geom_attach_by_guids() code is already broken from the point of GEOM locking -- it drops topology lock while iterating over the list of providers. Your patch additionally reduces race window by postponing the vdev_geom_attach() call.

Yeah, that's a problem. Do you have any ideas about how to fix it?

I had no good idea for complete fix from when been there. This code works different from what other GEOM classes do, that is why it does not fit. What's about keeping the present logic, I am not sure it will work either, because after you open exclusively ada0, you may be unable to do the same for ada0p1.

In D10365#214910, @mav wrote:
In D10365#214878, @mav wrote:
  • vdev_geom_attach_by_guids() code is already broken from the point of GEOM locking -- it drops topology lock while iterating over the list of providers. Your patch additionally reduces race window by postponing the vdev_geom_attach() call.

Yeah, that's a problem. Do you have any ideas about how to fix it?

I had no good idea for complete fix from when been there. This code works different from what other GEOM classes do, that is why it does not fit. What's about keeping the present logic, I am not sure it will work either, because after you open exclusively ada0, you may be unable to do the same for ada0p1.

ZFS isn't completely different from how gmirror or graid3 work. It might be possible to copy their tasting strategy. Certainly beyond the scope of this change though. Are you comfortable with this change as is?

ZFS isn't completely different from how gmirror or graid3 work. It might be possible to copy their tasting strategy.

Most of GEOM classes do tasting from single geom even thread. It allows legally drop the lock. vdev_geom does it from arbitrary thread.

Certainly beyond the scope of this change though. Are you comfortable with this change as is?

I'll survive it.

This revision was automatically updated to reflect the committed changes.