Changeset View
Standalone View
sys/geom/mirror/g_mirror_ctl.c
Show First 20 Lines • Show All 976 Lines • ▼ Show 20 Lines | for (i = 0; i < (u_int)*nargs; i++) { | ||||
if (g_mirror_ndisks(sc, -1) == sc->sc_ndisks) { | if (g_mirror_ndisks(sc, -1) == sc->sc_ndisks) { | ||||
sx_xunlock(&sc->sc_lock); | sx_xunlock(&sc->sc_lock); | ||||
G_MIRROR_DEBUG(1, | G_MIRROR_DEBUG(1, | ||||
"All disks connected in %s, skipping.", | "All disks connected in %s, skipping.", | ||||
sc->sc_name); | sc->sc_name); | ||||
continue; | continue; | ||||
} | } | ||||
sc->sc_ndisks = g_mirror_ndisks(sc, -1); | sc->sc_ndisks = g_mirror_ndisks(sc, -1); | ||||
if (sc->sc_state == G_MIRROR_DEVICE_STATE_RUNNING && | |||||
g_mirror_ndisks(sc, G_MIRROR_DISK_STATE_ACTIVE) > 0) { | |||||
/* Bump genid after forgetting disconnected disks */ | |||||
sc->sc_bump_id |= G_MIRROR_BUMP_GENID; | |||||
g_mirror_event_send(sc, 0, | |||||
G_MIRROR_EVENT_WAIT | G_MIRROR_EVENT_DEVICE); | |||||
} else { | |||||
LIST_FOREACH(disk, &sc->sc_disks, d_next) { | LIST_FOREACH(disk, &sc->sc_disks, d_next) { | ||||
g_mirror_update_metadata(disk); | g_mirror_update_metadata(disk); | ||||
} | |||||
markj: What's the point of this part of the change? If g_mirror_ndisks(sc, -1) != sc->sc_ndisks, then… | |||||
Done Inline ActionsI don't know the exact motivation for this change. Cindy? g_mirror_ndisks(sc, -1) != sc->sc_ndisks can sometimes occur at runtime, when a disk disappears (STATE_DISCONNECTED) or is manually removed (g_mirror_ctl_remove). But it isn't clear to me if that can race with this function or not. If it is an invariant, maybe we should MPASS(sc->sc_state == G_MIRROR_DEVICE_STATE_STARTING);. cem: I don't know the exact motivation for this change. Cindy?
`g_mirror_ndisks(sc, -1) != sc… | |||||
Not Done Inline ActionsArgh ... my previous reply was misaligned. Here you go:
cindy.yang_isilon.com: Argh ... my previous reply was misaligned. Here you go:
> I added this bit b/c during testing… | |||||
Not Done Inline ActionsSorry for being picky, but gmirror has historically been pretty tricky to reason about, so I really want to make sure I understand the intent of each change. Can you elaborate on what "failed to add" means? In general, any time a mirror is in a degraded state, the remaining components should have a larger genid or syncid than the missing components. If a component is kicked out while the mirror is running, we should bump the genid. If the mirror is started with one or more components missing, we bump the syncid (upon the first write to the mirror). Unrelated to that, the condition in the if statement is kind of weird. How can we have a running mirror with 0 active disks? Why do we not just always bump the genid?
Currently it does that only if the mirror is degraded, so "force" isn't really true.
The "insert" verb also updates sc->ndisks but doesn't update genid. I agree that there is no harm in updating the genid here, but I don't see the point of it given that this change also adds code to kick out mirrors with inconsistent md_all fields. markj: Sorry for being picky, but gmirror has historically been pretty tricky to reason about, so I… | |||||
Not Done Inline ActionsNo problem. Thanks for reviewing this patch. The "failed to add" scenario actually came out of a field issue. On a factory fresh node, during early boot a couple of SEDs dropped out while gmirror tried to replicate/sync the root device from the primary copy onto the secondary copy. It happened twice in a row and finally the 3rd attempt succeeded and the root mirror transitioned to RUNNING. The data I saw was that one of the disconnected components had the same genid as the that of the valid ones. I think you're correct w.r.t. using the highest syncid as a tie breaker to pick out the correct component. All we are interested is that genid gets incremented when the mirror is degraded. As our userspace daemon runs the rebalance routine as often as it can, forget() is called to proactively update genid as needed. This is a nice trick for gmirror to outright reject old component on the next boot based on genid comparison. I think the g_mirror_ndisks(sc, G_MIRROR_DISK_STATE_ACTIVE) > 0 is me being paranoid and trying to avoid hitting the assert in g_mirror_bump_genid(). cindy.yang_isilon.com: No problem. Thanks for reviewing this patch.
The "failed to add" scenario actually came out of… | |||||
Done Inline ActionsShould I drop this section, at least for the initial upstream revision? cem: Should I drop this section, at least for the initial upstream revision? | |||||
Done Inline ActionsSounds good to me. cindy.yang_isilon.com: Sounds good to me. | |||||
Done Inline ActionsThanks, I'd also prefer to drop it for this revision, as it's logically somewhat separate. markj: Thanks, I'd also prefer to drop it for this revision, as it's logically somewhat separate. | |||||
} | } | ||||
sx_xunlock(&sc->sc_lock); | sx_xunlock(&sc->sc_lock); | ||||
} | } | ||||
} | } | ||||
static void | static void | ||||
g_mirror_ctl_stop(struct gctl_req *req, struct g_class *mp, int wipe) | g_mirror_ctl_stop(struct gctl_req *req, struct g_class *mp, int wipe) | ||||
{ | { | ||||
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines |
What's the point of this part of the change? If g_mirror_ndisks(sc, -1) != sc->sc_ndisks, then we started with a missing disk, in which case we must have bumped the syncid on the remaining disks, so we will not use the lost disk's metadata if it reappears.