Make root mount wait mechanism smarter, by making it wait only if it's actually
neccessary to find the root device.
Details
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 581 Build 581: arc lint + arc unit
Event Timeline
sys/kern/vfs_mountroot.c | ||
---|---|---|
687–688 | This looks like it's more for debugging than for an end user. Maybe put this under boot verbose? The reason is that it's not an error until it's an error, meaning that if we're going to wait for the device when it's not there yet, then printing errors is confusing. Putting it under boot verbose is a nice way for people to get insight into why a boot may be stalled... |
sys/kern/vfs_mountroot.c | ||
---|---|---|
687–688 | Actually, it was a debugging printf that snuck in. Removed. |
sys/kern/vfs_mountroot.c | ||
---|---|---|
945 | Please come with much shorter name for the function. | |
965 | I think this is not needed, and probably even dangerous, for the nfs case. I have no better suggestion than ugly strstr("nfs") hack, but it is still much better than preventing nfs mount due to the issues with the local storage subsystem. | |
985 | This blank line is not needed. |
sys/kern/vfs_mountroot.c | ||
---|---|---|
965 | For the non-NFS case it's definitely needed, because without it, partitions don't get created before we check for the device presence, and we end up doing the full wait in all cases, defying the purpose of this patch. For the NFS case, this code is not reached - just like for ZFS, we call vfs_mountroot_wait() and return few lines earlier. |
Alexander, just to be extra safe, do you perhaps see a scenario - like with graid - where this could break stuff?
Is this code used for all boot mounting, or only for root FS? If only root, then I worry about case when root is single disk, available immediately, while there is some other FS on graid, which incompleteness forces delay for 30 seconds before device will be registered. Won't other FS mounting fail in this case?
That's a good question. Yeah, I think things might break in this case. Do you have any idea how to fix it?
One idea would be to put this vfs_mountroot_wait_if_neccessary() into vfs_donmount(). It should work, but it's ugly. Another would be to add a "root mount hold counter" and export it via sysctl, and then in the mountcritlocal script do the "mount -a", as it's done now, and if it fails, wait for the sysctl to go to zero and retry.
I know very little about all this mounting code, so hardly I am an authority here. But I think that "root mount hold" should be waited in any case, the only question is how late can we postpone it. Some applications may access raw devices without mounting, and for them it would be better if device detection process was really completed before their start. I think it should be safe to proceed with root (or other critical FS) mounting if we have required devices, but at some point before starting random daemons we still may prefer to stop and wait for the rest of holds.
Instead of hold counter it could be better to export list of holds, as it is printed now on console. It is not always informative, but at least something.
I would not rework the patch. While mav@'s comments are good, they go beyond the simple problem that the hold solved. The hold only solved the holding off of mounting root while interrupt-driven discovery was still ongoing. You patch, trasz@, avoids that we wait mounting root when there was no point waiting, as the device was already there. That is a good change.
User space could never assume that devices are there when root gets mounted, though I'm sure this assumption has snuck into our own code that gets run from rc.d. The user space problem isn't solved by making lists of devices that can hold the root mount -- that's pretty much any block device. Such a list does not solve the fundamental problem that's hot plug.
GEOM in particular is very nasty to work with when it comes to creating partitions and suddenly having all kinds of gems appear due to pre-existing data.
What I'm thinking off is an interface that allows use space to wait for the kernel to become silent with respect to block device discovery and/or GEOM tasting. This interface can be used in the various mount scripts under /etc/rc.d as barriers.
Anyway: I don't think reworking the patch is a good idea, irrespective what else we may want to do.
Marcel, I agree a better interface is needed, eventually. However, this change without the mountcritlocal part can break people's setups, and that's something I'd like to avoid.
That said, I think it would be better to commit it in two parts, first the already approved fix, and then the new sysctl and rc scripts.