Initialize metaslab range trees in metaslab_init
Motivation
We've noticed several zloop crashes within Delphix generated
due to the following sequence of events:
- A device gets expanded and new metaslabas are allocated for it. These metaslabs go through metaslab_init() but haven't gone through metaslab_sync_done() yet. This meas that the only range tree that's actually set is the ms_allocatable. All the others are NULL.
- A vdev_initialization is issues and vdev_initialize_thread starts processing one of these new metaslabs of the expanded vdev.
- As part of vdev_initialize_calculate_progress() we call into metaslab_load() and metaslab_load_impl() which in turn tries to dereference the metaslabs trees that are still NULL and therefore we crash.
The same failure can come up from the vdev_trim code paths.
This Patch
We considered the following solutions to deal with this issue:
- Add logic to vdev_initialize/trim to skip those new metaslabs. We decided against this as it would be good to avoid exposing this lower-level detail to higer-level operations.
- Have metaslab_load_impl() return early for new metaslabs and thus never touch those range_trees that are NULL at that time. This seemed more of a work-around for the bug and not a clear-cut solution.
- Refactor our logic so all metaslabs have their range_trees created at the time of their creatin in metaslab_init().
In this patch we decided to go with [C] because:
(1) It doesn't expose more metaslab details to higher level
operations such as vdev initialize and trim.
(2) The current behavior of creating the range trees lazily
in `metaslab_sync_done()` is unnecessarily complicated.
(3) Always initializing the metaslab range_trees makes other
parts of the codebase cleaner. For example, we used to use `ms_freed` as the reference value for knowing whether all the range_trees have been initialized. Now we no longer need to do that check in most places (and in the few that we do we use the `ms_new` boolean field now which is more readable).
Side Changes
Probably due to a mismerge we set ms_loaded to B_TRUE twice
in metasloab_load_impl(). In this patch we remove the extraneous
assignment.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #11737