HomeFreeBSD

zed: Ensure spare activation after kernel-initiated device removal

Description

zed: Ensure spare activation after kernel-initiated device removal

In addition to hotplug events, the kernel may also mark a failing vdev
as REMOVED. This was observed in a customer report and reproduced by
forcing the NVMe host driver to disable the device after a failed reset
due to command timeout. In such cases, the spare was not activated
because the device had already transitioned to a REMOVED state before
zed processed the event.
To address this, explicitly attempt hot spare activation when the
kernel marks a device as REMOVED.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17187

Details

Provenance
Ameer Hamza <ahamza@ixsystems.com>Authored on Mar 28 2025, 7:48 PM
GitHub <noreply@github.com>Committed on Mar 28 2025, 7:48 PM
Parents
rGdd2a46b5e634: config: cache results of kernel checks (#17106)
Branches
Unknown
Tags
Unknown