HomeFreeBSD

mlx5: Decrease FW init timeout from 120 seconds to 5 seconds

Description

mlx5: Decrease FW init timeout from 120 seconds to 5 seconds

When encountering a failed NIC, the mlx5 driver will wait up to 120
secs for the firmware to respond. This timeout is absurdly huge, and
leads to boot times of 40 minutes to over an hour on our servers when a
NIC fails. This is because the driver will attempt to attach to the
failed NIC multiple times (once for each driver loaded after mlx5),
and wait 2 minutes on each attempt. This happens because the mlx5
driver is still the best match for the device. This delay then
triggers watchdog timeouts in our environment, rendering servers
with a failed NIC entirely unbootable without manual intervention.

Note that FW_INIT_WARN_MESSAGE_INTERVAL must also be decreased, as
it must be less than the init timeout.

Reviewed by: kib (initial version, before reducing warn interval)
Sponsored by: Netflix

Details

Provenance
gallatinAuthored on Jun 29 2025, 8:51 PM
Parents
rGce929c4769c9: ddb: print FIN flag only one when printing BBLog entries
Branches
Unknown
Tags
Unknown