Page MenuHomeFreeBSD

cam: Fail the disk if READ CAPACITY returns 4/2 asc/ascq
ClosedPublic

Authored by imp on Jul 8 2025, 11:17 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Sep 12, 2:31 AM
Unknown Object (File)
Sep 10 2025, 6:50 AM
Unknown Object (File)
Sep 9 2025, 11:30 AM
Unknown Object (File)
Sep 1 2025, 3:19 PM
Unknown Object (File)
Aug 26 2025, 11:05 AM
Unknown Object (File)
Aug 23 2025, 2:29 AM
Unknown Object (File)
Jul 29 2025, 8:50 AM
Unknown Object (File)
Jul 29 2025, 12:26 AM
Subscribers
None

Details

Summary

HGST disks that are sick are returning 44/0 for START UNIT (which we
ignore) and then 4/2 on READ CAPACITY. START UNIT should be enough for
READ CAPACITY to succeed or UNIT ATTENTION. However, we get NOT_READ +
4/2 back. I've seen this on several models of HGST drives. Although the
timeout is 5s for READ_CAPACITY, we wait the full 30s for
READ_CAPACITY_16. This causes us to stall booting as we start to taste
as soon as we release the final hold... but the tasting means
g_wait_idle() takes now takes over 5 minutes to clear since we do this
for all the opens.

Perhaps both should use 5s. The READ_CAPACITY_16 code has used either
60s or 30s since it was originally committed in 2003, but that original
commit does not explain why (is there a reason, or was it just something
arbitrary). Perhaps both should be more like 3s. This would also be less
bothersom and would reduce the tasting failure time to 30s or so. But
there's no sense in repeated failures, especailly since there's no way
to re-taste a failure that was due to this. It's better not to adjust
the timeouts here (though that might be warranted) and fail the periph.
Changing the timeouts is orthogonal to this problem.

Perhaps we should fail the periph when START UNIT fails with the same
codes we check in the read capacity path. I'm reluctant to do such a
global change since it's in cam_periph, and there seems no good way to
flag that we want this behavior. It's also a big magical when it runs.

Sponsored by: Netflix

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

imp requested review of this revision.Jul 8 2025, 11:17 PM
This revision was not accepted when it landed; it landed in state Needs Review.Jul 10 2025, 5:07 PM
This revision was automatically updated to reflect the committed changes.