Page MenuHomeFreeBSD

zfsd: fault disks that generate too many I/O delay events
ClosedPublic

Authored by asomers on Nov 28 2023, 11:07 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, May 14, 2:14 PM
Unknown Object (File)
Wed, May 8, 11:44 AM
Unknown Object (File)
Wed, May 8, 11:44 AM
Unknown Object (File)
Wed, May 8, 9:36 AM
Unknown Object (File)
Apr 30 2024, 12:56 PM
Unknown Object (File)
Apr 28 2024, 3:52 PM
Unknown Object (File)
Apr 26 2024, 2:28 AM
Unknown Object (File)
Apr 25 2024, 9:29 PM

Details

Summary

If ZFS reports that a disk had at least 8 I/O operations over 60s that
were each delayed by at least 30s (implying a queue depth > 5 or I/O
aggregation, obviously), fault that disk. Disks that respond this
slowly can degrade the entire system's performance.

MFC after: 2 weeks
Sponsored by: Axcient

Test Plan

unit and integration tests added. Change has been run in production for 4 months, where it faults about 0.2% of HDDs annually.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

delphij added inline comments.
cddl/usr.sbin/zfsd/case_file.h
239

Should these converted to const int's instead of being enum?

This revision is now accepted and ready to land.Nov 29 2023, 6:47 AM
asomers added inline comments.
cddl/usr.sbin/zfsd/case_file.h
239

Good question. They certainly could be. I suspect that it was @gibbs who initially created them as enums. BTW, in a future commit I'm planning to make these customizable via vdev properties, as suggested by @allanjude . I'll convert them to const int at that time.

I'd have liked to see these configurable thresholds for people that want to more or less aggressively fault things.
But as is, it's fine, and it's a nice to have not something absolutely required.

In D42825#976834, @imp wrote:

I'd have liked to see these configurable thresholds for people that want to more or less aggressively fault things.
But as is, it's fine, and it's a nice to have not something absolutely required.

Yes, definitely. I'm going to make them configurable next, using Allan Jude's suggested method.

FYI: One of the things I have planned for after the first of the year is to start publishing from the CAM io scheduler latencies that exceed some threshold, either static (> 30s) or dynamic (> 5MAD from median). I'd planned on using this stream to accumulate in real time the notion of an unresponsive disk and use that to inform our system's continued use of the disk. Interesting to see parallel work.