CAM: Common Access Method
I've tried opposite approach of adding LOWPRIO flag instead and using it only for background operations in few places, and marking BIOs without it high-priority in ATA/SCSI. But while testing it I've noticed that disk random IOPS drop to almost non-NCQ level on a mix of different priorities. And I am measuring the same on both WD and HGST. I don't understand what is going on there, may be I am missing something, but that is unacceptable trade-off to me. I've uploaded my present patch in case somebody wish to play, but probably won't commit it in this state.
Priority is working on top of tag, affecting specifically only commands tagged as SIMPLE . The ORDERED and HEAD tags still have their function as they are mandatory in their fencing semantics, while priority is a softer hint for a schduler.
I think that the intention of the feature from the manufacturers is for background sync and scrub workloads, not filesystem consistency operations. Regarding SAS, tags have always been the mechanism for setting priority and creating barriers. Ordered tags, head-of-queue tags, etc. mpr/mps support these, but they're largely unused because BIO_ORDERED was removed from FreeBSD. I'm not aware of SAS adopting the same SATA priority scheme.
More experiments with SATA WD REDs show that priorities there more like absolute with deadline. On WD20EFRX-68E on heavy random workload I see low-priority requests in presence of high-priority are all delayed for about a second, while on WD80EFZX-68U they are all delayed for about 5 seconds. So big difference makes me think it is unusable for differentiation of sync vs async requests, but should still be good for read/write vs scrub/initialization/etc differentiation. Unfortunately I still haven't found any capable SAS drive to check there, but considering SATL directly map one into another I suppose they should have the same (absolute) semantics.
Sat, Oct 24
Fri, Oct 23
It would be trivial to request high priority for synchronous writes in bwrite() and if desired synchronous reads in bread(). That would have effects for several filesystems.
Interesting. I have patches to iosched that marks metadata requests and topqueues them, but doesn't try to prioritize in the drive. It doesn't handle writes, though (we don't need them, but it's one of the reasons I've not conmitted)... it gives a modest boost to open latency, but not as much as the async open chuck is working on.
Thu, Oct 22
Sat, Oct 17
I've had good luck running this for 2 or 3 firmware cycles now at Netflix... We're down 3/4 in panics, and at least part of the remaining 1/4 appear to be due to a small mismerge of two files so they were out of sync with the rest of the state machine....
Jul 21 2020
I'm deploying this to one or two of the machines that we see panics from this every few days.
Jun 30 2020
I only looked at mpr, but this looks good to me. It's good you have a way to recreate it. I have random machines failing with this (about ~0.001%/day), but can't find one to recreate it. I like this approach, and is similar to the one I took with other commands and target reset. thanks for fixing.