Page MenuHomeFreeBSD

nvme: Improve timeout action
AcceptedPublic

Authored by imp on Mon, May 13, 11:42 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, May 18, 8:46 PM
Unknown Object (File)
Thu, May 16, 10:12 AM
Unknown Object (File)
Wed, May 15, 6:32 PM
Unknown Object (File)
Tue, May 14, 11:20 AM
Unknown Object (File)
Tue, May 14, 7:45 AM
Unknown Object (File)
Tue, May 14, 7:28 AM
Unknown Object (File)
Tue, May 14, 12:42 AM
Subscribers

Details

Reviewers
chs
chuck
mav
jhb
Summary

Today, we call the ISR process routine every time we hit the
timeout. This is wasteful and races the real ISR. It also can delay the
real ISR, resulting in more noise in the latency of requests than the
underlying hardware is delivering.

Instead, invent a soft deadline that starts at 1% into the actual
timeout for I/Os and admin transactions.. This is 300ms and 600ms
respectively. The oldest transaction is at the head of the queue, so if
the first one has been running more than those times, we start to call
_nvme_qpair_process_completions to see if we might be missing
interrupts. This avoids racing the ISR almost always, except when
something is getting stuck.

Sponsored by: Netflix

Test Plan

This is lightly tested on a bunch of good drives, and one cranky drive.

Sometimes I wonder why the I/O timeout is 30s and not like 3s.

We were trying to get PCI4 nvme drives and saw annoying latency issues with the hardware and (alas) software. This seems to fix it, but I have only a few miles on this code... But enough, I think in the normal path, to open it up to review.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 57698
Build 54586: arc lint + arc unit

Event Timeline

imp requested review of this revision.Mon, May 13, 11:42 PM
imp added reviewers: chs, chuck, mav, jhb.

I see no problems, but I have difficulties to believe that timeout handlers 1-2 times per second per queue pair may have any visible effects. Also I am not happy to see second place where timeouts are calculated. And 99/100 also looks quite arbitrary.

This revision is now accepted and ready to land.Tue, May 14, 12:44 AM

Yea. I didn't like it either.
I could move the timeouts into qpair. Then there'd be no more computing the values... I'd have to plumb the sysctl too... it would be cleaner. I could also put the soft timeout in there too. Then it wouldn't be so arbitrary.

I had counters showing that we hit the race a few times an hour on each queue when the system was under load. It surprised me too. It seems like it should not have. So far this seems better since we call the ISR as a fallback almost never now. I still need to run the detailed before/after latency tests to see tge final effects, but preliminary data suggests it's quite good... but i do agree the next layer of detail would help understand why I'm seeing improved results.