I have some other ioat(4) work to upstream that was waiting for 11 to branch,
but I don't think this fix should wait.
ioat(4): Split timer into poll and shrink functions
Poll should happen quickly, while shrink should happen infrequently.
Protect is_completion_pending with submit_lock.
Sponsored by: EMC / Isilon Storage Division
ioat(4): Serialize ioat_reset_hw invocations
Sponsored by: EMC / Isilon Storage Division
ioat(4): Block asynchronous work during HW reset
Fix the race between ioat_reset_hw and ioat_process_events.
HW reset isn't protected by a lock because it can sleep for a long time
(40.1 ms). This resulted in a race where we would process bogus parts
of the descriptor ring as if it had completed. This looked like
duplicate completions on old events, if your ring had looped at least
once.
Block callout and interrupt work while reset runs so the completion end
of things does not observe indeterminate state and process invalid parts
of the ring.
Start the channel with a manually implemented ioat_null() to keep other
submitters quiesced while we wait for the channel to start (100 us).
r295605 may have made the race between ioat_reset_hw and
ioat_process_events wider, but I believe it already existed before that
revision. ioat_process_events can be invoked by two asynchronous
sources: callout (softclock) and device interrupt. Those could race
each other, to the same effect.
Sponsored by: EMC / Isilon Storage Division