It all started when I found that due to use of single counter for both
IOAT_DMAENGINE_REF and IOAT_ACTIVE_DESCR_REF, ioat_reset_hw() always
hangs if there is any consumer exist, not even doing anything. I first
thought to split the counters, but then dropped the active descriptor
counter completely, comparing queue head and tail instead. Aside of
fixing reset it allowed to remove some atomics/locking per request.
Next issue was a large excessive overhead from starting/stopping callout
each time queue gets active/idle. Since you said it is a workaround for
hardware bug, I left on in, but I changed its logic, so that callout is
not stopped every time. Under high request rate it is cheaper to let it
run and fire sometimes rather then stop and restart each time. Plus it
allowed to removed complicated locking dance used to update
is_completion_pending, which is not needed any more.
As part of previous, I also completely decoupled cleanup_lock and
submit_lock, since previous relation caused LORs in some usage patterns.
I still left Wittness hint on attach, but changed its direction, since
it is more likely to happen if consumer code try to send new request
based on previous completion.
Also I closed some race conditions and issues around device attach/detach.
Also I added some trick to not call ioat_get_chansts() on each
ioat_process_events() call, since it appeared pretty heavy in profiler.
I kind of replicated whan Linux driver does -- polling it only on much
more rare time events. Please let me know if you think this is wrong,
since I still have no hardware documentation (trying to get one).
Also, while being there I found several unneeded variables, which I
removed. hw_head variable seemed to be full copy of head, so I replaced
unified them. Let me know if I missed something.