MFC r354241: Some more taskqueue optimizations.
- Optimize enqueue for two task priority values by adding new tq_hint
field, pointing to the last task inserted into the middle of the list.
In case of more then two priority values it should halve average search.
- Move tq_active insert/remove out of the taskqueue_run_locked loop.
Instead of dirtying few shared cache lines per task introduce different
mechanism to drain active tasks, based on task sequence number counter,
that uses only cache lines already present in cache. Since the new
mechanism does not need ordering, switch tq_active from TAILQ to LIST.
- Move static and dynamic struct taskqueue fields into different cache
lines. Move lock into its own cache line, so that heavy lock spinning
by multiple waiting threads would not affect the running thread.
- While there, correct some TQ_SLEEP() wait messages.
This change fixes certain ZFS write workloads, causing huge congestion
on taskqueue lock. Those workloads combine some large block writes to
saturate the pool and trigger allocation throttling, which uses higher
priority tasks to requeue the delayed I/Os, with many small blocks to
generate deep queue of small tasks for taskqueue to sort.
Sponsored by: iXsystems, Inc.