Reduce the interrupt latency caused by the steal_idle code
in tdq_idled() by removing the spinlock_enter()/spinlock_exit()
wrapper around it. Preemption events are pretty rare and can
be handled by detecting them and restarting the search.
Poll for recently assigned threads and switch to them before
calling tdq_lock_pair(). This happens frequently enough that
it is worth avoiding the aquisition of the extra lock which
will get immediately dropped.
If the cpu returned by sched_highest() no longer has a thead
to steal, restart the search from the beginning rather than
resuming the search at the same topology level. Things have
changed enough in the time that is has taken to get to this
point that the previous results are stale. The extra time
spent by restarting the search at the beginning does not
matter so much now that interrupts are not blocked, and
this is part of the idle thread ...
Straighten out the control flow in tdq_idled() so that it
is not quite so convoluted.
Fix an off-by-one error in the CPU_SEARCH_HIGHEST section
of cpu_search(). There are only transferable threads
when tdq_load > 1, since a CPU with a currently running
thread and an empty queue will have a load of 1.