sched_ule: Re-implement stealing on top of runq common-code
Stop using internal knowledge of runqueues. Remove duplicate
boilerplate parts.
Concretely, runq_steal() and runq_steal_from() are now implemented on
top of runq_findq().
Besides considerably simplifying the code, this change also brings an
algorithmic improvement since, previously, set bits in the runqueue's
status words were found by testing each bit individually in a loop
instead of using ffsl()/bsfl() (except for the first set bit per status
word).
This change also makes it more apparent that runq_steal_from() treats
the first thread with highest priority specifically (which runq_steal()
does not).
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45388