thread: move nthread management out of tid_alloc
While this adds more work single-threaded, it also enables SMP-related speed ups.