Change stack_save_td() to support stack capture for running threads, and
introduce a return value.
Reimplement stack capture for running threads on x86 using
smp_rendezvous instead of an NMI. It has become too difficult to deal
with all of the possible scheduler states, and we cannot capture a stack
while interrupts are disabled anyway.