powerpc: micro-optimize cpu_switch()
Since the non-volatile registers are restored at the end of cpu_switchin (of
the new thread) they're free for us to use for our own purposes. Load the
PCB_FLAGS into a non-volatile register so it's preserved across the C
function calls that manage FPU and altivec state. This removes 4 loads from
each file. Might be a trivial performance improvement (~12 clock cycles per
context switch).
MFC after: 3 weeks