Use a temporary array instead of the arg array y[] for calling
kernel_rem_pio2(). This simplifies analysis of aliasing and thus
results in better code for the usual case where kernel_rem_pio2()
is not called. In particular, when __ieee854_rem_pio2[f]() is inlined,
it normally results in y[] being returned in registers. I couldn't
get this to work using the restrict qualifier.
In float precision, this saves 2-3% in most cases on amd64 and i386
(A64) despite it not being inlined in float precision yet. In double
precision, this has high variance, with an average gain of 2% for
amd64 and 0.7% for i386 (but a much larger gain for usual cases) and
some losses.