amd64: get rid of the pessimized bcopy in syscall arg copy
The code was unnecessarily conditionally copying either 5 or 6 args.
It can blindly copy 6, which also means the size is known at compilation
time and the operation can be depessimized.
Note the entire syscall handling code is rather slow.
Tested on Skylake, sample result for getppid (calls/s):
without pti: 7310106 -> 10653569
with pti: 3304843 -> 4148306
Some syscalls (like read) did not note any difference, other have typically
very modest wins.