cpu_fetch_syscall_args becoms jmp-free for the common case (modulo sv_mask which is going away soon). Bumps getuid rate on Broadwell from 105mln to 107.5mln.
This pessimizes 6 or more arg syscalls, which are a complete minority.
cpu_set_syscall_retval change removes 3 branches in favor of 1 and prevents reloads of ->td_frame.
Note the entire mechanism should be reworked - as it is there is a func call to amd64_syscall, then 3 indirect calls (fetch, syscall itself, set). The initial call can be indirect to a routine which does the fetch/set work and includes inlined amd64_syscall work.