The asm version should probably get removed and be re-coded in C. It can however work as a template for copyinstr changes later.
commit message;
amd64: mostly depessimize copystr
- remove a forward branch in the common case
- replace xchg + lodsb/stosb loop with simple movs
A simple test on Intel(R) Core(TM) i7-4600U CPU @ 2.10GH copying /foo/bar/baz in a loop
goes from 295715863 ops/s to 465807408.