- _fault handlers for both primitives are identical, provide just one
- change the copying scheme to match memcpy (in particular jump avoidance for the most common case of multiply of 8)
- stop re-reading pcb address on exit, just store it locally (in r9)
This last bit can be applied to all other primitives as well. Side point is that there is a way to get rid of the need of setting onfault handler by looking up RIP at page fault and comparing it against userspace access routines. I have a prototype which encloses rep sections with labels for identification.
Regardless of the above there is more cleanup to be done here.