ARM64 copyinout improvements
The first of set of patches.
Use wider load/stores when aligned buffer is being copied.
In a simple test:
dd if=/dev/zero of=/dev/null bs=1M count=1024
the performance jumped from 410MB/s up to 3.6GB/s.
TODO:
- better handling of unaligned buffers (WiP)
- implement similar mechanism to bzero
Submitted by: Dominik Ermel <der@semihalf.com>
Obtained from: Semihalf
Sponsored by: Cavium
Reviewed by: kib, andrew, emaste
Differential Revision: https://reviews.freebsd.org/D5664