Page MenuHomeFreeBSD

lib/libc/aarch64/string: add memcpy SIMD implementation
ClosedPublic

Authored by getz on Aug 9 2024, 1:18 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Feb 5, 2:31 PM
Unknown Object (File)
Tue, Feb 4, 11:53 PM
Unknown Object (File)
Jan 28 2025, 2:59 AM
Unknown Object (File)
Jan 27 2025, 7:13 AM
Unknown Object (File)
Jan 26 2025, 5:54 PM
Unknown Object (File)
Jan 24 2025, 5:40 PM
Unknown Object (File)
Jan 17 2025, 3:27 PM
Unknown Object (File)
Jan 13 2025, 2:08 AM
Subscribers

Details

Summary

I noticed that we have a SIMD optimized memcpy in the
arm-optimized-routines in /contrib.

This patch ensures we use the SIMD variant as opposed to the
Scalar optimized variant.

Benchmarks are available below generated by fuz' strperf utility.

os: FreeBSD
arch: arm64
cpu: ARM Neoverse-V1 r1p1
        │ memcpyScalar │             memcpySIMD              │
        │    sec/op    │   sec/op     vs base                │
64         30.71µ ± 0%   22.47µ ± 1%  -26.83% (p=0.000 n=20)
4k         7.875µ ± 0%   4.069µ ± 0%  -48.33% (p=0.000 n=20)
256k       6.608µ ± 0%   5.126µ ± 0%  -22.43% (p=0.000 n=20)
16m        512.0µ ± 0%   503.0µ ± 0%   -1.75% (p=0.000 n=20)
1g         41.42m ± 0%   39.73m ± 0%   -4.08% (p=0.000 n=20)
geomean    127.7µ        98.70µ       -22.68%

        │ memcpyScalar │              memcpySIMD               │
        │     B/s      │      B/s       vs base                │
64        7.582Gi ± 0%   10.362Gi ± 1%  +36.68% (p=0.000 n=20)
4k        29.57Gi ± 0%    57.22Gi ± 0%  +93.55% (p=0.000 n=20)
256k      35.23Gi ± 0%    45.42Gi ± 0%  +28.91% (p=0.000 n=20)
16m       29.11Gi ± 0%    29.62Gi ± 0%   +1.78% (p=0.000 n=20)
1g        23.02Gi ± 0%    24.00Gi ± 0%   +4.26% (p=0.000 n=20)
geomean   22.12Gi         28.60Gi       +29.33%

os: FreeBSD
arch: arm64
cpu: ARM Cortex-A76 r4p1
        │ memcpyScalar │             memcpySIMD              │
        │    sec/op    │   sec/op     vs base                │
64         51.55µ ± 0%   46.25µ ± 0%  -10.29% (p=0.000 n=20)
4k         9.866µ ± 0%   7.253µ ± 0%  -26.48% (p=0.000 n=20)
256k       7.044µ ± 0%   7.793µ ± 0%  +10.64% (p=0.000 n=20)
16m        3.523m ± 6%   3.707m ± 5%        ~ (p=0.602 n=20)
1g         209.3m ± 1%   211.3m ± 1%   +0.93% (p=0.035 n=20)
geomean    305.1µ        289.9µ        -4.97%

        │ memcpyScalar │              memcpySIMD              │
        │     B/s      │     B/s       vs base                │
64        4.516Gi ± 0%   5.035Gi ± 0%  +11.48% (p=0.000 n=20)
4k        23.60Gi ± 0%   32.10Gi ± 0%  +36.02% (p=0.000 n=20)
256k      33.05Gi ± 0%   29.88Gi ± 0%   -9.62% (p=0.000 n=20)
16m       4.230Gi ± 5%   4.020Gi ± 5%        ~ (p=0.602 n=20)
1g        4.556Gi ± 1%   4.514Gi ± 1%   -0.92% (p=0.035 n=20)
geomean   9.255Gi        9.739Gi        +5.23%

os: FreeBSD
arch: arm64
cpu: ARM Cortex-A78C r0p0
        │ memcpyScalar │             memcpySIMD             │
        │    sec/op    │   sec/op     vs base               │
64         67.58µ ± 0%   64.87µ ± 0%  -4.00% (p=0.000 n=20)
4k         14.42µ ± 0%   14.43µ ± 0%       ~ (p=0.478 n=20)
256k       14.68µ ± 1%   14.76µ ± 1%       ~ (p=0.192 n=20)
16m        1.513m ± 1%   1.500m ± 1%       ~ (p=0.301 n=20)
1g         86.77m ± 2%   87.08m ± 1%       ~ (p=0.640 n=20)
geomean    284.9µ        282.7µ       -0.78%

        │ memcpyScalar │             memcpySIMD              │
        │     B/s      │     B/s       vs base               │
64        3.445Gi ± 0%   3.589Gi ± 0%  +4.17% (p=0.000 n=20)
4k        16.15Gi ± 0%   16.14Gi ± 0%       ~ (p=0.478 n=20)
256k      15.86Gi ± 1%   15.77Gi ± 1%       ~ (p=0.192 n=20)
16m       9.850Gi ± 1%   9.931Gi ± 1%       ~ (p=0.301 n=20)
1g        10.99Gi ± 2%   10.95Gi ± 1%       ~ (p=0.640 n=20)
geomean   9.909Gi        9.987Gi       +0.78%
Test Plan

No regressions in the test suite noticed, all tests pass

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 58975
Build 55862: arc lint + arc unit