This is a port of the Scalar optimized variant of strspn for amd64 to aarch64
It utilizes a LUT to speed up the function, a SIMD variant is still under
development.
Performance benchmarks are as usual generated by strperf
os: FreeBSD arch: arm64 cpu: ARM Cortex-A76 r4p1 │ strspnScalar │ strspnSIMD │ │ sec/op │ sec/op vs base │ Short1 240.3µ ± 1% 152.0µ ± 0% -36.73% (p=0.000 n=20) Mid1 146.06µ ± 0% 83.33µ ± 0% -42.95% (p=0.000 n=20) Long1 112.77µ ± 0% 70.46µ ± 1% -37.52% (p=0.000 n=20) Short5 298.5µ ± 2% 253.9µ ± 0% -14.94% (p=0.000 n=20) Mid5 161.7µ ± 1% 110.4µ ± 0% -31.77% (p=0.000 n=20) Long5 112.75µ ± 0% 68.38µ ± 1% -39.35% (p=0.000 n=20) Short20 527.1µ ± 1% 514.8µ ± 0% -2.33% (p=0.000 n=20) Mid20 227.0µ ± 1% 180.3µ ± 0% -20.59% (p=0.000 n=20) Long20 112.78µ ± 0% 68.29µ ± 2% -39.45% (p=0.000 n=20) Short40 840.1µ ± 1% 869.8µ ± 1% +3.53% (p=0.000 n=20) Mid40 314.7µ ± 1% 275.6µ ± 0% -12.42% (p=0.000 n=20) Long40 112.82µ ± 0% 68.13µ ± 2% -39.61% (p=0.000 n=20) geomean 212.9µ 153.9µ -27.70% │ strspnScalar │ strspnSIMD │ │ B/s │ B/s vs base │ Short1 496.1Mi ± 1% 784.1Mi ± 0% +58.06% (p=0.000 n=20) Mid1 816.2Mi ± 0% 1430.6Mi ± 0% +75.28% (p=0.000 n=20) Long1 1.032Gi ± 0% 1.652Gi ± 1% +60.04% (p=0.000 n=20) Short5 399.4Mi ± 2% 469.5Mi ± 0% +17.57% (p=0.000 n=20) Mid5 737.0Mi ± 1% 1080.2Mi ± 0% +46.56% (p=0.000 n=20) Long5 1.033Gi ± 0% 1.702Gi ± 1% +64.88% (p=0.000 n=20) Short20 226.2Mi ± 1% 231.5Mi ± 0% +2.38% (p=0.000 n=20) Mid20 525.1Mi ± 1% 661.2Mi ± 0% +25.92% (p=0.000 n=20) Long20 1.032Gi ± 0% 1.705Gi ± 2% +65.15% (p=0.000 n=20) Short40 141.9Mi ± 1% 137.1Mi ± 1% -3.41% (p=0.000 n=20) Mid40 378.8Mi ± 1% 432.5Mi ± 0% +14.18% (p=0.000 n=20) Long40 1.032Gi ± 0% 1.709Gi ± 2% +65.60% (p=0.000 n=20) geomean 559.9Mi 774.4Mi +38.30% os: FreeBSD arch: arm64 cpu: ARM Neoverse-V1 r1p1 │ strspnScalar │ strspnSIMD │ │ sec/op │ sec/op vs base │ Short1 177.2µ ± 1% 112.2µ ± 1% -36.67% (p=0.000 n=20) Mid1 100.87µ ± 0% 48.44µ ± 0% -51.97% (p=0.000 n=20) Long1 90.16µ ± 0% 50.55µ ± 0% -43.93% (p=0.000 n=20) Short5 231.8µ ± 1% 181.6µ ± 2% -21.65% (p=0.000 n=20) Mid5 115.49µ ± 2% 60.70µ ± 2% -47.44% (p=0.000 n=20) Long5 90.16µ ± 0% 36.59µ ± 0% -59.41% (p=0.000 n=20) Short20 426.9µ ± 0% 436.4µ ± 3% ~ (p=0.149 n=20) Mid20 168.0µ ± 0% 126.0µ ± 0% -24.99% (p=0.000 n=20) Long20 90.17µ ± 0% 36.59µ ± 0% -59.43% (p=0.000 n=20) Short40 702.9µ ± 0% 758.4µ ± 1% +7.89% (p=0.000 n=20) Mid40 244.3µ ± 0% 213.4µ ± 0% -12.65% (p=0.000 n=20) Long40 90.22µ ± 0% 36.68µ ± 1% -59.34% (p=0.000 n=20) geomean 164.4µ 102.4µ -37.73% │ strspnScalar │ strspnSIMD │ │ MiB/s │ MiB/s vs base │ Short1 705.3 ± 1% 1113.6 ± 1% +57.89% (p=0.000 n=20) Mid1 1.239k ± 0% 2.580k ± 0% +108.21% (p=0.000 n=20) Long1 1.386k ± 0% 2.473k ± 0% +78.36% (p=0.000 n=20) Short5 539.3 ± 1% 688.4 ± 2% +27.64% (p=0.000 n=20) Mid5 1.082k ± 2% 2.059k ± 2% +90.28% (p=0.000 n=20) Long5 1.386k ± 0% 3.416k ± 0% +146.39% (p=0.000 n=20) Short20 292.8 ± 0% 286.4 ± 3% ~ (p=0.149 n=20) Mid20 744.2 ± 0% 992.2 ± 0% +33.31% (p=0.000 n=20) Long20 1.386k ± 0% 3.416k ± 0% +146.46% (p=0.000 n=20) Short40 177.8 ± 0% 164.8 ± 1% -7.31% (p=0.000 n=20) Mid40 511.7 ± 0% 585.7 ± 0% +14.48% (p=0.000 n=20) Long40 1.385k ± 0% 3.407k ± 1% +145.95% (p=0.000 n=20) geomean 760.4 1.221k +60.59% os: FreeBSD arch: arm64 cpu: ARM Cortex-A78C r0p0 │ strspnScalar │ strspnSIMD │ │ sec/op │ sec/op vs base │ Short1 341.0µ ± 0% 223.6µ ± 0% -34.42% (p=0.000 n=20) Mid1 201.66µ ± 1% 99.97µ ± 1% -50.43% (p=0.000 n=20) Long1 169.75µ ± 0% 91.16µ ± 0% -46.30% (p=0.000 n=20) Short5 413.9µ ± 0% 294.1µ ± 1% -28.95% (p=0.000 n=20) Mid5 220.5µ ± 0% 117.2µ ± 1% -46.86% (p=0.000 n=20) Long5 169.73µ ± 0% 75.27µ ± 0% -55.65% (p=0.000 n=20) Short20 784.7µ ± 0% 352.3µ ± 1% -55.11% (p=0.000 n=20) Mid20 323.5µ ± 0% 135.3µ ± 0% -58.16% (p=0.000 n=20) Long20 169.79µ ± 0% 74.71µ ± 0% -56.00% (p=0.000 n=20) Short40 1288.4µ ± 0% 432.8µ ± 1% -66.41% (p=0.000 n=20) Mid40 467.8µ ± 0% 160.1µ ± 0% -65.78% (p=0.000 n=20) Long40 169.84µ ± 0% 74.77µ ± 0% -55.97% (p=0.000 n=20) geomean 310.3µ 146.5µ -52.80% │ strspnScalar │ strspnSIMD │ │ B/s │ B/s vs base │ Short1 349.6Mi ± 0% 533.1Mi ± 0% +52.49% (p=0.000 n=20) Mid1 591.1Mi ± 1% 1192.5Mi ± 1% +101.73% (p=0.000 n=20) Long1 702.3Mi ± 0% 1307.8Mi ± 0% +86.22% (p=0.000 n=20) Short5 288.0Mi ± 0% 405.3Mi ± 1% +40.74% (p=0.000 n=20) Mid5 540.5Mi ± 0% 1017.2Mi ± 1% +88.19% (p=0.000 n=20) Long5 702.3Mi ± 0% 1583.7Mi ± 0% +125.49% (p=0.000 n=20) Short20 151.9Mi ± 0% 338.4Mi ± 1% +122.76% (p=0.000 n=20) Mid20 368.5Mi ± 0% 880.9Mi ± 0% +139.02% (p=0.000 n=20) Long20 702.1Mi ± 0% 1595.7Mi ± 0% +127.28% (p=0.000 n=20) Short40 92.53Mi ± 0% 275.44Mi ± 1% +197.69% (p=0.000 n=20) Mid40 254.8Mi ± 0% 744.7Mi ± 0% +192.24% (p=0.000 n=20) Long40 701.9Mi ± 0% 1594.3Mi ± 0% +127.14% (p=0.000 n=20) geomean 384.1Mi 813.9Mi +111.87%