This is a port of the Scalar optimized variant of strspn for amd64 to aarch64
It utilizes a LUT to speed up the function, a SIMD variant is still under
development.
Performance benchmarks are as usual generated by strperf
```
os: FreeBSD
arch: arm64
cpu: ARM Cortex-A76 r4p1
│ strspnScalar │ strspnSIMD │
│ sec/op │ sec/op vs base │
Short1 240.3µ ± 1% 152.0µ ± 0% -36.73% (p=0.000 n=20)
Mid1 146.06µ ± 0% 83.33µ ± 0% -42.95% (p=0.000 n=20)
Long1 112.77µ ± 0% 70.46µ ± 1% -37.52% (p=0.000 n=20)
Short5 298.5µ ± 2% 253.9µ ± 0% -14.94% (p=0.000 n=20)
Mid5 161.7µ ± 1% 110.4µ ± 0% -31.77% (p=0.000 n=20)
Long5 112.75µ ± 0% 68.38µ ± 1% -39.35% (p=0.000 n=20)
Short20 527.1µ ± 1% 514.8µ ± 0% -2.33% (p=0.000 n=20)
Mid20 227.0µ ± 1% 180.3µ ± 0% -20.59% (p=0.000 n=20)
Long20 112.78µ ± 0% 68.29µ ± 2% -39.45% (p=0.000 n=20)
Short40 840.1µ ± 1% 869.8µ ± 1% +3.53% (p=0.000 n=20)
Mid40 314.7µ ± 1% 275.6µ ± 0% -12.42% (p=0.000 n=20)
Long40 112.82µ ± 0% 68.13µ ± 2% -39.61% (p=0.000 n=20)
geomean 212.9µ 153.9µ -27.70%
│ strspnScalar │ strspnSIMD │
│ B/s │ B/s vs base │
Short1 496.1Mi ± 1% 784.1Mi ± 0% +58.06% (p=0.000 n=20)
Mid1 816.2Mi ± 0% 1430.6Mi ± 0% +75.28% (p=0.000 n=20)
Long1 1.032Gi ± 0% 1.652Gi ± 1% +60.04% (p=0.000 n=20)
Short5 399.4Mi ± 2% 469.5Mi ± 0% +17.57% (p=0.000 n=20)
Mid5 737.0Mi ± 1% 1080.2Mi ± 0% +46.56% (p=0.000 n=20)
Long5 1.033Gi ± 0% 1.702Gi ± 1% +64.88% (p=0.000 n=20)
Short20 226.2Mi ± 1% 231.5Mi ± 0% +2.38% (p=0.000 n=20)
Mid20 525.1Mi ± 1% 661.2Mi ± 0% +25.92% (p=0.000 n=20)
Long20 1.032Gi ± 0% 1.705Gi ± 2% +65.15% (p=0.000 n=20)
Short40 141.9Mi ± 1% 137.1Mi ± 1% -3.41% (p=0.000 n=20)
Mid40 378.8Mi ± 1% 432.5Mi ± 0% +14.18% (p=0.000 n=20)
Long40 1.032Gi ± 0% 1.709Gi ± 2% +65.60% (p=0.000 n=20)
geomean 559.9Mi 774.4Mi +38.30%
os: FreeBSD
arch: arm64
cpu: ARM Neoverse-V1 r1p1
│ strcspnScalar │ strcspnSIMD │
│ sec/op │ sec/op vs base │
Short0 172.46µ ± 1% 96.38µ ± 0% -44.12% (p=0.000 n=20)
Mid0 97.96µ ± 0% 25.10µ ± 2% -74.37% (p=0.000 n=20)
Long0 90.099µ ± 0% 3.031µ ± 1% -96.64% (p=0.000 n=20)
Short1 178.9µ ± 1% 130.3µ ± 0% -27.15% (p=0.000 n=20)
Mid1 100.51µ ± 1% 31.66µ ± 0% -68.50% (p=0.000 n=20)
Long1 90.110µ ± 0% 4.536µ ± 0% -94.97% (p=0.000 n=20)
Short5 229.0µ ± 1% 199.5µ ± 0% -12.86% (p=0.000 n=20)
Mid5 113.85µ ± 0% 65.27µ ± 1% -42.67% (p=0.000 n=20)
Long5 90.14µ ± 0% 36.74µ ± 0% -59.24% (p=0.000 n=20)
Short20 397.3µ ± 0% 459.1µ ± 1% +15.55% (p=0.000 n=20)
Mid20 163.4µ ± 0% 132.7µ ± 1% -18.78% (p=0.000 n=20)
Long20 90.16µ ± 0% 36.96µ ± 1% -59.01% (p=0.000 n=20)
Short40 638.1µ ± 0% 790.6µ ± 0% +23.91% (p=0.000 n=20)
Mid40 238.6µ ± 0% 222.4µ ± 1% -6.80% (p=0.000 n=20)
Long40 90.19µ ± 0% 36.96µ ± 0% -59.02% (p=0.000 n=20)
geomean 150.6µ 62.93µ -58.22%
│ strcspnScalar │ strcspnSIMD │
│ MiB/s │ MiB/s vs base │
Short0 724.8 ± 1% 1297.0 ± 0% +78.94% (p=0.000 n=20)
Mid0 1.276k ± 0% 4.980k ± 2% +290.25% (p=0.000 n=20)
Long0 1.387k ± 0% 41.238k ± 1% +2872.41% (p=0.000 n=20)
Short1 698.8 ± 1% 959.2 ± 0% +37.27% (p=0.000 n=20)
Mid1 1.244k ± 1% 3.948k ± 0% +217.45% (p=0.000 n=20)
Long1 1.387k ± 0% 27.557k ± 0% +1886.50% (p=0.000 n=20)
Short5 545.9 ± 1% 626.5 ± 0% +14.76% (p=0.000 n=20)
Mid5 1.098k ± 0% 1.915k ± 1% +74.43% (p=0.000 n=20)
Long5 1.387k ± 0% 3.402k ± 0% +145.35% (p=0.000 n=20)
Short20 314.6 ± 0% 272.3 ± 1% -13.46% (p=0.000 n=20)
Mid20 765.2 ± 0% 942.1 ± 1% +23.12% (p=0.000 n=20)
Long20 1.386k ± 0% 3.382k ± 1% +143.94% (p=0.000 n=20)
Short40 195.9 ± 0% 158.1 ± 0% -19.29% (p=0.000 n=20)
Mid40 523.9 ± 0% 562.1 ± 1% +7.29% (p=0.000 n=20)
Long40 1.386k ± 0% 3.382k ± 0% +144.03% (p=0.000 n=20)
geomean 829.9 1.986k +139.35%
os: FreeBSD
arch: arm64
cpu: ARM Cortex-A78C r0p0
│ strspnScalar │ strspnSIMD │
│ sec/op │ sec/op vs base │
Short1 341.0µ ± 0% 223.6µ ± 0% -34.42% (p=0.000 n=20)
Mid1 201.66µ ± 1% 99.97µ ± 1% -50.43% (p=0.000 n=20)
Long1 169.75µ ± 0% 91.16µ ± 0% -46.30% (p=0.000 n=20)
Short5 413.9µ ± 0% 294.1µ ± 1% -28.95% (p=0.000 n=20)
Mid5 220.5µ ± 0% 117.2µ ± 1% -46.86% (p=0.000 n=20)
Long5 169.73µ ± 0% 75.27µ ± 0% -55.65% (p=0.000 n=20)
Short20 784.7µ ± 0% 352.3µ ± 1% -55.11% (p=0.000 n=20)
Mid20 323.5µ ± 0% 135.3µ ± 0% -58.16% (p=0.000 n=20)
Long20 169.79µ ± 0% 74.71µ ± 0% -56.00% (p=0.000 n=20)
Short40 1288.4µ ± 0% 432.8µ ± 1% -66.41% (p=0.000 n=20)
Mid40 467.8µ ± 0% 160.1µ ± 0% -65.78% (p=0.000 n=20)
Long40 169.84µ ± 0% 74.77µ ± 0% -55.97% (p=0.000 n=20)
geomean 310.3µ 146.5µ -52.80%
│ strspnScalar │ strspnSIMD │
│ B/s │ B/s vs base │
Short1 349.6Mi ± 0% 533.1Mi ± 0% +52.49% (p=0.000 n=20)
Mid1 591.1Mi ± 1% 1192.5Mi ± 1% +101.73% (p=0.000 n=20)
Long1 702.3Mi ± 0% 1307.8Mi ± 0% +86.22% (p=0.000 n=20)
Short5 288.0Mi ± 0% 405.3Mi ± 1% +40.74% (p=0.000 n=20)
Mid5 540.5Mi ± 0% 1017.2Mi ± 1% +88.19% (p=0.000 n=20)
Long5 702.3Mi ± 0% 1583.7Mi ± 0% +125.49% (p=0.000 n=20)
Short20 151.9Mi ± 0% 338.4Mi ± 1% +122.76% (p=0.000 n=20)
Mid20 368.5Mi ± 0% 880.9Mi ± 0% +139.02% (p=0.000 n=20)
Long20 702.1Mi ± 0% 1595.7Mi ± 0% +127.28% (p=0.000 n=20)
Short40 92.53Mi ± 0% 275.44Mi ± 1% +197.69% (p=0.000 n=20)
Mid40 254.8Mi ± 0% 744.7Mi ± 0% +192.24% (p=0.000 n=20)
Long40 701.9Mi ± 0% 1594.3Mi ± 0% +127.14% (p=0.000 n=20)
geomean 384.1Mi 813.9Mi +111.87%
```