This is a port of the Scalar optimized variant of strcspn for amd64 to aarch64
It utilizes a LUT to speed up the function, a SIMD variant is still under
development.
Performance benchmarks are as usual generated by strperf
os: FreeBSD arch: arm64 cpu: ARM Cortex-A76 r4p1 │ strcspnScalar │ strcspnSIMD │ │ sec/op │ sec/op vs base │ Short0 241.7µ ± 0% 131.7µ ± 0% -45.49% (p=0.000 n=20) Mid0 145.39µ ± 0% 39.67µ ± 0% -72.71% (p=0.000 n=20) Long0 113.487µ ± 0% 4.438µ ± 0% -96.09% (p=0.000 n=20) Short1 246.2µ ± 0% 144.9µ ± 0% -41.14% (p=0.000 n=20) Mid1 146.43µ ± 0% 47.50µ ± 0% -67.56% (p=0.000 n=20) Long1 113.478µ ± 0% 6.594µ ± 0% -94.19% (p=0.000 n=20) Short5 297.5µ ± 0% 276.6µ ± 1% -7.04% (p=0.000 n=20) Mid5 161.6µ ± 0% 116.4µ ± 1% -27.98% (p=0.000 n=20) Long5 113.67µ ± 0% 68.12µ ± 1% -40.07% (p=0.000 n=20) Short20 522.0µ ± 0% 534.1µ ± 0% +2.32% (p=0.000 n=20) Mid20 225.9µ ± 0% 185.8µ ± 0% -17.77% (p=0.000 n=20) Long20 113.70µ ± 0% 68.21µ ± 0% -40.00% (p=0.000 n=20) Short40 828.1µ ± 0% 899.3µ ± 0% +8.60% (p=0.000 n=20) Mid40 312.4µ ± 0% 284.1µ ± 0% -9.06% (p=0.000 n=20) Long40 113.74µ ± 0% 68.28µ ± 1% -39.96% (p=0.000 n=20) geomean 200.9µ 91.70µ -54.37% │ strcspnScalar │ strcspnSIMD │ │ B/s │ B/s vs base │ Short0 493.3Mi ± 0% 904.9Mi ± 0% +83.45% (p=0.000 n=20) Mid0 819.9Mi ± 0% 3004.7Mi ± 0% +266.47% (p=0.000 n=20) Long0 1.026Gi ± 0% 26.231Gi ± 0% +2457.08% (p=0.000 n=20) Short1 484.2Mi ± 0% 822.6Mi ± 0% +69.90% (p=0.000 n=20) Mid1 814.1Mi ± 0% 2509.8Mi ± 0% +208.30% (p=0.000 n=20) Long1 1.026Gi ± 0% 17.654Gi ± 0% +1620.84% (p=0.000 n=20) Short5 400.7Mi ± 0% 431.0Mi ± 1% +7.58% (p=0.000 n=20) Mid5 737.7Mi ± 0% 1024.4Mi ± 1% +38.86% (p=0.000 n=20) Long5 1.024Gi ± 0% 1.709Gi ± 1% +66.87% (p=0.000 n=20) Short20 228.4Mi ± 0% 223.2Mi ± 0% -2.26% (p=0.000 n=20) Mid20 527.6Mi ± 0% 641.6Mi ± 0% +21.60% (p=0.000 n=20) Long20 1.024Gi ± 0% 1.707Gi ± 0% +66.68% (p=0.000 n=20) Short40 144.0Mi ± 0% 132.6Mi ± 0% -7.92% (p=0.000 n=20) Mid40 381.6Mi ± 0% 419.7Mi ± 0% +9.97% (p=0.000 n=20) Long40 1.024Gi ± 0% 1.705Gi ± 1% +66.57% (p=0.000 n=20) geomean 593.2Mi 1.270Gi +119.14% os: FreeBSD arch: arm64 cpu: ARM Neoverse-V1 r1p1 │ strcspnScalar │ strcspnSIMD │ │ sec/op │ sec/op vs base │ Short0 172.46µ ± 1% 96.38µ ± 0% -44.12% (p=0.000 n=20) Mid0 97.96µ ± 0% 25.10µ ± 2% -74.37% (p=0.000 n=20) Long0 90.099µ ± 0% 3.031µ ± 1% -96.64% (p=0.000 n=20) Short1 178.9µ ± 1% 130.3µ ± 0% -27.15% (p=0.000 n=20) Mid1 100.51µ ± 1% 31.66µ ± 0% -68.50% (p=0.000 n=20) Long1 90.110µ ± 0% 4.536µ ± 0% -94.97% (p=0.000 n=20) Short5 229.0µ ± 1% 199.5µ ± 0% -12.86% (p=0.000 n=20) Mid5 113.85µ ± 0% 65.27µ ± 1% -42.67% (p=0.000 n=20) Long5 90.14µ ± 0% 36.74µ ± 0% -59.24% (p=0.000 n=20) Short20 397.3µ ± 0% 459.1µ ± 1% +15.55% (p=0.000 n=20) Mid20 163.4µ ± 0% 132.7µ ± 1% -18.78% (p=0.000 n=20) Long20 90.16µ ± 0% 36.96µ ± 1% -59.01% (p=0.000 n=20) Short40 638.1µ ± 0% 790.6µ ± 0% +23.91% (p=0.000 n=20) Mid40 238.6µ ± 0% 222.4µ ± 1% -6.80% (p=0.000 n=20) Long40 90.19µ ± 0% 36.96µ ± 0% -59.02% (p=0.000 n=20) geomean 150.6µ 62.93µ -58.22% │ strcspnScalar │ strcspnSIMD │ │ MiB/s │ MiB/s vs base │ Short0 724.8 ± 1% 1297.0 ± 0% +78.94% (p=0.000 n=20) Mid0 1.276k ± 0% 4.980k ± 2% +290.25% (p=0.000 n=20) Long0 1.387k ± 0% 41.238k ± 1% +2872.41% (p=0.000 n=20) Short1 698.8 ± 1% 959.2 ± 0% +37.27% (p=0.000 n=20) Mid1 1.244k ± 1% 3.948k ± 0% +217.45% (p=0.000 n=20) Long1 1.387k ± 0% 27.557k ± 0% +1886.50% (p=0.000 n=20) Short5 545.9 ± 1% 626.5 ± 0% +14.76% (p=0.000 n=20) Mid5 1.098k ± 0% 1.915k ± 1% +74.43% (p=0.000 n=20) Long5 1.387k ± 0% 3.402k ± 0% +145.35% (p=0.000 n=20) Short20 314.6 ± 0% 272.3 ± 1% -13.46% (p=0.000 n=20) Mid20 765.2 ± 0% 942.1 ± 1% +23.12% (p=0.000 n=20) Long20 1.386k ± 0% 3.382k ± 1% +143.94% (p=0.000 n=20) Short40 195.9 ± 0% 158.1 ± 0% -19.29% (p=0.000 n=20) Mid40 523.9 ± 0% 562.1 ± 1% +7.29% (p=0.000 n=20) Long40 1.386k ± 0% 3.382k ± 0% +144.03% (p=0.000 n=20) geomean 829.9 1.986k +139.35% os: FreeBSD arch: arm64 cpu: ARM Cortex-A78C r0p0 │ strcspnScalar │ strcspnSIMD │ │ sec/op │ sec/op vs base │ Short0 335.0µ ± 0% 174.1µ ± 0% -48.03% (p=0.000 n=20) Mid0 199.72µ ± 1% 53.89µ ± 0% -73.02% (p=0.000 n=20) Long0 169.648µ ± 0% 5.949µ ± 0% -96.49% (p=0.000 n=20) Short1 339.7µ ± 0% 231.1µ ± 0% -31.97% (p=0.000 n=20) Mid1 200.14µ ± 0% 68.14µ ± 0% -65.95% (p=0.000 n=20) Long1 169.65µ ± 0% 10.07µ ± 0% -94.06% (p=0.000 n=20) Short5 389.4µ ± 0% 304.6µ ± 1% -21.78% (p=0.000 n=20) Mid5 215.5µ ± 0% 123.6µ ± 1% -42.67% (p=0.000 n=20) Long5 169.64µ ± 0% 75.37µ ± 0% -55.57% (p=0.000 n=20) Short20 686.7µ ± 0% 356.2µ ± 1% -48.12% (p=0.000 n=20) Mid20 314.8µ ± 0% 136.2µ ± 0% -56.73% (p=0.000 n=20) Long20 169.66µ ± 0% 75.30µ ± 0% -55.62% (p=0.000 n=20) Short40 1187.6µ ± 0% 440.5µ ± 1% -62.91% (p=0.000 n=20) Mid40 458.0µ ± 0% 161.0µ ± 0% -64.85% (p=0.000 n=20) Long40 169.79µ ± 0% 75.27µ ± 0% -55.67% (p=0.000 n=20) geomean 284.0µ 95.35µ -66.43% │ strcspnScalar │ strcspnSIMD │ │ B/s │ B/s vs base │ Short0 355.8Mi ± 0% 684.7Mi ± 0% +92.42% (p=0.000 n=20) Mid0 596.9Mi ± 1% 2211.9Mi ± 0% +270.58% (p=0.000 n=20) Long0 702.7Mi ± 0% 20038.6Mi ± 0% +2751.73% (p=0.000 n=20) Short1 350.9Mi ± 0% 515.8Mi ± 0% +47.00% (p=0.000 n=20) Mid1 595.6Mi ± 0% 1749.4Mi ± 0% +193.70% (p=0.000 n=20) Long1 702.7Mi ± 0% 11833.8Mi ± 0% +1584.08% (p=0.000 n=20) Short5 306.1Mi ± 0% 391.3Mi ± 1% +27.84% (p=0.000 n=20) Mid5 553.1Mi ± 0% 964.8Mi ± 1% +74.43% (p=0.000 n=20) Long5 702.7Mi ± 0% 1581.6Mi ± 0% +125.08% (p=0.000 n=20) Short20 173.6Mi ± 0% 334.7Mi ± 1% +92.77% (p=0.000 n=20) Mid20 378.7Mi ± 0% 875.1Mi ± 0% +131.08% (p=0.000 n=20) Long20 702.6Mi ± 0% 1583.1Mi ± 0% +125.31% (p=0.000 n=20) Short40 100.4Mi ± 0% 270.6Mi ± 1% +169.61% (p=0.000 n=20) Mid40 260.3Mi ± 0% 740.5Mi ± 0% +184.52% (p=0.000 n=20) Long40 702.1Mi ± 0% 1583.8Mi ± 0% +125.58% (p=0.000 n=20) geomean 419.7Mi 1.221Gi +197.86%