string: add strrchr scalar, baseline implementation
ClosedPublic
Actions

Authored by fuz on Oct 16 2023, 1:08 AM.

Details

Reviewers

mjg
kib

Commits

rGc105cd8426bb: share/man/man7/simd.7: document strrchr scalar, baseline implementation
rG9b1a851e1ed0: lib/libc/amd64/string: add strrchr scalar, baseline implementation
rG2ed514a220ed: lib/libc/amd64/string: add strrchr scalar, baseline implementation
rGdd1c2e887c1f: share/man/man7/simd.7: document strrchr scalar, baseline implementation

Summary

The baseline implementation is very straightforward,
while the scalar implementation suffers from register pressure
and the need to use SWAR techniques similar to those used for
strchr().

Performance is ok-ish. Slower than glibc, but glibc gets to use AVX-512
which this one doesn't. See this commit for results:

s: FreeBSD
arch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
        │ strrchr.pre.out │          strrchr.scalar.out          │        strrchr.baseline.out         │
        │     sec/op      │    sec/op     vs base                │   sec/op     vs base                │
Short        111.51µ ± 0%    82.39µ ± 1%  -26.11% (p=0.000 n=20)   45.19µ ± 0%  -59.48% (p=0.000 n=20)
Mid           66.19µ ± 0%    23.44µ ± 0%  -64.59% (p=0.000 n=20)   10.59µ ± 0%  -84.00% (p=0.000 n=20)
Long         51.422µ ± 0%   15.932µ ± 0%  -69.02% (p=0.000 n=20)   5.972µ ± 0%  -88.39% (p=0.000 n=20)
geomean       72.40µ         31.33µ       -56.72%                  14.19µ       -80.40%

        │ strrchr.pre.out │          strrchr.scalar.out           │          strrchr.baseline.out          │
        │       B/s       │     B/s       vs base                 │      B/s       vs base                 │
Short        1.044Gi ± 0%   1.413Gi ± 1%   +35.34% (p=0.000 n=20)    2.576Gi ± 0%  +146.76% (p=0.000 n=20)
Mid          1.759Gi ± 0%   4.967Gi ± 0%  +182.42% (p=0.000 n=20)   10.996Gi ± 0%  +525.18% (p=0.000 n=20)
Long         2.264Gi ± 0%   7.307Gi ± 0%  +222.76% (p=0.000 n=20)   19.493Gi ± 0%  +761.03% (p=0.000 n=20)
geomean      1.608Gi        3.715Gi       +131.07%                   8.204Gi       +410.23%

os: Linux
arch: x86_64
cpu:
        │ strrchr.glibc.out │
        │      sec/op       │
Short           28.91µ ± 2%
Mid             8.588µ ± 0%
Long            2.113µ ± 0%
geomean         8.064µ

        │ strrchr.glibc.out │
        │        B/s        │
Short          4.027Gi ± 2%
Mid            13.56Gi ± 0%
Long           55.10Gi ± 0%
geomean        14.44Gi

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

fuz created this revision.Oct 16 2023, 1:08 AM

Herald added a subscriber: imp. · View Herald TranscriptOct 16 2023, 1:08 AM

fuz requested review of this revision.Oct 16 2023, 1:08 AM

Harbormaster completed remote builds in B53996: Diff 128812.Oct 16 2023, 1:08 AM

lib/libc/amd64/string/strrchr.S: restore weak alias to rindex

lib/libc/string/strrchr.c has a weak alias of of strrchr() to rindex().
I forgot about this alias when adding the SIMD implementation, leading
to an undefined symbol rindex(). Add the weak alias to the SIMD code
to find this issue.

Harbormaster completed remote builds in B54862: Diff 131163.Dec 8 2023, 11:56 AM

This revision was not accepted when it landed; it landed in state Needs Review.Dec 25 2023, 2:26 PM

Closed by commit rGdd1c2e887c1f: share/man/man7/simd.7: document strrchr scalar, baseline implementation (authored by fuz). · Explain Why

This revision was automatically updated to reflect the committed changes.

fuz added a commit: rGdd1c2e887c1f: share/man/man7/simd.7: document strrchr scalar, baseline implementation.

fuz added a commit: rG2ed514a220ed: lib/libc/amd64/string: add strrchr scalar, baseline implementation.

fuz added a commit: rG9b1a851e1ed0: lib/libc/amd64/string: add strrchr scalar, baseline implementation.Jan 24 2024, 7:45 PM

fuz added a commit: rGc105cd8426bb: share/man/man7/simd.7: document strrchr scalar, baseline implementation.