Page MenuHomeFreeBSD

lib/libc/amd64/string: add strrchr scalar, baseline implementation
ClosedPublic

Authored by fuz on Oct 16 2023, 1:08 AM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Apr 30, 1:44 PM
Unknown Object (File)
Tue, Apr 30, 1:37 PM
Unknown Object (File)
Tue, Apr 30, 1:37 PM
Unknown Object (File)
Tue, Apr 30, 1:36 PM
Unknown Object (File)
Tue, Apr 30, 7:16 AM
Unknown Object (File)
Jan 25 2024, 12:09 AM
Unknown Object (File)
Dec 25 2023, 5:57 PM
Unknown Object (File)
Dec 20 2023, 8:11 AM
Subscribers

Details

Summary

The baseline implementation is very straightforward,
while the scalar implementation suffers from register pressure
and the need to use SWAR techniques similar to those used for
strchr().

Performance is ok-ish. Slower than glibc, but glibc gets to use AVX-512
which this one doesn't. See this commit for results:

s: FreeBSD
arch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
        │ strrchr.pre.out │          strrchr.scalar.out          │        strrchr.baseline.out         │
        │     sec/op      │    sec/op     vs base                │   sec/op     vs base                │
Short        111.51µ ± 0%    82.39µ ± 1%  -26.11% (p=0.000 n=20)   45.19µ ± 0%  -59.48% (p=0.000 n=20)
Mid           66.19µ ± 0%    23.44µ ± 0%  -64.59% (p=0.000 n=20)   10.59µ ± 0%  -84.00% (p=0.000 n=20)
Long         51.422µ ± 0%   15.932µ ± 0%  -69.02% (p=0.000 n=20)   5.972µ ± 0%  -88.39% (p=0.000 n=20)
geomean       72.40µ         31.33µ       -56.72%                  14.19µ       -80.40%

        │ strrchr.pre.out │          strrchr.scalar.out           │          strrchr.baseline.out          │
        │       B/s       │     B/s       vs base                 │      B/s       vs base                 │
Short        1.044Gi ± 0%   1.413Gi ± 1%   +35.34% (p=0.000 n=20)    2.576Gi ± 0%  +146.76% (p=0.000 n=20)
Mid          1.759Gi ± 0%   4.967Gi ± 0%  +182.42% (p=0.000 n=20)   10.996Gi ± 0%  +525.18% (p=0.000 n=20)
Long         2.264Gi ± 0%   7.307Gi ± 0%  +222.76% (p=0.000 n=20)   19.493Gi ± 0%  +761.03% (p=0.000 n=20)
geomean      1.608Gi        3.715Gi       +131.07%                   8.204Gi       +410.23%

os: Linux
arch: x86_64
cpu:
        │ strrchr.glibc.out │
        │      sec/op       │
Short           28.91µ ± 2%
Mid             8.588µ ± 0%
Long            2.113µ ± 0%
geomean         8.064µ

        │ strrchr.glibc.out │
        │        B/s        │
Short          4.027Gi ± 2%
Mid            13.56Gi ± 0%
Long           55.10Gi ± 0%
geomean        14.44Gi

Sponsored by: The FreeBSD Foundation

Test Plan

passes unit tests, no changes in FreeBSD's test suite when
enabling the new code (both for scalar and baseline). See also D41520.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

fuz requested review of this revision.Oct 16 2023, 1:08 AM
  • lib/libc/amd64/string/strrchr.S: restore weak alias to rindex

lib/libc/string/strrchr.c has a weak alias of of strrchr() to rindex().
I forgot about this alias when adding the SIMD implementation, leading
to an undefined symbol rindex(). Add the weak alias to the SIMD code
to find this issue.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 25 2023, 2:26 PM
This revision was automatically updated to reflect the committed changes.