Page MenuHomeFreeBSD

lib/libc/amd64/string: add memrchr scalar, baseline implementation
ClosedPublic

Authored by fuz on Dec 6 2023, 2:06 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Nov 20, 10:22 PM
Unknown Object (File)
Sep 23 2024, 7:17 AM
Unknown Object (File)
Sep 22 2024, 7:48 PM
Unknown Object (File)
Sep 22 2024, 9:51 AM
Unknown Object (File)
Sep 19 2024, 9:43 PM
Unknown Object (File)
Sep 18 2024, 2:38 PM
Unknown Object (File)
Sep 13 2024, 8:02 AM
Unknown Object (File)
Sep 4 2024, 10:31 PM
Subscribers

Details

Summary

The scalar implementation is fairly simplistic and only performs
slightly better than the generic C implementation. It could be
improved by using the same algorithm as for memchr, but it would
have been a lot more complicated.

The baseline implementation performs well and is similar to
timingsafe_memcmp in the way it operates. See the usual place
for benchmark results:

os: FreeBSD
arch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
        │ memrchr.pre.out │          memrchr.scalar.out          │        memrchr.baseline.out         │
        │     sec/op      │    sec/op     vs base                │   sec/op     vs base                │
Short        120.95µ ± 0%    98.08µ ± 0%  -18.90% (p=0.000 n=20)   37.75µ ± 1%  -68.79% (p=0.000 n=20)
Mid          74.374µ ± 0%   48.394µ ± 0%  -34.93% (p=0.000 n=20)   9.120µ ± 0%  -87.74% (p=0.000 n=20)
Long         52.181µ ± 0%   38.607µ ± 0%  -26.01% (p=0.000 n=20)   4.110µ ± 0%  -92.12% (p=0.000 n=20)
geomean       77.72µ         56.80µ       -26.91%                  11.23µ       -85.55%

        │ memrchr.pre.out │          memrchr.scalar.out           │          memrchr.baseline.out           │
        │       B/s       │      B/s       vs base                │      B/s       vs base                  │
Short        985.6Mi ± 0%   1215.4Mi ± 0%  +23.31% (p=0.000 n=20)   3158.2Mi ± 1%   +220.42% (p=0.000 n=20)
Mid          1.565Gi ± 0%    2.406Gi ± 0%  +53.68% (p=0.000 n=20)   12.765Gi ± 0%   +715.52% (p=0.000 n=20)
Long         2.231Gi ± 0%    3.015Gi ± 0%  +35.16% (p=0.000 n=20)   28.323Gi ± 0%  +1169.56% (p=0.000 n=20)
geomean      1.498Gi         2.050Gi       +36.82%                   10.37Gi        +592.26%

New unit tests to cover this function are provided, too.

Test Plan

passes newly added unit tests, no new kyua failures.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable