Page MenuHomeFreeBSD

lib/libc/amd64/string: add memrchr scalar, baseline implementation
ClosedPublic

Authored by fuz on Dec 6 2023, 2:06 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Oct 2, 1:49 AM
Unknown Object (File)
Sun, Sep 28, 8:23 AM
Unknown Object (File)
Sat, Sep 20, 4:29 PM
Unknown Object (File)
Sat, Sep 20, 4:16 PM
Unknown Object (File)
Aug 9 2025, 8:33 PM
Unknown Object (File)
Jun 29 2025, 3:08 PM
Unknown Object (File)
Jun 18 2025, 2:04 PM
Unknown Object (File)
Jun 18 2025, 7:48 AM
Subscribers

Details

Summary

The scalar implementation is fairly simplistic and only performs
slightly better than the generic C implementation. It could be
improved by using the same algorithm as for memchr, but it would
have been a lot more complicated.

The baseline implementation performs well and is similar to
timingsafe_memcmp in the way it operates. See the usual place
for benchmark results:

os: FreeBSD
arch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
        │ memrchr.pre.out │          memrchr.scalar.out          │        memrchr.baseline.out         │
        │     sec/op      │    sec/op     vs base                │   sec/op     vs base                │
Short        120.95µ ± 0%    98.08µ ± 0%  -18.90% (p=0.000 n=20)   37.75µ ± 1%  -68.79% (p=0.000 n=20)
Mid          74.374µ ± 0%   48.394µ ± 0%  -34.93% (p=0.000 n=20)   9.120µ ± 0%  -87.74% (p=0.000 n=20)
Long         52.181µ ± 0%   38.607µ ± 0%  -26.01% (p=0.000 n=20)   4.110µ ± 0%  -92.12% (p=0.000 n=20)
geomean       77.72µ         56.80µ       -26.91%                  11.23µ       -85.55%

        │ memrchr.pre.out │          memrchr.scalar.out           │          memrchr.baseline.out           │
        │       B/s       │      B/s       vs base                │      B/s       vs base                  │
Short        985.6Mi ± 0%   1215.4Mi ± 0%  +23.31% (p=0.000 n=20)   3158.2Mi ± 1%   +220.42% (p=0.000 n=20)
Mid          1.565Gi ± 0%    2.406Gi ± 0%  +53.68% (p=0.000 n=20)   12.765Gi ± 0%   +715.52% (p=0.000 n=20)
Long         2.231Gi ± 0%    3.015Gi ± 0%  +35.16% (p=0.000 n=20)   28.323Gi ± 0%  +1169.56% (p=0.000 n=20)
geomean      1.498Gi         2.050Gi       +36.82%                   10.37Gi        +592.26%

New unit tests to cover this function are provided, too.

Test Plan

passes newly added unit tests, no new kyua failures.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable