string: add memrchr scalar, baseline implementation
ClosedPublic
Actions

Authored by fuz on Dec 6 2023, 2:06 PM.

Details

Reviewers

kib
mjg

Commits

rGff7799e00311: lib/libc/amd64/string: add memrchr() scalar, baseline implementation
rGacb47064d658: lib/libc/tests/string: add memrchr unit tests
rGfb197a4f7751: lib/libc/amd64/string: add memrchr() scalar, baseline implementation
rG691ff1832e09: lib/libc/tests/string: add memrchr unit tests

Summary

The scalar implementation is fairly simplistic and only performs
slightly better than the generic C implementation. It could be
improved by using the same algorithm as for memchr, but it would
have been a lot more complicated.

The baseline implementation performs well and is similar to
timingsafe_memcmp in the way it operates. See the usual place
for benchmark results:

os: FreeBSD
arch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
        │ memrchr.pre.out │          memrchr.scalar.out          │        memrchr.baseline.out         │
        │     sec/op      │    sec/op     vs base                │   sec/op     vs base                │
Short        120.95µ ± 0%    98.08µ ± 0%  -18.90% (p=0.000 n=20)   37.75µ ± 1%  -68.79% (p=0.000 n=20)
Mid          74.374µ ± 0%   48.394µ ± 0%  -34.93% (p=0.000 n=20)   9.120µ ± 0%  -87.74% (p=0.000 n=20)
Long         52.181µ ± 0%   38.607µ ± 0%  -26.01% (p=0.000 n=20)   4.110µ ± 0%  -92.12% (p=0.000 n=20)
geomean       77.72µ         56.80µ       -26.91%                  11.23µ       -85.55%

        │ memrchr.pre.out │          memrchr.scalar.out           │          memrchr.baseline.out           │
        │       B/s       │      B/s       vs base                │      B/s       vs base                  │
Short        985.6Mi ± 0%   1215.4Mi ± 0%  +23.31% (p=0.000 n=20)   3158.2Mi ± 1%   +220.42% (p=0.000 n=20)
Mid          1.565Gi ± 0%    2.406Gi ± 0%  +53.68% (p=0.000 n=20)   12.765Gi ± 0%   +715.52% (p=0.000 n=20)
Long         2.231Gi ± 0%    3.015Gi ± 0%  +35.16% (p=0.000 n=20)   28.323Gi ± 0%  +1169.56% (p=0.000 n=20)
geomean      1.498Gi         2.050Gi       +36.82%                   10.37Gi        +592.26%

New unit tests to cover this function are provided, too.

Test Plan

passes newly added unit tests, no new kyua failures.