Page MenuHomeFreeBSD

lib/libc/amd64/string: add baseline implementation of memcmp, bcmp
ClosedPublic

Authored by fuz on Aug 13 2023, 3:59 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Oct 18, 7:41 PM
Unknown Object (File)
Fri, Oct 18, 6:59 PM
Unknown Object (File)
Wed, Oct 16, 6:55 PM
Unknown Object (File)
Oct 5 2024, 10:48 PM
Unknown Object (File)
Oct 2 2024, 12:22 PM
Unknown Object (File)
Sep 27 2024, 7:30 AM
Unknown Object (File)
Sep 27 2024, 4:09 AM
Unknown Object (File)
Sep 24 2024, 10:02 AM
Subscribers

Details

Summary

This changeset adds a baseline implementation of memcmp and bcmp
for amd64. The same code is used for both functions with conditional
code were the behaviour differs (we need more precise output for the
memcmp case).

FreeBSD documents that memcmp returns the difference between the
mismatching characters. Slightly faster code would be possible could
we relax this requirement to the ISO/IEC 9899:1999 requirement of
merely returning a negative/positive integer or zero.

__FBSDID is dropped in anticipation of @imp's announced change.

The changes are documented in simd(7).

In addition to this change, extend the memcmp test suite entry to
accept an externally defined memcmp function to simplify the
development of additonal test cases.

Performance is better than bionic and glibc, except for long strings
were the two are 13% faster. This could be because they use SSE4
ptest which we cannot use in a baseline kernel.

        │ memcmp.baseline.out │            memcmp.bionic.out             │            memcmp.scalar.out             │
        │       sec/op        │    sec/op     vs base                    │    sec/op     vs base                    │
Short             26.41µ ± 0%    65.81µ ± 0%  +149.15% (p=0.000 n=30+20)    61.40µ ± 0%  +132.46% (p=0.000 n=30+20)
Mid               8.175µ ± 0%   21.077µ ± 1%  +157.82% (p=0.000 n=30+20)   13.580µ ± 1%   +66.12% (p=0.000 n=30+20)
Long              3.469µ ± 0%    3.055µ ± 6%   -11.92% (p=0.000 n=30+20)    4.807µ ± 0%   +38.58% (p=0.000 n=30+20)
geomean           9.082µ         16.18µ        +78.19%                      15.89µ        +74.91%

        │ memcmp.baseline.out │            memcmp.bionic.out            │            memcmp.scalar.out            │
        │         B/s         │     B/s       vs base                   │     B/s       vs base                   │
Short            4.407Gi ± 0%   1.769Gi ± 0%  -59.86% (p=0.000 n=30+20)   1.896Gi ± 0%  -56.98% (p=0.000 n=30+20)
Mid             14.240Gi ± 0%   5.523Gi ± 1%  -61.21% (p=0.000 n=30+20)   8.572Gi ± 1%  -39.80% (p=0.000 n=30+20)
Long             33.56Gi ± 0%   38.10Gi ± 6%  +13.53% (p=0.000 n=30+20)   24.22Gi ± 0%  -27.84% (p=0.000 n=30+20)
geomean          12.82Gi        7.194Gi       -43.88%                     7.328Gi       -42.83%

os: Linux
arch: x86_64
cpu: 
        │ memcmp.glibc.out │
        │      sec/op      │
Short          32.29µ ± 1%
Mid            10.25µ ± 0%
Long           3.111µ ± 0%
geomean        10.10µ

        │ memcmp.glibc.out │
        │       B/s        │
Short         3.605Gi ± 1%
Mid           11.36Gi ± 0%
Long          37.42Gi ± 0%
geomean       11.53Gi
Test Plan

Passes the test suite.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

fuz requested review of this revision.Aug 13 2023, 3:59 PM
fuz created this revision.

@ngie Would you like me to take out the test suite bits from this one, too?

I only have cosmetics remarks vs the code, which I'm going to spare this time

commit this without tests, add them to a separate review (maybe even do D41520)

This revision is now accepted and ready to land.Aug 21 2023, 8:28 AM