Page MenuHomeFreeBSD

lib/libc/amd64/string: add memccpy, strncat scalar, baseline implementations
ClosedPublic

Authored by fuz on Dec 4 2023, 10:17 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 24, 5:51 PM
Unknown Object (File)
Dec 15 2024, 1:53 PM
Unknown Object (File)
Dec 1 2024, 7:07 PM
Unknown Object (File)
Dec 1 2024, 6:59 PM
Unknown Object (File)
Nov 25 2024, 11:41 AM
Unknown Object (File)
Nov 23 2024, 12:10 AM
Unknown Object (File)
Nov 22 2024, 1:31 PM
Unknown Object (File)
Nov 22 2024, 10:26 AM
Subscribers

Details

Summary

Based on the strlcpy code from D42863, this DR adds a SIMD-enhanced
implementation of memccpy for amd64. A scalar implementation calling
into memchr and memcpy to do the job is provided, too. Then, strncat
is reimplemented to call into strlen and memccpy to do its job, allowing
it to benefit from the enhanced implementations.

Please note that this code does not behave exactly the same as the C
implementation of memccpy for overlapping inputs. However, overlapping
inputs are not allowed for this function by ISO/IEC 9899:1999 and neither
does the C code have code to deal with the possibility. It just
proceeds byte-by-byte, which may or may not do the expected thing for
some overlaps. We do not document whether overlapping inputs are
supported in memccpy(3).

New unit tests are added to cover memccpy in more detail.

The performance is up to 21x better than the C code. The scalar
implementation is pretty good, too, except for very short strings.

os: FreeBSD
arch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
        │ memccpy.pre.out │          memccpy.scalar.out          │        memccpy.baseline.out         │
        │     sec/op      │    sec/op     vs base                │   sec/op     vs base                │
Short         92.24µ ± 0%   109.23µ ± 0%  +18.42% (p=0.000 n=20)   66.93µ ± 0%  -27.44% (p=0.000 n=20)
Mid          52.091µ ± 0%   16.617µ ± 1%  -68.10% (p=0.000 n=20)   8.008µ ± 1%  -84.63% (p=0.000 n=20)
Long         80.934µ ± 0%   11.611µ ± 0%  -85.65% (p=0.000 n=20)   3.577µ ± 0%  -95.58% (p=0.000 n=20)
geomean       72.99µ         27.62µ       -62.16%                  12.42µ       -82.98%

        │ memccpy.pre.out │           memccpy.scalar.out           │          memccpy.baseline.out           │
        │       B/s       │      B/s       vs base                 │      B/s       vs base                  │
Short        1.262Gi ± 0%    1.066Gi ± 0%   -15.55% (p=0.000 n=20)    1.739Gi ± 0%    +37.82% (p=0.000 n=20)
Mid          2.235Gi ± 0%    7.006Gi ± 1%  +213.49% (p=0.000 n=20)   14.537Gi ± 1%   +550.49% (p=0.000 n=20)
Long         1.438Gi ± 0%   10.026Gi ± 0%  +597.03% (p=0.000 n=20)   32.550Gi ± 0%  +2162.92% (p=0.000 n=20)
geomean      1.595Gi         4.215Gi       +164.25%                   9.371Gi        +487.59%
Test Plan

passes the newly added unit tests and no new Kyua test suite failures
in other tests either.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable