Page MenuHomeFreeBSD

lib/libc/amd64/string: add timingsafe_bcmp(3) scalar, baseline implementations
ClosedPublic

Authored by fuz on Aug 31 2023, 3:44 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, May 15, 11:38 PM
Unknown Object (File)
Wed, May 15, 11:38 PM
Unknown Object (File)
Wed, May 15, 11:38 PM
Unknown Object (File)
Wed, May 15, 11:38 PM
Unknown Object (File)
Mon, May 13, 12:38 PM
Unknown Object (File)
Mon, May 6, 2:35 AM
Unknown Object (File)
Wed, May 1, 10:38 AM
Unknown Object (File)
Sun, Apr 28, 4:16 PM

Details

Summary

As part of an ongoing FreeBSD Foundation project to enhance libc
string functions with SIMD on amd64, enhance timingsafe_bcmp(3).
As usual, two implementations, selectable by ARCHLEVEL (see simd(7))
are provided: one (scalar) without SIMD, and one (baseline) with SSE/SSE2.
AVX or AVX-512 implementations may be provided with a future changeset.

Very straightforward and similar to memcmp(3). The code has
been written to use only instructions specified as having
data operand independent timing by Intel.

Performance appears to be quite ok:
The “pre” benchmark set refers to the generic C implementation.

os: FreeBSD
arch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
            │ memcmp.pre.out │          memcmp.scalar.out          │         memcmp.baseline.out         │
            │     sec/op     │   sec/op     vs base                │   sec/op     vs base                │
TsBcmpShort     101.65µ ± 1%   56.70µ ± 1%  -44.22% (p=0.000 n=20)   36.65µ ± 0%  -63.95% (p=0.000 n=20)
TsBcmpMid       29.106µ ± 0%   8.412µ ± 0%  -71.10% (p=0.000 n=20)   7.028µ ± 0%  -75.85% (p=0.000 n=20)
TsBcmpLong      13.974µ ± 0%   5.096µ ± 0%  -63.53% (p=0.000 n=20)   3.481µ ± 0%  -75.09% (p=0.000 n=20)
geomean          34.58µ        13.44µ       -61.12%                  9.643µ       -72.11%

            │ memcmp.pre.out │           memcmp.scalar.out            │          memcmp.baseline.out           │
            │      B/s       │      B/s       vs base                 │      B/s       vs base                 │
TsBcmpShort     1.145Gi ± 1%    2.053Gi ± 1%   +79.28% (p=0.000 n=20)    3.177Gi ± 0%  +177.36% (p=0.000 n=20)
TsBcmpMid       4.000Gi ± 0%   13.840Gi ± 0%  +246.02% (p=0.000 n=20)   16.565Gi ± 0%  +314.14% (p=0.000 n=20)
TsBcmpLong      8.331Gi ± 0%   22.845Gi ± 0%  +174.23% (p=0.000 n=20)   33.443Gi ± 0%  +301.44% (p=0.000 n=20)
geomean         3.367Gi         8.659Gi       +157.18%                   12.07Gi       +258.60%

Sponsored by: The FreeBSD Foundation

Test Plan

passes extended memcmp() tests of D41528. Constant time
properties to be verified by manual code review.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

fuz requested review of this revision.Aug 31 2023, 3:44 PM
markj added inline comments.
lib/libc/amd64/string/timingsafe_bcmp.S
34

Is there a reason not to pad with int3 instead?

fuz marked an inline comment as done.Aug 31 2023, 10:06 PM
fuz added inline comments.
lib/libc/amd64/string/timingsafe_bcmp.S
34

The padding must be executable as it is traversed to get to the loop entrance.

fuz marked an inline comment as done.Aug 31 2023, 10:35 PM
  • lib/libc/amd64/string/timingsafe_bcmp.S: fix off-by-one error
  • lib/libc/amd64/string/timingsafe_bcmp.S: fix jump to wrong label

Two bugs I found in the code that the test suite unfortunately did
not catch. I hope it's all correct now.

This revision is now accepted and ready to land.Oct 11 2023, 7:48 PM