string: add timingsafe_memcmp() assembly implementation.
ClosedPublic
Actions

Authored by fuz on Sep 2 2023, 12:08 PM.

Details

Reviewers

cperciva
kib
mjg

Group Reviewers

secteam
security

Commits

rG9a6a587e672b: lib/libc/amd64/string: add timingsafe_memcmp() assembly implementation
rG2e64c927d26e: share/man/man7/simd.7: document new amd64 timingsafe_memcmp()
rG5048c1b85506: lib/libc/amd64/string: add timingsafe_memcmp() assembly implementation
rGa8cc4671c06a: share/man/man7/simd.7: document new amd64 timingsafe_memcmp()

Summary

Conceptually very similar to timingsafe_bcmp(), but with comparison
logic inspired by Elijah Stone's
fancy memcmp. A baseline (SSE) implementation
was omitted this time as I was not able to get it to perform adequately.
Best I got was 8% over the scalar version for long inputs, but slower for
short inputs.

Performance is solid, at about 10x of the generic C
implementation overall:

os: FreeBSD
arch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
              │ memcmp.pre.out │          memcmp.amd64.out           │
              │     sec/op     │   sec/op     vs base                │
TsMemcmpShort     189.11µ ± 0%   55.85µ ± 0%  -70.47% (p=0.000 n=20)
TsMemcmpMid       146.47µ ± 0%   10.14µ ± 0%  -93.08% (p=0.000 n=20)
TsMemcmpLong     130.642µ ± 0%   6.608µ ± 0%  -94.94% (p=0.000 n=20)
geomean            153.5µ        15.52µ       -89.89%

              │ memcmp.pre.out │             memcmp.amd64.out             │
              │      B/s       │      B/s        vs base                  │
TsMemcmpShort     630.4Mi ± 0%    2134.4Mi ± 0%   +238.60% (p=0.000 n=20)
TsMemcmpMid       813.9Mi ± 0%   11761.9Mi ± 0%  +1345.11% (p=0.000 n=20)
TsMemcmpLong      912.5Mi ± 0%   18039.2Mi ± 0%  +1876.92% (p=0.000 n=20)
geomean           776.5Mi          7.499Gi        +888.99%

As with the timingsafe_bcmp implementation from D41673, care has been
taken to ensure that only instructions with data operand independent
timing from Intel's list have been used.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

fuz created this revision.Sep 2 2023, 12:08 PM

Herald added a subscriber: imp. · View Herald TranscriptSep 2 2023, 12:08 PM

fuz requested review of this revision.Sep 2 2023, 12:08 PM

Harbormaster completed remote builds in B53410: Diff 126815.Sep 2 2023, 12:08 PM

fuz added reviewers: secteam, security.Sep 2 2023, 12:13 PM

Does anybody know what timing-independence mean? For instance, should the different alignment of the compared memory be ignored, or not? Even more interesting is perhaps the cache and TLB state.

Does anybody know what timing-independence mean?

Timing reveals no information about the value of the data.

It's ok (and unavoidable) to reveal information about the size of the buffer, its position in memory, the state of the CPU caches, etc.

Looks correct to me.

lib/libc/amd64/string/timingsafe_memcmp.S
33 ↗	(On Diff #126815)	It might be helpful to include a comment here saying which values are in which registers, in case anyone can't remember the calling convention.
117 ↗	(On Diff #126815)	s/fist/first/