Page MenuHomeFreeBSD

lib/libc/amd64/string: add timingsafe_memcmp() assembly implementation.
ClosedPublic

Authored by fuz on Sep 2 2023, 12:08 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 24, 5:57 PM
Unknown Object (File)
Fri, Jan 24, 5:47 PM
Unknown Object (File)
Sat, Jan 18, 3:11 AM
Unknown Object (File)
Tue, Jan 14, 9:48 AM
Unknown Object (File)
Thu, Jan 9, 5:28 PM
Unknown Object (File)
Dec 21 2024, 4:10 PM
Unknown Object (File)
Dec 21 2024, 4:05 PM
Unknown Object (File)
Dec 7 2024, 11:00 PM

Details

Summary

Conceptually very similar to timingsafe_bcmp(), but with comparison
logic inspired by Elijah Stone's
fancy memcmp. A baseline (SSE) implementation
was omitted this time as I was not able to get it to perform adequately.
Best I got was 8% over the scalar version for long inputs, but slower for
short inputs.

Performance is solid, at about 10x of the generic C
implementation overall:

os: FreeBSD
arch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
              │ memcmp.pre.out │          memcmp.amd64.out           │
              │     sec/op     │   sec/op     vs base                │
TsMemcmpShort     189.11µ ± 0%   55.85µ ± 0%  -70.47% (p=0.000 n=20)
TsMemcmpMid       146.47µ ± 0%   10.14µ ± 0%  -93.08% (p=0.000 n=20)
TsMemcmpLong     130.642µ ± 0%   6.608µ ± 0%  -94.94% (p=0.000 n=20)
geomean            153.5µ        15.52µ       -89.89%

              │ memcmp.pre.out │             memcmp.amd64.out             │
              │      B/s       │      B/s        vs base                  │
TsMemcmpShort     630.4Mi ± 0%    2134.4Mi ± 0%   +238.60% (p=0.000 n=20)
TsMemcmpMid       813.9Mi ± 0%   11761.9Mi ± 0%  +1345.11% (p=0.000 n=20)
TsMemcmpLong      912.5Mi ± 0%   18039.2Mi ± 0%  +1876.92% (p=0.000 n=20)
geomean           776.5Mi          7.499Gi        +888.99%

As with the timingsafe_bcmp implementation from D41673, care has been
taken to ensure that only instructions with data operand independent
timing from Intel's list have been used.

Sponsored by: The FreeBSD Foundation

Test Plan

passes extended unit tests from D41528.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

fuz requested review of this revision.Sep 2 2023, 12:08 PM

Does anybody know what timing-independence mean? For instance, should the different alignment of the compared memory be ignored, or not? Even more interesting is perhaps the cache and TLB state.

Does anybody know what timing-independence mean?

Timing reveals no information about the value of the data.

It's ok (and unavoidable) to reveal information about the size of the buffer, its position in memory, the state of the CPU caches, etc.

Looks correct to me.

lib/libc/amd64/string/timingsafe_memcmp.S
33 ↗(On Diff #126815)

It might be helpful to include a comment here saying which values are in which registers, in case anyone can't remember the calling convention.

117 ↗(On Diff #126815)

s/fist/first/

  • lib/libc/amd64/string/timingsafe_memcmp.S: respond to code review

I've fixed the typo and added the comment as requested by @cperciva.

This revision is now accepted and ready to land.Oct 12 2023, 3:08 PM