HomeFreeBSD

calculate_crc32c: Add SSE4.2 implementation on x86

Description

calculate_crc32c: Add SSE4.2 implementation on x86

Derived from an implementation by Mark Adler.

The fast loop performs three simultaneous CRCs over subsets of the data
before composing them. This takes advantage of certain properties of
the CRC32 implementation in Intel hardware. (The CRC instruction takes 1
cycle but has 2-3 cycles of latency.)

The CRC32 instruction does not manipulate FPU state.

i386 does not have the crc32q instruction, so avoid it there. Otherwise
the implementation is identical to amd64.

Add basic userland tests to verify correctness on a variety of inputs.

PR: 216467
Reported by: Ben RUBSON <ben.rubson at gmail.com>
Reviewed by: kib@, markj@ (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D9342

Details

Provenance
cemAuthored on
Differential Revision
D9342: calculate_crc32c: Add SSE4.2 implementation on x86
Parents
rS313005: Update CFLAGS for clang compatibility
Branches
Unknown
Tags
Unknown