Page MenuHomeFreeBSD

aesni(4): Add support for x86 SHA intrinsics
ClosedPublic

Authored by cem on Sep 22 2017, 4:57 AM.
Tags
None
Referenced Files
F108171481: D12452.diff
Wed, Jan 22, 4:56 AM
F108099206: D12452.id33470.diff
Tue, Jan 21, 9:18 AM
Unknown Object (File)
Fri, Jan 17, 11:02 PM
Unknown Object (File)
Tue, Jan 14, 7:36 AM
Unknown Object (File)
Fri, Jan 10, 2:01 AM
Unknown Object (File)
Thu, Jan 2, 6:02 AM
Unknown Object (File)
Wed, Jan 1, 1:21 AM
Unknown Object (File)
Dec 22 2024, 12:54 PM
Subscribers

Details

Summary

Some x86 class CPUs have accelerated intrinsics for SHA1 and SHA256.
Provide this functionality on CPUs that support it.

This implements CRYPTO_SHA1, CRYPTO_SHA1_HMAC, and CRYPTO_SHA2_256_HMAC.

Correctness: The cryptotest.py suite in tests/sys/opencrypto has been
enhanced to verify SHA1 and SHA256 HMAC using standard NIST test
vectors. The test passes on this driver. Additionally, jhb's
cryptocheck tool has been used to compare various random inputs against
OpenSSL. This test also passes.

Rough performance averages on AMD Ryzen 1950X (4kB buffer):
aesni: SHA1: ~8300 Mb/s SHA256: ~8000 Mb/s
cryptosoft: ~1800 Mb/s SHA256: ~1800 Mb/s

So ~4.4-4.6x speedup depending on algorithm choice. This is consistent
with the results the Linux folks saw for 4kB buffers.

The driver borrows SHA update code from sys/crypto sha1 and sha256. The
intrinsic step function comes from Intel under a 3-clause BSDL.[0] The
intel_sha_extensions_sha<foo>_intrinsic.c files were renamed and lightly
modified (added const, resolved a warning or two; included the sha_sse
header to declare the functions).

[0]: https://software.intel.com/en-us/articles/intel-sha-extensions-implementations

Test Plan

cryptotest.py , cryptocheck for correctness;

cryptotest (tools/crypto) for performance numbers

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 11771
Build 12115: arc lint + arc unit

Event Timeline

sys/crypto/sha_sse/intel_sha1.c
2 ↗(On Diff #33299)

Is core fine with this license addition ?

sys/crypto/sha_sse/sha_sse.c
87 ↗(On Diff #33299)

This locking should be unified between aes and sha drivers. Best would be to move the stuff into the common header and probably some support subroutines.

sys/crypto/sha_sse/sha_sse.h
29 ↗(On Diff #33299)

Please use normal include braces.

sys/modules/sha_sse/Makefile
22 ↗(On Diff #33299)

This is very incomplete.

sys/crypto/sha_sse/intel_sha1.c
2 ↗(On Diff #33299)

Isn't this 3-clause BSD?

sys/crypto/sha_sse/sha_sse.h
29 ↗(On Diff #33299)

In what way is the include guard hack better?

sys/crypto/sha_sse/sha_sse.h
29 ↗(On Diff #33299)
  1. It does not depend on the compiler extensions.
  2. Sometimes it is useful to know that the header was included, where the guard provides the ready-to-use symbol.
sys/crypto/sha_sse/sha_sse.c
403–417 ↗(On Diff #33299)

Do we actually need a fpu_kern_enter in this procedure? It doesn't seem to be used. And there's no requirement from the API that the same thread that does setup does process, right?

cem marked 3 inline comments as done.
  • Remove unused FPU context during setup()
  • Regress to include guard hack for kib
  • Add full Intel intrinsics header dependencies in module Makefile
sys/crypto/sha_sse/sha_sse.c
87 ↗(On Diff #33299)

Seems like an orthogonal cleanup. Any suggestion where it should go?

403–417 ↗(On Diff #33299)

Guess not — presumably it was used in aesni for key scheduling. I will remove it.

I don't believe there's any requirement from the API that the same thread does process and setup.

Fix the build. Somehow forgot to compile it after making the latest round of changes. Sorry :(. Coffee hasn't kicked in yet.

On a general note, I would almost consider making this part of aesni.ko instead if the effective sets of CPUs that support both instruction sets is the same (or nearly so). In particular, the existing aesni.ko code does not accelerate things that combine AES with an HMAC (like an IPSec session using AES-CBC with a SHA HMAC). Having a single module that registers support for whatever algorithms the CPU supports (only SHA if AESNI isn't available for example) would permit the driver to handle these chained operations by combining AESNI with accelerated SHA hashing. (In this case aesni.ko would perhaps be better named "x86_crypto.ko" or some such to mean "accelerated software crypto for x86".)

In D12452#258449, @jhb wrote:

On a general note, I would almost consider making this part of aesni.ko instead if the effective sets of CPUs that support both instruction sets is the same (or nearly so).

The overlap is not huge yet. It will probably grow in the future, but I don't know. For now, the vast majority of aesni-supported CPUs do not support SHA extensions.

In particular, the existing aesni.ko code does not accelerate things that combine AES with an HMAC (like an IPSec session using AES-CBC with a SHA HMAC). Having a single module that registers support for whatever algorithms the CPU supports (only SHA if AESNI isn't available for example) would permit the driver to handle these chained operations by combining AESNI with accelerated SHA hashing. (In this case aesni.ko would perhaps be better named "x86_crypto.ko" or some such to mean "accelerated software crypto for x86".)

Yeah, that is an interesting point. It would increase the complexity somewhat. They could also always be merged in the future.

In D12452#258449, @jhb wrote:

On a general note, I would almost consider making this part of aesni.ko instead if the effective sets of CPUs that support both instruction sets is the same (or nearly so). In particular, the existing aesni.ko code does not accelerate things that combine AES with an HMAC (like an IPSec session using AES-CBC with a SHA HMAC). Having a single module that registers support for whatever algorithms the CPU supports (only SHA if AESNI isn't available for example) would permit the driver to handle these chained operations by combining AESNI with accelerated SHA hashing. (In this case aesni.ko would perhaps be better named "x86_crypto.ko" or some such to mean "accelerated software crypto for x86".)

Yes, this should be integrated directly into the aesni.ko module. I had thoughts of doing this myself, but never got the chance.

cem planned changes to this revision.Sep 23 2017, 8:25 PM
In D12452#258496, @jmg wrote:

Yes, this should be integrated directly into the aesni.ko module. I had thoughts of doing this myself, but never got the chance.

I guess that's consensus. I'll adapt this to fit into aesni.

Integrate into aesni(4)

cem retitled this revision from Add a new driver, sha_sse(4) to aesni(4): Add support for x86 SHA intrinsics.Sep 24 2017, 7:56 PM
cem edited the summary of this revision. (Show Details)

If you'd like an alternate testing tool (one that can test AES-CBC chained with SHA for example), you can try 'cryptocheck' from github/bsdjhb/freebsd.git on the 'cryptocheck' branch. You can also run IPSec across a session using the ciphers for testing but that requires a bit more setup.

sys/crypto/aesni/aesni.c
144

Perhaps s/SHA2/SHA256/?

In D12452#259005, @jhb wrote:

If you'd like an alternate testing tool (one that can test AES-CBC chained with SHA for example), you can try 'cryptocheck' from github/bsdjhb/freebsd.git on the 'cryptocheck' branch. You can also run IPSec across a session using the ciphers for testing but that requires a bit more setup.

Yeah, I'm not going to try IPsec but I should give your tool a go.

The parts I'm least sure about are the cryptops with encryption + authentication descriptors. Also, correct detection of whether an op is *encryption* or *decryption* (in which case we need to verify). I tried to follow the patterns GMAC was using, but that's also its own weird thing and I don't know if it is right for HMAC.

sys/crypto/aesni/aesni.c
144

Ah, good point.

Cryptocheck:

$ ./tools/tools/crypto/cryptocheck -v -a hmac -d aesni0
sha1 (16) matched (cryptodev device aesni0)
sha256 (16) matched (cryptodev device aesni0)
cryptocheck: cryptodev sha384 HMAC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev sha512 HMAC not supported for device aesni0: Operation not supported

(As expected, SHA384 and 512 are not supported.)

"authenc" (Authenticated AES modes) also seem to work; again, 384 and 512 are expected to be unsupported:

$ ./tools/tools/crypto/cryptocheck -v -a authenc -d aesni0
aes-cbc+sha1 (16) matched (cryptodev device aesni0)
aes-cbc+sha256 (16) matched (cryptodev device aesni0)
aes-cbc192+sha1 (16) matched (cryptodev device aesni0)
aes-cbc192+sha256 (16) matched (cryptodev device aesni0)
aes-cbc256+sha1 (16) matched (cryptodev device aesni0)
aes-cbc256+sha256 (16) matched (cryptodev device aesni0)
aes-ctr+sha1 (16) matched (cryptodev device aesni0)
aes-ctr+sha256 (16) matched (cryptodev device aesni0)
aes-ctr192+sha1 (16) matched (cryptodev device aesni0)
aes-ctr192+sha256 (16) matched (cryptodev device aesni0)
aes-ctr256+sha1 (16) matched (cryptodev device aesni0)
aes-ctr256+sha256 (16) matched (cryptodev device aesni0)
aes-xts+sha1 (16) matched (cryptodev device aesni0)
aes-xts+sha256 (16) matched (cryptodev device aesni0)
aes-xts256+sha1 (16) matched (cryptodev device aesni0)
aes-xts256+sha256 (16) matched (cryptodev device aesni0)
cryptocheck: cryptodev aes-cbc+sha384 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-cbc+sha512 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-cbc192+sha384 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-cbc192+sha512 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-cbc256+sha384 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-cbc256+sha512 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-ctr+sha384 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-ctr+sha512 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-ctr192+sha384 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-ctr192+sha512 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-ctr256+sha384 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-ctr256+sha512 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-xts+sha384 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-xts+sha512 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-xts256+sha384 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-xts256+sha512 AUTHENC not supported for device aesni0: Operation not supported

Additionally, I didn't seem to break "blkcipher" (unauthenticated AES modes) nor "aead" (auth. encryption w/ associated data).

Hm, some authenc tests fail with -z. The rest seem ok:

$ ( ./tools/tools/crypto/cryptocheck -vz -a all -d aesni0 >/dev/null ) 2>&1 | grep -v sha384 |grep -v sha512
cryptocheck: cryptodev aes-cbc+sha1 (65536) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc+sha1 (131072) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc+sha1 (245760) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc+sha256 (65536) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc+sha256 (131072) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc+sha256 (245760) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc192+sha1 (65536) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc192+sha1 (131072) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc192+sha1 (245760) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc192+sha256 (65536) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc192+sha256 (131072) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc192+sha256 (245760) AUTHENC failed for device aesni0: Bad message
cem marked 2 inline comments as done.
cem edited the summary of this revision. (Show Details)

Replace SHA2 with SHA256. Other SHA2 lengths are unsupported.

sys/crypto/aesni/aesni.c
828–829

This block probably needs to be GCM-only

853

You could move it into the 'case' here.

Remove MAC validation for HMACs. It isn't expected of drivers at this time.

This fixes all of the cryptocheck tests.

cem edited the test plan for this revision. (Show Details)
cem marked 2 inline comments as done.

Move tag copydata into GCM-only section.

Just dropping some nitpicks, I haven't reached a deep enough understanding for a real review yet.

sys/crypto/aesni/aesni.c
139

It looks like this could be device_set_desc (no _copy) as these are string constants and it doesn't look like they are ever mutated. Not new though.

562–596

So as earlier, we don't need the fpu context for setup in the encini == NULL case. It may not be worth worrying about.

566

Spell 8 as CHAR_BIT and add if (authini->cri_klen % CHAR_BIT != 0) return (EINVAL), or too paranoid?

897

Maybe better spelled howmany(MAX(SHA1_HASH_LEN, SHA2_256_HASH_LEN), sizeof(uint32_t)).

cem planned changes to this revision.Sep 26 2017, 12:17 AM
cem added inline comments.
sys/crypto/aesni/aesni.c
139

True. Ideally there would be a device_set_descf() method that would take a format string and then some copy/paste here could be reduced. I'll remove _copy.

562–596

Oh, true. Will fix.

566

I'm not a fan of spelling 8 as CHAR_BIT, but we could add the modulus check. Some other drivers use roundup().

897

I mean, these numbers are written in stone. I am not worried about either HASH_LEN changing or becoming not an even multiple of 4 bytes. I chose this way of writing it just because it is slightly less magical than "8."

cem marked 4 inline comments as done.
  • Use device_set_desc (no copy) with constant strings
  • Check for non-bytesized keys
  • Skip grabbing FPU context for MAC-only newsession calls
jhb added inline comments.
sys/crypto/aesni/aesni.c
289

Should probably do something to ensure that GMAC auth is only used with GCM cipher and vice versa. For ccr(4) I think I used a bool gcm_hash and then after walking the cri loop ensured that if gcm_hash was set, encinit was valid and was GCM, maybe something like:

if (gcm_hash && (encini == NULL || encini->cipher != CRYPTO_AES_NIST_GCM_16))
    return (EINVAL);
if (encini != NULL && encini->cipher == CRYPTO_AES_NIST_GCM_16 && !gcm_hash)
    return (EINVAL);
This revision is now accepted and ready to land.Sep 26 2017, 10:07 PM
sys/crypto/aesni/aesni.c
289

Will do!

Add GCM + GMAC check to aesni while we're here.

This revision now requires review to proceed.Sep 26 2017, 11:05 PM
This revision was automatically updated to reflect the committed changes.
cem marked an inline comment as done.