Page MenuHomeFreeBSD

aesni(4): Add support for x86 SHA intrinsics
ClosedPublic

Authored by cem on Sep 22 2017, 4:57 AM.

Details

Summary

Some x86 class CPUs have accelerated intrinsics for SHA1 and SHA256.
Provide this functionality on CPUs that support it.

This implements CRYPTO_SHA1, CRYPTO_SHA1_HMAC, and CRYPTO_SHA2_256_HMAC.

Correctness: The cryptotest.py suite in tests/sys/opencrypto has been
enhanced to verify SHA1 and SHA256 HMAC using standard NIST test
vectors. The test passes on this driver. Additionally, jhb's
cryptocheck tool has been used to compare various random inputs against
OpenSSL. This test also passes.

Rough performance averages on AMD Ryzen 1950X (4kB buffer):
aesni: SHA1: ~8300 Mb/s SHA256: ~8000 Mb/s
cryptosoft: ~1800 Mb/s SHA256: ~1800 Mb/s

So ~4.4-4.6x speedup depending on algorithm choice. This is consistent
with the results the Linux folks saw for 4kB buffers.

The driver borrows SHA update code from sys/crypto sha1 and sha256. The
intrinsic step function comes from Intel under a 3-clause BSDL.[0] The
intel_sha_extensions_sha<foo>_intrinsic.c files were renamed and lightly
modified (added const, resolved a warning or two; included the sha_sse
header to declare the functions).

[0]: https://software.intel.com/en-us/articles/intel-sha-extensions-implementations

Test Plan

cryptotest.py , cryptocheck for correctness;

cryptotest (tools/crypto) for performance numbers

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

cem created this revision.Sep 22 2017, 4:57 AM
kib added inline comments.Sep 22 2017, 1:24 PM
sys/crypto/sha_sse/intel_sha1.c
2 ↗(On Diff #33299)

Is core fine with this license addition ?

sys/crypto/sha_sse/sha_sse.c
87 ↗(On Diff #33299)

This locking should be unified between aes and sha drivers. Best would be to move the stuff into the common header and probably some support subroutines.

sys/crypto/sha_sse/sha_sse.h
29 ↗(On Diff #33299)

Please use normal include braces.

sys/modules/sha_sse/Makefile
22 ↗(On Diff #33299)

This is very incomplete.

kib added a reviewer: rlibby.Sep 22 2017, 2:17 PM
cem added inline comments.Sep 22 2017, 2:23 PM
sys/crypto/sha_sse/intel_sha1.c
2 ↗(On Diff #33299)

Isn't this 3-clause BSD?

sys/crypto/sha_sse/sha_sse.h
29 ↗(On Diff #33299)

In what way is the include guard hack better?

kib added inline comments.Sep 22 2017, 3:00 PM
sys/crypto/sha_sse/sha_sse.h
29 ↗(On Diff #33299)
  1. It does not depend on the compiler extensions.
  2. Sometimes it is useful to know that the header was included, where the guard provides the ready-to-use symbol.
rlibby added inline comments.Sep 22 2017, 4:14 PM
sys/crypto/sha_sse/sha_sse.c
403–417 ↗(On Diff #33299)

Do we actually need a fpu_kern_enter in this procedure? It doesn't seem to be used. And there's no requirement from the API that the same thread that does setup does process, right?

cem updated this revision to Diff 33337.Sep 22 2017, 4:36 PM
cem marked 3 inline comments as done.
  • Remove unused FPU context during setup()
  • Regress to include guard hack for kib
  • Add full Intel intrinsics header dependencies in module Makefile
cem added inline comments.Sep 22 2017, 4:36 PM
sys/crypto/sha_sse/sha_sse.c
87 ↗(On Diff #33299)

Seems like an orthogonal cleanup. Any suggestion where it should go?

403–417 ↗(On Diff #33299)

Guess not — presumably it was used in aesni for key scheduling. I will remove it.

I don't believe there's any requirement from the API that the same thread does process and setup.

cem updated this revision to Diff 33338.Sep 22 2017, 4:39 PM

Fix the build. Somehow forgot to compile it after making the latest round of changes. Sorry :(. Coffee hasn't kicked in yet.

jhb added a comment.Sep 22 2017, 7:12 PM

On a general note, I would almost consider making this part of aesni.ko instead if the effective sets of CPUs that support both instruction sets is the same (or nearly so). In particular, the existing aesni.ko code does not accelerate things that combine AES with an HMAC (like an IPSec session using AES-CBC with a SHA HMAC). Having a single module that registers support for whatever algorithms the CPU supports (only SHA if AESNI isn't available for example) would permit the driver to handle these chained operations by combining AESNI with accelerated SHA hashing. (In this case aesni.ko would perhaps be better named "x86_crypto.ko" or some such to mean "accelerated software crypto for x86".)

cem added a comment.Sep 22 2017, 7:20 PM
In D12452#258449, @jhb wrote:

On a general note, I would almost consider making this part of aesni.ko instead if the effective sets of CPUs that support both instruction sets is the same (or nearly so).

The overlap is not huge yet. It will probably grow in the future, but I don't know. For now, the vast majority of aesni-supported CPUs do not support SHA extensions.

In particular, the existing aesni.ko code does not accelerate things that combine AES with an HMAC (like an IPSec session using AES-CBC with a SHA HMAC). Having a single module that registers support for whatever algorithms the CPU supports (only SHA if AESNI isn't available for example) would permit the driver to handle these chained operations by combining AESNI with accelerated SHA hashing. (In this case aesni.ko would perhaps be better named "x86_crypto.ko" or some such to mean "accelerated software crypto for x86".)

Yeah, that is an interesting point. It would increase the complexity somewhat. They could also always be merged in the future.

jmg added a comment.Sep 23 2017, 6:12 AM
In D12452#258449, @jhb wrote:

On a general note, I would almost consider making this part of aesni.ko instead if the effective sets of CPUs that support both instruction sets is the same (or nearly so). In particular, the existing aesni.ko code does not accelerate things that combine AES with an HMAC (like an IPSec session using AES-CBC with a SHA HMAC). Having a single module that registers support for whatever algorithms the CPU supports (only SHA if AESNI isn't available for example) would permit the driver to handle these chained operations by combining AESNI with accelerated SHA hashing. (In this case aesni.ko would perhaps be better named "x86_crypto.ko" or some such to mean "accelerated software crypto for x86".)

Yes, this should be integrated directly into the aesni.ko module. I had thoughts of doing this myself, but never got the chance.

cem planned changes to this revision.Sep 23 2017, 8:25 PM
In D12452#258496, @jmg wrote:

Yes, this should be integrated directly into the aesni.ko module. I had thoughts of doing this myself, but never got the chance.

I guess that's consensus. I'll adapt this to fit into aesni.

cem updated this revision to Diff 33386.Sep 24 2017, 7:56 PM

Integrate into aesni(4)

cem retitled this revision from Add a new driver, sha_sse(4) to aesni(4): Add support for x86 SHA intrinsics.Sep 24 2017, 7:56 PM
cem edited the summary of this revision. (Show Details)
jhb added a comment.Sep 25 2017, 8:08 PM

If you'd like an alternate testing tool (one that can test AES-CBC chained with SHA for example), you can try 'cryptocheck' from github/bsdjhb/freebsd.git on the 'cryptocheck' branch. You can also run IPSec across a session using the ciphers for testing but that requires a bit more setup.

sys/crypto/aesni/aesni.c
144 ↗(On Diff #33386)

Perhaps s/SHA2/SHA256/?

cem added a comment.Sep 25 2017, 8:26 PM
In D12452#259005, @jhb wrote:

If you'd like an alternate testing tool (one that can test AES-CBC chained with SHA for example), you can try 'cryptocheck' from github/bsdjhb/freebsd.git on the 'cryptocheck' branch. You can also run IPSec across a session using the ciphers for testing but that requires a bit more setup.

Yeah, I'm not going to try IPsec but I should give your tool a go.

The parts I'm least sure about are the cryptops with encryption + authentication descriptors. Also, correct detection of whether an op is *encryption* or *decryption* (in which case we need to verify). I tried to follow the patterns GMAC was using, but that's also its own weird thing and I don't know if it is right for HMAC.

sys/crypto/aesni/aesni.c
144 ↗(On Diff #33386)

Ah, good point.

cem added a comment.Sep 25 2017, 8:48 PM

Cryptocheck:

$ ./tools/tools/crypto/cryptocheck -v -a hmac -d aesni0
sha1 (16) matched (cryptodev device aesni0)
sha256 (16) matched (cryptodev device aesni0)
cryptocheck: cryptodev sha384 HMAC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev sha512 HMAC not supported for device aesni0: Operation not supported

(As expected, SHA384 and 512 are not supported.)

"authenc" (Authenticated AES modes) also seem to work; again, 384 and 512 are expected to be unsupported:

$ ./tools/tools/crypto/cryptocheck -v -a authenc -d aesni0
aes-cbc+sha1 (16) matched (cryptodev device aesni0)
aes-cbc+sha256 (16) matched (cryptodev device aesni0)
aes-cbc192+sha1 (16) matched (cryptodev device aesni0)
aes-cbc192+sha256 (16) matched (cryptodev device aesni0)
aes-cbc256+sha1 (16) matched (cryptodev device aesni0)
aes-cbc256+sha256 (16) matched (cryptodev device aesni0)
aes-ctr+sha1 (16) matched (cryptodev device aesni0)
aes-ctr+sha256 (16) matched (cryptodev device aesni0)
aes-ctr192+sha1 (16) matched (cryptodev device aesni0)
aes-ctr192+sha256 (16) matched (cryptodev device aesni0)
aes-ctr256+sha1 (16) matched (cryptodev device aesni0)
aes-ctr256+sha256 (16) matched (cryptodev device aesni0)
aes-xts+sha1 (16) matched (cryptodev device aesni0)
aes-xts+sha256 (16) matched (cryptodev device aesni0)
aes-xts256+sha1 (16) matched (cryptodev device aesni0)
aes-xts256+sha256 (16) matched (cryptodev device aesni0)
cryptocheck: cryptodev aes-cbc+sha384 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-cbc+sha512 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-cbc192+sha384 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-cbc192+sha512 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-cbc256+sha384 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-cbc256+sha512 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-ctr+sha384 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-ctr+sha512 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-ctr192+sha384 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-ctr192+sha512 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-ctr256+sha384 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-ctr256+sha512 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-xts+sha384 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-xts+sha512 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-xts256+sha384 AUTHENC not supported for device aesni0: Operation not supported
cryptocheck: cryptodev aes-xts256+sha512 AUTHENC not supported for device aesni0: Operation not supported

Additionally, I didn't seem to break "blkcipher" (unauthenticated AES modes) nor "aead" (auth. encryption w/ associated data).

Hm, some authenc tests fail with -z. The rest seem ok:

$ ( ./tools/tools/crypto/cryptocheck -vz -a all -d aesni0 >/dev/null ) 2>&1 | grep -v sha384 |grep -v sha512
cryptocheck: cryptodev aes-cbc+sha1 (65536) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc+sha1 (131072) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc+sha1 (245760) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc+sha256 (65536) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc+sha256 (131072) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc+sha256 (245760) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc192+sha1 (65536) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc192+sha1 (131072) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc192+sha1 (245760) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc192+sha256 (65536) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc192+sha256 (131072) AUTHENC failed for device aesni0: Bad message
cryptocheck: cryptodev aes-cbc192+sha256 (245760) AUTHENC failed for device aesni0: Bad message
cem updated this revision to Diff 33418.Sep 25 2017, 8:50 PM
cem marked 2 inline comments as done.
cem edited the summary of this revision. (Show Details)

Replace SHA2 with SHA256. Other SHA2 lengths are unsupported.

jhb added inline comments.Sep 25 2017, 8:54 PM
sys/crypto/aesni/aesni.c
817 ↗(On Diff #33418)

This block probably needs to be GCM-only

847 ↗(On Diff #33418)

You could move it into the 'case' here.

cem updated this revision to Diff 33420.Sep 25 2017, 9:06 PM

Remove MAC validation for HMACs. It isn't expected of drivers at this time.

This fixes all of the cryptocheck tests.

cem edited the summary of this revision. (Show Details)Sep 25 2017, 9:07 PM
cem edited the test plan for this revision. (Show Details)
cem updated this revision to Diff 33424.Sep 25 2017, 10:37 PM
cem marked 2 inline comments as done.

Move tag copydata into GCM-only section.

cem edited the summary of this revision. (Show Details)Sep 25 2017, 10:38 PM

Just dropping some nitpicks, I haven't reached a deep enough understanding for a real review yet.

sys/crypto/aesni/aesni.c
139 ↗(On Diff #33386)

It looks like this could be device_set_desc (no _copy) as these are string constants and it doesn't look like they are ever mutated. Not new though.

554 ↗(On Diff #33420)

Spell 8 as CHAR_BIT and add if (authini->cri_klen % CHAR_BIT != 0) return (EINVAL), or too paranoid?

563–582 ↗(On Diff #33420)

So as earlier, we don't need the fpu context for setup in the encini == NULL case. It may not be worth worrying about.

883 ↗(On Diff #33420)

Maybe better spelled howmany(MAX(SHA1_HASH_LEN, SHA2_256_HASH_LEN), sizeof(uint32_t)).

cem planned changes to this revision.Sep 26 2017, 12:17 AM
cem added inline comments.
sys/crypto/aesni/aesni.c
139 ↗(On Diff #33386)

True. Ideally there would be a device_set_descf() method that would take a format string and then some copy/paste here could be reduced. I'll remove _copy.

554 ↗(On Diff #33420)

I'm not a fan of spelling 8 as CHAR_BIT, but we could add the modulus check. Some other drivers use roundup().

563–582 ↗(On Diff #33420)

Oh, true. Will fix.

883 ↗(On Diff #33420)

I mean, these numbers are written in stone. I am not worried about either HASH_LEN changing or becoming not an even multiple of 4 bytes. I chose this way of writing it just because it is slightly less magical than "8."

cem updated this revision to Diff 33425.Sep 26 2017, 12:21 AM
cem marked 4 inline comments as done.
  • Use device_set_desc (no copy) with constant strings
  • Check for non-bytesized keys
  • Skip grabbing FPU context for MAC-only newsession calls
jhb accepted this revision.Sep 26 2017, 10:07 PM
jhb added inline comments.
sys/crypto/aesni/aesni.c
284 ↗(On Diff #33425)

Should probably do something to ensure that GMAC auth is only used with GCM cipher and vice versa. For ccr(4) I think I used a bool gcm_hash and then after walking the cri loop ensured that if gcm_hash was set, encinit was valid and was GCM, maybe something like:

if (gcm_hash && (encini == NULL || encini->cipher != CRYPTO_AES_NIST_GCM_16))
    return (EINVAL);
if (encini != NULL && encini->cipher == CRYPTO_AES_NIST_GCM_16 && !gcm_hash)
    return (EINVAL);
This revision is now accepted and ready to land.Sep 26 2017, 10:07 PM
cem added inline comments.Sep 26 2017, 10:51 PM
sys/crypto/aesni/aesni.c
284 ↗(On Diff #33425)

Will do!

cem updated this revision to Diff 33468.Sep 26 2017, 11:05 PM

Add GCM + GMAC check to aesni while we're here.

This revision now requires review to proceed.Sep 26 2017, 11:05 PM
This revision was automatically updated to reflect the committed changes.
cem marked an inline comment as done.