Details

Reviewers

cem
jhb
jmg

Commits

rS348268: Add an AESNI-optimized version of the CCM/CBC cryptographic and authentication

Summary

This adds the much faster AESNI version of the CCM and CBC encryption/authentication.

Test Plan

cryptocheck -A 0 -a aes-ccm -d aesni 1024
cryptocheck -A 10 -a aes-ccm -d aesni 1024

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

cem added inline comments.Apr 6 2019, 12:12 AM

sys/crypto/aesni/aesni_ccm.c
70 ↗	(On Diff #55817)	This file is more or less new code; it's free to follow style(9) from the beginning. So nitpicks here: Blank line between variable declarations and first statement; and `return (retval);`.
82 ↗	(On Diff #55817)	ditto the blank line thing
99 ↗	(On Diff #55817)	Seems like `retval` could be named something more descriptive, like `B0` or `MAC`. And `temp_block` is maybe `auth_block` or something (`staging_block` would match software CBC-MAC).
109 ↗	(On Diff #55817)	Does `__m128i foo = 0;` generate different code than `__m128i foo = _mm_setzero_si128();`? It isn't important, I'm just vaguely curious. Edit: Ahhh, __m128i is literally a vector, rather than integral, type, according to the compiler. So it cannot simply be initialized with an integer.
129–130 ↗	(On Diff #55817)	Could just use be16enc(&temp_block, auth_len);
134–136 ↗	(On Diff #55817)	What do you mean by calling convention? The immediate function takes a size_t, which can represent larger values. I understand that internally OCF deals only in `int` sizes, but I'd suggest either implementing the 3rd mode or asserting we're within range of the 2nd mode anyway.
140–143 ↗	(On Diff #55817)	Simplified: be16enc(&temp_block, 0xfffe); be32enc((char *)&temp_block + 2, auth_len);
162 ↗	(On Diff #55817)	style nit: `sizeof(temp_block)`, for consistency with the object we're acting on
185 ↗	(On Diff #55817)	typo: abytes
186 ↗	(On Diff #55817)	usually spelled "4 GB" or "4 gigabytes" but not "4gbytes"
188–190 ↗	(On Diff #55817)	I don't know what these pre-processor error directives (there are a few) are intended to be, but probably remove them before commit. (Figure out how to incorporate them into comments or code, if necessary.)
193–196 ↗	(On Diff #55817)	Ugh. I know you just inherited this mess from GCM, but what an obnoxious API :P.
205–206 ↗	(On Diff #55817)	Is it impossible to generate a tag for AAD without any ciphered input?
230–231 ↗	(On Diff #55817)	You don't need explicit_bzero for ordinary initializations. Here `_mm_setzero_si128` would be fine. explicit_bzero is for deleting private keys when you release memory. Also, `current_block` is never used?
258 ↗	(On Diff #55817)	So really `last_block` is the rolling MAC value. I'd suggest a name incorporating MAC, for clarity.
259–262 ↗	(On Diff #55817)	Right, ok, we're generating a keystream from an encrypted series of counter blocks.
262 ↗	(On Diff #55817)	style nit: Usually we at declare variables at the beginning of some scope, if not function scope.
272 ↗	(On Diff #55817)	This step doesn't have any dependency on the data, right? We could do it earlier, as soon as we have saved a copy in `s_x` for use in generating the keystream.
275 ↗	(On Diff #55817)	style: Empty return Here is where you would explicit_bzero() sensitive data, like maybe `last_block`, `s0`, `temp_block`, and `X`.
302 ↗	(On Diff #55817)	maybe this is just phabricator, but it looks like there are extra spaces between `__m128i` and `s0`
315 ↗	(On Diff #55817)	style(9) nit: pointers aren't booleans; compare with null in expressions. Ditto below.
334 ↗	(On Diff #55817)	It's not a hash. It's a MAC, or tag.
342–344 ↗	(On Diff #55817)	You already zero padded temp_block on lines 322-325.
346 ↗	(On Diff #55817)	Not that they're different types, but since this acts on `pad_block`, `sizeof(pad_block)` is slightly more appropriate
362 ↗	(On Diff #55817)	empty return probably also the last block of decrypted plaintext is sensitive (`temp_block`)? And perhaps the tag, `hash_block`.
382–383 ↗	(On Diff #55817)	No verifying of the tag over the AAD?
418 ↗	(On Diff #55817)	s_x is never used?
426 ↗	(On Diff #55817)	`timingsafe_bcmp`
427 ↗	(On Diff #55817)	style(9) nit: `return (0);`

In D19298#425490, @cem wrote:

In this implementation, I'm not sure the 2-pass decrypt buys us anything. We already pre-allocate space (if input is non-contiguous) — checking the tag first doesn't save us from a DoS in terms of memory use. If the (unexpected case) tag verification fails, we'd need to do extra work to clobber anything we output, maybe? I might be forgetting something about CCM.

The reason I still did it that was was because AESNI_CCM_decrypt doesn't allocate the memory -- the caller does, and if something else were to call it (or the aesni wrapping code ever changes how it gets the buffers), it'd expose data it shouldn't.

(I'll go through the rest of the comments this weekend.)

In D19298#425522, @sef wrote:

In D19298#425490, @cem wrote:

In this implementation, I'm not sure the 2-pass decrypt buys us anything. We already pre-allocate space (if input is non-contiguous) — checking the tag first doesn't save us from a DoS in terms of memory use. If the (unexpected case) tag verification fails, we'd need to do extra work to clobber anything we output, maybe? I might be forgetting something about CCM.

The reason I still did it that was was because AESNI_CCM_decrypt doesn't allocate the memory -- the caller does, and if something else were to call it (or the aesni wrapping code ever changes how it gets the buffers), it'd expose data it shouldn't.

That's totally reasonable.

sef added inline comments.Apr 8 2019, 6:51 PM

sys/crypto/aesni/aesni_ccm.c
205–206 ↗	(On Diff #55817)	Huh. I hadn't thought about that, but looks like it can. Thanks! (I'm working on the other feedback as well.)

Just a couple of thoughts from what I've seen flying by:

I think it's correct to decrypt twice (but I think everyone agrees on that now)
I have patches to make ccr(4) do CCM.
It certainly seems like you should be able to do a CCM request that only has AAD and no payload. GCM supports this and the NIST-KAT tests for GCM include examples of this.
As part of my ccr(4) work I'm working on adding the CCM NIST-KAT vectors to the security/nist-kat port and hooking those up to the python tester (I thought we had asked for that earlier as part of the CCM changes, but I don't see them in the tree and haven't seen them in a review, so I'm just going to work on it as part of testing ccr(4)).

New diff coming right after this.

sys/crypto/aesni/aesni_ccm.c
114 ↗	(On Diff #55817)	I find memcpy backwards, bcopy sane. But it's also kernel code, and bcopy/bzero is more prevalent than memcpy/memset.
129–130 ↗	(On Diff #55817)	ha. I did not know about those. Much nicer.
186 ↗	(On Diff #55817)	I deal with bits (networking stuff) so often that my habit is to specify Xbits or Xbytes as appropriate.
188–190 ↗	(On Diff #55817)	Old debugging code, I've gotten rid of all of them.
193–196 ↗	(On Diff #55817)	I hate any function with more than 6 arguments. That's from compiler work decades ago ;).
258 ↗	(On Diff #55817)	I am, always, horrible at naming things. I changed it to "rolling_mac".
272 ↗	(On Diff #55817)	I put it there because of how the spec was written.
302 ↗	(On Diff #55817)	There was in at least one place, fixed.
342–344 ↗	(On Diff #55817)	Done, I believe.
362 ↗	(On Diff #55817)	Is the tag sensitive? I've got it in the new changes, but since it's part of the message, I'm not sure it's sensitive?
382–383 ↗	(On Diff #55817)	I changed both of them to check for message and aad len being zero for early exit.
418 ↗	(On Diff #55817)	Good catch, holdover from the older code.

Review feedback incorporated.

Looking better, thanks

sys/crypto/aesni/aesni_ccm.c
135 ↗	(On Diff #56030)	I think we lost `0xfffe`?
114 ↗	(On Diff #55817)	bcopy/bzero is more prevalent than memcpy/memset. How did you arrive at that conclusion? Recursive grep suggests the opposite is true.
186 ↗	(On Diff #55817)	Being explicit about bytes is harmless, but we're missing a space and a full SI-prefix.
258 ↗	(On Diff #55817)	Thanks
362 ↗	(On Diff #55817)	This is the correct tag, as opposed to the one provided by the attacker in the message. It's sensitive, no?

I lost the length descriptor prefix in the last change. I've put it back, and run cryptocheck with -A lengths of 0, 13, 16, 32, 192102, and 127091. (User-space can't test more than 256k unfortunately.)

sef marked an inline comment as done.Apr 10 2019, 6:33 PM

sef added inline comments.

sys/crypto/aesni/aesni_ccm.c
135 ↗	(On Diff #56030)	Yeah, put back and explicitly tested the larger AAD lengths.
114 ↗	(On Diff #55817)	Older check. "_was_ more prevalent" it turns out.

Poke?

Only some small suggestions / nits from me. I think it looks good overall. Does this pass the limited subset of CCM vectors the latest python script test runs?

sys/crypto/aesni/aesni.c
134 ↗	(On Diff #56062)	FWIW, I've sorted these alphabetically in the manpage for ccr and think it wouldn't hurt to do the same here.
138 ↗	(On Diff #56062)	Unrelated, it seems like SHA224 is missing (that might be a cem@ bug and a separate commit to fix).
848 ↗	(On Diff #54233)	So we do have AES_CBC_MAC_HASH_LEN which is what I used in ccr(4) as the equivalent of GMAC_DIGEST_LEN. I prefer the explicit constants as right now this is hardcoding the assumption that CCM and GCM have the same tags. We could instead add a _Static_assert() that 'sizeof(tag) >= GMAC_DIGEST_LEN' for the GCM case and something similar for CCM.
sys/crypto/aesni/aesni_ccm.c
68 ↗	(On Diff #56062)	Nit: s/put/Put/
99 ↗	(On Diff #56062)	FWIW, this will be the 3rd copy of this code in the tree (plain software generates the B0 block, ccr has a 'generate_ccm_b0' function). Someday we should export a shared helper function, but we don't have to do that as part of this change.
211 ↗	(On Diff #56062)	For actual CCM I think the limits are 7 and 13.

I'm updating and build testing before uploading new diffs.

sys/crypto/aesni/aesni.c
138 ↗	(On Diff #56062)	I didn't touch that -- but it's possible I'm out of date. I'll do an update before I upload these modified diffs.
sys/crypto/aesni/aesni_ccm.c
68 ↗	(On Diff #56062)	Keyboard issues.
211 ↗	(On Diff #56062)	That may be right, the spec just said "15-L octets"; it clearly can't be 0, since it says it must also be unique. But the other constraints on L aren't as clear to me.

The promised updated diffs.

I think most feedback has been incorporated, but I think a few things weren't addressed yet:

sys/crypto/aesni/aesni.c
138 ↗	(On Diff #56062)	Right, I didn't enumerate SHA224 independent of SHA256 (nor the HMAC varieties) in the device string. It's really just SHA256 with a different initial vector and truncated result. Maybe it should be included (unrelated to CCM).
sys/crypto/aesni/aesni_ccm.c
309 ↗	(On Diff #57018)	I still find `hashp` and `hash_block` problematic here; these have specific cryptographic meaning that AES-CBC-MAC does not satisfy. `macp`/`mac_block` (or tagp/tag_block) would be totally fine.
211 ↗	(On Diff #56062)	NIST 800-38c §A.1 states `n ∈ [7, 13]`.
186 ↗	(On Diff #55817)	still present

I'll get to those tomorrow, thanks! (Various stuff going on today has thrown my schedule off.)

Doing testing with the changes, then I'll upload the diffs.

A bit late, but the updated diff promised. Thanks cem & jhb!

Thanks! I believe all my concerns are addressed. (And thanks again for your continued patience.)

This revision is now accepted and ready to land.May 24 2019, 8:27 PM

jhb accepted this revision.May 24 2019, 9:19 PM

Closed by commit rS348268: Add an AESNI-optimized version of the CCM/CBC cryptographic and authentication (authored by sef). · Explain WhyMay 25 2019, 7:26 AM

This revision was automatically updated to reflect the committed changes.

This broke the build with gcc:

11:40:34 --- all_subdir_aesni ---
11:40:34 /workspace/src/sys/crypto/aesni/aesni_ccm.c: In function 'xor_and_encrypt':
11:40:34 /workspace/src/sys/crypto/aesni/aesni_ccm.c:61:18: error: incompatible types when initializing type '__m128 {aka __vector(4) float}' using type '__m128i {aka __vector(2) long long int}'
11:40:34   __m128 retval = _mm_xor_si128(a, b);
11:40:34                   ^~~~~~~~~~~~~
11:40:34 /workspace/src/sys/crypto/aesni/aesni_ccm.c:63:21: error: incompatible type for argument 3 of 'aesni_enc'
11:40:34   retval = AESNI_ENC(retval, k, nr);
11:40:34                      ^
11:40:34 /workspace/src/sys/crypto/aesni/aesni_ccm.c:47:64: note: in definition of macro 'AESNI_ENC'
11:40:34  #define AESNI_ENC(d, k, nr) aesni_enc(nr-1, (const __m128i*)k, d)
11:40:34                                                                 ^
11:40:34 In file included from /workspace/src/sys/crypto/aesni/aesni_ccm.c:46:0:
11:40:34 /workspace/src/sys/crypto/aesni/aesencdec.h:115:1: note: expected '__m128i {aka const __vector(2) long long int}' but argument is of type '__m128 {aka __vector(4) float}'
11:40:34  aesni_enc(int rounds, const __m128i *keysched, const __m128i from)
11:40:34  ^~~~~~~~~
11:40:34 /workspace/src/sys/crypto/aesni/aesni_ccm.c:64:9: error: incompatible types when returning type '__m128 {aka __vector(4) float}' but '__m128i {aka __vector(2) long long int}' was expected
11:40:34   return (retval);
11:40:34          ^
11:40:34 /workspace/src/sys/crypto/aesni/aesni_ccm.c:65:1: error: control reaches end of non-void function [-Werror=return-type]
11:40:34  }
11:40:34  ^

Why was this committed without signoff from someone on secteam (CC: secteam)?

CC: @lwhsu

In D19298#440806, @ngie wrote:

This broke the build with gcc:

I just noticed the same thing :-). Should be fixed in r348293.

Why was this committed without signoff from someone on secteam?

Very rarely do we have secteam sign off on new cipher implementations. This cipher is intended to allow importing encrypted zpools from other/older OpenZFS implementations, but should be discouraged from use for literally any other purpose. This is part of a series; arguably the same question could be raised for rS346650, rS346649, rS344142, rS344141 (D19090), and rS344140.

In my experience, Secteam is way overloaded (it's just delphij and gordon). I would love it if someone on secteam would take a critical eye to this series of changes, but I don't know if it is the best use of their limited time.

In D19298#440809, @cem wrote:

In D19298#440806, @ngie wrote:

This broke the build with gcc:

I just noticed the same thing :-). Should be fixed in r348293.

Thanks for the fix! I was going to file a ticket for asking help. :-)

Why was this committed without signoff from someone on secteam?

Very rarely do we have secteam sign off on new cipher implementations. This cipher is intended to allow importing encrypted zpools from other/older OpenZFS implementations, but should be discouraged from use for literally any other purpose. This is part of a series; arguably the same question could be raised for rS346650, rS346649, rS344142, rS344141 (D19090), and rS344140.

In my experience, Secteam is way overloaded (it's just delphij and gordon). I would love it if someone on secteam would take a critical eye to this series of changes, but I don't know if it is the best use of their limited time.

I am not familiar with the process here, but I guess it would be nice to have a post-commit notification for them, in case we missed anything.

AESNI version of CCM+CBC-MAC
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 57884

head/sys/conf/files.amd64

head/sys/conf/files.i386

head/sys/crypto/aesni/aesni.h

head/sys/crypto/aesni/aesni.c

head/sys/crypto/aesni/aesni_ccm.c

head/sys/crypto/aesni/aesni_wrap.c

head/sys/modules/aesni/Makefile

AESNI version of CCM+CBC-MACClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 57884

head/sys/conf/files.amd64

head/sys/conf/files.i386

head/sys/crypto/aesni/aesni.h

head/sys/crypto/aesni/aesni.c

head/sys/crypto/aesni/aesni_ccm.c

head/sys/crypto/aesni/aesni_wrap.c

head/sys/modules/aesni/Makefile

AESNI version of CCM+CBC-MAC
ClosedPublic
Actions

Revision Contents
Changeset List