Page MenuHomeFreeBSD

AESNI version of CCM+CBC-MAC
Needs ReviewPublic

Authored by sef on Feb 22 2019, 8:39 PM.

Details

Reviewers
cem
jhb
jmg
Summary

This adds the much faster AESNI version of the CCM and CBC encryption/authentication.

Test Plan

cryptocheck -A 0 -a aes-ccm -d aesni 1024
cryptocheck -A 10 -a aes-ccm -d aesni 1024

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sef added inline comments.Mar 28 2019, 10:00 PM
sys/modules/aesni/Makefile
16–25

Well, I just copied it, but I've removed both now.

sef updated this revision to Diff 55565.Mar 28 2019, 10:01 PM
sef marked an inline comment as done.

Implement some (but not all, due to asking some questions) of the feedback from cem@.

cem added inline comments.Apr 4 2019, 4:43 PM
sys/crypto/aesni/aesni.c
853

Ah, sorry, I did not realize we hadn't defined a CMAC_DIGEST_LEN. Maybe we should. I don't feel strongly about it.

Sure, sizeof(tag) sounds appropriate here.

In general, when something is a local array x[YYY], I prefer using sizeof(x) (or nitems(x)) to the YYY constant (named or not).

862

Is copying out harmful if there was an error? I suppose it could be misused or accidentally leaked in some malicious way.

So: no objection to the approach you've taken here.

sys/modules/aesni/Makefile
16–25

Right, thanks.

sef marked an inline comment as done.Apr 4 2019, 7:24 PM

Responses to cem; a new patch incoming.

sys/crypto/aesni/aesni.c
862

If the tag is incorrect, but the key and data are, then it exposes the unencrypted data when it shouldn't. It's bothersome.

sef updated this revision to Diff 55815.Apr 4 2019, 7:27 PM

Changed the GMAC_DIGEST_LEN uses to be sizeof(tag).

cem added a comment.Apr 4 2019, 7:55 PM

A little more input on aesni_ccm itself for discussion.

sys/crypto/aesni/aesni_ccm.c
54–57

Use of this union is somewhat problematic.

It's use is very much in violation of the spirit of the C standard, around what unions are; and what behavior is defined, implementation-defined, or undefined.

I think I'd suggest just typedef __m128i aes_block_t (or removing use of the aes_block_t type entirely) instead of this union. Most of the code is already defined in terms of the __m128i type anyway.

(Style(9) nitpick on the name, anyway: unambiguously says not to define typedefs ending with "_t".)

302

For example, here it is not clear which member of the union is supposed to be initialized by s0.

320

And here we reference the union pointer directly for initialization.

sef updated this revision to Diff 55817.Apr 4 2019, 8:38 PM

Per feedback from cem, remove aes_block_t. Note that I did use a uint8_t pointer for some of the work, rather than having annoying typecasts in function calls.

cem added a comment.Apr 6 2019, 12:12 AM

Some feedback on the meat of the patch.

In this implementation, I'm not sure the 2-pass decrypt buys us anything. We already pre-allocate space (if input is non-contiguous) — checking the tag first doesn't save us from a DoS in terms of memory use. If the (unexpected case) tag verification fails, we'd need to do extra work to clobber anything we output, maybe? I might be forgetting something about CCM.

sys/crypto/aesni/aesni_ccm.c
71

This file is more or less new code; it's free to follow style(9) from the beginning.

So nitpicks here: Blank line between variable declarations and first statement; and return (retval);.

83

ditto the blank line thing

100

Seems like retval could be named something more descriptive, like B0 or MAC.

And temp_block is maybe auth_block or something (staging_block would match software CBC-MAC).

110

Does __m128i foo = 0; generate different code than __m128i foo = _mm_setzero_si128();? It isn't important, I'm just vaguely curious.

Edit: Ahhh, __m128i is literally a vector, rather than integral, type, according to the compiler. So it cannot simply be initialized with an integer.

112

style nit: auth_len > 0, as auth_len is not a boolean

115

I'm not asking you to change these, but in new code I'd encourage use of the C89 memcpy/memmove/memset macros over the pre-ANSI bcopy etc routines.

(For me as a reviewer, I have to think about destination/source argument order every time I look at one of these b* routines, whereas I don't have to think about memcpy.)

130–131

Could just use

be16enc(&temp_block, auth_len);
135–137

What do you mean by calling convention? The immediate function takes a size_t, which can represent larger values.

I understand that internally OCF deals only in int sizes, but I'd suggest either implementing the 3rd mode or asserting we're within range of the 2nd mode anyway.

141–144

Simplified:

be16enc(&temp_block, 0xfffe);
be32enc((char *)&temp_block + 2, auth_len);
163

style nit: sizeof(temp_block), for consistency with the object we're acting on

186

typo: abytes

187

usually spelled "4 GB" or "4 gigabytes" but not "4gbytes"

189–191

I don't know what these pre-processor error directives (there are a few) are intended to be, but probably remove them before commit. (Figure out how to incorporate them into comments or code, if necessary.)

194–197

Ugh. I know you just inherited this mess from GCM, but what an obnoxious API :P.

206–207

Is it impossible to generate a tag for AAD without any ciphered input?

231–232

You don't need explicit_bzero for ordinary initializations. Here _mm_setzero_si128 would be fine. explicit_bzero is for deleting private keys when you release memory.

Also, current_block is never used?

259

So really last_block is the rolling MAC value. I'd suggest a name incorporating MAC, for clarity.

260–263

Right, ok, we're generating a keystream from an encrypted series of counter blocks.

263

style nit: Usually we at declare variables at the beginning of some scope, if not function scope.

273

This step doesn't have any dependency on the data, right? We could do it earlier, as soon as we have saved a copy in s_x for use in generating the keystream.

276

style: Empty return

Here is where you would explicit_bzero() sensitive data, like maybe last_block, s0, temp_block, and X.

303

maybe this is just phabricator, but it looks like there are extra spaces between __m128i and s0

316

style(9) nit: pointers aren't booleans; compare with null in expressions. Ditto below.

335

It's not a hash. It's a MAC, or tag.

343–345

You already zero padded temp_block on lines 322-325.

347

Not that they're different types, but since this acts on pad_block, sizeof(pad_block) is slightly more appropriate

363

empty return

probably also the last block of decrypted plaintext is sensitive (temp_block)? And perhaps the tag, hash_block.

383–384

No verifying of the tag over the AAD?

419

s_x is never used?

427

timingsafe_bcmp

428

style(9) nit: return (0);

sef added a comment.Apr 6 2019, 12:26 AM
In D19298#425490, @cem wrote:

In this implementation, I'm not sure the 2-pass decrypt buys us anything. We already pre-allocate space (if input is non-contiguous) — checking the tag first doesn't save us from a DoS in terms of memory use. If the (unexpected case) tag verification fails, we'd need to do extra work to clobber anything we output, maybe? I might be forgetting something about CCM.

The reason I still did it that was was because AESNI_CCM_decrypt doesn't allocate the memory -- the caller does, and if something else were to call it (or the aesni wrapping code ever changes how it gets the buffers), it'd expose data it shouldn't.

sef added a comment.Apr 6 2019, 12:27 AM

(I'll go through the rest of the comments this weekend.)

cem added a comment.Apr 6 2019, 1:05 AM
In D19298#425522, @sef wrote:
In D19298#425490, @cem wrote:

In this implementation, I'm not sure the 2-pass decrypt buys us anything. We already pre-allocate space (if input is non-contiguous) — checking the tag first doesn't save us from a DoS in terms of memory use. If the (unexpected case) tag verification fails, we'd need to do extra work to clobber anything we output, maybe? I might be forgetting something about CCM.

The reason I still did it that was was because AESNI_CCM_decrypt doesn't allocate the memory -- the caller does, and if something else were to call it (or the aesni wrapping code ever changes how it gets the buffers), it'd expose data it shouldn't.

That's totally reasonable.

sef added inline comments.Apr 8 2019, 6:51 PM
sys/crypto/aesni/aesni_ccm.c
206–207

Huh. I hadn't thought about that, but looks like it can. Thanks! (I'm working on the other feedback as well.)

jhb added a comment.Apr 8 2019, 9:02 PM

Just a couple of thoughts from what I've seen flying by:

  1. I think it's correct to decrypt twice (but I think everyone agrees on that now)
  2. I have patches to make ccr(4) do CCM.
  3. It certainly seems like you should be able to do a CCM request that only has AAD and no payload. GCM supports this and the NIST-KAT tests for GCM include examples of this.
  4. As part of my ccr(4) work I'm working on adding the CCM NIST-KAT vectors to the security/nist-kat port and hooking those up to the python tester (I thought we had asked for that earlier as part of the CCM changes, but I don't see them in the tree and haven't seen them in a review, so I'm just going to work on it as part of testing ccr(4)).
sef marked 4 inline comments as done.Apr 10 2019, 2:05 AM

New diff coming right after this.

sys/crypto/aesni/aesni_ccm.c
115

I find memcpy backwards, bcopy sane. But it's also kernel code, and bcopy/bzero is more prevalent than memcpy/memset.

130–131

ha. I did not know about those. Much nicer.

187

I deal with bits (networking stuff) so often that my habit is to specify Xbits or Xbytes as appropriate.

189–191

Old debugging code, I've gotten rid of all of them.

194–197

I hate any function with more than 6 arguments. That's from compiler work decades ago ;).

259

I am, *always*, horrible at naming things. I changed it to "rolling_mac".

273

I put it there because of how the spec was written.

303

There was in at least one place, fixed.

343–345

Done, I believe.

363

Is the tag sensitive? I've got it in the new changes, but since it's part of the message, I'm not sure it's sensitive?

383–384

I changed both of them to check for message and aad len being zero for early exit.

419

Good catch, holdover from the older code.

sef updated this revision to Diff 56030.Apr 10 2019, 2:06 AM

Review feedback incorporated.

cem added a comment.Apr 10 2019, 3:14 PM

Looking better, thanks

sys/crypto/aesni/aesni_ccm.c
115

bcopy/bzero is more prevalent than memcpy/memset.

How did you arrive at that conclusion? Recursive grep suggests the opposite is true.

136

I think we lost 0xfffe?

187

Being explicit about bytes is harmless, but we're missing a space and a full SI-prefix.

259

Thanks

363

This is the correct tag, as opposed to the one provided by the attacker in the message. It's sensitive, no?

sef updated this revision to Diff 56062.Apr 10 2019, 6:30 PM

I lost the length descriptor prefix in the last change. I've put it back, and run cryptocheck with -A lengths of 0, 13, 16, 32, 192102, and 127091. (User-space can't test more than 256k unfortunately.)

sef marked an inline comment as done.Apr 10 2019, 6:33 PM
sef added inline comments.
sys/crypto/aesni/aesni_ccm.c
115

Older check. "_was_ more prevalent" it turns out.

136

Yeah, put back and explicitly tested the larger AAD lengths.

sef marked an inline comment as done.Wed, May 1, 10:57 PM

Poke?

jhb added a comment.Thu, May 2, 4:48 PM

Only some small suggestions / nits from me. I think it looks good overall. Does this pass the limited subset of CCM vectors the latest python script test runs?

sys/crypto/aesni/aesni.c
134

FWIW, I've sorted these alphabetically in the manpage for ccr and think it wouldn't hurt to do the same here.

139

Unrelated, it seems like SHA224 is missing (that might be a cem@ bug and a separate commit to fix).

853

So we do have AES_CBC_MAC_HASH_LEN which is what I used in ccr(4) as the equivalent of GMAC_DIGEST_LEN. I prefer the explicit constants as right now this is hardcoding the assumption that CCM and GCM have the same tags. We could instead add a _Static_assert() that 'sizeof(tag) >= GMAC_DIGEST_LEN' for the GCM case and something similar for CCM.

sys/crypto/aesni/aesni_ccm.c
69

Nit: s/put/Put/

100

FWIW, this will be the 3rd copy of this code in the tree (plain software generates the B0 block, ccr has a 'generate_ccm_b0' function). Someday we should export a shared helper function, but we don't have to do that as part of this change.

212

For actual CCM I think the limits are 7 and 13.

sef marked 2 inline comments as done.Fri, May 3, 5:53 PM

I'm updating and build testing before uploading new diffs.

sys/crypto/aesni/aesni.c
139

I didn't touch that -- but it's possible I'm out of date. I'll do an update before I upload these modified diffs.

sys/crypto/aesni/aesni_ccm.c
69

Keyboard issues.

212

That may be right, the spec just said "15-L octets"; it clearly can't be 0, since it says it must also be unique. But the other constraints on L aren't as clear to me.

sef updated this revision to Diff 57018.Fri, May 3, 8:14 PM
sef marked an inline comment as done.

The promised updated diffs.