Page MenuHomeFreeBSD

Add ccp(4): experimental driver for AMD Crypto Co-Processor
ClosedPublic

Authored by cem on Oct 18 2017, 5:51 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Apr 16, 4:00 PM
Unknown Object (File)
Tue, Apr 16, 1:50 PM
Unknown Object (File)
Tue, Apr 16, 12:35 PM
Unknown Object (File)
Tue, Apr 16, 12:18 PM
Unknown Object (File)
Tue, Apr 16, 10:03 AM
Unknown Object (File)
Tue, Apr 16, 9:11 AM
Unknown Object (File)
Tue, Apr 16, 7:58 AM
Unknown Object (File)
Tue, Apr 16, 7:09 AM

Details

Summary
  • Registers TRNG source for random(4)
  • Finds available queues, LSBs; allocates static objects
  • Allocates a shared MSI-X for all queues. The hardware does not have separate interrupts per queue. Working interrupt mode driver.
  • Computes SHA hashes, HMAC. Passes cryptotest.py, cryptocheck tests.
  • Does AES-CBC, CTR mode, and XTS. cryptotest.py and cryptocheck pass.
  • Support for "authenc" (AES + HMAC). (SHA1 seems to result in "unaligned" cleartext inputs from cryptocheck -- which the engine cannot handle. SHA2 seems to work fine.)
  • GCM passes for block-multiple AAD, input lengths

Largely based on ccr(4), part of cxgbe(4).

Rough performance averages on AMD Ryzen 1950X (4kB buffer):
aesni: SHA1: ~8300 Mb/s SHA256: ~8000 Mb/s
ccp: ~630 Mb/s SHA256: ~660 Mb/s SHA512: ~700 Mb/s
cryptosoft: ~1800 Mb/s SHA256: ~1800 Mb/s SHA512: ~2700 Mb/s

As you can see, performance is poor in comparison to aesni(4) and even
cryptosoft (due to high setup cost). At a larger buffer size (128kB),
throughput is a little better (but still worse than aesni(4)):

aesni: SHA1:~10400 Mb/s SHA256: ~9950 Mb/s
ccp: ~2200 Mb/s SHA256: ~2600 Mb/s SHA512: ~3800 Mb/s
cryptosoft: ~1750 Mb/s SHA256: ~1800 Mb/s SHA512: ~2700 Mb/s

AES performance has a similar story:

aesni: 4kB: ~11250 Mb/s 128kB: ~11250 Mb/s
ccp: ~350 Mb/s 128kB: ~4600 Mb/s
cryptosoft: ~1750 Mb/s 128kB: ~1700 Mb/s

This driver is EXPERIMENTAL. You should verify cryptographic results
on typical and corner case inputs from your application against a known-
good implementation.

Test Plan

Passing cryptocheck and cryptotest.py in supported modes. Unaligned inputs are rejected.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 12428
Build 12703: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sys/crypto/ccp/ccp.c
196

s/SGLs/SGEs/

239

Since ccp_initialize_queue() is only used once, it be a bit neater to have a ccp_initialize_queues() instead and move this loop there.

814

"error" is write-only.

833

This function and the one below appear to be unused.

sys/crypto/ccp/ccp.h
25

Missing $FreeBSD$ tags in the headers.

sys/crypto/ccp/ccp_hardware.c
93

This line is misindented.

sys/crypto/ccp/ccp_hardware.h
27

Missing include guard.

sys/crypto/ccp/ccp_lsb.c
44

Some of these headers aren't needed: dev/pci/*, mutex.h, lock.h, rman.h, module.h, ...

sys/modules/ccp/Makefile
14

Need to include opt_ddb.h to SRCS to be able to build it as a standalone KLD.

cem marked 8 inline comments as done.

Fix review issues from markj

sys/crypto/ccp/ccp.c
239

Ok

833

Yeah, I was thinking it would be nice to have a kernel API like ioat that doesn't go through opencrypto, but don't have that very concrete yet.

  • Discard now useless test code
  • Use acquire/release routines from OCF process()
  • Clear out any descriptors written for a transaction that experiences an error before submission

Fix backwards "active" queue entries calculation (only used in debug asserts for now).

Use MSI-X interrupts. It turns out we were neglecting to set up the 2nd
vector, which is where the relevant interrupts were delivered.

Improvements I'd like to make before commit:

  • Ensure there is sufficient room in the submission rings, or return EAGAIN/block depending on wait flags. Currently there are only KASSERTs.
  • Make debug spam printfs sysctlable and possibly CTR/KTR records.
  • Track whole transactions and clear out the whole thing if any segment errors. This is probably identical to advancing the head past the next segment with associated callback.

Reviewers, is there anything else you'd especially like to see before accepting this for commit?

cem retitled this revision from Add ccp(4): driver for AMD Crypto Co-Processor to Add ccp(4): experimental driver for AMD Crypto Co-Processor.Nov 17 2017, 7:34 PM
cem edited the summary of this revision. (Show Details)
cem edited the test plan for this revision. (Show Details)
sys/crypto/ccp/ccp.c
235

There's a lot of printf("XXX...") in the driver, which ought to be cleaned up at some point.

237

Not something to be addressed here, but crypto_get_driverid() should probably just use unr(9). :/

460

Seems like an odd initialization default, especially since we don't set cipher_mode in the AES-XTS case. We don't register support for AES-XTS, so this isn't a major problem at the moment, but it looks weird.

sys/crypto/ccp/ccp_hardware.c
460

You could just set ec before breaking from the loop?

661

This looks like it should be assigning to error instead. There's also an extra newline.

680

It looks like we don't do proper cleanup in the caller if this loop aborts. e.g., we won't call pci_release_msi().

682

"rid" may be modified by bus_alloc_resource_any(). In practice I guess it's not an issue for SYS_RES_IRQ, but the man page explicitly warns about this.

746

Is it worth checking the value of g_ccp_ring_order first to make sure it's sane?

796

It'd be nice to come up with a better name for "i" given its use in this cleanup block.

1030

mflags is unused.

1044

Should this be addressed? :)

1049

It would be a good idea to verify this too. Perhaps also cache the paddrs, but the lookup is pretty cheap, so it may not be worth the effort.

1068

Note that min() will truncate to 32 bits.

1235

Write-only var, but I'm guessing that addressing the XXX might change that.

1403

keydata and keydata_len are left uninitialized if we go through the default case.

1409

It'd be nice to clean this up.

1445

This doesn't seem to be used either?

1484

Does C actually guarantee that this cast will give you the low 32 bits? It would be a bit cleaner to explicitly mask the bits you want.

sys/x86/include/bus.h
121

This should be committed separately.

sys/crypto/ccp/ccp.c
235

Definitely. At one point I was keeping this stuff separate in a git stash but I guess it got merged in sometime. Some of it even leaks stuff like key information.

460

I suppose. Mostly it just needs to be initialized to something other than CCP_AES_MODE_GCTR.

Ideally AES-XTS will be added at some point, and that will make it more clear. We could #if 0 the the XTS case of the outer switch, making it return EINVAL? (Though as you point out, it should never be invoked as we don't register support.)

sys/crypto/ccp/ccp_hardware.c
680

Yes, that's true. Handling cleanup fully correctly if some step in initialization fails is hard.

682

Hm. We could use a temporary copy for the call to bus_alloc_resource_any.

746

Might as well.

1030

At this time, yes, but the intent is to eventually pass M_WAITOK / M_NOWAIT.

1044

Probably :(

1049

Sure. Note that the H_vectors here are local to this driver (since they're in reverse order). Can you think of an annotation short of __align(PAGE_SIZE) to make sure they don't cross a page boundary at compile time? I suppose some clever CTASSERTs might work.

1068

The other consideration is that crd_len is int, so we're dealing with at most 31 bit lengths.

1235

It's currently vestigial from ccr(4) based on the following commented out line, but yeah, the XXX probably needs addressing.

1403

The default case here is impossible to reach. This routine is invoked from ccp_process with a session mode of BLKCIPHER. That requires ccp_newsession with one of CBC, ICM, or XTS. And newsession shouldn't be called with XTS at this time as it isn't registered.

1409

As in, actually implement XTS? :-) Yes.

1445

Vestigial from the ccr driver.

1484

I believe so. Not sure how else it could be interpreted. Note that vm_paddr_t is an integer rather than pointer type.

https://stackoverflow.com/questions/6752567/casting-a-large-number-type-to-a-smaller-type claims that the standard says:

If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2^n where n is the number of bits used to represent the unsigned type). [Note: In a two's complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation).]

sys/crypto/ccp/ccp_hardware.c
1049

I can't think of another annotation that might help, but I think this CTASSERT might do it:

CTASSERT(PAGE_SIZE - ((uintptr_t)&vec[0] % PAGE_SIZE) >= sizeof(vec))

I think all of the vectors would together fit in less of a page, so you could maybe put them in a single array and align that.

1403

The compiler doesn't know that. :)

1484

I see, thanks.

sys/crypto/ccp/ccp.c
460

The other reason to use ECB is that it is so clearly a bogus value. The hardware will do it, but OCF and this driver don't support it (for good reason).

sys/crypto/ccp/ccp_hardware.c
1049

Thanks, I'll do something like that.

1403

It doesn't produce an error. I don't recall it producing a warning, either, thought it may be. Do you think the compiler is assuming UB and generating bad code? The CBC and ICM tests pass, so I think that is unlikely.

sys/crypto/ccp/ccp_hardware.c
1403

At least gcc 6 emits an error for this.

sys/crypto/ccp/ccp_hardware.c
1403

Ah, sure. I have been building with Clang. Is keydata / keydata_len all I need to initialize to silence the GCC warning?

sys/crypto/ccp/ccp_hardware.c
1403

Yep, looks like it. gcc is also complaining about sgl_nsegs being write-only. With those two things fixed, ccp builds.

cem marked 29 inline comments as done.
cem edited the summary of this revision. (Show Details)
  • Return EAGAIN if we're out of ring space (squash to ENOMEM at top level for OCF consumers). (I decided to punt on a sleepable API.) Ring space is evaluated lazily and partial submissions are rolled back if we discover ourselves to be out of space.
  • Fix up cleanup on descriptor error; clear out and error the entire operation/transaction (through next callback).
  • Be sure to zero sensitive key data in session descriptor.
  • Fix up most of the review feedback from Mark

cryptotest.py: Add similar skipping for unsupported AES-GCM inputs

cem marked an inline comment as done.Jan 13 2018, 5:55 PM

Committed MAXADDR 48bit separately.

cem edited the summary of this revision. (Show Details)

Hide debugging (and especially sensitive content printing) behind special
macros. DPRINTF is debug spam and can be enabled at runtime. INSECURE_DEBUG()
prints sensitive material and is compiled out by default.

Accidentally posted two patches; remove accidentally added patch.

cem marked 4 inline comments as done.Jan 15 2018, 11:29 PM

I think the driver is more or less ready to go in now.

Put trivial random changes in independent review for secteam: https://reviews.freebsd.org/D13925

CCing (maybe :-)) interested parties for the random source portion of this driver.

Rebase past smaller changes pulled out and committed already.

If there are no objections in the next day or two, I'll go ahead and commit
this to the tree.

This revision was not accepted when it landed; it landed in state Needs Review.Jan 18 2018, 10:01 PM
This revision was automatically updated to reflect the committed changes.