HomeFreeBSD

Add ccp(4): experimental driver for AMD Crypto Co-Processor

Description

Add ccp(4): experimental driver for AMD Crypto Co-Processor

  • Registers TRNG source for random(4)
  • Finds available queues, LSBs; allocates static objects
  • Allocates a shared MSI-X for all queues. The hardware does not have separate interrupts per queue. Working interrupt mode driver.
  • Computes SHA hashes, HMAC. Passes cryptotest.py, cryptocheck tests.
  • Does AES-CBC, CTR mode, and XTS. cryptotest.py and cryptocheck pass.
  • Support for "authenc" (AES + HMAC). (SHA1 seems to result in "unaligned" cleartext inputs from cryptocheck -- which the engine cannot handle. SHA2 seems to work fine.)
  • GCM passes for block-multiple AAD, input lengths

Largely based on ccr(4), part of cxgbe(4).

Rough performance averages on AMD Ryzen 1950X (4kB buffer):
aesni: SHA1: ~8300 Mb/s SHA256: ~8000 Mb/s
ccp: ~630 Mb/s SHA256: ~660 Mb/s SHA512: ~700 Mb/s
cryptosoft: ~1800 Mb/s SHA256: ~1800 Mb/s SHA512: ~2700 Mb/s

As you can see, performance is poor in comparison to aesni(4) and even
cryptosoft (due to high setup cost). At a larger buffer size (128kB),
throughput is a little better (but still worse than aesni(4)):

aesni: SHA1:~10400 Mb/s SHA256: ~9950 Mb/s
ccp: ~2200 Mb/s SHA256: ~2600 Mb/s SHA512: ~3800 Mb/s
cryptosoft: ~1750 Mb/s SHA256: ~1800 Mb/s SHA512: ~2700 Mb/s

AES performance has a similar story:

aesni: 4kB: ~11250 Mb/s 128kB: ~11250 Mb/s
ccp: ~350 Mb/s 128kB: ~4600 Mb/s
cryptosoft: ~1750 Mb/s 128kB: ~1700 Mb/s

This driver is EXPERIMENTAL. You should verify cryptographic results on
typical and corner case inputs from your application against a known- good
implementation.

Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12723

Details