- Registers TRNG source for random(4)
- Finds available queues, LSBs; allocates static objects
- Allocates a shared MSI-X for all queues. The hardware does not have separate interrupts per queue. Working interrupt mode driver.
- Computes SHA hashes, HMAC. Passes cryptotest.py, cryptocheck tests.
- Does AES-CBC, CTR mode, and XTS. cryptotest.py and cryptocheck pass.
- Support for "authenc" (AES + HMAC). (SHA1 seems to result in "unaligned" cleartext inputs from cryptocheck -- which the engine cannot handle. SHA2 seems to work fine.)
- GCM passes for block-multiple AAD, input lengths
Largely based on ccr(4), part of cxgbe(4).
Rough performance averages on AMD Ryzen 1950X (4kB buffer):
aesni:      SHA1: ~8300 Mb/s    SHA256: ~8000 Mb/s
ccp:               ~630 Mb/s    SHA256:  ~660 Mb/s  SHA512:  ~700 Mb/s
cryptosoft:       ~1800 Mb/s    SHA256: ~1800 Mb/s  SHA512: ~2700 Mb/s
As you can see, performance is poor in comparison to aesni(4) and even
cryptosoft (due to high setup cost).  At a larger buffer size (128kB),
throughput is a little better (but still worse than aesni(4)):
aesni:      SHA1:~10400 Mb/s    SHA256: ~9950 Mb/s
ccp:              ~2200 Mb/s    SHA256: ~2600 Mb/s  SHA512: ~3800 Mb/s
cryptosoft:       ~1750 Mb/s    SHA256: ~1800 Mb/s  SHA512: ~2700 Mb/s
AES performance has a similar story:
aesni:      4kB: ~11250 Mb/s    128kB: ~11250 Mb/s
ccp:               ~350 Mb/s    128kB:  ~4600 Mb/s
cryptosoft:       ~1750 Mb/s    128kB:  ~1700 Mb/s
This driver is EXPERIMENTAL.  You should verify cryptographic results
on typical and corner case inputs from your application against a known-
good implementation.