Page MenuHomeFreeBSD

armv8crypto: Use cursors to access crypto buffer data
ClosedPublic

Authored by markj on Feb 26 2021, 6:31 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Jan 19, 9:53 PM
Unknown Object (File)
Fri, Jan 17, 5:28 PM
Unknown Object (File)
Fri, Jan 17, 11:55 AM
Unknown Object (File)
Dec 26 2024, 5:04 PM
Unknown Object (File)
Dec 26 2024, 4:53 PM
Unknown Object (File)
Dec 26 2024, 4:48 PM
Unknown Object (File)
Dec 26 2024, 4:10 PM
Unknown Object (File)
Dec 26 2024, 5:04 AM

Details

Summary

Currently armv8crypto copies the scheme used in aesni(9), where payload
data and output buffers are allocated on the fly if the crypto buffer is
not virtually contiguous. This scheme is simple but incurs a lot of
overhead: for an encryption request with a separate output buffer we
have to

  • allocate a temporary buffer to hold the payload
  • copy input data into the buffer
  • copy the encrypted payload to the output buffer
  • zero the temporary buffer before freeing it

We have a handy crypto buffer cursor abstraction now, so reimplement the
armv8crypto routines using that instead of temporary buffers. This
introduces some extra complexity, but not a lot. The driver still
allocates an AAD buffer for AES-GCM if necessary.

Some profiling of a sendfile+KTLS workload on an Altra indicates that we
spend almost as much CPU time copying and zeroing as we do encrypting.
I am doing some profiling of ipsec on an espressobin now to see if we
get any improvements or degradations with smaller payloads.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

markj requested review of this revision.Feb 26 2021, 6:31 PM
markj added inline comments.
sys/crypto/armv8/armv8_crypto.c
483–484

I am not sure why we bother zeroing an AAD buffer.

FWIW, I have a patch to use cursors for aesni, but it actually made things a bit slower for KTLS when I tried it. isal(4) does use cursors though.

https://github.com/freebsd/freebsd-src/compare/master...bsdjhb:aesni_cursor

@gallatin reports a 10% increase in throughput with this change, without any increase in CPU usage. Not sure how it compares to ossl yet.

Note that ossl(4) doesn't have AES-GCM bindings yet. Not sure how much AES-CBC gallatin@ is testing with? That said, I think in general using cursors to avoid copies when possible, and using FPU_KERN_NOCTX to avoid more expensive save/restores are directions we want to be moving in. ossl(4) and isal(4) use both of those.

The implementation looks fine to me.

sys/crypto/armv8/armv8_crypto.c
483–484

That is possibly not warranted. Note that in some cases AAD might not be on the wire (the ESN in IPsec comes to mind, or the TLS sequence number for TLS). It might be simpler to zero it then to worry about it.

This revision is now accepted and ready to land.Feb 14 2022, 6:11 PM