Page MenuHomeFreeBSD

ktls: Support asynchronous dispatch of AEAD ciphers.
ClosedPublic

Authored by jhb on Aug 24 2021, 5:53 PM.
Tags
None
Referenced Files
Unknown Object (File)
Apr 4 2024, 6:59 PM
Unknown Object (File)
Nov 23 2023, 10:40 AM
Unknown Object (File)
Sep 18 2023, 10:51 AM
Unknown Object (File)
Sep 7 2023, 5:49 PM
Unknown Object (File)
Jul 1 2023, 7:37 AM
Unknown Object (File)
Jul 1 2023, 7:37 AM
Unknown Object (File)
Jul 1 2023, 7:37 AM
Unknown Object (File)
Jul 1 2023, 7:36 AM
Subscribers

Details

Summary

KTLS OCF support was originally targeted at software backends that
used host CPU cycles to encrypt TLS records. As a result, each KTLS
worker thread queued a single TLS record at a time and waited for it
to be encrypted before processing another TLS record. This works well
for software backends but limits throughput on OCF drivers for
coprocessors that support asynchronous operation such as qat(4) or
ccr(4). This change uses an alternate function (ktls_encrypt_async)
when encrypt TLS records via a coprocessor. This function queues TLS
records for encryption and returns. It defers the work done after a
TLS record has been encrypted (such as marking the mbufs ready) to a
callback invoked asynchronously by the coprocessor driver when a
record has been encrypted.

  • Add a struct ktls_ocf_state that holds the per-request state stored on the stack for synchronous requests. Asynchronous requests malloc this structure while synchronous requests continue to allocate this structure on the stack.
  • Add a ktls_encrypt_async() variant of ktls_encrypt() which does not perform request completion after dispatching a request to OCF. Instead, the ktls_ocf backends invoke ktls_encrypt_cb() when a TLS record request completes for an asynchronous request.
  • Flag software TLS sessions as async if the backend driver selected by OCF is an async driver.
  • Pull code to create and dispatch an OCF request out of ktls_encrypt() into a new ktls_encrypt_one() function used by both ktls_encrypt() and ktls_encrypt_async().
  • Pull code to "finish" the VM page shuffling for a file-backed TLS record into a helper function ktls_finish_noanon() used by both ktls_encrypt() and ktls_encrypt_cb().

Sponsored by: Netflix
Tested on: ccr(4) (jhb), qat(4) (markj)

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jhb requested review of this revision.Aug 24 2021, 5:53 PM

I have tried additional changes to support direct dispatch of async requests as well as async support for AES-CBC but those did not improve performance. For ccr(4), I found that the CPU usage remained the same but the throughput increased (e.g. from 45 Gbps to 50 Gbps for AES-GCM128 for TLS 1.2). When Mark tested this over local loopback, he saw a 3x increase in throughput (though I believe that was still less throughput than using aesni(4)).

sys/kern/uipc_ktls.c
2044
2087
2129
2205

I think ktls_encrypt_mbuf or ktls_encrypt_one_mbuf would be a more descriptive name.

2277

Why not so->so_error = error?

2333

The error handling here is duplicated below, so there's a double unlock and double unref.

2344

Again, why not preserve the returned errno?

sys/opencrypto/ktls.h
48

I wonder if it would be profitable to use UMA instead of malloc() for these. This structure is 528 bytes on amd64, so with malloc(9) we'll round up to 1024 bytes(!) per allocation. I guess in general there would be relatively few of these structures allocated at any given time, so maybe it's not worth it.

sys/opencrypto/ktls_ocf.c
767

It would be useful to have some comment explaining why we don't perform async dispatch for CBC.

jhb marked 5 inline comments as done.
  • Address some of Mark's feedback.
sys/kern/uipc_ktls.c
2205

I went with ktls_encrypt_record

2277

This is copied from the old code which used EIO for encrypted errors. The error from a crypto driver could be various things like EFBIG, ENOMEM, etc. but from the user land application's perspective I do think EIO might be a more accurate error vs the crypto error.

2333

So this sorele() is to paired with per-record sore() a few lines earlier (2320) that on success is dropped in ktls_encrypt_cb(). The sorele() at the end of the function release the socket reference owned by top->m_epg_so which corresponds to the soref() in ktls_enqueue().

sys/opencrypto/ktls.h
48

Humm, I had been going on the recent advice of mjg@ to avoid new UMA zones, but the size is a bit odd, yes. I do think you are not likely to have more than N * CPUs (where N is probably < 100, possibly even < 10).

sys/opencrypto/ktls.h
48

I have a patch to use a UMA zone. It didn't make any performance difference as far as I could tell. If you think it's cleaner it's easy for me to merge it into this however.

markj added inline comments.
sys/kern/uipc_ktls.c
2333

Sorry, I see now.

sys/opencrypto/ktls.h
48

It's probably fine to keep it as is.

This revision is now accepted and ready to land.Aug 27 2021, 12:34 PM