Page MenuHomeFreeBSD

Initial support for kernel TLS receive offload.
Needs ReviewPublic

Authored by jhb on Dec 24 2019, 8:24 PM.

Details

Reviewers
gallatin
np
hselasky
kib
Group Reviewers
transport
Summary

This only supports TOE mode, but can be extended to support other
offload backends in the future.

  • Add new TCP_RXTLS_ENABLE and TCP_RXTLS_MODE socket options. These function similarly to their TX counterparts.
  • TLS records are stored in the socket receive buffer as records with a new TLS_GET_RECORD control message containing fields from the TLS header in a struct tls_get_record. The decrypted TLS record payload is appended after the control message mbuf in the socket buffer. soreceive_stream() falls back to soreceive_generic() for sockets with an active KTLS RX session. OpenSSL uses recvmsg() to fetch decrypted TLS records. If a TLS record is received that fails to authenticate, the socket error is set to EBADMSG to fail the next call to recvmsg().
  • A new SO_WANT_KTLS socket option can be set by applications to hint to the kernel that KTLS is requested. If the socket ends up using TLS, the application must attempt to enable KTLS via TCP_RXTLS_ENABLE, however, setting the socket option is not required. Some KTLS backends may not be able to offload RX if the option is not set.

    The Chelsio driver uses this hint to enable TLS mode on TOE sockets. The TOE layer has to know when a connection is established if it will use TLS for RX or not.
  • The TOE interface for allocating TLS sessions now accepts an additional 'bool transmit' parameter to differentiate transmit vs receive sessions.
Test Plan
  • tested with TOE TLS using KTLS for both RX and TX with nginx and openssl s_time

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 30114
Build 27921: arc lint + arc unit

Event Timeline

jhb created this revision.Dec 24 2019, 8:24 PM
jhb added a comment.Dec 24 2019, 8:32 PM

I've included the Chelsio changes for context in the review, but would split that out into a separate commit.

NIC TLS RX will require additional changes to deal with marking not-yet-decrypted data in the socket buffer as NOTREADY and teaching soreceive() to not return NOTREADY mbufs. I have a sketch in my head for how to extend this further to permit "software" offload, though the intended use case would not be pure software (that works fine in userland) but things like ccr(4) as it is almost certainly more efficient to let hardware crypto engines do the decrypt in the kernel before handing it off to userland than to have userland try to use /dev/crypto to copy the data back into the kernel and out again with extra syscalls, etc.

SO_WANT_KTLS is the part of the design I'm less sure of. It is convenient for TOE TLS, but a software-based KTLS RX would not need it. I'm not sure if it would be beneficial for NIC TLS since the T6 doesn't support NIC TLS for RX. Hans, would it be helpful for you to know "early" that a connection might want to use KTLS RX, or would you allocate a queue, etc. when the RX key is programmed via TCP_TLSRX_ENABLE?

The OpenSSL changes for this are available at https://github.com/openssl/openssl/compare/master...bsdjhb:ktls_rx Note that this also includes two changes to support BIO_conn at the start of the branch so that openssl s_time supports KTLS. I've kept SO_WANT_KTLS separate in that branch so its easy to revert if we don't keep the option.

jhb added inline comments.Dec 24 2019, 8:33 PM
sys/kern/uipc_socket.c
3130

I'll add these two missing options to SOPT_GET as a separate commit as well.

np accepted this revision.Dec 26 2019, 6:09 PM
hselasky accepted this revision.Dec 27 2019, 2:25 PM

Check my questions first.

sys/kern/uipc_ktls.c
944

Is this ifdef needed?

sys/netinet/tcp_usrreq.c
2083

How is TCP_RXTLS_ENABLE specified?

If there is already encrypted data in the socket buffer what will happen with that?

Should part of enabling RXTLS be to empty the RX socket buffer?

jhb added inline comments.Dec 30 2019, 6:15 PM
sys/kern/uipc_ktls.c
944

Yep. See how it works below for the TX side. Eventually there will be more code in the #else case (e.g. if I add support for RX offload via software which initially I wasn't going to do, but am now leaning towards doing as it will be the best way to make use of crypto co-processors like ccr(4)).

sys/netinet/tcp_usrreq.c
2083

For T6, the NIC "recognizes" the Change Cipher Suite TLS handshake message and queues up data until it gets an RX key (or until the driver says "not going to offload TLS RX"). For other TOE implementations, the driver-specific callback to create a TOE TLS RX session would have to handle data in the socket buffer. This only works when the TOE socket in the NIC is using a special "TLS" mode that looks for CCS messages. T6 can't do TLS RX offload of arbitrary TOE sockets, only TOE sockets placed in "TLS" mode at the time the TOE socket is created either by connect() or the end of the 3-way handshake.

For software RX for TLS < 1.3, then the software RX path will indeed have to look at pending data in the socket buffer and check for CCS directly. Once it finds that, it will have to mark any later mbufs as M_NOTREADY and start the process of decrypting frames. NIC TLS RX will also need to do something similar, but that will depend on how the NIC TLS implementation works. If NIC TLS requires knowing that a socket is using TLS at the start of its lifetime, then the NIC might be able to check for CCS similar to the T6 TOE implementation, otherwise it might need to do something similar to what I've sketched out above for software. This is part of why I asked about SO_WANT_KTLS and if that is relevant to how TLS RX works for Mellanox. If NIC TLS RX for Mellanox works by checking for CCS and blocking the queue, then SO_WANT_KTLS will be useful to know which sockets need a dedicated queue early on.

jhb added a comment.Jan 7 2020, 4:48 PM

It would still be useful to know if SO_WANT_KTLS would be useful for Mellanox NIC TLS, but I think I want to at least do an initial software TLS before defining the interface. One thing I noticed while reading the OpenSSL code again is that for the RX case, the Linux KTLS support checks for TLS records already read in the userland openssl buffer and only enables KTLS if only full records are stored, and passes along the number of stored records via an initial sequence number. Software KTLS will need the same thing, so I'm going to need to add a uint64_t sequence number field to the tls_enable structure for that. I can add a temporary combat shim for head for the TXENABLE sockopt to deal with old vs new fields, but I'd like the RX case to work correctly from the get-go. For TOE KTLS, that sequence number should always be zero.

In D22917#505555, @jhb wrote:

It would still be useful to know if SO_WANT_KTLS would be useful for Mellanox NIC TLS, but I think I want to at least do an initial software TLS before defining the interface. One thing I noticed while reading the OpenSSL code again is that for the RX case, the Linux KTLS support checks for TLS records already read in the userland openssl buffer and only enables KTLS if only full records are stored, and passes along the number of stored records via an initial sequence number. Software KTLS will need the same thing, so I'm going to need to add a uint64_t sequence number field to the tls_enable structure for that. I can add a temporary combat shim for head for the TXENABLE sockopt to deal with old vs new fields, but I'd like the RX case to work correctly from the get-go. For TOE KTLS, that sequence number should always be zero.

Eventually the Mellanox NIC TLS will support TLS RX offloading, but I'm a little bit uncertain about the best implementation approach, especially when TLS records are receive out of sequence.

Can't you add some reserved fields to the TLS enable structure perhaps?

jhb updated this revision to Diff 69885.Thu, Mar 26, 3:33 AM
  • Rebase.
  • Checkpoint WIP before I rip it up again some more.
  • Finish first pass at KTLS RX software framework.
  • Use a better name for the iovec param.
  • Don't pass the TLS header in the iovec.
  • Add support for KTLS RX for TLS 1.2 GCM.
  • Shorten to initial_seqno.
  • Use an array of bytes in big-endian for the sequence number.
  • Actually use copyin_tls_enable.
  • Fix inverted CRD_F_ENCRYPT flags for TLS 1.2.
  • Mark KTLS RX mbufs as not ready.
  • Fix some inverted assertions.
  • Handle KTLS RX in sbcut().
  • Various fixes to get to the point of receiving records in userland.
  • Don't return EBADMSG for the EOF case.
  • Check for mismatched TLS versions and invalid record lengths.
  • Fix sense of NOTREADY check for TLS mbuf chain mbufs.
  • Fix some issues with 'n' and splitting.
  • Disable assertion in sbfree_ktls_rx().
  • Handle corrupted TLS headers sooner, and abort the connection.
  • When splitting, set 'n's m_data at the correct offset.
  • Fix incorrect check for max valid length.
  • Skip over mbufs that are the exact size of the header for the iovec.
  • Use EMSGSIZE for truncated TLS records instead of EBADMSG.
  • Document additional errors returned by software KTLS RX.
  • Trim trailing whitespace.
jhb added a comment.Thu, Mar 26, 3:48 AM

So this now supports software KTLS RX as well as TOE. For software KTLS RX, the socket buffer now effectively holds two chains of mbufs. New mbufs from TCP are appending to a new "TLS" chain. When a full TLS record's worth of data is queued, the socket is scheduled to the KTLS worker thread pool. The worker thread walks the TLS chain decrypting individual TLS records. Decrypted TLS records are then appended to the "normal" mbuf chain that is available for reading as records with a control message holding the TLS header. When KTLS RX is enabled, any pending data in the socket buffer is moved to the TLS chain and scheduled for decryption. In addition, the TLS enable structure for the socket option now takes an initial sequence number to handle the case that userland might have done a read-ahead and have pending encrypted TLS records in userland already. OpenSSL already handles this for Linux and doesn't enable KTLS if it has any "partial" records in userland.

This doesn't include support for NIC / ifnet TLS, but it shouldn't be a lot of work to add. For NIC TLS, we could add a new type of send tag for TLSRX (except it would never be stored as a "send tag" in an mbuf). When a NIC decrypts a TLS mbuf, it would pass a M_NOMAP mbuf with the populated TLS header and a length corresponding to the header + trailer up to TCP. In sbappend_ktls we would treat M_NOMAP mbufs with a valid header as an already encrypted mbuf. When they are at the head of the queue we would just allocate the control mbuf for the TLS header and strip the header and trailer from the M_NOMAP mbuf and move them to the regular mbuf chain. If the NIC needs to pass up "plain" data due to a retransmit, it can make the sw_decrypt function pointer point to a routine that decrypts the reassembled TLS frame using the NIC or software, etc. and then resumes decrypting TLS frames in the NIC. Note that this routine would also be invoked for any data already in the socket buffer at the time of the setsockopt call allowing the NIC to start its decryption with the sequence number from the socket option. I think this will work for what Hans has described to me previously for NIC TLS on mlx.

I'm still not sure about whether the SO_WANT_KTLS will be useful for anyone but TOE TLS. We may also want some additional controls (sysctls or otherwise) to control software KTLS RX. In my testing it wasn't really a win compared to AESNI in userland, though it may be useful for lower powered CPUs where you can use a co-processor engine, and NFS over TLS in the kernel will want to use it.

jhb added a comment.Mon, Apr 6, 11:59 PM

Anyone have any further thoughts? I will probably try to split this up a bit into an initial commit to add the API changes (adding 'rec_seq' and socket options, etc.) along with TOE infrastructure (but perhaps without SO_WANT_KTLS), and then a commit for T6 TOE TLS, then maybe the changes for SW RX separately as they are more invasive (and probably deserve being reviewed on their own). Mostly I wanted to get the SW fleshed out to validate the API changes like adding the rec_seq and additional errors.