Paths

Table of Contentst

Initial support for kernel TLS receive offload.
AbandonedPublic
Actions

Authored by jhb on Dec 24 2019, 8:24 PM.

Details

Reviewers

gallatin
np
• hselasky
kib

Group Reviewers

transport

Summary

This only supports TOE mode, but can be extended to support other
offload backends in the future.

Add new TCP_RXTLS_ENABLE and TCP_RXTLS_MODE socket options. These function similarly to their TX counterparts.
TLS records are stored in the socket receive buffer as records with a new TLS_GET_RECORD control message containing fields from the TLS header in a struct tls_get_record. The decrypted TLS record payload is appended after the control message mbuf in the socket buffer. soreceive_stream() falls back to soreceive_generic() for sockets with an active KTLS RX session. OpenSSL uses recvmsg() to fetch decrypted TLS records. If a TLS record is received that fails to authenticate, the socket error is set to EBADMSG to fail the next call to recvmsg().
A new SO_WANT_KTLS socket option can be set by applications to hint to the kernel that KTLS is requested. If the socket ends up using TLS, the application must attempt to enable KTLS via TCP_RXTLS_ENABLE, however, setting the socket option is not required. Some KTLS backends may not be able to offload RX if the option is not set.

The Chelsio driver uses this hint to enable TLS mode on TOE sockets. The TOE layer has to know when a connection is established if it will use TLS for RX or not.
The TOE interface for allocating TLS sessions now accepts an additional 'bool transmit' parameter to differentiate transmit vs receive sessions.

Test Plan

tested with TOE TLS using KTLS for both RX and TX with nginx and openssl s_time

Diff Detail

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 30114
Build 27921: arc lint + arc unit

Event Timeline

jhb created this revision.Dec 24 2019, 8:24 PM

Herald added a reviewer: transport. · View Herald TranscriptDec 24 2019, 8:24 PM

Herald added a subscriber: ae. · View Herald Transcript

Harbormaster completed remote builds in B28307: Diff 65965.Dec 24 2019, 8:24 PM

I've included the Chelsio changes for context in the review, but would split that out into a separate commit.

NIC TLS RX will require additional changes to deal with marking not-yet-decrypted data in the socket buffer as NOTREADY and teaching soreceive() to not return NOTREADY mbufs. I have a sketch in my head for how to extend this further to permit "software" offload, though the intended use case would not be pure software (that works fine in userland) but things like ccr(4) as it is almost certainly more efficient to let hardware crypto engines do the decrypt in the kernel before handing it off to userland than to have userland try to use /dev/crypto to copy the data back into the kernel and out again with extra syscalls, etc.

SO_WANT_KTLS is the part of the design I'm less sure of. It is convenient for TOE TLS, but a software-based KTLS RX would not need it. I'm not sure if it would be beneficial for NIC TLS since the T6 doesn't support NIC TLS for RX. Hans, would it be helpful for you to know "early" that a connection might want to use KTLS RX, or would you allocate a queue, etc. when the RX key is programmed via TCP_TLSRX_ENABLE?

The OpenSSL changes for this are available at https://github.com/openssl/openssl/compare/master...bsdjhb:ktls_rx Note that this also includes two changes to support BIO_conn at the start of the branch so that openssl s_time supports KTLS. I've kept SO_WANT_KTLS separate in that branch so its easy to revert if we don't keep the option.

jhb added inline comments.Dec 24 2019, 8:33 PM

sys/kern/uipc_socket.c
3130	I'll add these two missing options to `SOPT_GET` as a separate commit as well.

np accepted this revision.Dec 26 2019, 6:09 PM

Check my questions first.

sys/kern/uipc_ktls.c
944	Is this ifdef needed?
sys/netinet/tcp_usrreq.c
2083	How is TCP_RXTLS_ENABLE specified? If there is already encrypted data in the socket buffer what will happen with that? Should part of enabling RXTLS be to empty the RX socket buffer?

jhb added inline comments.Dec 30 2019, 6:15 PM

sys/kern/uipc_ktls.c
944	Yep. See how it works below for the TX side. Eventually there will be more code in the #else case (e.g. if I add support for RX offload via software which initially I wasn't going to do, but am now leaning towards doing as it will be the best way to make use of crypto co-processors like ccr(4)).
sys/netinet/tcp_usrreq.c
2083	For T6, the NIC "recognizes" the Change Cipher Suite TLS handshake message and queues up data until it gets an RX key (or until the driver says "not going to offload TLS RX"). For other TOE implementations, the driver-specific callback to create a TOE TLS RX session would have to handle data in the socket buffer. This only works when the TOE socket in the NIC is using a special "TLS" mode that looks for CCS messages. T6 can't do TLS RX offload of arbitrary TOE sockets, only TOE sockets placed in "TLS" mode at the time the TOE socket is created either by connect() or the end of the 3-way handshake. For software RX for TLS < 1.3, then the software RX path will indeed have to look at pending data in the socket buffer and check for CCS directly. Once it finds that, it will have to mark any later mbufs as `M_NOTREADY` and start the process of decrypting frames. NIC TLS RX will also need to do something similar, but that will depend on how the NIC TLS implementation works. If NIC TLS requires knowing that a socket is using TLS at the start of its lifetime, then the NIC might be able to check for CCS similar to the T6 TOE implementation, otherwise it might need to do something similar to what I've sketched out above for software. This is part of why I asked about `SO_WANT_KTLS` and if that is relevant to how TLS RX works for Mellanox. If NIC TLS RX for Mellanox works by checking for CCS and blocking the queue, then `SO_WANT_KTLS` will be useful to know which sockets need a dedicated queue early on.

It would still be useful to know if SO_WANT_KTLS would be useful for Mellanox NIC TLS, but I think I want to at least do an initial software TLS before defining the interface. One thing I noticed while reading the OpenSSL code again is that for the RX case, the Linux KTLS support checks for TLS records already read in the userland openssl buffer and only enables KTLS if only full records are stored, and passes along the number of stored records via an initial sequence number. Software KTLS will need the same thing, so I'm going to need to add a uint64_t sequence number field to the tls_enable structure for that. I can add a temporary combat shim for head for the TXENABLE sockopt to deal with old vs new fields, but I'd like the RX case to work correctly from the get-go. For TOE KTLS, that sequence number should always be zero.

In D22917#505555, @jhb wrote:

It would still be useful to know if SO_WANT_KTLS would be useful for Mellanox NIC TLS, but I think I want to at least do an initial software TLS before defining the interface. One thing I noticed while reading the OpenSSL code again is that for the RX case, the Linux KTLS support checks for TLS records already read in the userland openssl buffer and only enables KTLS if only full records are stored, and passes along the number of stored records via an initial sequence number. Software KTLS will need the same thing, so I'm going to need to add a uint64_t sequence number field to the tls_enable structure for that. I can add a temporary combat shim for head for the TXENABLE sockopt to deal with old vs new fields, but I'd like the RX case to work correctly from the get-go. For TOE KTLS, that sequence number should always be zero.

Eventually the Mellanox NIC TLS will support TLS RX offloading, but I'm a little bit uncertain about the best implementation approach, especially when TLS records are receive out of sequence.

Can't you add some reserved fields to the TLS enable structure perhaps?

Rebase.
Checkpoint WIP before I rip it up again some more.
Finish first pass at KTLS RX software framework.
Use a better name for the iovec param.
Don't pass the TLS header in the iovec.
Add support for KTLS RX for TLS 1.2 GCM.
Shorten to initial_seqno.
Use an array of bytes in big-endian for the sequence number.
Actually use copyin_tls_enable.
Fix inverted CRD_F_ENCRYPT flags for TLS 1.2.
Mark KTLS RX mbufs as not ready.
Fix some inverted assertions.
Handle KTLS RX in sbcut().
Various fixes to get to the point of receiving records in userland.
Don't return EBADMSG for the EOF case.
Check for mismatched TLS versions and invalid record lengths.
Fix sense of NOTREADY check for TLS mbuf chain mbufs.
Fix some issues with 'n' and splitting.
Disable assertion in sbfree_ktls_rx().
Handle corrupted TLS headers sooner, and abort the connection.
When splitting, set 'n's m_data at the correct offset.
Fix incorrect check for max valid length.
Skip over mbufs that are the exact size of the header for the iovec.
Use EMSGSIZE for truncated TLS records instead of EBADMSG.
Document additional errors returned by software KTLS RX.
Trim trailing whitespace.

Herald added a subscriber: melifaro. · View Herald TranscriptMar 26 2020, 3:33 AM

Harbormaster completed remote builds in B30114: Diff 69885.Mar 26 2020, 3:33 AM

So this now supports software KTLS RX as well as TOE. For software KTLS RX, the socket buffer now effectively holds two chains of mbufs. New mbufs from TCP are appending to a new "TLS" chain. When a full TLS record's worth of data is queued, the socket is scheduled to the KTLS worker thread pool. The worker thread walks the TLS chain decrypting individual TLS records. Decrypted TLS records are then appended to the "normal" mbuf chain that is available for reading as records with a control message holding the TLS header. When KTLS RX is enabled, any pending data in the socket buffer is moved to the TLS chain and scheduled for decryption. In addition, the TLS enable structure for the socket option now takes an initial sequence number to handle the case that userland might have done a read-ahead and have pending encrypted TLS records in userland already. OpenSSL already handles this for Linux and doesn't enable KTLS if it has any "partial" records in userland.

This doesn't include support for NIC / ifnet TLS, but it shouldn't be a lot of work to add. For NIC TLS, we could add a new type of send tag for TLSRX (except it would never be stored as a "send tag" in an mbuf). When a NIC decrypts a TLS mbuf, it would pass a M_NOMAP mbuf with the populated TLS header and a length corresponding to the header + trailer up to TCP. In sbappend_ktls we would treat M_NOMAP mbufs with a valid header as an already encrypted mbuf. When they are at the head of the queue we would just allocate the control mbuf for the TLS header and strip the header and trailer from the M_NOMAP mbuf and move them to the regular mbuf chain. If the NIC needs to pass up "plain" data due to a retransmit, it can make the sw_decrypt function pointer point to a routine that decrypts the reassembled TLS frame using the NIC or software, etc. and then resumes decrypting TLS frames in the NIC. Note that this routine would also be invoked for any data already in the socket buffer at the time of the setsockopt call allowing the NIC to start its decryption with the sequence number from the socket option. I think this will work for what Hans has described to me previously for NIC TLS on mlx.

I'm still not sure about whether the SO_WANT_KTLS will be useful for anyone but TOE TLS. We may also want some additional controls (sysctls or otherwise) to control software KTLS RX. In my testing it wasn't really a win compared to AESNI in userland, though it may be useful for lower powered CPUs where you can use a co-processor engine, and NFS over TLS in the kernel will want to use it.

• hselasky added a reviewer: kib.Mar 26 2020, 8:29 AM

Anyone have any further thoughts? I will probably try to split this up a bit into an initial commit to add the API changes (adding 'rec_seq' and socket options, etc.) along with TOE infrastructure (but perhaps without SO_WANT_KTLS), and then a commit for T6 TOE TLS, then maybe the changes for SW RX separately as they are more invasive (and probably deserve being reviewed on their own). Mostly I wanted to get the SW fleshed out to validate the API changes like adding the rec_seq and additional errors.

I've added the first batch of commit candidates adding the user interface and TOE support. Once those are settled I will rebase this branch and put up a review for the software path. My plan is to use this review as a kind of meta review to track the smaller commits going forward.

jhb abandoned this revision.Jul 23 2020, 11:51 PM

Revision Contents
Changeset List

Path

Size

share/

man/

man4/

tcp.4

48 lines

sys/

dev/

cxgbe/

tom/

t4_tls.c

266 lines

t4_tom.h

2 lines

t4_tom.c

8 lines

kern/

uipc_ktls.c

574 lines

uipc_sockbuf.c

252 lines

uipc_socket.c

39 lines

netinet/

3 lines

2 lines

5 lines

47 lines

2 lines

2 lines

opencrypto/

ktls_ocf.c

122 lines

sys/

52 lines

11 lines

3 lines

4 lines

Commit	Tree	Parents	Author	Summary	Date
235d45e6a411	843aae0606ab	72abe8357bd9	John Baldwin	Trim trailing whitespace.	Mar 25 2020, 10:54 PM
72abe8357bd9	c46af4330c0b	d6bd97c007a8	John Baldwin	Document additional errors returned by software KTLS RX.	Mar 25 2020, 10:42 PM
d6bd97c007a8	799b9641e292	f779b43184ff	John Baldwin	Use EMSGSIZE for truncated TLS records instead of EBADMSG.	Mar 25 2020, 10:40 PM
f779b43184ff	b2a0ce16bf74	6bca8303e726	John Baldwin	Skip over mbufs that are the exact size of the header for the iovec.	Mar 10 2020, 6:49 PM
6bca8303e726	2fc93deca008	bce3298d2e80	John Baldwin	Fix incorrect check for max valid length. (Show More…)	Mar 10 2020, 6:30 PM
bce3298d2e80	29af60330fda	f63f64a1e085	John Baldwin	When splitting, set 'n's m_data at the correct offset.	Mar 6 2020, 11:41 PM
f63f64a1e085	c9c634c062c8	293f4749f383	John Baldwin	Handle corrupted TLS headers sooner, and abort the connection.	Mar 6 2020, 11:25 PM
293f4749f383	cca8f2ebc336	778eca42f49e	John Baldwin	Disable assertion in sbfree_ktls_rx(). (Show More…)	Mar 6 2020, 10:47 PM
778eca42f49e	095d9d5fae7e	8f43c0c704dd	John Baldwin	Fix some issues with 'n' and splitting. (Show More…)	Mar 6 2020, 8:22 PM
8f43c0c704dd	e44eb79b22b5	5b2bd2bacb88	John Baldwin	Fix sense of NOTREADY check for TLS mbuf chain mbufs.	Mar 6 2020, 7:44 PM
5b2bd2bacb88	8e8201afead5	8b6d273fc094	John Baldwin	Check for mismatched TLS versions and invalid record lengths. (Show More…)	Mar 6 2020, 7:09 PM
8b6d273fc094	b19757cb8341	ee876c90d73e	John Baldwin	Don't return EBADMSG for the EOF case.	Mar 5 2020, 6:38 PM
ee876c90d73e	92b367a3ea1f	a25283420fdf	John Baldwin	Various fixes to get to the point of receiving records in userland. (Show More…)	Mar 5 2020, 1:49 AM
a25283420fdf	af542d0a772e	b08c58170048	John Baldwin	Handle KTLS RX in sbcut().	Mar 3 2020, 10:31 PM
b08c58170048	5ab3d1a1ba68	a4282dbf5f1e	John Baldwin	Fix some inverted assertions.	Mar 3 2020, 9:13 PM
a4282dbf5f1e	785a4634eb3c	270b194fbcb4	John Baldwin	Mark KTLS RX mbufs as not ready. (Show More…)	Mar 3 2020, 12:18 AM
270b194fbcb4	f4f87b2e3875	81c2da68a5fb	John Baldwin	Fix inverted CRD_F_ENCRYPT flags for TLS 1.2.	Feb 29 2020, 1:50 AM
81c2da68a5fb	0fdf2ece58e3	5a5a0b357217	John Baldwin	Actually use copyin_tls_enable.	Feb 29 2020, 1:13 AM
5a5a0b357217	b12741c6ca87	0f9ed577dffc	John Baldwin	Use an array of bytes in big-endian for the sequence number. (Show More…)	Feb 29 2020, 12:01 AM
0f9ed577dffc	d92fce8af20b	c392768ead93	John Baldwin	Shorten to initial_seqno.	Feb 28 2020, 1:15 AM
c392768ead93	7a6df5260f0c	1a173bb47846	John Baldwin	Add support for KTLS RX for TLS 1.2 GCM.	Feb 28 2020, 12:29 AM
1a173bb47846	13abd3a441d5	6f31ec62f090	John Baldwin	Don't pass the TLS header in the iovec. (Show More…)	Feb 28 2020, 12:29 AM
6f31ec62f090	d637488292fb	f7de4135f1a0	John Baldwin	Use a better name for the iovec param.	Feb 28 2020, 12:27 AM
f7de4135f1a0	f603a59b210a	88e9e5b7e2bd	John Baldwin	Finish first pass at KTLS RX software framework. (Show More…)	Feb 27 2020, 11:47 PM
88e9e5b7e2bd	980b5762bd89	5591e502f031	John Baldwin	Checkpoint WIP before I rip it up again some more. (Show More…)	Feb 27 2020, 12:22 AM
5591e502f031	15869a3456c6	1e8c4053eafb	John Baldwin	Document new socket options and control messages.	Dec 24 2019, 8:21 PM
1e8c4053eafb	1a2cf286eaff	0fa49016088c	John Baldwin	Don't require ext pgs for RX.	Dec 24 2019, 7:55 PM
0fa49016088c	f903dc5ea214	2d680279f091	John Baldwin	Honor SO_WANT_KTLS in TOE.	Nov 21 2019, 7:20 PM
2d680279f091	b506b3d33655	04e24dea3b4c	John Baldwin	Add a SO_WANT_KTLS socket option. (Show More…)	Nov 21 2019, 7:18 PM
04e24dea3b4c	9d87835d7e9c	9c081e05c5b8	John Baldwin	Cleanup some #if 0'd code, and some code I meant to axe earlier.	Nov 13 2019, 11:33 PM
9c081e05c5b8	10162300b71f	b4183934a64a	John Baldwin	Fix issues with key context programming. (Show More…)	Nov 12 2019, 12:18 AM
b4183934a64a	f4b9c573e430	250bc1fbb226	John Baldwin	Set M_EOR for the last mbuf in a TLS record payload. (Show More…)	Nov 8 2019, 12:50 AM
250bc1fbb226	986e2ea09401	709628f628b9	John Baldwin	Actually pass KTLS RX keys onto tls_alloc_ktls.	Nov 7 2019, 10:52 PM
709628f628b9	a5d7a863d6d6	21ab2aa9ff8f	John Baldwin	Drop t4_soreceive_tom now that soreceive_stream DTRT.	Nov 7 2019, 10:42 PM
21ab2aa9ff8f	f0ed25aa6cff	11c27f547910	John Baldwin	When KTLS is enabled, redirect soreceive_stream to soreceieve_generic.	Nov 7 2019, 10:32 PM
11c27f547910	2566982aeae7	cb237d0ca14d	John Baldwin	Fixup to use so_rcv instead of so_snd.	Nov 7 2019, 10:27 PM
cb237d0ca14d	bde7d91c2f13	4266af86c2a8	John Baldwin	Add the receive path for KTLS RX on TOE.	Nov 7 2019, 7:17 PM
4266af86c2a8	37947bbb4453	81ba8ad53118	John Baldwin	Start on TOE support for KTLS RX. (Show More…)	Nov 6 2019, 10:15 PM
81ba8ad53118	0aee12f95469	185d435bd586	John Baldwin	Add socket options, control message type, and TOE interface for KTLS RX.	Nov 1 2019, 12:08 AM