Page MenuHomeFreeBSD

Update the TCP LRO code to handle both encrypted and un-encrypted traffic.
ClosedPublic

Authored by hselasky on Aug 2 2021, 9:57 AM.

Details

Summary

Encrypted and un-encrypted traffic needs to be coalesced separately.
Split the 16-bit lro_type field in the address information into two
8-bit fields, and then use the last 8-bit field for flags, which among
other indicate if the received mbuf is encrypted or un-encrypted.

MFC after: 1 week
Sponsored by: NVIDIA Networking

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Do you have more details about your use cases for this? For TLS received frames should probably use a TLS mbuf for each frame and there is ample room in the TLS mbuf to mark decrypted vs encrypted mbufs (e.g. m_epg_flags). If this is for IPsec, there is already an M_DECRYPTED mbuf flag defined to the TCP/IP layers. tcp_lro() can assume that the M_PROTO* flags are TCP/IP flags since it only operates on connections using netinet as the protocol layer.

@jhb: I wasn't aware about M_DECRYPTED flag . I'll update my patch.

The flag will basically be used to distinguish encrypted and unencrypted traffic.

What is the current state of M_DECRYPTED support? Do we support receiving mixed traffic, that means encrypted and un-encrypted packets interleaved?

And where does the software decryption happen? Is this implemented yet?

hselasky retitled this revision from Implement a decrypted flag for mbufs and TCP LRO. to Update the TCP LRO code to handle both encrypted and un-encrypted traffic..
hselasky edited the summary of this revision. (Show Details)

Update code as per suggestion for @jhb .

Where is the new flag LRO_FLAG_DECRYPTED used? It seems to be write-only in this patch.

Where is the new flag LRO_FLAG_DECRYPTED used? It seems to be write-only in this patch.

Hi,

The fields are stuffed in the lro_address. Then u_long raw[x] is used for reading all the fields quickly.

union lro_address {
	u_long raw[1];

--HPPS

@jhb: I wasn't aware about M_DECRYPTED flag . I'll update my patch.

The flag will basically be used to distinguish encrypted and unencrypted traffic.

Well, the specific cases do matter. M_DECRYPTED is currently used by IPsec. IPsec's input path sets it after it has decrypted a packet. There are then a few places that avoid actions on decrypted packets (like ICMP replies). I think you'd have to adjust IPsec's input path to avoid trying normal OCF decryption if an incoming packet already has M_DECRYPTED set. For TLS, I probably wouldn't suggest doing this as I said earlier (I'm kind of assuming that for TLS the NIC gives you the entire TLS record as one message in effect in which case it can be stored as a TLS mbuf).

What is the current state of M_DECRYPTED support? Do we support receiving mixed traffic, that means encrypted and un-encrypted packets interleaved?

Right now IPsec assumes all its input is encrypted still, but it shouldn't be hard to add a check that skips that and passes the packet up if M_DECRYPTED is already set.

And where does the software decryption happen? Is this implemented yet?

You mean the existing IPsec code? Yes, it exists in sys/netipsec. :) It assumes that it always has to encrypt/decrypt each packet though via OCF. It does not have any plumbing yet to support offloading either of those to the NIC. I looked at it a bit for cxgbe(4) for the TX side. The control plane side is that you want to be able to propagate SA entries down to interfaces (for TX I had imagined allocating send tags for this as a new send tag type). Then on the data plane side you need to mark mbufs. For TX the send tags would have worked for that (though it might have been stored a bit differently perhaps, not sure if we want to permit packet pacing over IPsec, etc.). For RX M_DECRYPTED can serve to mark packets, but I don't know if we need to pass up some kind of SA identifier? I guess if you are just decrypting the packet but not stripping the ESP header IPsec can re-lookup the SA based on the 4 tuple and the SPI in the ESP header. (For some applications of "inline IPsec" you want the NIC to do ESP header insertion/extraction in which case you need a way to pass the SPI up similar to how we handle VLAN tags).

In D31377#707797, @jhb wrote:

You mean the existing IPsec code? Yes, it exists in sys/netipsec. :) It assumes that it always has to encrypt/decrypt each packet though via OCF. It does not have any plumbing yet to support offloading either of those to the NIC. I looked at it a bit for cxgbe(4) for the TX side. The control plane side is that you want to be able to propagate SA entries down to interfaces (for TX I had imagined allocating send tags for this as a new send tag type). Then on the data plane side you need to mark mbufs. For TX the send tags would have worked for that (though it might have been stored a bit differently perhaps, not sure if we want to permit packet pacing over IPsec, etc.). For RX M_DECRYPTED can serve to mark packets, but I don't know if we need to pass up some kind of SA identifier? I guess if you are just decrypting the packet but not stripping the ESP header IPsec can re-lookup the SA based on the 4 tuple and the SPI in the ESP header. (For some applications of "inline IPsec" you want the NIC to do ESP header insertion/extraction in which case you need a way to pass the SPI up similar to how we handle VLAN tags).

I am trying to get some consistent opinion on what driver hooks are required for implementation of inline IPSEC. I have an impression that tags would not be useful for this. Instead, we (as Nvidia Connect-X) need full SADB offloaded to the hardware. In fact, I am highly confused there, because host side has SADB and SPD, and you need both to determine how to handle the packet.

On the card side, we can and in fact must match specific flow, i.e. SPD entry, on either send or receive direction, and then hardware can apply specific IPSEC encrypt or decrypt action and optional encapsulation or decapsulation. So SPD seems to partially match a definition of the flow, and SADB would fit into IPSEC encrypt/decrypt parameters, except that SADB and SPD entries have some redundant information, and host must distill this to straight flow + key + set of actions.

Also card' inline engine has some limitations. For instance we must not feed fragments on TX into it, so driver must have a hook to request fall back to the host encryption for specific packet. Similarly, engine might refuse to decrypt a packet for several reasons, so we must push up the information whether the packet was authenticated and decrypted by hardware, or should it be handled by host. Then there is next problem with the replay protection, so when packet falls to software decrypt path, we either need to get reply cache from card and feed it into host ipsec state, or do something else, for instance, not do decapsulation of decrypted packets in hardware, but leave it to the host so that fresh reply cache is always there.

To summarize, it seems that we need:

  • hooks on SADB element addition and removal (somewhere in key_newsav() and key_cleansav() ?)
  • hooks on SPD changes (key_insertsp() ?)
  • hooks on pre-send on Tx
  • indication of decryption and authentication state on Rx
  • interface to get replay cache

And all the natural places where the listed hooks could be naturally added seems to be non-sleepable, which makes things quite complicated.

I would appreciate any comments fixing my (mis-)understanding of the stuff.

In D31377#707828, @kib wrote:
In D31377#707797, @jhb wrote:

You mean the existing IPsec code? Yes, it exists in sys/netipsec. :) It assumes that it always has to encrypt/decrypt each packet though via OCF. It does not have any plumbing yet to support offloading either of those to the NIC. I looked at it a bit for cxgbe(4) for the TX side. The control plane side is that you want to be able to propagate SA entries down to interfaces (for TX I had imagined allocating send tags for this as a new send tag type). Then on the data plane side you need to mark mbufs. For TX the send tags would have worked for that (though it might have been stored a bit differently perhaps, not sure if we want to permit packet pacing over IPsec, etc.). For RX M_DECRYPTED can serve to mark packets, but I don't know if we need to pass up some kind of SA identifier? I guess if you are just decrypting the packet but not stripping the ESP header IPsec can re-lookup the SA based on the 4 tuple and the SPI in the ESP header. (For some applications of "inline IPsec" you want the NIC to do ESP header insertion/extraction in which case you need a way to pass the SPI up similar to how we handle VLAN tags).

I am trying to get some consistent opinion on what driver hooks are required for implementation of inline IPSEC. I have an impression that tags would not be useful for this. Instead, we (as Nvidia Connect-X) need full SADB offloaded to the hardware. In fact, I am highly confused there, because host side has SADB and SPD, and you need both to determine how to handle the packet.

On the card side, we can and in fact must match specific flow, i.e. SPD entry, on either send or receive direction, and then hardware can apply specific IPSEC encrypt or decrypt action and optional encapsulation or decapsulation. So SPD seems to partially match a definition of the flow, and SADB would fit into IPSEC encrypt/decrypt parameters, except that SADB and SPD entries have some redundant information, and host must distill this to straight flow + key + set of actions.

Also card' inline engine has some limitations. For instance we must not feed fragments on TX into it, so driver must have a hook to request fall back to the host encryption for specific packet. Similarly, engine might refuse to decrypt a packet for several reasons, so we must push up the information whether the packet was authenticated and decrypted by hardware, or should it be handled by host. Then there is next problem with the replay protection, so when packet falls to software decrypt path, we either need to get reply cache from card and feed it into host ipsec state, or do something else, for instance, not do decapsulation of decrypted packets in hardware, but leave it to the host so that fresh reply cache is always there.

To summarize, it seems that we need:

  • hooks on SADB element addition and removal (somewhere in key_newsav() and key_cleansav() ?)
  • hooks on SPD changes (key_insertsp() ?)
  • hooks on pre-send on Tx
  • indication of decryption and authentication state on Rx
  • interface to get replay cache

And all the natural places where the listed hooks could be naturally added seems to be non-sleepable, which makes things quite complicated.

I would appreciate any comments fixing my (mis-)understanding of the stuff.

I think the only difference here for Chelsio (at least for T6) is that for the send side (T6 can't do any RX offload for IPsec), it would be sufficient to have the SA state in the driver (rather than the NIC per se). What I had imagined there is that trying to transmit an IPsec packet would allocate a send tag for the desired SADB for the first packet sent over a given interface (though perhaps you could pre-create them) and set the m_snd_tag to the allocated tag. In the driver the tag would be used to steer the packet to a different transmit path (much like how TLS works today on both NICs I believe) where you use a different command separate from "send a plain packet". The send tag approach means that the stack is still the one mapping a given packet to an SA and the send tag when allocated would already be tied to a specific set of keys and algorithms so the driver doesn't have to parse the 4-tuple or SPI to figure out the SA. However, if you are putting the SADB on the NIC itself for TX, then that approach doesn't work for you. (For Chelsio T6, the work request sent to the NIC describes how to encrypt the packet including which algorithms to use and which keys to use (either inline or pointers into keys stored on the NIC). However, I'm not sure if that same design will be a requirement for future Chelsio NICs for the transmit side. I think for any type of RX offload for IPsec you have to store SADB entries on the NIC so it can look up keys and decrypt on its own. That said, if we directly add SADB entries to the NIC in general, Chelsio T6 could make use of that as well, it would just need to examine the 4 tuple + SPI to map outgoing packets to the SADB entry. I think for pre-TX we may not need hooks so much as a way to flag that a given interface can accept unencrypted IPsec packets. Part of this in the TLS case somewhat falls out naturally by use of send tags as one of the things you want to be careful about is if you get a route change from one NIC to another and the second NIC doesn't support IPsec offload you don't want to leak unencrypted packets on the wire. A new if_capenable flag might be sufficient for that though. Probably we would like to have separate flags for RX and TX. However it does mean you will need a way to "flag" a packet as encrypted or not so the NIC knows if it should encrypt it or if it has already been encrypted by SW. For TLS we depend on the TLS session pointer in m_epg_tls being NULL to mark "already encrypted" packets. We might need something else for IPsec. Perhaps we can reuse M_DECRYPTED to mark "not yet encrypted". The only annoyance there is that in theory a driver might get non-IP packets (though not very likely in practice).

This revision was not accepted when it landed; it landed in state Needs Review.Aug 6 2021, 9:31 AM
This revision was automatically updated to reflect the committed changes.