Page MenuHomeFreeBSD

ipsec_accel: kernel infrastructure
Needs ReviewPublic

Authored by kib on Mar 5 2024, 7:27 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, May 18, 10:47 AM
Unknown Object (File)
Sat, May 18, 10:46 AM
Unknown Object (File)
Sat, May 18, 10:46 AM
Unknown Object (File)
Wed, May 15, 11:30 AM
Unknown Object (File)
Tue, May 14, 10:14 AM
Unknown Object (File)
Thu, May 9, 4:22 PM
Unknown Object (File)
Tue, May 7, 6:18 PM
Unknown Object (File)
Tue, May 7, 4:20 AM

Details

Reviewers
rscheff
Group Reviewers
transport
Summary
Inline IPSEC offload moves almost whole IPSEC processing from the
CPU/MCU and possibly crypto accelerator, to the network card.

The transmitted packet content is not touched by CPU during TX
operations, kernel only does the required policy and security
association lookups to find out that given flow is offloaded, and then
packet is transmitted as plain text to the card. For driver convenience,
a metadata is attached to the packet identifying SA which must process
the packet. Card does encryption of the payload, padding, calculates
authentication, and does the reformat according to the policy.

Similarly, on receive, card does the decapsulation, decryption, and
authentification.  Kernel receives the identifier of SA that was
used to process the packet, together with the plain-text packet.

Overall, payload octets are only read or written by card DMA engine,
removing a lot of memory subsystem overhead, and saving CPU time because
IPSEC algos calculations are avoided.

If driver declares support for inline IPSEC offload (with the
IFCAP2_IPSEC_ACCEL capability set and registering method table struct
if_ipsec_accel_methods), kernel offers the SPD and SAD to driver.

Driver decides which policies and SAs can be offloaded based on
hardware capacity, and acks/nacks each SA for given interface to
kernel.  Kernel needs to keep this information to make a decision to
skip software processing on TX, and to assume processing already done
on RX.  This shadow SPD/SAD database of offloads is rooted from
policies (struct secpolicy accel_ifps, struct ifp_handle_sp) and SAs
(struct secasvar accel_ipfs, struct ifp_handle_sav).

Some extensions to the PF_KEY socket allow to limit interfaces for
which given SP/SA could be offloaded (proposed for offload).  Also,
additional statistics extensions allow to observe allocation/octet/use
counters for specific SA.

Since SPs and SAs are typically instantiated in non-sleepable context,
while offloading them into card is expected to require costly async
manipulations of the card state, calls to the driver for offload and
termination are executed in the threaded taskqueue.  It also solves
the issue of allocating resources needed for the offload database.
Neither ipf_handle_sp nor ipf_handle_sav do not add reference to the
owning SP/SA, the offload must be terminated before last reference is
dropped.  ipsec_accel only adds transient references to ensure safe
pointer ownership by taskqueue.

Maintaining the SA counters for hardware-accelerated packets is the
duty of the driver.  The helper ipsec_accel_drv_sa_lifetime_update()
is provided to hide accel infrastructure from drivers which would use
expected callout to query hardware periodically for updates.

Sponsored by:   NVIDIA networking

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib requested review of this revision.Mar 5 2024, 7:27 AM

Since you are touching the TCP code of the base stack, what is with the RACK stack? Doesn't it need the same modifications?

Since you are touching the TCP code of the base stack, what is with the RACK stack? Doesn't it need the same modifications?

I never looked. This jumbo patch is of course split in the repository, and the part that touches TCP stack is tiny, with the only purpose to allow the TSO indicator to pass down from driver up to tcp_output() and then back to driver over ipsec_output(). Perhaps something similar can be done for RACK.

I will extract several 'interesting' patches which tweak code around, so that it is easier to discuss.

Update to integrate more bug fixes.

rscheff added a subscriber: rscheff.

i've read through the TCP related changes, which look good to me. But that's only a minute part of this change. Others should comment on the remaining aspects of this major diff.

sys/netinet/tcp_output.c
913 ↗(On Diff #135624)

Maybe update the assert text depending on the HW IPSEC offload state?

This revision is now accepted and ready to land.Mar 12 2024, 7:18 AM
In D44219#1008810, @kib wrote:

I never looked. This jumbo patch is of course split in the repository,

where can I find the individual bits if there's a split version? It would probably help review a lot if I could go through in smaller junks a time.

In D44219#1010688, @bz wrote:
In D44219#1008810, @kib wrote:

I never looked. This jumbo patch is of course split in the repository,

where can I find the individual bits if there's a split version? It would probably help review a lot if I could go through in smaller junks a time.

D44221 D44222 D44223 D44224 D44225 D44314 D44316

The rest is the monolithic netipsec/ipsec_accel.{h,c} together with hooks to call into it from the strategic places.

sys/netinet/tcp_var.h
43 ↗(On Diff #135624)

Maybe just hide struct ifcap under _KERNEL?

sys/netipsec/ipsec_accel.c
64

What is the pros on disabling 100% of a file with preprocessor instead of just not compiling it at all with help of make glue?

kib marked 2 inline comments as done.Mar 12 2024, 9:55 PM
kib added inline comments.
sys/netinet/tcp_output.c
913 ↗(On Diff #135624)

But what would be a condition and corresponding error message? It is always ipoptlen - ipsec_optlen != 0, which means that there are options.

sys/netinet/tcp_var.h
43 ↗(On Diff #135624)

I am not sure, and do not see point in investigating if it is used by userspace (header is definitely used).

sys/netipsec/ipsec_accel.c
64

The big issue is that we have ipsec.ko and it is very ugly to define conditional source for a module Makefile depending on the kernel config.

Also, the file contained some stubs for !IPSEC_ACCEL initially, then it happen to clean.

kib marked 2 inline comments as done.

Rebase; use t_flags2 since the last bit in t_flags was consumed.

This revision now requires review to proceed.Mar 12 2024, 9:56 PM
sys/netinet/tcp_var.h
43 ↗(On Diff #135624)

Well, the header shall not be used. There are some known violators, they also usually define _WANT_INPCB or _WANT_TCPCB. We are aiming towards eliminating its use rather than providing accommodations for its use. So I'd rather remove these lines and I'm pretty sure that buildworld will pass. And that would be enough.

sys/netinet/tcp_var.h
43 ↗(On Diff #135624)

Lets not mix the changes. Moving a struct out of userspace compilation environment is not related to the subject. We can do it later, but I do not want to do it in this scope.

sys/netinet/tcp_var.h
43 ↗(On Diff #135624)

I agree - let's don't mix the changes. But lets make them in order that would not add <stdbool.h> just to be deleted on the next step. Let's do this first:

https://reviews.freebsd.org/D44340

Then your bigger change.

Latest version after the rounds of bugfixes