Inline IPSEC offload moves almost whole IPSEC processing from the CPU/MCU and possibly crypto accelerator, to the network card. The transmitted packet content is not touched by CPU during TX operations, kernel only does the required policy and security association lookups to find out that given flow is offloaded, and then packet is transmitted as plain text to the card. For driver convenience, a metadata is attached to the packet identifying SA which must process the packet. Card does encryption of the payload, padding, calculates authentication, and does the reformat according to the policy. Similarly, on receive, card does the decapsulation, decryption, and authentification. Kernel receives the identifier of SA that was used to process the packet, together with the plain-text packet. Overall, payload octets are only read or written by card DMA engine, removing a lot of memory subsystem overhead, and saving CPU time because IPSEC algos calculations are avoided. If driver declares support for inline IPSEC offload (with the IFCAP2_IPSEC_ACCEL capability set and registering method table struct if_ipsec_accel_methods), kernel offers the SPD and SAD to driver. Driver decides which policies and SAs can be offloaded based on hardware capacity, and acks/nacks each SA for given interface to kernel. Kernel needs to keep this information to make a decision to skip software processing on TX, and to assume processing already done on RX. This shadow SPD/SAD database of offloads is rooted from policies (struct secpolicy accel_ifps, struct ifp_handle_sp) and SAs (struct secasvar accel_ipfs, struct ifp_handle_sav). Some extensions to the PF_KEY socket allow to limit interfaces for which given SP/SA could be offloaded (proposed for offload). Also, additional statistics extensions allow to observe allocation/octet/use counters for specific SA. Since SPs and SAs are typically instantiated in non-sleepable context, while offloading them into card is expected to require costly async manipulations of the card state, calls to the driver for offload and termination are executed in the threaded taskqueue. It also solves the issue of allocating resources needed for the offload database. Neither ipf_handle_sp nor ipf_handle_sav do not add reference to the owning SP/SA, the offload must be terminated before last reference is dropped. ipsec_accel only adds transient references to ensure safe pointer ownership by taskqueue. Maintaining the SA counters for hardware-accelerated packets is the duty of the driver. The helper ipsec_accel_drv_sa_lifetime_update() is provided to hide accel infrastructure from drivers which would use expected callout to query hardware periodically for updates. Sponsored by: NVIDIA networking
Details
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
Since you are touching the TCP code of the base stack, what is with the RACK stack? Doesn't it need the same modifications?
I never looked. This jumbo patch is of course split in the repository, and the part that touches TCP stack is tiny, with the only purpose to allow the TSO indicator to pass down from driver up to tcp_output() and then back to driver over ipsec_output(). Perhaps something similar can be done for RACK.
I will extract several 'interesting' patches which tweak code around, so that it is easier to discuss.
i've read through the TCP related changes, which look good to me. But that's only a minute part of this change. Others should comment on the remaining aspects of this major diff.
sys/netinet/tcp_output.c | ||
---|---|---|
913 ↗ | (On Diff #135624) | Maybe update the assert text depending on the HW IPSEC offload state? |
I never looked. This jumbo patch is of course split in the repository,
where can I find the individual bits if there's a split version? It would probably help review a lot if I could go through in smaller junks a time.
sys/netinet/tcp_output.c | ||
---|---|---|
913 ↗ | (On Diff #135624) | But what would be a condition and corresponding error message? It is always ipoptlen - ipsec_optlen != 0, which means that there are options. |
sys/netinet/tcp_var.h | ||
43 ↗ | (On Diff #135624) | I am not sure, and do not see point in investigating if it is used by userspace (header is definitely used). |
sys/netipsec/ipsec_accel.c | ||
64 | The big issue is that we have ipsec.ko and it is very ugly to define conditional source for a module Makefile depending on the kernel config. Also, the file contained some stubs for !IPSEC_ACCEL initially, then it happen to clean. |
sys/netinet/tcp_var.h | ||
---|---|---|
43 ↗ | (On Diff #135624) | Well, the header shall not be used. There are some known violators, they also usually define _WANT_INPCB or _WANT_TCPCB. We are aiming towards eliminating its use rather than providing accommodations for its use. So I'd rather remove these lines and I'm pretty sure that buildworld will pass. And that would be enough. |
sys/netinet/tcp_var.h | ||
---|---|---|
43 ↗ | (On Diff #135624) | Lets not mix the changes. Moving a struct out of userspace compilation environment is not related to the subject. We can do it later, but I do not want to do it in this scope. |
sys/netinet/tcp_var.h | ||
---|---|---|
43 ↗ | (On Diff #135624) | I agree - let's don't mix the changes. But lets make them in order that would not add <stdbool.h> just to be deleted on the next step. Let's do this first: https://reviews.freebsd.org/D44340 Then your bigger change. |