Details

Reviewers

jhb
markj

Commits

rG0ff2d00d2aa3: ipsec: allow it to work with unmapped mbufs
rGb0e020764aae: ipsec + ktls: cannot coexists
rGcf322978d73a: mb_unmapped_to_ext(): return error code on error

Summary

IPv4 only ATM, want to get some feedback before handling IPv6.

Only map mbuf when a policy is looked up and indicates that IPSEC needs to transform the packet.
If IPSEC is inline offloaded, it is up to the interface driver to request remap if needed.

Fetch the IP header using m_copydata() instead of using mtod() to select policy/SA.

Change interface of mb_unmapped_to_ext() to distinguish failures between OOM and incompatible packets. Note that right now, TLS packets passed to IPSEC mean insta-panic for debug kernels, and UB for non-debug. This should be fixed by the patch.

Ideas for better diagnostic/action on EINVAL from mb_unmapped_to_ext() are solicited.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kib created this revision.Dec 31 2024, 7:42 AM

Herald added subscribers: olce, glebius, melifaro and 2 others. · View Herald TranscriptDec 31 2024, 7:42 AM

kib requested review of this revision.Dec 31 2024, 7:42 AM

I wonder why exactly software ipsec cannot process unmapped mbufs. In an unmapped ktls packet, the protocol headers are still mapped. netipsec needs to access the payload in order to encrypt, but it uses opencrypto's cursor abstraction, and crypto_cursor_* handles unmapped mbufs. What is missing?

Basically, I wonder if we can side-step the problem by making netipsec compatible with unmapped mbufs.

sys/netinet/ip_output.c
685 ↗	(On Diff #148591)	Why not do this within the existing `if (IPSEC_ENABLED())` block above?
sys/netipsec/ipsec.c
492 ↗	(On Diff #148591)	`ip1` can be a pointer to const.
sys/netipsec/ipsec_output.c
920 ↗	(On Diff #148591)	This makes me nervous: it's very easy to accidentally invalidate the `ip` pointer at some point, especially since neither `m` nor `ip` is pointing to const here or in callees. Can we avoid fetching the pointer here and defer that to ipsec4_perform_request().

In D48265#1100743, @markj wrote:

I wonder why exactly software ipsec cannot process unmapped mbufs. In an unmapped ktls packet, the protocol headers are still mapped. netipsec needs to access the payload in order to encrypt, but it uses opencrypto's cursor abstraction, and crypto_cursor_* handles unmapped mbufs. What is missing?

Basically, I wonder if we can side-step the problem by making netipsec compatible with unmapped mbufs.

Well, it started with PR 272616. There m_unshare uses memcpy() not being adopted for the unmapped buffers.

I think my patch moves the code in the right direction, and is the useful first step. For now, it avoids the cost of re-mapping incurred for all packets just if ipsec.ko is loaded. Also, it:

handles incompatibility/panic/UB between ipsec and ktls
should improve offloaded ipsec because packets are kept unmapped there.

sys/netipsec/ipsec_output.c
920 ↗	(On Diff #148591)	I do not share this opinion. The code in the function is straightforward from there to the perform_request() point. There only calculations that are done in between, is assigning the mtag. I can add an assert that ip is same if recalculated right before the perform_request() call.

Mark one place *ip as const.
Collapse two if()s.

In D48265#1100756, @kib wrote:

In D48265#1100743, @markj wrote:

I wonder why exactly software ipsec cannot process unmapped mbufs. In an unmapped ktls packet, the protocol headers are still mapped. netipsec needs to access the payload in order to encrypt, but it uses opencrypto's cursor abstraction, and crypto_cursor_* handles unmapped mbufs. What is missing?

Basically, I wonder if we can side-step the problem by making netipsec compatible with unmapped mbufs.

Well, it started with PR 272616. There m_unshare uses memcpy() not being adopted for the unmapped buffers.

I think my patch moves the code in the right direction, and is the useful first step. For now, it avoids the cost of re-mapping incurred for all packets just if ipsec.ko is loaded. Also, it:

handles incompatibility/panic/UB between ipsec and ktls

should improve offloaded ipsec because packets are kept unmapped there.

Sorry, I didn't quite understand the patch the first time. I think the approach is ok. Soon I might need to make sw KTLS and IPSec compatible with each other for an upcoming project, but I'm not sure yet. I do not see how hw KTLS can be made to work with IPsec at all.

sys/kern/kern_mbuf.c
953	The failure semantics are inconsistent, since in this case we do not free `m`. I believe mb_unmapped_to_ext() will leak an mbuf in this case.
1099–1102	Suppose _mb_unmapped_to_ext() fails with ENOBUFS on the first mbuf in the chain. It will free `m`, and `top == m`, so here we'll free the mbuf twice. In practice I suspect this never happens because the first mbuf will be mapped.
sys/netipsec/ipsec_output.c
920 ↗	(On Diff #148591)	The problem is not only in this function. ipsec4_perform_request() passes `&m` to HHOOK_TYPE_IPSEC_OUT hooks, which may replace the mbuf. This could invalidate the IP header pointer, which is dereferenced later in the sctp_delayed_checksum() call.

kib marked 4 inline comments as done.Jan 1 2025, 11:46 AM

kib added inline comments.

sys/kern/kern_mbuf.c
1099–1102	Isn't the m_freem(next) call also problematic? I replaced m_freem(top) with conditional m_free(top), to keep next valid always.
sys/netipsec/ipsec_output.c
920 ↗	(On Diff #148591)	Then this is the place to re-evaluate ip.

Bug fixes:

try to handle mbuf leaks and double-free on map failure
re-calculate *ip after hooks

In D48265#1100880, @markj wrote:

I do not see how hw KTLS can be made to work with IPsec at all.

Forgot to answer this. FWIK, nvidia hw does not support more then one crypto op over the packet. In other words, netcard KTLS + inline IPSEC offload cannot coexist.

This is the place where my patch stops, but still a better diagnostic could be useful.

Fetch the IP header using m_copydata() instead of using mtod() to select policy/SA.

This is just to handle the possibility that the IP header is unmapped, right? If so, I am still somewhat skeptical about it: IPSEC_OUTPUT/IPSEC_FORWARD is called by functions which access the protocol headers directly with mtod(). In particular, m_copydata() is more expensive than an inline memcpy with constant size.

In D48265#1101142, @kib wrote:

In D48265#1100880, @markj wrote:

I do not see how hw KTLS can be made to work with IPsec at all.

Forgot to answer this. FWIK, nvidia hw does not support more then one crypto op over the packet. In other words, netcard KTLS + inline IPSEC offload cannot coexist.

This is the place where my patch stops, but still a better diagnostic could be useful.

If we can support sw KTLS with IPSec enabled, so there is no need to map mbufs, then perhaps we can simply disallow NIC KTLS session creation if the NIC currently has IPSec offload enabled. That is, rather than checking each packet in the data path, we just do not permit enabling both types of offload in a single ifnet.

sys/netinet6/ip6_output.c
1111 ↗	(On Diff #148639)	Certainly we have to at least return an error here, as `m` is now a pointer to freed memory.
sys/netipsec/ipsec_output.c
114 ↗	(On Diff #148639)	`ip` can be a pointer to const.
sys/netipsec/subr_ipsec.c
64 ↗	(On Diff #148639)	`ip1` can be a pointer to const.

More const ip1 *.
More love to ipv6 case.

markj accepted this revision.Jan 3 2025, 2:43 PM

markj added inline comments.

sys/netinet/ip_output.c
737 ↗	(On Diff #148675)	You might need to rate-limit this print. Or perhaps not: the first error will put the socket into an error state and prevent further transmits, so we would get one print per connection.