In two places in ip6_output we are doing (delayed) checksum calculations. The initial logic came from SCTP in r205075,205104 and was later copied and adjusted for the TCP|UDP case in r235958. The problem was that the original SCTP offsets were already wrong for any case with extension headers present given IPv6 extension headers are not part of the pseudo checksum calculations. The later changes do not help in case there is checksum offloading as for extension headers (incl. fragments) we do currrently never offload as we have no infrastructure to know whether the NIC can handle these cases. Correct the offsets for delayed checksum calculations and properly handle mbuf flags. In addition harmonize the almost identical duplicate code. Submitted by: Francis Dupont (fdupont isc.org) Reported by: Francis Dupont (fdupont isc.org) PR: 243675
Details
TODO, write tests cases; we cannot easily use vnet/epair or other local
network tests as these "circumvent" the checksum calculations and set
correct checksums on the mbufs; we'd need at least a bhyve or multiple
real host scenario.
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
sys/netinet6/ip6_output.c | ||
---|---|---|
221 ↗ | (On Diff #68552) | This check is weird and I do not fully understand why the original changes from I dropped @jhb an email a few minutes ago asking. I hope we can remove the "!frag &&" here. |
I'm 1/2 joking, but what would you think about not supporting extension headers at all? They are the worst part of IPv6 and make everything complicated and add lots of hairy cases. What benefit are they?
(I'm legitimately curious)
sys/netinet6/ip6_output.c | ||
---|---|---|
221 ↗ | (On Diff #68552) | I believe it is because any packtets that were originally unmapped would have already been previously expanded by mb_unmapped_to_ext(), when fragmented so there is no reason to check |
I'd love to. Sadly that train left more than 20 years ago and we are seeing more and more protocol(s | extensions) making use of them. I think as a general purpose OS we cannot do much about it.
I do have a big hammer prototype check in a branch at the beginning of ip6_output() to do all essentially mandatory checks for the default case of just sending out a normal packet (ignoring firewalls, etc).
Let's keep cleaning bits by bits up there making the code simpler and factoring parts (such as ext hdrs) out more and more and we'll see if we can have a "fast path"; but also see if that "fast path" is actually any faster.
https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#extension-header
Active usage here are IPSec (50/51) (used by various VPNs), Fragment (44) (used in general, i.e. DNS(sec)) and Mobility (135) (used by Microsoft for DirectAccess) .
Leave comments and questions (and another possible bugfix comment for later).
sys/netinet6/ip6_output.c | ||
---|---|---|
221 ↗ | (On Diff #68552) | I am not sure I can entirely follow. Fragmentation happens afterwards. |
1097 ↗ | (On Diff #68552) | Sounds right. I'll leave that for a follow-up commit if you don't mind. |
1132 ↗ | (On Diff #68552) | There is and was for a long time a problem here in that if we called mb_unmapped_to_ext(m); and it was successful we'd get a new chain but never updated the ip6 pointer here doing ip6 = mtod(m, struct ip6_hdr *); if I am not wrong? |
Add KASSERT as suggested by @markj .
Remove hlen variable which always matches unfragpartlen as suggested by @markj.
Fix a case where we possibly do not update the ip6 pointer.
Leave a comment about an unclear header clearing line for future contemplation (which was removed in the previous revision).
sys/netinet6/ip6_output.c | ||
---|---|---|
1026 ↗ | (On Diff #68552) | I am still contemplating this one. I keep wondering if the original I added years ago should have been: m->m_pkthdr.csum_flags &= ~ifp->if_hwassist; clearing all further hwassist flags given we've done the delayed checksum calculations? |
1097 ↗ | (On Diff #68552) | Actually done it now given I updated the revision anyway. |
sys/netinet6/ip6_output.c | ||
---|---|---|
221 ↗ | (On Diff #68552) | Yeah, i see what you mean. Its just a bug. This code came from me originally (netflx tree ~2017) and the bug has been there since the initial commit. We just never noticed because we don't use SCTP. Adding the check here is correct. |
sys/netinet6/ip6_output.c | ||
---|---|---|
221 ↗ | (On Diff #68552) | @gallatin: Now you confuse me even more given this is unrelated to SCTP. It's the (ifp->if_capenable & IFCAP_NOMAP) == 0 check which is the one in question if it is needed for the second call (now around line 1129 in this review). The cases "/* Try to fragment the packet. Cases 1-b and 3. */". We'd calculate the checksum if not already done before doing fragmentation. If we do not do a delayed checksum for TCP|UDP|SCTP (for whatever reason) and did not run into the "ifp->if_capenable & IFCAP_NOMAP == 0" case the first time (for whatever reason), then the mbuf was never run through mb_unmapped_to_ext(). I assume there are two questions: (1) can this happen at all or would we always run through the case the first time anyway and thus it is not needed at all the 2nd time? (1b) in that case would we actually need the 2nd case? (2) if it can happen, then do we need to add it to the 2nd time? In other words: do you suggest I do remove the "!frag &&" from the change? |
sys/netinet6/ip6_output.c | ||
---|---|---|
221 ↗ | (On Diff #68552) | I'm utterly confused at what you're asking. Can you point me to the line in the original code where you think there should be a check? |
sys/netinet6/ip6_output.c | ||
---|---|---|
221 ↗ | (On Diff #68552) | Should there be a "} else if ((ifp->if_capenable & IFCAP_NOMAP) == 0) {" check here https://svnweb.freebsd.org/base/head/sys/netinet6/ip6_output.c?annotate=358167#l1127 as well as it is further up here: https://svnweb.freebsd.org/base/head/sys/netinet6/ip6_output.c?annotate=358167#l1019 ? |
sys/netinet6/ip6_output.c | ||
---|---|---|
221 ↗ | (On Diff #68552) | By the time we've gotten past the code around line 1029, we have examined the ifp, and converted from unmapped to mapped if the ifp is not capable of handling unmapped. How can we get to line 1127 without having done the check at 1029? Or is there something that I'm missing, like we are using a different ifp? The other checks there are needed because we're doing software checksums because of fragmentation, so we must convert unmapped to mapped in that case. But I do not see a need to re-check the interface caps. |
sys/netinet6/ip6_output.c | ||
---|---|---|
235 ↗ | (On Diff #68607) | I'm sure I"m asking dumb questions again, but why is performing the checksum tied to an interface not having IFCAP_NOMAP? |
@gallatin I hope this makes sense ..
sys/netinet6/ip6_output.c | ||
---|---|---|
235 ↗ | (On Diff #68607) | I think this time you got the logic wrong. We could always do the double work of calling mb_unmapped_to_ext() twice (and in fact for SCTP we might have done this in the old code because there was a missing else inside the #ifdef SCTP block). Now that is at least folded into one call only. There is still the case which was there before in that for the (2nd case, the before fragmentation) we'd call mb_unmapped_to_ext() again if in the first case we did not compute the checksum but only did the mb_unmapped_to_ext() because IFCAP_NOMAP was not set. So at least we are not off worse. We could optimise that out as well if we want. The main purpose of this change was to fix IPv6 delayed checksum computations though and I only stumbled over the IFCAP_NOMAP thing not fully realising why the code always was as-is. If we'd want to further optimise the code we'd put the entire: m = mb_unmapped_to_ext(m); if (m == NULL) { ... } block up there under an extra if (frag && (ifp->if_capenable & IFCAP_NOMAP) == 0) { ... } block as in that case we'd done the mb_unmapped_to_ext() already in the !frag case. That wasn't my logic before but was exactly the problem of my confusion initially. |
sys/netinet6/ip6_output.c | ||
---|---|---|
235 ↗ | (On Diff #68607) | Ah, sorry, yes. I confess it is because there are so many comments on the review that the first hunk scrolled off my screen.. |