Page MenuHomeFreeBSD

ip6_output: Reduce cache misses on pktopts
ClosedPublic

Authored by gallatin on Mar 4 2024, 10:19 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Nov 20, 10:09 AM
Unknown Object (File)
Wed, Nov 6, 5:21 PM
Unknown Object (File)
Wed, Nov 6, 5:21 PM
Unknown Object (File)
Oct 16 2024, 8:26 PM
Unknown Object (File)
Oct 16 2024, 8:26 PM
Unknown Object (File)
Oct 16 2024, 8:26 PM
Unknown Object (File)
Oct 16 2024, 8:26 PM
Unknown Object (File)
Oct 16 2024, 8:26 PM

Details

Summary

When profiling an IP6 heavy workload, I noticed that we were getting a lot of cache misses in ip6_output() around ip6_pktopts. This was happening because the TCP stack passes inp->in6p_outputopts even if all options are unused. So in the common case of no options present, pkt_opts is not null, and is checked repeatedly for different options. Since ip6_pktopts is large (4 cachelines), and every field is checked, we take 4 cache misses (2 of which tend to be hidden by the adjacent line prefetcher).

To fix this common case, I introduced a new flag in ip6_pktopts ( ip6po_valid) which tracks which options have been set. In the common case where nothing is set, this causes just a single cache miss to load. It also eliminates a test for some options (if (opt != NULL && opt->val >= const) vs if ((optvalid & flag) !=0 )

To keep the struct the same size in 64-bit kernels, and to keep the integer values (like ip6po_hlim, ip6po_tclass, etc) on the same cacheline, I moved them to the top.

For our web server workload (with the ip6po_tclass option set), this drops the CPI from 2.9 to 2.4 for ip6_output

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Initially I thought we should name some better but the original structs have the same names so all good.
I have not checked if you got all the places but it looks good.

FreeBSD/sys/netinet6/ip6_var.h
165 ↗(On Diff #135368)

I think a blank line in front of this would be ok given this is all vertical spaced anyway.

This revision is now accepted and ready to land.Mar 4 2024, 10:49 PM
  • added a blank like before ip6po_m, as requested by @bz
This revision now requires review to proceed.Mar 5 2024, 12:58 AM
In D44204#1008527, @bz wrote:

Initially I thought we should name some better but the original structs have the same names so all good.
I have not checked if you got all the places but it looks good.

If you could take a pass over it and make sure I got all the places with no typos, I'd appreciate it. I'm afraid I'll screw something up that may not be noticed for a while..

Probably you can simplify some similar checks in in6_src.c too, e.g. IP6PO_VALID_PKTINFO and IP6PO_VALID_NHINFO. Not sure how it impacts your cache misses measurements.

LGTM, q - would it be possible to introduce ‘ip6po_<set|clear>_<field>’ inline functions and use them so we don’t accidentally miss setting/clearing up the relevant bit?

FreeBSD/sys/netinet6/ip6_var.h
133 ↗(On Diff #135380)

Can you please place a comment above the struct that basically is a reworded commit message of this review. E.g.:

/*
  * We store validity flags in the first word of this large struct because...

Added a comment explaining why the flags exist, as suggested by @glebius

In D44204#1008766, @ae wrote:

Probably you can simplify some similar checks in in6_src.c too, e.g. IP6PO_VALID_PKTINFO and IP6PO_VALID_NHINFO. Not sure how it impacts your cache misses measurements.

None of these paths showed up as hot in my profiling, and I'm currently a bit terrified that I'll make a typo and screw up some obscure feature, so I'd rather not touch them if at all possible.

Do you have a workload where they are in the critical path? I'm afraid I don't know ip6 as well as I should.

LGTM, q - would it be possible to introduce ‘ip6po_<set|clear>_<field>’ inline functions and use them so we don’t accidentally miss setting/clearing up the relevant bit?

I tend to dislike functions like that which obfuscate setting & clearing bits, so I would prefer not to. Unless you insist..

LGTM, q - would it be possible to introduce ‘ip6po_<set|clear>_<field>’ inline functions and use them so we don’t accidentally miss setting/clearing up the relevant bit?

I tend to dislike functions like that which obfuscate setting & clearing bits, so I would prefer not to. Unless you insist..

I'm with Drew on this.

Got it. From my pov we're now requiring explicit mental effort (and knowledge) that each field now comes with a bit, and field mutations may need flag updates. Often after some time it ends up with flags not being consistent, leading to hard-to-debug errors..
I don't have a _strong_ opinion on this, so won't object to the current version - and hope there won't be 'I told you so' situation in a year :-)

Generally looks good to me.

FreeBSD/sys/netinet6/ip6_output.c
502 ↗(On Diff #135437)

Then we can remove redundant NULL check in the macro MAKE_EXTHDR

@@ -159,14 +159,12 @@ static int copypktopts(struct ip6_pktopts *, struct ip6_pktopts *, int);
  */
 #define        MAKE_EXTHDR(hp, mp, _ol)                                        \
     do {                                                               \
-       if (hp) {                                                       \
                struct ip6_ext *eh = (struct ip6_ext *)(hp);            \
                error = ip6_copyexthdr((mp), (caddr_t)(hp),             \
                    ((eh)->ip6e_len + 1) << 3);                         \
                if (error)                                              \
                        goto freehdrs;                                  \
                (_ol) += (*(mp))->m_len;                                \
-       }                                                               \
     } while (/*CONSTCOND*/ 0)
 
 /*

I'm wondering if the option checking optvalid & IP6PO_VALID_XXX can be embedded into the macro MAKE_EXTHDR so that the code is easier to read.

Generally looks good to me.

Good point about removing the redundant NULL checks. I am about to push this, and I added that to the code I'll push.

I like the idea of making it easier to read, but the macro would get a lot more complex, so I don't plan to do that.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 20 2024, 7:51 PM
This revision was automatically updated to reflect the committed changes.