Page MenuHomeFreeBSD

Optimize hash6_insert() in sys/netgraph/netflow/netflow.c on 64-bit platforms
ClosedPublic

Authored by jkim on May 24 2016, 8:21 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Dec 6, 8:06 PM
Unknown Object (File)
Mon, Dec 2, 2:58 AM
Unknown Object (File)
Mon, Dec 2, 2:58 AM
Unknown Object (File)
Mon, Dec 2, 2:58 AM
Unknown Object (File)
Mon, Dec 2, 2:58 AM
Unknown Object (File)
Mon, Dec 2, 2:58 AM
Unknown Object (File)
Mon, Dec 2, 2:38 AM
Unknown Object (File)
Sat, Nov 30, 12:37 PM
Subscribers

Details

Summary

A macro ipv6_masklen() in sys/netgraph/netflow/netflow.c is using four bitcount32() calls. It is inefficient on 64-bit platforms. This patch re-implements it with bit_count(). On amd64, for example, both Clang and GCC generate two popcntq instructions instead of four popcntl instructions when POPCNT is available.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jkim retitled this revision from to Optimize hash6_insert() in sys/netgraph/netflow/netflow.c on 64-bit platforms.
jkim updated this object.
jkim edited the test plan for this revision. (Show Details)
jkim added reviewers: asomers, gibbs, ngie.
jkim set the repository for this revision to rS FreeBSD src repository - subversion.
sys/netgraph/netflow/netflow.c
153 ↗(On Diff #16817)

mask will always be of the form /^1*0*$/, right? That is, a bunch of ones followed by a bunch of zeros? If so, you don't need bit_count. You could use bit_ffc. bit_ffc is probably faster, especially on pre-Nehalem platforms that don't have POPCNT. However, you'd have to be careful to make sure that it works on all endiannesses and word sizes.

155 ↗(On Diff #16817)

I really don't like overloading len here. It has two very different meanings. Could you use separate variables instead?

Calculate the number of bits and do not reuse a variable.

sys/netgraph/netflow/netflow.c
153 ↗(On Diff #16817)

I just wanted to convert bitcount32() to bit_count() in this patch. However, I see your point. Alexander, what do you think?

155 ↗(On Diff #16817)

Fixed.

sys/netgraph/netflow/netflow.c
157 ↗(On Diff #16829)

Can this be a constant instead of a hardcoded value?

sys/netgraph/netflow/netflow.c
17 ↗(On Diff #16829)

mlen should be initialized to the computed value too, not to a hard-coded 128.

sys/netgraph/netflow/netflow.c
157 ↗(On Diff #16829)

Sure, I can go back to the original code, i.e., reusing mlen, if that's what you mean. :-)

mlen = sizeof(mask->sin6_addr) * NBBY;
if (info->rti_addrs & RTA_NETMASK)
        bit_count((bitstr_t *)&mask->sin6_addr, 0, mlen, &mlen);
sys/netgraph/netflow/netflow.c
157 ↗(On Diff #16829)

No, please don't assign two different meanings to mlen. I think that the best solution is something like the following, because it avoids magic numbers and also assigns only a single meaning for each variable:

if (info->rti_addrs & RTA_NETMASK)
        bit_count((bitstr_t *)&mask->sin6_addr, 0, sizeof(mask->sin6_addr) * NBBY, &mlen);
else
        mlen = sizeof(mask->sin6_addr) * NBBY;
sys/netgraph/netflow/netflow.c
157 ↗(On Diff #16829)

In fact, these two numbers are essentially the same thing, i.e., when netmask is not set use full 128-bit. I really don't want to duplicate sizeof(mask->sin6_addr) * NBBY. :-(

sys/netgraph/netflow/netflow.c
157 ↗(On Diff #16829)

It's fine if you want to assign sizeof(mask->sin6_addr) * NBBY to some other const variable. I just don't like assigning two separate meanings to mlen (the size of the bitstr, and also the size of the network). I find it hard to reason about code when variables' meanings change. It's also not how most compilers work. I try to think in the single static assignment paradigm, because it's what my compiler will do. How about this?

const int sin6_addr_size = sizeof(mask->sin6_addr) * NBBY;
if (info->rti_addrs & RTA_NETMASK)
        bit_count((bitstr_t *)&mask->sin6_addr, 0, sin6_addr_size, &mlen);
else
        mlen = sin6_addr_size;
153 ↗(On Diff #16817)

Looking at the problem in more detail, I see that using bit_ffc as I suggested would require some ntohl operations on LE architectures. I don't know whether that's faster than using bit_count, but I do know that it's uglier. I would approve of using bit_count.

sys/netgraph/netflow/netflow.c
157 ↗(On Diff #16829)

I think making mask->sin6_addr into a #define would result in the biggest net win, since it's used three times.

Also, sin6_addr_size should be num_bits_in_sin6_addr.

jkim marked 2 inline comments as done.

I hope this patch satisfy both asomers and ngie now. This is my final version. ;-)

ngie edited edge metadata.
This revision is now accepted and ready to land.May 26 2016, 5:55 PM
asomers edited edge metadata.
This revision was automatically updated to reflect the committed changes.