Page MenuHomeFreeBSD

Avoid finding every bit transition in bit_ffs_area_at
ClosedPublic

Authored by dougm on Nov 23 2019, 8:00 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Mar 4, 2:05 PM
Unknown Object (File)
Mon, Mar 4, 2:05 PM
Unknown Object (File)
Mon, Mar 4, 2:05 PM
Unknown Object (File)
Mon, Mar 4, 2:05 PM
Unknown Object (File)
Mon, Mar 4, 2:05 PM
Unknown Object (File)
Feb 20 2024, 2:53 AM
Unknown Object (File)
Feb 9 2024, 10:15 AM
Unknown Object (File)
Jan 19 2024, 12:03 AM

Details

Summary

The current implementation of bit_ffs_area_at finds the start of every block of 1 bits until it finds a block of sufficient size. If a 32-bit word contains 0x55555555, that means finding the start of each of 16 blocks of 1 bits in looking for a pair of consecutive 1-bits.

The method proposed here preprocesses each 32-bit word to make blocks of 1-bits of insufficient size disappear; the time for that preprocessing is proportional to lg(size). It does not rely on calling other bit_ffs functions to examine data.

The same technique is applied to bit_ffc_area_at.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Address a problem of finding a match at the start when start!=0.

Revert half of the previous change. Only _value needs initialization based on _start, and _offset does not.

Reorder operations to avoid reading the word past the end of the bitstring, and ignoring it.

Add a missing underscore.

Hmm. I think I understand the change.. but it seems less intuitive.

Do you have any example of how much this improves? I.e. a C program with a large enough test case to show measurable improvement?

Hmm. I think I understand the change.. but it seems less intuitive.

Do you have any example of how much this improves? I.e. a C program with a large enough test case to show measurable improvement?

If it helps @dougm, the maximum size we'd typically use for the bitstrings would be 2048 bits, with us maybe using 16384-bit ones in the future.

In D22523#495149, @erj wrote:

If it helps @dougm, the maximum size we'd typically use for the bitstrings would be 2048 bits, with us maybe using 16384-bit ones in the future.

Even if we use a much larger example that is fine to me, as long as we can show a measurable improvement.

We're likely going to switch to using vmem in the future anyways, so for our use case this isn't a concern to me.

With the attached simple test program, compiled with -O3, the results without and with DM defined are:

$ time ./test
location = 71998

real 0m4.278s
user 0m4.276s
sys 0m0.000s

$ time ./test
location = 71998

real 0m0.091s
user 0m0.093s
sys 0m0.000s

Of course, I've picked the test case to make my version look best.

With the attached simple test program, compiled with -O3, the results without and with DM defined are:

Ok

$ time ./test
location = 71998

real 0m4.278s
user 0m4.276s
sys 0m0.000s

$ time ./test
location = 71998

real 0m0.091s
user 0m0.093s
sys 0m0.000s

Of course, I've picked the test case to make my version look best.

Nice!

That's a pretty significant improvement for what is essentially the worst case of the current algorithm. I just wanted to be able to confirm how much this can improve things.

This revision is now accepted and ready to land.Dec 3 2019, 6:20 PM