Page MenuHomeFreeBSD

Avoid finding every bit transition in bit_ffs_area_at
ClosedPublic

Authored by dougm on Sat, Nov 23, 8:00 PM.

Details

Summary

The current implementation of bit_ffs_area_at finds the start of every block of 1 bits until it finds a block of sufficient size. If a 32-bit word contains 0x55555555, that means finding the start of each of 16 blocks of 1 bits in looking for a pair of consecutive 1-bits.

The method proposed here preprocesses each 32-bit word to make blocks of 1-bits of insufficient size disappear; the time for that preprocessing is proportional to lg(size). It does not rely on calling other bit_ffs functions to examine data.

The same technique is applied to bit_ffc_area_at.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

dougm created this revision.Sat, Nov 23, 8:00 PM
dougm updated this revision to Diff 64778.Sat, Nov 23, 8:22 PM

Address a problem of finding a match at the start when start!=0.

dougm updated this revision to Diff 64780.Sat, Nov 23, 9:18 PM

Revert half of the previous change. Only _value needs initialization based on _start, and _offset does not.

dougm updated this revision to Diff 64784.Sat, Nov 23, 10:25 PM

Reorder operations to avoid reading the word past the end of the bitstring, and ignoring it.

dougm updated this revision to Diff 64802.Sun, Nov 24, 9:41 AM

Add a missing underscore.

dougm updated this revision to Diff 64877.Tue, Nov 26, 5:06 AM

Just tidying up.

Hmm. I think I understand the change.. but it seems less intuitive.

Do you have any example of how much this improves? I.e. a C program with a large enough test case to show measurable improvement?

erj added a comment.Mon, Dec 2, 9:26 PM

Hmm. I think I understand the change.. but it seems less intuitive.
Do you have any example of how much this improves? I.e. a C program with a large enough test case to show measurable improvement?

If it helps @dougm, the maximum size we'd typically use for the bitstrings would be 2048 bits, with us maybe using 16384-bit ones in the future.

In D22523#495149, @erj wrote:

If it helps @dougm, the maximum size we'd typically use for the bitstrings would be 2048 bits, with us maybe using 16384-bit ones in the future.

Even if we use a much larger example that is fine to me, as long as we can show a measurable improvement.

We're likely going to switch to using vmem in the future anyways, so for our use case this isn't a concern to me.

dougm added a comment.Tue, Dec 3, 3:26 AM

With the attached simple test program, compiled with -O3, the results without and with DM defined are:

$ time ./test
location = 71998

real 0m4.278s
user 0m4.276s
sys 0m0.000s

$ time ./test
location = 71998

real 0m0.091s
user 0m0.093s
sys 0m0.000s

Of course, I've picked the test case to make my version look best.

With the attached simple test program, compiled with -O3, the results without and with DM defined are:

Ok

$ time ./test
location = 71998
real 0m4.278s
user 0m4.276s
sys 0m0.000s
$ time ./test
location = 71998
real 0m0.091s
user 0m0.093s
sys 0m0.000s
Of course, I've picked the test case to make my version look best.

Nice!

That's a pretty significant improvement for what is essentially the worst case of the current algorithm. I just wanted to be able to confirm how much this can improve things.

This revision is now accepted and ready to land.Tue, Dec 3, 6:20 PM
erj accepted this revision.Tue, Dec 3, 8:41 PM
This revision was automatically updated to reflect the committed changes.