Page MenuHomeFreeBSD

rb_tree: optimize bit twiddle
ClosedPublic

Authored by dougm on Jun 19 2022, 8:05 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 11, 11:02 PM
Unknown Object (File)
Sat, Jan 11, 10:58 PM
Unknown Object (File)
Tue, Dec 31, 7:00 PM
Unknown Object (File)
Mon, Dec 30, 10:48 PM
Unknown Object (File)
Fri, Dec 27, 2:15 AM
Unknown Object (File)
Dec 26 2024, 9:00 PM
Unknown Object (File)
Dec 26 2024, 1:57 PM
Unknown Object (File)
Dec 14 2024, 4:51 AM
Subscribers

Details

Summary

Change parts of RB_INSERT_COLOR and RB_REMOVE_COLOR to reduce the number of operations, by, in some cases, flipping two color bits at a time instead of flipping each individually, in separate operations, and by using a switch statement to replace a sequence of if-elses. With llvm-14.0.3, this reduces the code size by 96 bytes on amd64.

Also, allow RB users to define a preprocessor symbol to generate RB_REMOVE_COLOR code that matches the implementation described in the original weak-AVL paper in one particular case, instead of the still correct, but slightly more efficient implementation of that case currently implemented.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

dougm requested review of this revision.Jun 19 2022, 8:05 PM
dougm created this revision.

Shrink the size a little more with REMOVE_COLOR adjustments.

Undo last revision. Can't be assigning to RB_BITS. Still, a 16 byte shrinkage.

Apply the same trick to INSERT_COLOR. This assumes that the two children of elm cannot both be red. I assert that than cannot be.

Reduces the lip3 size to 8798 from 8830.

Use switch in RB_REMOVE_COLOR instead of several if-elses. Only modify each color-bit-pair once instead of modifying left and then right. Reduces size to 8766 from 8830 on lip3.

Two test binaries have been compiled on lip3 and executed on lip2. The test binaries come from -O2 compilation of this source:

. The results of a little testing are here:
dougm@108-254-203-202:dougm $ time megatree.orig; time megatree; time megatree.orig; time megatree;

real 0m42.878s
user 0m42.991s
sys 0m0.000s

real 0m42.563s
user 0m42.524s
sys 0m0.000s

real 0m43.198s
user 0m43.121s
sys 0m0.000s

real 0m41.990s
user 0m42.160s
sys 0m0.000s
dougm@108-254-203-202:dougm $ time megatree.orig; time megatree; time megatree.orig; time megatree;

real 0m42.614s
user 0m42.810s
sys 0m0.000s

real 0m41.890s
user 0m42.079s
sys 0m0.000s

real 0m42.813s
user 0m42.773s
sys 0m0.000s

real 0m41.881s
user 0m42.036s
sys 0m0.000s

The modified code seems a little faster. Running the same binaries on lip5 produces less clear results:
dougm@108-254-203-203:dougm $ time megatree.orig;time megatree;time megatree.orig;time megatree;

real 0m26.041s
user 0m26.017s
sys 0m0.023s

real 0m25.965s
user 0m25.948s
sys 0m0.016s

real 0m26.061s
user 0m26.021s
sys 0m0.039s

real 0m26.105s
user 0m26.097s
sys 0m0.008s
dougm@108-254-203-203:dougm $ time megatree.orig;time megatree;time megatree.orig;time megatree;

real 0m25.965s
user 0m25.941s
sys 0m0.024s

real 0m26.406s
user 0m26.382s
sys 0m0.024s

real 0m26.229s
user 0m26.197s
sys 0m0.032s

real 0m25.962s
user 0m25.930s
sys 0m0.032s

Allow an RB tree user to "#define STRICT_HST 1" to get the behavior described by the authors for the deletion, single rotation case. Leave it turned off by default, since it grows the code and hasn't so far been shown to improve performance.

"What if you combine this patch with the one at reviews.freebsd.org/D35520, I am asked.

In terms of size:
original: 8846 bytes
with this patch: 8782 bytes
with both patches: 8654 bytes

In terms of time:
on lip2
dougm@108-254-203-202:dougm $ time megatree.orig; time megatree; time megatree.orig; time megatree;

real 0m42.777s
user 0m42.853s
sys 0m0.000s

real 0m42.260s
user 0m42.303s
sys 0m0.000s

real 0m42.649s
user 0m42.730s
sys 0m0.000s

real 0m42.232s
user 0m42.287s
sys 0m0.000s

improvment! But on lip3:

dougm@108-254-203-203:dougm $ time megatree.orig;time megatree;time megatree.orig;time megatree

real 0m26.126s
user 0m26.095s
sys 0m0.030s

real 0m26.929s
user 0m26.920s
sys 0m0.007s

real 0m26.028s
user 0m25.997s
sys 0m0.031s

real 0m27.184s
user 0m27.160s
sys 0m0.023s

not so much.

Update for overlapping patch just checked in.

dougm added a reviewer: markj.

"What if you combine this patch with the one at reviews.freebsd.org/D35520, I am asked.

In terms of size:
original: 8846 bytes
with this patch: 8782 bytes
with both patches: 8654 bytes

In terms of time:

It seems a bit strange that one test run has 0 system time, while the other does not.

sys/sys/tree.h
525

This could be prefixed with RB_ to reduce the likelihood of a collision.

Rename STRICT_HST to RB_STRICT_HST.

sys/sys/tree.h
485–486

Why is this flip no longer a part of the prior "else" clause?

dougm added inline comments.
sys/sys/tree.h
485–486

The only possibilities for the state of 'child' are red-left, red-right, and red-neither (when child is a leaf). So the 'else' doesn't matter. Removing it, though, might prevent a branch instruction.

sys/sys/tree.h
485–486

This change is more likely to be a pessimization. When RB_RED_LEFT() is true, the code now has to perform another conditional branch on RB_RED_RIGHT().

dougm added inline comments.
sys/sys/tree.h
485–486

Making the RED_LEFT test conditional on the RED_RIGHT test failing leads to an object code of the same size, but with two more 'jmp' statements in the compiled and disassembled code. Let me know if you want to see the diff.

Swap the order of child color tests in the color insert double rotate code, following a verbal reviewer suggestion. It reduces the object size a bit more. Now, from 8686 to 8574, for iommu_gas.o.

Update after placating coverity.

Give me a patch that only updates "insert color". I fully understand that part, and I'm ready to approve it after a little testing.

sys/sys/tree.h
352–353

This being done as two separate assignment statements is pessimizing the generated code.

In D35524#809486, @alc wrote:

Give me a patch that only updates "insert color". I fully understand that part, and I'm ready to approve it after a little testing.

https://reviews.freebsd.org/D35696

Rewrite RB_SET_PARENT. Upcase the /* fallthrough */ comments.

sys/sys/tree.h
530

It seems odd not to have a blank line separating this comment from insert.

Normalize with a blank line.

The effects of these changes extend to arm64 as well. vm_phys.o's vm_phys_fictitious_reg_range() and vm_phys_fictitious_unreg_range() are together 140 bytes smaller.

This revision is now accepted and ready to land.Jul 4 2022, 5:21 PM
This revision was automatically updated to reflect the committed changes.