rb_tree: fine-tune rebalancing code
Change parts of RB_INSERT_COLOR and RB_REMOVE_COLOR to reduce the
number of operations, by, in some cases, flipping two color bits at a
time instead of flipping each individually, in separate operations,
and by using a switch statement to replace a sequence of if-elses.
Rewrite RB_SET_PARENT to generate fewer instructions. These changes
reduce the code size by over 100 bytes on some architectures.
Also, allow RB users to define a preprocessor symbol to generate
RB_REMOVE_COLOR code that matches the implementation described in the
original weak-AVL paper in one particular case, instead of the still
correct, but slightly more efficient implementation of that case
currently implemented.
Reviewed by: alc
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D35524
(cherry picked from commit 2120d7f57aa0ee48d0be7a4309072bb332d568dd)