Change parts of RB_INSERT_COLOR and RB_REMOVE_COLOR to reduce the number of operations, by, in some cases, flipping two color bits at a time instead of flipping each individually, in separate operations, and by using a switch statement to replace a sequence of if-elses. With llvm-14.0.3, this reduces the code size by 96 bytes on amd64.
Also, allow RB users to define a preprocessor symbol to generate RB_REMOVE_COLOR code that matches the implementation described in the original weak-AVL paper in one particular case, instead of the still correct, but slightly more efficient implementation of that case currently implemented.