The existing implementations of POSIX tsearch() and tdelete() don't
attempt to perform any balancing at all. Testing reveals that inserting
100k nodes into a tree sequentially takes approximately one minute on my
system.
Though most other BSDs also don't use any balanced tree internally, C
libraries like glibc and musl do provide better implementations. glibc
uses a red-black tree and musl uses an AVL tree.
Red-black trees have the advantage over AVL trees that they only require
O(1) rotations after insertion and deletion, but have the disadvantage
that the tree has a maximum depth of 2*log2(n) instead of 1.44*log2(n).
Still, my take is that it's better to focus on having a lower maximum
depth, for the reason that in the case of tsearch(), the invocation of
the comparator likely dominates the running time.
This change replaces the tsearch() and tdelete() functions by versions
that create an AVL tree. Compared to musl's implementation, this
version is optimized in two different ways:
- We don't keep track of heights; just balances. This is sufficient. This has the advantage that it reduces the number of nodes that are being accessed.
- Don't use any recursion at all. We know that the tree cannot 2^64 elements in size, so the height of the tree can never be larger than 96. Use a 128-bit bitmask to keep track of the path that is computed. This allows us to iterate over the same path twice, meaning we can apply rotations from top to bottom.
Inserting 100k nodes into a tree now only takes 0.015 seconds. This implementation is about 1/3 faster than glibc, and unlike glibc, it uses a fixed amount of memory.