This adds a missing swap to uma_zfree to protect lifo behavior. The alloc bucket is now always the most recent data. This was +8% on my test.
It changes the load balancing algorithm for ROUNDROBIN to more strongly prefer the current domain unless there is a severe imbalance. I don't recall the actual % gain but it was not insignificant.