This is part of a series of patches intended to enable first-touch numa policies for UMA by default. It also reduces the cost of uma_zalloc/zfree by approximately 30% each in my tests.
The concept here is that we embed the entries and cnt for each bucket loaded into a per-cpu cache directly in the per-cpu cache structure. This means we can check how much space is available without touching a bucket. For large buckets the bucket header won't be in the same cache line as the bucket item in many cases so this saves a cache miss.
I like the simplification in branches, loop logic, and summary routines that results. The overhead of managing the separate bucket type is minimal and provides an opportunity for extra asserts.