This is part of a series of patches intended to enable first-touch numa policies for UMA by default. It also reduces the cost of uma_zalloc/zfree by approximately 30% each in my tests.
This patch caches the zone size and flags in the per-cpu caches and adds a few flags so features can be tested without touching uma_zone. In a normal fast path allocation we will now only touch the cacheline for the per-cpu area and the bucket we're popping as a result.
The per-cpu cache buckets have 32bit of padding in them. It is slightly gross to use this but if I don't the per-cpu cache size will exceed 64 bytes which I would very much like to avoid. The limit code could be improved by setting and clearing the flag when we reach and drop below the limit, probably with some timeout to clear to limit hysteresis.