- Change pcpu zone consumers to use a stride size of PAGE_SIZE (defined as UMA_PCPU_ZONE_SIZE to make future identification easier)
- allocate page from the correct domain for a given cpu
The former slab size of `sizeof(struct pcpu)` was somewhat arbitrary. The new value is `PAGE_SIZE` because that's the smallest granularity which the VM can allocate a slab for a given domain. If you have fewer than PAGE_SIZE/8 counters on your system there will be some memory wasted, but this is obviously something where you want the cache line to be coming from the correct domain.