Optimize fast path allocations by storing bucket headers in the per-cpu
cache area. This allows us to check on bucket space for all per-cpu
buckets with a single cacheline access and fewer branches.
Reviewed by: markj, rlibby
Differential Revision: https://reviews.freebsd.org/D22825