Right now we allocate 2*PFR_OP_ADDR_MAX*OP_DIR_MAX counters per table
entry, which itself requires 8 pointers. We can instead define a zone
which returns arrays of 8 per-CPU counters, so the table entry structure
only needs to store one pointer.
On amd64 this reduces sizeof(pfr_kentry) from 216 to 160. The smaller
size also gets us better slab efficiency (i.e., there is less internal
fragmentation): we can pack 25 structures in a page instead of 18.
vm.uma.pf_table_entries.keg.efficiency goes from 94% to 97%.