pf by default does not do table address accounting unless the "counters"
keyword is specified in the corresponding pf.conf entry. (There doesn't
appear to be a way to specify this at all when adding a table using
pfctl directly.) Yet, we always allocate counters. For large tables,
the memory overhead of the counters is quite significant since we
allocate 12 per table entry. (Plus 12 pointers in the table entry itself.)
Moreover, reloading a table definition from pf.conf causes all counters
to be zeroed, which corresponds to 12 SMP rendezvous operations per
entry, while holding the rules lock.
A further refinement might add a pfr_kentry counter array UMA zone, so
that we can allocate a contiguous array of counters and thus reduce
pointer overhead. We should also try to find a way to reduce the
overhead of counter zeroing.
Mitigate the problem by checking for PFR_TFLAG_COUNTERS before
performing any allocation.