uma_zalloc_domain() allocates from the requested domain instead of
following a first-touch policy (used for most zones). Currently it is
only used by malloc_domainset(), which returns memory freed with free().
uma_zalloc_domain() works by always going to the keg for an item. In
particular, use of UMA zone caches is unbalanced: we free items to the
caches, but always allocate from the keg, skipping the caches.
Make some effort to allocate from the UMA caches when performing a
cross-domain allocation. This avoids blowing up the caches when
something is performing many transient allocations with
malloc_domainset(). We could go further and dip into the per-CPU
caches, but I don't think the extra complexity really buys us anything,
so I propose simply popping the first item in the bucket cache.
Reported and tested by: dhw, glebius