This is part of a series of patches intended to enable first-touch numa policies for UMA by default. It also reduces the cost of uma_zalloc/zfree by approximately 30% each in my tests.
This patch allocates a per-domain crossdomain free bucket for each zone. Prior to this patch, when a crossdomain free bucket is filled it has to be drained all the way back to the keg layer. This is a very expensive operation and means if any significant fraction of your memory is freed on the wrong domain frees serialize on the keg lock. To alleviate this, I sort per-cpu crossdomain buckets into per-domain buckets under a new zone lock. When this bucket is full it goes on the normal free list. Nothing goes through the slab layer.
It would be possible to do this with a per-domain cross domain lock if we allocated a full domain * domain matrix of cross buckets. With this patch I was able to saturate memory bandwidth with mbufs freed on the wrong domain half of the time with only a 30% cpu penalty. Before this patch it was several hundred times slower. I believe the current performance is adequate but if it is not we can easily spend a little more memory and a small effort to make it so.