This review teaches two example drivers to pass NUMA domain information up via the mbuf numa domain field. This information is then used by TCP syncache_socket() to associate that information with inpcb. The domain information is then fed back into transmitted mbufs in ip{6}_output(). This mechanism is nearly identical to what is done to track RSS hash values in the inp_flowid.
In this patch, since all of these things are in the critical path, I have been careful to put everything inside #ifdef NUMA.
Once the inpcb has NUMA information, we can do several more interesting things, such as teach lacp which egress port to use based on NUMA domain, bind the TCP pacers to NUMA domains and pace connections on the local domain, teach sendfile(9) where to allocate backing pages, and filter inpcblb_group (SO_REUSEPORT_LB) listen sockets by NUMA domain. I have all this working in the Netflix tree, and will feed these patches in after this. In combination, they reduce cross domain QPI traffic by roughly 50% for a web workload running on two-socket Xeon and increase throughput from 140Gb/s to almost 200Gb/s.