This patch replaces all numa policy and iterators with a new version that is integrated with cpuset. I'm still testing so I'm mostly looking for early design feedback. The intent of integrating with cpuset is to provide administrators with the ability to partition resources as well as allowing programs to control their own behavior. This created some complexity in cpuset due to the different synchronization behaviors of allocation vs thread switching. Iterators were rewritten so flag handling for nowait/waitok happens in a centralized place. I have implemented only the two policies that libnuma supports for now but this is trivially expanded upon.
Domainset synchronization is accomplished by making the structures immutable. This means potentially all threads and objects in the system may have read-only references to the same domainset. This allows me to put a popcnt and max in the structure in order to optimize searches. It also means we don't have to worry about torn writes or use any other synchronization like seq(9) which was used before. The domainset just holds a bitmap of allowed domains and an integer policy value. There is a domainset_ref which holds an iterator integer in either a thread or object which is used so iteration is coherent from call to call.
The iterators are written to attempt to streamline the first allocation since it is assumed that it will succeed. The iterators also revisit every domain before blocking on the first domain. So if there was a preference in the policy we will use the preferred domain. This partially addresses kib's concerns from my other two reviews. Other behaviors should be more straightforward to implement now that the flag handling is centralized. It also shrunk the iterating functions considerably.
The policy precedence is object first. So this will allow us to specify a kernel_object policy which overrides the calling thread's policy except where the calling thread is explicitly coded in the kernel to request a domain. This precedence is also consistent with what libnuma provides.
The cpuset integration was somewhat cumbersome because cpusets are not immutable. Masks can be modified on the fly with locks held. So there is more resource preallocation and tracking for domains. Integrating domains with cpusets makes things like jail work on specific domains out of the box. I have some more work to do to finish this off. For example, domain lookup should be hashed based, not a linked list. I also haven't put in any code to validate domain policies.
If you don't like names etc now is the time to speak up. I should be ready to post a final patch within a week.