Apologies if this is wrong - but in this case for rr we tried the domains above, so domain should be the last one we tried from vm_domainset_iter_policy(), right? Is that what we want to use as our attempt in keg_fetch_free_slab() or should we cache the first domain from the vm_domainset_iter_policy_ref_init() [the original RR "hand" as it were] and use that?
I certainly suppose you can argue that if all domains are equally likely to be insufficient in these cases the last one visited in the RR search will rotate (just "behind" the RR hand as it were?), so that's good enough.. but thought this was worth checking on intent.
keg_fetch_free_slab() will try all of the domains. So for RR we will start from the last domain from which we tried to allocate, but I think that's fairly arbitrary.
Though, we are inconsistent here about how M_NOWAIT is handled when the domain is specified. In the loop above we try only to allocate from the requested domain, but here we're looping over all of them unconditionally.
- Create a global first-touch policy. Use it to initialize the keg's domainset ref for non-round-robin kegs. This removes most of the asymmetry between rr and !rr in keg_fetch_slab().
- For first-touch zones, fall back to other domains before giving up even in the M_NOWAIT case.
- Add a comment describing a pre-existing bug in the way the roundrobin domain iterator is updated. I'm not sure how best to fix it yet.
Would it make sense to have a tunable to control the default policy for UMA? If I wanted to have zones that don't declare themselves UMA_ZONE_FIRSTTOUCH, but interleaved, or something to that effect?
Are there specific zones you have in mind? I'm not sure how useful a global override would be except for testing purposes, but maybe I'm not thinking hard enough. Really it would be worthwhile to audit all zones to make sure that a FIRSTTOUCH default makes sense. For vnodes, for example, it probably doesn't, since vnodes may persist for a long time. Though, when they are recycled, they do at least pass through UMA.
The case I am thinking of, there are 4 NUMA domains, but memory access is equal cost to all four, so there may be better performance to be had by having the default policy be INTERLEAVE instead of FIRST_TOUCH (use the bandwidth of all 4 memory controllers). So I wasn't thinking to override those that request FIRST_TOUCH, but just set the default policy for zones that don't request a specific policy.
Maybe there are not enough zones that would fall into this case to bother.
I think it probably depends entirely on the way items from a given zone are used and cached. If you have a case where interleave/roundrobin makes more sense than first-touch, then we should probably just modify the zone definition to use a better policy.
With first-touch, UMA does some extra work to handle cross-domain allocations and frees. If the overhead of that work is too high for some particular workload (maybe it results in a lot of lock contention, say), then that would be a good reason to revisit the default policy and think about making it easier to adjust.
BTW, the interleave policy can't directly be used for UMA slabs; we use the page index (pindex) to compute a consistent domain index within a stride, but UMA doesn't provide one. Round-robin is probably the only other reasonable default policy.