This implements per-cpu free page caching only for FREEPOOL_DEFAULT. I haven't seen enough contention yet from FREEPOOL_DIRECT that it is an issue. This also brings in vm_phys_alloc_npages() which gives you as many as 'count' pages. This is an optimization to reduce round-trips to the backend phys allocator. Contrary to other instances of this patch, this is now completely compatible with reservations.
It might be more elegant to push this into vm_phys.c along with all of the vm_domain_free_lock() calls. However, pushing it there would mean the cached pages were reflected in the free count. I believe this will make us more prone to out of memory deadlocks.