This closes a few races in low memory conditions while avoiding locks by using the same threshold based notification algorithm I employed in the buffer cache. The algorithm uses atomics for free_count adjustment and then only threads which cross important thresholds attempt to check for waiters and wakeup. It can devolve into the same contention if you constantly bounce around the thresholds but this doesn't seem to be a problem in practice.
I improved the annotation and description of the locking protocol so hopefully this is fairly obvious from the code. I moved the pageout sleep/wakeup protection into its own locks to continue to shrink the scope of the free lock. I have a follow on patch that does the same for the allocation side but it is significantly larger.