The patch contains three somewhat related fixes, all dealing with either OOM inability to resolve deadlock or OOM not triggering when it should.
- In addition to pagedaemon initiating OOM, also do it from the vm_fault(). Namely, if the thread waits for a free page to satisfy page fault some preconfigured amount of time, trigger OOM. These triggers are rate-limited, due to a usual case of several threads of the same multi-threaded process to enter fault handler simultaneously. The faults from pagedaemon threads participate in the calculation of OOM rate, but are not under the limit. This part works around the issue which I have on some machines.
- Larry McVoy reported a load where large process was swapped out and then selected as OOM victim. Since process must be made runnable to exit, and since we are low on memory thus not swapping in processes, OOM kill appear to be not effective. Handle the case by making swapper to check for killed processes if low on memory and swap them in. Similarly with vm_fault() allocating the page to handle fault of killed process with SYSTEM priority, kernel stacks of the killed process swapped in with SYSTEM priority, to avoid swapper blocking.
- Peter Jeremy reported a load where single anonymous object consumed almost all memory on the large system. Swapout code executes the iteration over the corresponding object page queue for long time, owning the map and object locks. This blocked pagedaemon which tries to lock the object, and blocked other threads in the process in vm_fault waiting for the map lock. Handle the issue by terminating the deactivation loop if we executed too long and by yielding at the top level in vm_daemon. Also, change the map lock mode in vm_swapout_map_deactivate_pages() to read.
Patches 1 and 2 were discussed with Mark. Peter tested patches 2 and 3, I used previous version of the patch 1 on my machines.