I used two methods to generate load on the page daemon.
The first is a program that mmaps a region of anonymous memory
two times the size of RAM. In a loop, it reads a byte from each
page in the range. It periodically moves recently touched ranges
to the inactive queue with madvise(MADV_DONTNEED) to ensure
that the page daemon doesn't need to scan the active queue.
Three instances of this program are enough to bring the page daemon
to 100% CPU.
The second involves using truncate(1) to create a large sparse file,
and reading it back in a loop using dd(1).
Without this change, with both tests, page daemon throughput drops
off a cliff when multiple instances of the test loop are running, as a
result of the page daemon going to sleep for 1s while the test loops
are already blocked in VM_WAIT.