Page MenuHomeFreeBSD

Fix two problems with the page daemon control loop.
ClosedPublic

Authored by markj on Dec 8 2017, 4:49 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 19, 9:00 AM
Unknown Object (File)
Mon, Nov 18, 1:35 PM
Unknown Object (File)
Oct 18 2024, 11:02 PM
Unknown Object (File)
Oct 4 2024, 7:14 PM
Unknown Object (File)
Oct 4 2024, 5:51 PM
Unknown Object (File)
Oct 2 2024, 12:00 PM
Unknown Object (File)
Oct 2 2024, 6:22 AM
Unknown Object (File)
Oct 2 2024, 12:09 AM
Subscribers

Details

Summary

These both arise when multiple applications are consuming free pages at
a high rate (e.g., > 1GB/s), and the inactive queue contains plenty of
clean pages.

  1. After completing an inactive queue scan, we might still be below the v_free_min threshold, in which case there are threads still sleeping in VM_WAIT.
  1. vm_pageout_wanted may be set without the free queues lock held. This can lead to a race:
    • the page daemon checks vm_pageout_wanted in vm_pageout_worker() and sees that it's false. vm_pages_needed is false.
    • a thread sets vm_pageout_wanted without holding the queue lock
    • the page daemon goes to sleep thinking that it has no work to do
    • a thread enters VM_WAIT, sees that vm_pageout_wanted is true and thus does not wake up the page daemon
    • the page daemon sleeps for a full second before starting another scan, despite the fact that there are threads in VM_WAIT

Fix 1) by ensuring that we immediately start a new inactive queue scan
if v_free_count < v_free_min and threads are blocked in VM_WAIT.

Fix 2) by adding a new subroutine, pagedaemon_wait(), called from
vm_wait() and vm_waitpfault(). It wakes up the page daemon if either
vm_pageout_wanted or vm_pages_needed is false, and atomically sleeps
on v_free_count.

Test Plan

I used two methods to generate load on the page daemon.

The first is a program that mmaps a region of anonymous memory
two times the size of RAM. In a loop, it reads a byte from each
page in the range. It periodically moves recently touched ranges
to the inactive queue with madvise(MADV_DONTNEED) to ensure
that the page daemon doesn't need to scan the active queue.
Three instances of this program are enough to bring the page daemon
to 100% CPU.

The second involves using truncate(1) to create a large sparse file,
and reading it back in a loop using dd(1).

Without this change, with both tests, page daemon throughput drops
off a cliff when multiple instances of the test loop are running, as a
result of the page daemon going to sleep for 1s while the test loops
are already blocked in VM_WAIT.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

markj edited the test plan for this revision. (Show Details)
markj edited the summary of this revision. (Show Details)
sys/vm/vm_pageout.c
1801 ↗(On Diff #36389)

We probably only want to follow the goto if target_met from the last scan is true. Otherwise, if the inactive queue is depleted, we'll scan too frequently and trigger the OOM killer.

sys/vm/vm_pageout.c
1801 ↗(On Diff #36389)

Yes.

  • Always pause() if the previous scan didn't meet its target.
  • Update comments.
  • Don't increment "pass" if the previous scan met its target. This has no functional effect, but lets us distinguish an inactive queue shortfall from a regular free page shortage.
markj marked an inline comment as done.Dec 9 2017, 4:23 PM
sys/vm/vm_page.c
2664 ↗(On Diff #36413)

I presume that you made this change because the "noreturn" attribute on panic() has the same effect. (I also believe that it does.)

sys/vm/vm_pageout.h
99 ↗(On Diff #36413)

I would actually suggest removing all of these redundant "extern"'s, rather than adding a new one.

  • Remove superfluous uses of "extern".
markj added inline comments.
sys/vm/vm_page.c
2664 ↗(On Diff #36413)

Right. I verified in the disassembly of _vm_wait() that the hint has no effect.

sys/vm/vm_pageout.c
1829–1830 ↗(On Diff #36693)

More precisely, the previous scan freed the targeted number of pages, but concurrent allocations have already consumed those pages and left us with waiting threads.

As an aside, I really don't see the point of the "pass" parameter to vm_pageout_scan() anymore. Once upon a time, it controlled laundering, but now it only serves to tell us that we entered vm_pageout_scan() on the 1 second timeout. We could just as well calculate the page shortage unconditionally and let that determine whether to invoke the lowmem handlers, uma_reclaim(), and vm_swapout_run_idle().

Hypothetically, we might use it to determine that vm_pageout_scan() is on a "treadmill" with concurrent allocators consuming pages as fast as it frees them. Does this seem useful?

This revision is now accepted and ready to land.Dec 24 2017, 5:28 PM
  • Simplify a comment to avoid an inaccuracy.
  • Remove the curproc != pagedaemon assertion from pagedaemon_wait(). I think it's possible for the laundry thread to end up in VM_WAIT.
This revision now requires review to proceed.Dec 24 2017, 6:12 PM
In D13424#284415, @alc wrote:

As an aside, I really don't see the point of the "pass" parameter to vm_pageout_scan() anymore. Once upon a time, it controlled laundering, but now it only serves to tell us that we entered vm_pageout_scan() on the 1 second timeout. We could just as well calculate the page shortage unconditionally and let that determine whether to invoke the lowmem handlers, uma_reclaim(), and vm_swapout_run_idle().

Hypothetically, we might use it to determine that vm_pageout_scan() is on a "treadmill" with concurrent allocators consuming pages as fast as it frees them. Does this seem useful?

I would consider going a bit further and lifting the computation of the target out of vm_pageout_scan(), but that's probably best left to a separate change.

I do think it'd be useful to be able to recognize a "treadmilling" scenario and dynamically increase the target in response. In OneFS, the page daemon target is given by the output of a PID controller with the free page target as the setpoint, so we already have something like this. (And I think Jeff is planning to look at integrating it later?)

alc added inline comments.
sys/vm/vm_pageout.c
1974 ↗(On Diff #36970)

Is there any reason not to perform the assignment to vm_pageout_wanted here, i.e., conditionally?

This revision is now accepted and ready to land.Dec 24 2017, 6:20 PM
sys/vm/vm_pageout.c
1974 ↗(On Diff #36970)

Nope. I'll change that before committing.

This revision was automatically updated to reflect the committed changes.