Page MenuHomeFreeBSD

Fix two problems with the page daemon control loop.
ClosedPublic

Authored by markj on Dec 8 2017, 4:49 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Nov 23, 5:41 AM
Unknown Object (File)
Tue, Nov 19, 9:00 AM
Unknown Object (File)
Mon, Nov 18, 1:35 PM
Unknown Object (File)
Oct 18 2024, 11:02 PM
Unknown Object (File)
Oct 4 2024, 7:14 PM
Unknown Object (File)
Oct 4 2024, 5:51 PM
Unknown Object (File)
Oct 2 2024, 12:00 PM
Unknown Object (File)
Oct 2 2024, 6:22 AM
Subscribers

Details

Summary

These both arise when multiple applications are consuming free pages at
a high rate (e.g., > 1GB/s), and the inactive queue contains plenty of
clean pages.

  1. After completing an inactive queue scan, we might still be below the v_free_min threshold, in which case there are threads still sleeping in VM_WAIT.
  1. vm_pageout_wanted may be set without the free queues lock held. This can lead to a race:
    • the page daemon checks vm_pageout_wanted in vm_pageout_worker() and sees that it's false. vm_pages_needed is false.
    • a thread sets vm_pageout_wanted without holding the queue lock
    • the page daemon goes to sleep thinking that it has no work to do
    • a thread enters VM_WAIT, sees that vm_pageout_wanted is true and thus does not wake up the page daemon
    • the page daemon sleeps for a full second before starting another scan, despite the fact that there are threads in VM_WAIT

Fix 1) by ensuring that we immediately start a new inactive queue scan
if v_free_count < v_free_min and threads are blocked in VM_WAIT.

Fix 2) by adding a new subroutine, pagedaemon_wait(), called from
vm_wait() and vm_waitpfault(). It wakes up the page daemon if either
vm_pageout_wanted or vm_pages_needed is false, and atomically sleeps
on v_free_count.

Test Plan

I used two methods to generate load on the page daemon.

The first is a program that mmaps a region of anonymous memory
two times the size of RAM. In a loop, it reads a byte from each
page in the range. It periodically moves recently touched ranges
to the inactive queue with madvise(MADV_DONTNEED) to ensure
that the page daemon doesn't need to scan the active queue.
Three instances of this program are enough to bring the page daemon
to 100% CPU.

The second involves using truncate(1) to create a large sparse file,
and reading it back in a loop using dd(1).

Without this change, with both tests, page daemon throughput drops
off a cliff when multiple instances of the test loop are running, as a
result of the page daemon going to sleep for 1s while the test loops
are already blocked in VM_WAIT.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 13438
Build 13668: arc lint + arc unit

Event Timeline

markj edited the test plan for this revision. (Show Details)
markj edited the summary of this revision. (Show Details)
sys/vm/vm_pageout.c
1801

We probably only want to follow the goto if target_met from the last scan is true. Otherwise, if the inactive queue is depleted, we'll scan too frequently and trigger the OOM killer.

sys/vm/vm_pageout.c
1801

Yes.

  • Always pause() if the previous scan didn't meet its target.
  • Update comments.
  • Don't increment "pass" if the previous scan met its target. This has no functional effect, but lets us distinguish an inactive queue shortfall from a regular free page shortage.
markj marked an inline comment as done.Dec 9 2017, 4:23 PM
sys/vm/vm_page.c
2664

I presume that you made this change because the "noreturn" attribute on panic() has the same effect. (I also believe that it does.)

sys/vm/vm_pageout.h
99

I would actually suggest removing all of these redundant "extern"'s, rather than adding a new one.

  • Remove superfluous uses of "extern".
markj added inline comments.
sys/vm/vm_page.c
2664

Right. I verified in the disassembly of _vm_wait() that the hint has no effect.

sys/vm/vm_pageout.c
1836–1837

More precisely, the previous scan freed the targeted number of pages, but concurrent allocations have already consumed those pages and left us with waiting threads.

As an aside, I really don't see the point of the "pass" parameter to vm_pageout_scan() anymore. Once upon a time, it controlled laundering, but now it only serves to tell us that we entered vm_pageout_scan() on the 1 second timeout. We could just as well calculate the page shortage unconditionally and let that determine whether to invoke the lowmem handlers, uma_reclaim(), and vm_swapout_run_idle().

Hypothetically, we might use it to determine that vm_pageout_scan() is on a "treadmill" with concurrent allocators consuming pages as fast as it frees them. Does this seem useful?

This revision is now accepted and ready to land.Dec 24 2017, 5:28 PM
  • Simplify a comment to avoid an inaccuracy.
  • Remove the curproc != pagedaemon assertion from pagedaemon_wait(). I think it's possible for the laundry thread to end up in VM_WAIT.
This revision now requires review to proceed.Dec 24 2017, 6:12 PM
In D13424#284415, @alc wrote:

As an aside, I really don't see the point of the "pass" parameter to vm_pageout_scan() anymore. Once upon a time, it controlled laundering, but now it only serves to tell us that we entered vm_pageout_scan() on the 1 second timeout. We could just as well calculate the page shortage unconditionally and let that determine whether to invoke the lowmem handlers, uma_reclaim(), and vm_swapout_run_idle().

Hypothetically, we might use it to determine that vm_pageout_scan() is on a "treadmill" with concurrent allocators consuming pages as fast as it frees them. Does this seem useful?

I would consider going a bit further and lifting the computation of the target out of vm_pageout_scan(), but that's probably best left to a separate change.

I do think it'd be useful to be able to recognize a "treadmilling" scenario and dynamically increase the target in response. In OneFS, the page daemon target is given by the output of a PID controller with the free page target as the setpoint, so we already have something like this. (And I think Jeff is planning to look at integrating it later?)

alc added inline comments.
sys/vm/vm_pageout.c
1978

Is there any reason not to perform the assignment to vm_pageout_wanted here, i.e., conditionally?

This revision is now accepted and ready to land.Dec 24 2017, 6:20 PM
sys/vm/vm_pageout.c
1978

Nope. I'll change that before committing.

This revision was automatically updated to reflect the committed changes.