Eliminate "pass" from vm_pageout_scan() and vm_pageout_worker()
Needs ReviewPublic
Actions

Authored by alc on Dec 26 2017, 7:10 PM.

Details

Reviewers

kib
markj

Summary

Before the introduction of PQ_LAUNDRY, "pass" controlled how aggressively we laundered pages. Now, it really serves little purpose: If the page daemon wakes up because of a timeout, then we do not even look at the number of free pages. We only scan the active queue.

With the elimination of "pass", I want to point out that the page daemon will attempt to bring the number of free pages up to the target no matter how it was awakened. I do not think that that is a bad idea.

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

alc created this revision.Dec 26 2017, 7:10 PM

With the elimination of "pass", I want to point out that the page daemon will attempt to bring the number of free pages up to the target no matter how it was awakened. I do not think that that is a bad idea.

The difference vm_cnt.v_free_target - vm_pageout_wakeup_thresh scales linearly with the total amount of memory and becomes large on systems with lots of memory. With 16GB of RAM it's 240MB. With this change, we effectively lose the ability to cache that much more memory since the page daemon will be freeing pages more aggressively, so this would impact workloads with a working set in that region. It's arguably not a major problem, but I don't really see a corresponding benefit?

In D13644#285199, @markj wrote:

With the elimination of "pass", I want to point out that the page daemon will attempt to bring the number of free pages up to the target no matter how it was awakened. I do not think that that is a bad idea.

The difference vm_cnt.v_free_target - vm_pageout_wakeup_thresh scales linearly with the total amount of memory and becomes large on systems with lots of memory. With 16GB of RAM it's 240MB. With this change, we effectively lose the ability to cache that much more memory since the page daemon will be freeing pages more aggressively, so this would impact workloads with a working set in that region. It's arguably not a major problem, but I don't really see a corresponding benefit?

Perhaps this is more generic question. On a machine with 128Gb of RAM, I have free_target 2.7G. Is it reasonable at all ?

In D13644#285199, @markj wrote:

With the elimination of "pass", I want to point out that the page daemon will attempt to bring the number of free pages up to the target no matter how it was awakened. I do not think that that is a bad idea.

The difference vm_cnt.v_free_target - vm_pageout_wakeup_thresh scales linearly with the total amount of memory and becomes large on systems with lots of memory. With 16GB of RAM it's 240MB. With this change, we effectively lose the ability to cache that much more memory since the page daemon will be freeing pages more aggressively, so this would impact workloads with a working set in that region. It's arguably not a major problem, but I don't really see a corresponding benefit?

I'm happy to change this.

In D13644#285200, @kib wrote:

In D13644#285199, @markj wrote:

With the elimination of "pass", I want to point out that the page daemon will attempt to bring the number of free pages up to the target no matter how it was awakened. I do not think that that is a bad idea.

The difference vm_cnt.v_free_target - vm_pageout_wakeup_thresh scales linearly with the total amount of memory and becomes large on systems with lots of memory. With 16GB of RAM it's 240MB. With this change, we effectively lose the ability to cache that much more memory since the page daemon will be freeing pages more aggressively, so this would impact workloads with a working set in that region. It's arguably not a major problem, but I don't really see a corresponding benefit?

Perhaps this is more generic question. On a machine with 128Gb of RAM, I have free_target 2.7G. Is it reasonable at all ?

No. I would like to see the target vary, perhaps based on the rate at which we are allocating pages.

Don't free any pages unless the number of free pages is below the wakeup threshold for the page daemon.

Regardless of whether we eliminate "pass", I believe that there is another issue, albeit a minor one, that ought to be addressed. When we perform a "pass == 0" scan, I believe that we always return true from vm_pageout_scan(). (Please check me on this claim!) Suppose that in the midst of that scan pagedaemon_wakeup() was called. Here is the issue. Consider the following snippet.

/*                                                               
 * Do not clear vm_pageout_wanted until we reach our free page   
 * target.  Otherwise, we may be awakened over and over again,   
 * wasting CPU time.                                             
 */
if (vm_pageout_wanted && target_met)
        vm_pageout_wanted = false;

This is going to reset vm_pageout_wanted to false, mtx_sleep() is going to be called, and the page daemon won't perform a "pass == 1" scan until another page allocation occurs and wakes it up. If we are extremely unlucky and another pagedaemon_wakeup() call occurs after vm_pageout_wanted is reset but before mtx_sleep(), then the page daemon might sleep until pagedaemon_wait() is called.

The following version would address the issue that I raised yesterday without eliminating "pass". That said, I am no longer incrementing it. Its value is either 0 or 1.

while (TRUE) {
        mtx_lock(&vm_page_queue_free_mtx);
        if (vm_pages_needed && !vm_page_count_min()) {
                vm_pages_needed = false;
                wakeup(&vm_cnt.v_free_count);
        }
        if (!target_met) {
                mtx_unlock(&vm_page_queue_free_mtx);
                MPASS(pass == 1);
                pause("pwait", hz / VM_INACT_SCAN_RATE);
        } else if ((pass == 0 && vm_pageout_wanted) ||
            vm_pages_needed) {
                mtx_unlock(&vm_page_queue_free_mtx);
                pass = 1;
        } else {
                vm_pageout_wanted = false;
                if (mtx_sleep(&vm_pageout_wanted,
                    &vm_page_queue_free_mtx, PDROP | PVM, "psleep",
                    hz) == 0) {
                        VM_CNT_INC(v_pdwakeups);
                        pass = 1;
                } else
                        pass = 0;
        }
        target_met = vm_pageout_scan(domain, pass);
}

In D13644#285437, @alc wrote:

The following version would address the issue that I raised yesterday without eliminating "pass". That said, I am no longer incrementing it. Its value is either 0 or 1.

Hm, if we lift the computation of the target into vm_pageout_worker(), I think we could address this problem without keeping "pass"?

That said, even with the solution you suggested, we still have a problem: a thread might set vm_pageout_wanted to true and call wakeup() after the page daemon clears vm_pageout_wanted, but before it has gone to sleep. Then we won't reclaim pages until the 1s timeout elapses (and we perform the target computation upon every wakeup as in the current version of the diff), or a thread calls pagedaemon_wait(). I'm not really sure how to close this race while keeping pagedaemon_wakeup() inexpensive in the common case. Perhaps it would be reasonable to rewrite pagedaemon_wakeup() along these lines?

if (!vm_pageout_wanted && curthread->td_proc != pageproc) {
    mtx_lock(&vm_page_queue_free_mtx);
    if (!vm_pageout_wanted) {
        vm_pageout_wanted = true;
        wakeup(&vm_pageout_wanted);
    }
    mtx_unlock(&vm_page_queue_free_mtx);
}

In D13644#285473, @markj wrote:

In D13644#285437, @alc wrote:

The following version would address the issue that I raised yesterday without eliminating "pass". That said, I am no longer incrementing it. Its value is either 0 or 1.

Hm, if we lift the computation of the target into vm_pageout_worker(), I think we could address this problem without keeping "pass"?

I still need to think about this question, as I wanted to respond promptly to the following.

That said, even with the solution you suggested, we still have a problem: a thread might set vm_pageout_wanted to true and call wakeup() after the page daemon clears vm_pageout_wanted, but before it has gone to sleep. Then we won't reclaim pages until the 1s timeout elapses (and we perform the target computation upon every wakeup as in the current version of the diff), or a thread calls pagedaemon_wait(). I'm not really sure how to close this race while keeping pagedaemon_wakeup() inexpensive in the common case. Perhaps it would be reasonable to rewrite pagedaemon_wakeup() along these lines?
if (!vm_pageout_wanted && curthread->td_proc != pageproc) {
    mtx_lock(&vm_page_queue_free_mtx);
    if (!vm_pageout_wanted) {
        vm_pageout_wanted = true;
        wakeup(&vm_pageout_wanted);
    }
    mtx_unlock(&vm_page_queue_free_mtx);
}

I don't see a better option for dealing with the remaining race than your proposed change to pagedaemon_wakeup().

Combined, won't these changes eliminate the need for the "|| !vm_pages_needed)"-triggered wakeup in pagedaemon_wait()?

In D13644#285475, @alc wrote:

I don't see a better option for dealing with the remaining race than your proposed change to pagedaemon_wakeup().

Combined, won't these changes eliminate the need for the "|| !vm_pages_needed)"-triggered wakeup in pagedaemon_wait()?

I think you're right.

I have updated this patch according to yesterday's discussion. However, I have not yet updated any of the comments to reflect any of the changes in the code.

I left this patch running overnight on a machine that was configured to perform a moderate amount of paging.

Is there any reason not to commit the part that makes vm_pageout_wanted static?

In D13644#285630, @alc wrote:

I left this patch running overnight on a machine that was configured to perform a moderate amount of paging.

Is there any reason not to commit the part that makes vm_pageout_wanted static?

Sorry for the delay. This version looks ok to me, modulo the comments.

markj mentioned this in D15491: Eliminate the "pass" variable in the page daemon control loop..May 22 2018, 6:36 PM

I believe this revision can be closed.