Page MenuHomeFreeBSD

Eliminate weighted handling of pages in vm_page_advise().
ClosedPublic

Authored by markj on Aug 17 2015, 6:51 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Apr 29, 3:54 AM
Unknown Object (File)
Wed, Apr 24, 11:33 AM
Unknown Object (File)
Wed, Apr 24, 10:15 AM
Unknown Object (File)
Mar 16 2024, 12:04 PM
Unknown Object (File)
Mar 16 2024, 12:04 PM
Unknown Object (File)
Mar 2 2024, 4:40 PM
Unknown Object (File)
Feb 19 2024, 10:11 PM
Unknown Object (File)
Jan 9 2024, 7:42 PM
Subscribers

Details

Summary

This change also modifies _vm_page_deactivate() to requeue
already-inactive pages at the head of the inactive queue, if the caller
requests that it bypass regular LRU operation.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

markj retitled this revision from to Eliminate weighted handling of pages in vm_page_advise()..
markj edited the test plan for this revision. (Show Details)
markj updated this object.
sys/vm/vm_page.c
2545 ↗(On Diff #8013)

As we update code, I want to replace these old style asserts that expose the lock implementation details with:

vm_page_assert_locked(m);

that doesn't.

2564 ↗(On Diff #8013)

This flag should only be cleared when a page is moving to the inactive queue from elsewhere. If the page is already in the inactive queue and dirty, clearing this flag will actually cause the page to be moved to the tail of the queue rather than being written to swap.

I think that you should reinsert this statement between the else clause's dequeue and pagequeue_lock operations.

2769–2772 ↗(On Diff #8013)

I believe that the pages should be unconditionally deactivated, so remove this "if" statement. Think about it this way: madvise(..., MADV_DONTNEED) is making an assertion about the future usage of the page and so it should invalidate any past usage information.

2788 ↗(On Diff #8013)

The old code simply deactivated dirty pages, and for now I'm not sure that we should change that without experimental validation. With that in mind, let's replace "1" by "m->dirty == 0". When pages get kicked out of the buffer map, they are clean, so this won't impact fadvise("dont need").

sys/vm/vm_page.c
2769–2772 ↗(On Diff #8013)

That makes sense to me. That line of thinking would seem to imply that we should set act_count = 0 outright, as is done in the MADV_FREE case. Is there any reason not to do that here? I'm not sure that it would have much of an effect, since act_count is mostly relevant for active pages.

Address review comments:

  • use vm_page_assert_locked() instead of vm_page_lock_assert()
  • don't clear PG_WINATCFLS if the page is already inactive
  • only place clean pages at the head of the inactive queue; dirty pages should still go to the tail
  • DONTNEED should deactivate the page regardless of the current act_count
sys/vm/vm_page.c
2765 ↗(On Diff #8076)

To answer your question below, delete this line too. This line would only make sense if the page were to remain active.

2767–2768 ↗(On Diff #8076)

See my comment on line 2765.

Don't set act_count to 0 before deactivating.

markj added inline comments.
sys/vm/vm_page.c
2764–2765 ↗(On Diff #8094)

Thanks, that addresses my mild confusion. :)

Can you draft a commit message and post it here? I'm running a test involving madvise(2). If the results of the test are as I expect, then this patch is ready for HEAD.

alc edited edge metadata.

I configured a machine to have 2.5GB of RAM through /boot/loader.conf and ran the following program on a non-debug kernel, e.g., no INVARIANTS compiled in:

#include <sys/types.h>
#include <sys/mman.h>

#include <err.h>
#include <stdio.h>
#include <unistd.h>

int
main(int argc, char **argv)
{
        size_t i, len, supersize;
        char *ptr;
        int accum;

        len = (size_t)128 * 1024 * 1024 * 1024;
        ptr = mmap(0, len, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
        if (ptr == MAP_FAILED)
                err(1, "mmap");
        supersize = (size_t)2 * 1024 * 1024;
        accum = 0;
        for (i = 0; i < len; i++) {
                if ((i & (supersize - 1)) == 0 && i > supersize) {
                        if (madvise(&ptr[i - supersize], supersize, MADV_FREE) == -1)
                                err(1, "madvise");
                }
                accum += ptr[i] = i;
        }
        printf("accum: %d\nptr: %p\nsize: %ld\n", accum, ptr, len);
}

Without your patch applied, I see:

# time ./mmap_advise1
accum: 0
ptr: 0x801000000
size: 137438953472
162.500u 81.108s 4:03.61 99.9%  5+167k 0+0io 0pf+0w
# pstat -s
Device          1K-blocks     Used    Avail Capacity
/dev/ada0s3b     33554432      516 33553916     0%
# sysctl vm.phys_free
vm.phys_free: 
DOMAIN 0:

FREE LIST 0:

  ORDER (SIZE)  |  NUMBER
                |  POOL 0  |  POOL 1
--            -- --      -- --      --
  12 ( 16384K)  |       0  |       0
  11 (  8192K)  |       2  |       0
  10 (  4096K)  |       4  |       0
   9 (  2048K)  |     232  |       0
   8 (  1024K)  |     699  |       0
   7 (   512K)  |    1018  |       0
   6 (   256K)  |    1174  |       2
   5 (   128K)  |    1212  |       7
   4 (    64K)  |     913  |     158
   3 (    32K)  |    1048  |     562
   2 (    16K)  |     925  |    1311
   1 (     8K)  |     829  |    1432
   0 (     4K)  |     709  |    1757

FREE LIST 1:

  ORDER (SIZE)  |  NUMBER
                |  POOL 0  |  POOL 1
--            -- --      -- --      --
  12 ( 16384K)  |       0  |       0
  11 (  8192K)  |       0  |       0
  10 (  4096K)  |       0  |       0
   9 (  2048K)  |       0  |       0
   8 (  1024K)  |       0  |       0
   7 (   512K)  |       0  |       0
   6 (   256K)  |       2  |       1
   5 (   128K)  |       4  |       0
   4 (    64K)  |       5  |       0
   3 (    32K)  |       2  |       1
   2 (    16K)  |       2  |       1
   1 (     8K)  |       3  |       1
   0 (     4K)  |       0  |       0

With your patch applied, I see:

# time ./mmap_advise1
accum: 0
ptr: 0x801000000
size: 137438953472
159.558u 81.188s 4:00.75 99.9%  5+167k 0+0io 0pf+0w
# pstat -s
Device          1K-blocks     Used    Avail Capacity
/dev/ada0s3b     33554432        0 33554432     0%
# sysctl vm.phys_free
vm.phys_free: 
DOMAIN 0:

FREE LIST 0:

  ORDER (SIZE)  |  NUMBER
                |  POOL 0  |  POOL 1
--            -- --      -- --      --
  12 ( 16384K)  |     134  |       0
  11 (  8192K)  |       1  |       0
  10 (  4096K)  |       1  |       0
   9 (  2048K)  |       4  |       0
   8 (  1024K)  |      33  |       0
   7 (   512K)  |     103  |       0
   6 (   256K)  |     106  |       1
   5 (   128K)  |     112  |       0
   4 (    64K)  |     121  |       0
   3 (    32K)  |     145  |      26
   2 (    16K)  |     223  |     604
   1 (     8K)  |     263  |     646
   0 (     4K)  |     329  |     683

FREE LIST 1:

  ORDER (SIZE)  |  NUMBER
                |  POOL 0  |  POOL 1
--            -- --      -- --      --
  12 ( 16384K)  |       0  |       0
  11 (  8192K)  |       0  |       0
  10 (  4096K)  |       0  |       0
   9 (  2048K)  |       0  |       0
   8 (  1024K)  |       1  |       0
   7 (   512K)  |       0  |       0
   6 (   256K)  |       2  |       0
   5 (   128K)  |       1  |       0
   4 (    64K)  |       2  |       0
   3 (    32K)  |       0  |       0
   2 (    16K)  |       0  |       0
   1 (     8K)  |       1  |       0
   0 (     4K)  |       0  |       0

Notice three things: With your patch, (1) the program takes slightly less time to run, (2) no other pages are pushed out to swap, and (3) free memory is much less fragmented. (Because of (3), the old vm_page_advise() was making superpage usage less likely.)

This revision is now accepted and ready to land.Aug 21 2015, 8:17 PM

Thanks! Here's a proposed commit log message:

Remove weighted page handling from vm_page_advise().

This was added in r51337 as part of the implementation of
madvise(MADV_DONTNEED). It attempted to avoid offloading work onto the
pagedaemon by distributing pages among the various page queues according to
a fixed weighting.

Now that the pagedaemon performs steady scanning of the active page queue,
this weighted handling is unnecessary. Instead, always "cache" clean pages
by moving them to the head of the inactive page queue. This simplifies the
implementation of vm_page_advise() and eliminates the fragmentation that
resulted from the distribution of pages among multiple queues.

Suggested by: alc
Reviewed by: alc
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3401

Overall, it looks good. I've made one comment below.

In D3401#70933, @markj wrote:

Thanks! Here's a proposed commit log message:

Remove weighted page handling from vm_page_advise().

This was added in r51337 as part of the implementation of
madvise(MADV_DONTNEED). It attempted to avoid offloading work onto the
pagedaemon by distributing pages among the various page queues according to
a fixed weighting.

I would change the second sentence of the previous paragraph. In particular, I would replace "... offloading work ...." with something like: "Its objective was to ensure that the page daemon would still reclaim other unreferenced pages (i.e., unreferenced pages not touched by madvise()) from the active queue."

Now that the pagedaemon performs steady scanning of the active page queue,
this weighted handling is unnecessary. Instead, always "cache" clean pages
by moving them to the head of the inactive page queue. This simplifies the
implementation of vm_page_advise() and eliminates the fragmentation that
resulted from the distribution of pages among multiple queues.

Suggested by: alc
Reviewed by: alc
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3401

In D3401#71324, @alc wrote:

Overall, it looks good. I've made one comment below.

In D3401#70933, @markj wrote:

Thanks! Here's a proposed commit log message:

Remove weighted page handling from vm_page_advise().

This was added in r51337 as part of the implementation of
madvise(MADV_DONTNEED). It attempted to avoid offloading work onto the
pagedaemon by distributing pages among the various page queues according to
a fixed weighting.

I would change the second sentence of the previous paragraph. In particular, I would replace "... offloading work ...." with something like: "Its objective was to ensure that the page daemon would still reclaim other unreferenced pages (i.e., unreferenced pages not touched by madvise()) from the active queue."

Ah, I see, that makes more sense. How about substituting your suggested sentence, but changing "still reclaim" to "eventually reclaim"? As I understand it, the goal was to eventually trigger a scan of the active queue.

In D3401#71795, @markj wrote:
In D3401#71324, @alc wrote:

Overall, it looks good. I've made one comment below.

In D3401#70933, @markj wrote:

Thanks! Here's a proposed commit log message:

Remove weighted page handling from vm_page_advise().

This was added in r51337 as part of the implementation of
madvise(MADV_DONTNEED). It attempted to avoid offloading work onto the
pagedaemon by distributing pages among the various page queues according to
a fixed weighting.

I would change the second sentence of the previous paragraph. In particular, I would replace "... offloading work ...." with something like: "Its objective was to ensure that the page daemon would still reclaim other unreferenced pages (i.e., unreferenced pages not touched by madvise()) from the active queue."

Ah, I see, that makes more sense. How about substituting your suggested sentence, but changing "still reclaim" to "eventually reclaim"? As I understand it, the goal was to eventually trigger a scan of the active queue.

That's fine. Commit it!

This revision was automatically updated to reflect the committed changes.