This change also modifies _vm_page_deactivate() to requeue
already-inactive pages at the head of the inactive queue, if the caller
requests that it bypass regular LRU operation.
Details
- Reviewers
alc - Commits
- rS287235: Remove weighted page handling from vm_page_advise().
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 166 Build 166: arc lint + arc unit
Event Timeline
sys/vm/vm_page.c | ||
---|---|---|
2545 | As we update code, I want to replace these old style asserts that expose the lock implementation details with: vm_page_assert_locked(m); that doesn't. | |
2564 | This flag should only be cleared when a page is moving to the inactive queue from elsewhere. If the page is already in the inactive queue and dirty, clearing this flag will actually cause the page to be moved to the tail of the queue rather than being written to swap. I think that you should reinsert this statement between the else clause's dequeue and pagequeue_lock operations. | |
2767–2768 | I believe that the pages should be unconditionally deactivated, so remove this "if" statement. Think about it this way: madvise(..., MADV_DONTNEED) is making an assertion about the future usage of the page and so it should invalidate any past usage information. | |
2783 | The old code simply deactivated dirty pages, and for now I'm not sure that we should change that without experimental validation. With that in mind, let's replace "1" by "m->dirty == 0". When pages get kicked out of the buffer map, they are clean, so this won't impact fadvise("dont need"). |
sys/vm/vm_page.c | ||
---|---|---|
2767–2768 | That makes sense to me. That line of thinking would seem to imply that we should set act_count = 0 outright, as is done in the MADV_FREE case. Is there any reason not to do that here? I'm not sure that it would have much of an effect, since act_count is mostly relevant for active pages. |
Address review comments:
- use vm_page_assert_locked() instead of vm_page_lock_assert()
- don't clear PG_WINATCFLS if the page is already inactive
- only place clean pages at the head of the inactive queue; dirty pages should still go to the tail
- DONTNEED should deactivate the page regardless of the current act_count
sys/vm/vm_page.c | ||
---|---|---|
2764–2766 | Thanks, that addresses my mild confusion. :) |
Can you draft a commit message and post it here? I'm running a test involving madvise(2). If the results of the test are as I expect, then this patch is ready for HEAD.
I configured a machine to have 2.5GB of RAM through /boot/loader.conf and ran the following program on a non-debug kernel, e.g., no INVARIANTS compiled in:
#include <sys/types.h> #include <sys/mman.h> #include <err.h> #include <stdio.h> #include <unistd.h> int main(int argc, char **argv) { size_t i, len, supersize; char *ptr; int accum; len = (size_t)128 * 1024 * 1024 * 1024; ptr = mmap(0, len, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0); if (ptr == MAP_FAILED) err(1, "mmap"); supersize = (size_t)2 * 1024 * 1024; accum = 0; for (i = 0; i < len; i++) { if ((i & (supersize - 1)) == 0 && i > supersize) { if (madvise(&ptr[i - supersize], supersize, MADV_FREE) == -1) err(1, "madvise"); } accum += ptr[i] = i; } printf("accum: %d\nptr: %p\nsize: %ld\n", accum, ptr, len); }
Without your patch applied, I see:
# time ./mmap_advise1 accum: 0 ptr: 0x801000000 size: 137438953472 162.500u 81.108s 4:03.61 99.9% 5+167k 0+0io 0pf+0w # pstat -s Device 1K-blocks Used Avail Capacity /dev/ada0s3b 33554432 516 33553916 0% # sysctl vm.phys_free vm.phys_free: DOMAIN 0: FREE LIST 0: ORDER (SIZE) | NUMBER | POOL 0 | POOL 1 -- -- -- -- -- -- 12 ( 16384K) | 0 | 0 11 ( 8192K) | 2 | 0 10 ( 4096K) | 4 | 0 9 ( 2048K) | 232 | 0 8 ( 1024K) | 699 | 0 7 ( 512K) | 1018 | 0 6 ( 256K) | 1174 | 2 5 ( 128K) | 1212 | 7 4 ( 64K) | 913 | 158 3 ( 32K) | 1048 | 562 2 ( 16K) | 925 | 1311 1 ( 8K) | 829 | 1432 0 ( 4K) | 709 | 1757 FREE LIST 1: ORDER (SIZE) | NUMBER | POOL 0 | POOL 1 -- -- -- -- -- -- 12 ( 16384K) | 0 | 0 11 ( 8192K) | 0 | 0 10 ( 4096K) | 0 | 0 9 ( 2048K) | 0 | 0 8 ( 1024K) | 0 | 0 7 ( 512K) | 0 | 0 6 ( 256K) | 2 | 1 5 ( 128K) | 4 | 0 4 ( 64K) | 5 | 0 3 ( 32K) | 2 | 1 2 ( 16K) | 2 | 1 1 ( 8K) | 3 | 1 0 ( 4K) | 0 | 0
With your patch applied, I see:
# time ./mmap_advise1 accum: 0 ptr: 0x801000000 size: 137438953472 159.558u 81.188s 4:00.75 99.9% 5+167k 0+0io 0pf+0w # pstat -s Device 1K-blocks Used Avail Capacity /dev/ada0s3b 33554432 0 33554432 0% # sysctl vm.phys_free vm.phys_free: DOMAIN 0: FREE LIST 0: ORDER (SIZE) | NUMBER | POOL 0 | POOL 1 -- -- -- -- -- -- 12 ( 16384K) | 134 | 0 11 ( 8192K) | 1 | 0 10 ( 4096K) | 1 | 0 9 ( 2048K) | 4 | 0 8 ( 1024K) | 33 | 0 7 ( 512K) | 103 | 0 6 ( 256K) | 106 | 1 5 ( 128K) | 112 | 0 4 ( 64K) | 121 | 0 3 ( 32K) | 145 | 26 2 ( 16K) | 223 | 604 1 ( 8K) | 263 | 646 0 ( 4K) | 329 | 683 FREE LIST 1: ORDER (SIZE) | NUMBER | POOL 0 | POOL 1 -- -- -- -- -- -- 12 ( 16384K) | 0 | 0 11 ( 8192K) | 0 | 0 10 ( 4096K) | 0 | 0 9 ( 2048K) | 0 | 0 8 ( 1024K) | 1 | 0 7 ( 512K) | 0 | 0 6 ( 256K) | 2 | 0 5 ( 128K) | 1 | 0 4 ( 64K) | 2 | 0 3 ( 32K) | 0 | 0 2 ( 16K) | 0 | 0 1 ( 8K) | 1 | 0 0 ( 4K) | 0 | 0
Notice three things: With your patch, (1) the program takes slightly less time to run, (2) no other pages are pushed out to swap, and (3) free memory is much less fragmented. (Because of (3), the old vm_page_advise() was making superpage usage less likely.)
Thanks! Here's a proposed commit log message:
Remove weighted page handling from vm_page_advise().
This was added in r51337 as part of the implementation of
madvise(MADV_DONTNEED). It attempted to avoid offloading work onto the
pagedaemon by distributing pages among the various page queues according to
a fixed weighting.
Now that the pagedaemon performs steady scanning of the active page queue,
this weighted handling is unnecessary. Instead, always "cache" clean pages
by moving them to the head of the inactive page queue. This simplifies the
implementation of vm_page_advise() and eliminates the fragmentation that
resulted from the distribution of pages among multiple queues.
Suggested by: alc
Reviewed by: alc
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3401
Overall, it looks good. I've made one comment below.
I would change the second sentence of the previous paragraph. In particular, I would replace "... offloading work ...." with something like: "Its objective was to ensure that the page daemon would still reclaim other unreferenced pages (i.e., unreferenced pages not touched by madvise()) from the active queue."
Now that the pagedaemon performs steady scanning of the active page queue,
this weighted handling is unnecessary. Instead, always "cache" clean pages
by moving them to the head of the inactive page queue. This simplifies the
implementation of vm_page_advise() and eliminates the fragmentation that
resulted from the distribution of pages among multiple queues.Suggested by: alc
Reviewed by: alc
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3401
Ah, I see, that makes more sense. How about substituting your suggested sentence, but changing "still reclaim" to "eventually reclaim"? As I understand it, the goal was to eventually trigger a scan of the active queue.