As the next step in eliminating PG_CACHE pages, I would like to retire VM_FREEPOOL_CACHE. Setting aside the big-picture goal of eliminating PG_CACHE pages, this change has both positive and negative effects. On the positive side, there is a little less fragmentation of the free pages in the buddy allocator. In other words, there are larger contiguous chunks. On the negative side, there are fewer PG_CACHE page reactivations, because PG_CACHE pages don't tend to survive as long in the buddy allocator. Consequently, there are more page-ins.
Details
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
In general, I see extremely low reactivation rates for pages cached by the page daemon. What follows is some data from a specific experiment.
I have a machine configured with limited physical memory:
avail memory = 2578804736 (2459 MB)
This machine has six processor cores.
I run "make -j7 buildworld" in a loop. With the limited physical memory, this is sufficient to create memory pressure, leading to page daemon activity.
After 7 iterations of buildworld, I see the following results.
Without the change applied:
21628 pages reactivated
...
1488937 pages cached
With the change applied:
974 pages reactivated
...
1450744 pages cached
In other words, without the change applied only 1.5% of cached pages are ever reactivated. However, with the change applied the percentage of cached pages reactivated drop to almost nothing, specifically 0.07%.
This results in additional page-ins.
Without the change applied:
103 swap pager pageins 216 swap pager pages paged in 722 swap pager pageouts 3660 swap pager pages paged out 71410 vnode pager pageins 327242 vnode pager pages paged in 0 vnode pager pageouts 0 vnode pager pages paged out
With the change applied:
82 swap pager pageins 214 swap pager pages paged in 714 swap pager pageouts 3755 swap pager pages paged out 80427 vnode pager pageins 344011 vnode pager pages paged in 0 vnode pager pageouts 0 vnode pager pages paged out
With a non-debugging kernel, the wall clock times are sometimes better and other times worse.
Without the change applied:
8107.638u 870.706s 28:47.65 519.6% 36257+456k 39230+22947io 30159pf+0w
8249.229u 1132.328s 29:16.96 533.9% 35997+452k 21033+23833io 24275pf+0w
8261.015u 1142.917s 29:21.08 533.9% 36001+452k 21547+23528io 26599pf+0w
8275.331u 1148.902s 29:21.31 535.0% 35982+452k 20888+23592io 26012pf+0w
8250.499u 1141.594s 29:15.69 534.9% 35993+452k 19731+24045io 25485pf+0w
8231.311u 1125.284s 29:14.69 533.2% 36014+452k 21158+23738io 27111pf+0w
8265.856u 1155.000s 29:29.14 532.5% 35967+452k 21865+23606io 30462pf+0w
With the change applied:
8106.276u 888.365s 28:43.09 522.0% 36235+455k 39169+23504io 28154pf+0w
8230.056u 1136.302s 29:13.42 534.1% 36018+452k 21342+23686io 26026pf+0w
8267.191u 1155.911s 29:29.01 532.6% 35988+452k 21061+24048io 26825pf+0w
8269.759u 1154.176s 29:25.30 533.8% 35987+452k 21532+23582io 27906pf+0w
8274.054u 1156.547s 29:34.23 531.5% 35999+452k 21595+23725io 30417pf+0w
8255.385u 1142.794s 29:20.60 533.8% 36006+452k 21319+24095io 28770pf+0w
8217.288u 1127.367s 29:17.43 531.7% 36036+453k 21333+23527io 30406pf+0w
Did you tried the test which previously demonstrated the cache usefulness ? I remember, it was serial read of the mmaped memory backed by a large file, but still fitting in the machine memory, like 6G file on 8G machine. AFAIR it was very important for cache to work; due to the serial access pattern, vm_fault() cached the pages behind and reuse of the cached pages caused disk accesses for every iteration.
r281079 might avoid the issue in another way, but something similar might be relevant still.
I verified that this change doesn't adversely affect vm_fault()'s sequential access heuristic. As of r281079, that heuristic doesn't (directly) cache pages. Instead, it performs the equivalent of an automatic madvise(..., MADV_DONTNEED). In other words, the sequentially accessed file's pages are placed at the front of the inactive queue, where the page daemon starts working. Moreover, because the pages are in the inactive queue, rather than the buddy allocator, they are not immediately at risk for reclamation by vm_page_alloc().
contigmalloc(M_WAITOK), or more precisely, vm_pageout_grow_cache(), is the only use case for PG_CACHE pages that might see a high rate of reactivations and be adversely affected by this change. In essence, vm_pageout_grow_cache() indiscriminately caches a large number of pages, hoping to create a chunk of contiguous physical memory in the buddy queues by serendipity. It expects that the unused cache pages will be reactivated, avoiding page ins.
That's why we're looking to replace vm_pageout_grow_cache().
I've run some tests with Postgres 9.3.7. I configured the shared buffers to occupy 1/4 of physical memory. I used a database that was 3x larger than physical memory. And, I ran pgbench's SELECT only workload for a few hours.
With an unmodified kernel, 0.46% of PG_CACHE pages were reactivated.
5311258 pages reactivated
...
1147411011 pages cached
With the patched kernel, the percentage didn't change; it was still 0.46%.