Retire VM_FREEPOOL_CACHE.
ClosedPublic
Actions

Authored by alc on Jun 2 2015, 5:34 PM.

Details

Reviewers

kmacy
kib
jhb

Commits

rS284147: Retire VM_FREEPOOL_CACHE as the next step in eliminating PG_CACHE pages.

Summary

As the next step in eliminating PG_CACHE pages, I would like to retire VM_FREEPOOL_CACHE. Setting aside the big-picture goal of eliminating PG_CACHE pages, this change has both positive and negative effects. On the positive side, there is a little less fragmentation of the free pages in the buddy allocator. In other words, there are larger contiguous chunks. On the negative side, there are fewer PG_CACHE page reactivations, because PG_CACHE pages don't tend to survive as long in the buddy allocator. Consequently, there are more page-ins.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

alc updated this revision to Diff 5885.Jun 2 2015, 5:34 PM

alc retitled this revision from to Retire VM_FREEPOOL_CACHE..

alc updated this object.

alc edited the test plan for this revision. (Show Details)

alc added reviewers: jhb, kib, kmacy.

Herald added a subscriber: emaste. · View Herald TranscriptJun 2 2015, 5:34 PM

In general, I see extremely low reactivation rates for pages cached by the page daemon. What follows is some data from a specific experiment.

I have a machine configured with limited physical memory:

avail memory = 2578804736 (2459 MB)

This machine has six processor cores.

I run "make -j7 buildworld" in a loop. With the limited physical memory, this is sufficient to create memory pressure, leading to page daemon activity.

After 7 iterations of buildworld, I see the following results.

Without the change applied:

21628 pages reactivated

...

1488937 pages cached

With the change applied:

974 pages reactivated

...

1450744 pages cached

In other words, without the change applied only 1.5% of cached pages are ever reactivated. However, with the change applied the percentage of cached pages reactivated drop to almost nothing, specifically 0.07%.

This results in additional page-ins.

Without the change applied:

   103 swap pager pageins
   216 swap pager pages paged in
   722 swap pager pageouts
  3660 swap pager pages paged out
 71410 vnode pager pageins
327242 vnode pager pages paged in
     0 vnode pager pageouts
     0 vnode pager pages paged out

With the change applied:

    82 swap pager pageins
   214 swap pager pages paged in
   714 swap pager pageouts
  3755 swap pager pages paged out
 80427 vnode pager pageins
344011 vnode pager pages paged in
     0 vnode pager pageouts
     0 vnode pager pages paged out

With a non-debugging kernel, the wall clock times are sometimes better and other times worse.

Without the change applied:

8107.638u 870.706s 28:47.65 519.6% 36257+456k 39230+22947io 30159pf+0w
8249.229u 1132.328s 29:16.96 533.9% 35997+452k 21033+23833io 24275pf+0w
8261.015u 1142.917s 29:21.08 533.9% 36001+452k 21547+23528io 26599pf+0w
8275.331u 1148.902s 29:21.31 535.0% 35982+452k 20888+23592io 26012pf+0w
8250.499u 1141.594s 29:15.69 534.9% 35993+452k 19731+24045io 25485pf+0w
8231.311u 1125.284s 29:14.69 533.2% 36014+452k 21158+23738io 27111pf+0w
8265.856u 1155.000s 29:29.14 532.5% 35967+452k 21865+23606io 30462pf+0w

With the change applied:

8106.276u 888.365s 28:43.09 522.0% 36235+455k 39169+23504io 28154pf+0w
8230.056u 1136.302s 29:13.42 534.1% 36018+452k 21342+23686io 26026pf+0w
8267.191u 1155.911s 29:29.01 532.6% 35988+452k 21061+24048io 26825pf+0w
8269.759u 1154.176s 29:25.30 533.8% 35987+452k 21532+23582io 27906pf+0w
8274.054u 1156.547s 29:34.23 531.5% 35999+452k 21595+23725io 30417pf+0w
8255.385u 1142.794s 29:20.60 533.8% 36006+452k 21319+24095io 28770pf+0w
8217.288u 1127.367s 29:17.43 531.7% 36036+453k 21333+23527io 30406pf+0w

Did you tried the test which previously demonstrated the cache usefulness ? I remember, it was serial read of the mmaped memory backed by a large file, but still fitting in the machine memory, like 6G file on 8G machine. AFAIR it was very important for cache to work; due to the serial access pattern, vm_fault() cached the pages behind and reuse of the cached pages caused disk accesses for every iteration.

r281079 might avoid the issue in another way, but something similar might be relevant still.

I verified that this change doesn't adversely affect vm_fault()'s sequential access heuristic. As of r281079, that heuristic doesn't (directly) cache pages. Instead, it performs the equivalent of an automatic madvise(..., MADV_DONTNEED). In other words, the sequentially accessed file's pages are placed at the front of the inactive queue, where the page daemon starts working. Moreover, because the pages are in the inactive queue, rather than the buddy allocator, they are not immediately at risk for reclamation by vm_page_alloc().

kib accepted this revision.Jun 3 2015, 7:02 AM

kib edited edge metadata.

This revision is now accepted and ready to land.Jun 3 2015, 7:02 AM

contigmalloc(M_WAITOK), or more precisely, vm_pageout_grow_cache(), is the only use case for PG_CACHE pages that might see a high rate of reactivations and be adversely affected by this change. In essence, vm_pageout_grow_cache() indiscriminately caches a large number of pages, hoping to create a chunk of contiguous physical memory in the buddy queues by serendipity. It expects that the unused cache pages will be reactivated, avoiding page ins.

That's why we're looking to replace vm_pageout_grow_cache().

I've run some tests with Postgres 9.3.7. I configured the shared buffers to occupy 1/4 of physical memory. I used a database that was 3x larger than physical memory. And, I ran pgbench's SELECT only workload for a few hours.

With an unmodified kernel, 0.46% of PG_CACHE pages were reactivated.