As the next step in eliminating PG_CACHE pages, I am proposing to free rather than cache pages in vm_pageout_scan().
Details
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
Does it make sense to cut the tail in many small chunks ?
Why not replace all calls to vm_page_cache/vm_page_try_to_cache with the free in the single step ? Or make these two functions to free the page. Removing the cache code is obviously different question.
Here is some data from tests with PostgreSQL 9.3.7.
I've configured PostgreSQL to use 1/4 of DRAM as shared buffers. My database is 3 times larger than DRAM, I'm using the select-only workload for 30 minutes at a time.
There is a slight drop in TPS.
Before:
tps = 7856.441823 (including connections establishing)
tps = 7932.149083 (including connections establishing)
tps = 7933.584812 (including connections establishing)
tps = 7932.831139 (including connections establishing)
tps = 7935.023572 (including connections establishing)
tps = 7942.407592 (including connections establishing)
tps = 7949.623617 (including connections establishing)
tps = 7942.221710 (including connections establishing)
1147293397 pages were cached over the course of these 8 runs, but only 0.46% of them were reactivated.
After:
tps = 7847.534150 (including connections establishing)
tps = 7910.332619 (including connections establishing)
tps = 7925.766889 (including connections establishing)
tps = 7929.057218 (including connections establishing)
tps = 7916.481657 (including connections establishing)
tps = 7916.523477 (including connections establishing)
tps = 7914.272227 (including connections establishing)
tps = 7915.479444 (including connections establishing)
Virtually all of the pages examined by the page daemon are cached or freed, depending on the test, since this is a select-only workload. For example, in the before test, there are:
1182791179 pages examined by the page daemon
...
1147293397 pages cached
So, I also measured the time spent in the inactive queue scan using rdtsc().
Before: 2027760128901
After: 1698644621449
In other words, freeing rather than caching the pages reduces the time spent in the inactive scan by almost 1/6.
Different use cases of PG_CACHE pages have different characteristics, in particular, reactivation rates. I'm going case-by-case to isolate the effects that each one has.
Kostik, could you possibly look at a couple of semi-related items? First, we are reacquiring the inactive queue lock in vm_page_free(). I wonder if this reacquire could be safely eliminated, by dequeueing the page when the marker is inserted. This will, of course, require some changes to the other cases in the inactive queue scan. Second, it seems to me that the "(m->valid == 0)" case should come sooner. That's why this patch doesn't merge the "(m->dirty == 0)" case with the "(m->valid == 0)" case.
If you have time to create a patch, I'll benchmark it.