Details

Reviewers

alc
kib
dougm
markj

Summary

This is to reduce PCTRIE lookups from swap_pager_haspage().
Given swap_pager_haspage() is called frequently during heavy swap in and out activities, avoiding each individual lookup helps a lot.
For example, on i386, SWB_NPAGES is 32 and each swap_pager_haspage may call
PCTRIE_LOOKUP up to 65 times. Instead, new code will call 9 lookups for
the same size giving speed up of 700%.
On amd64, it is 64 and even twice faster as SWB_NPAGES is 64.

Test Plan

This changeset contains self-validations as a test driver.
I haven't seen any behavior changes between old and new implementations.
I will upload a version without self-test later.

make buildworld -j <hight number> causes lots of page in/out and is a
good test case

I analyzed the difference with the dtrace code below:

fbt::swap_pager_haspage:entry
{

self->ts = timestamp;

}
fbt::swap_pager_haspage:return
/self->ts/
{

@count["swap_pager_haspage count"] = count();
@avg["swap_pager_haspage average time"] = avg(timestamp - self->ts);
@sum["swap_pager_haspage total time"] = sum(timestamp - self->ts);
self->ts = 0;

}

I wanted to measure swp_meta_lookup but somehow dtrace -l did not list swp_meta_
lookup probe. So, I measured swap_pager_haspage() which calls the
swp_meta_lookup.

The results are from running buildworld from the 2 kernels with the same
base with and without the patch.
The system was setup in Parallels VM with 1.5 GB physical memory and 10GB swap s
pace on SSD.

r357119M: /usr/src/usr/obj/i386.i386/sys/GENERIC-NODEBUG i386

swap_pager_haspage count                                   33807456
swap_pager_haspage average time                               12867
swap_pager_haspage total time                          435000565713

r357119M: /usr/obj/mnt/sys/swp_meta_lookup/i386.i386/sys/GENERIC-NODEBUG i386

swap_pager_haspage count                                      88970
swap_pager_haspage average time                                8397
swap_pager_haspage total time                             747098759

average time is about 50% faster.
number of calls are reduced by about 380 times lesser.
total time is reduced by about 60 times lesser.

The speed up per each call is 50% faster and it sounds somewhat reasonable.
However, I am not sure why the total number of calls are reduced so much.

The challange of running dtrace for swap-in/swap-out is that
OOM killer frequently killed dtrace during low memory.
I had my dtrace killed nearly a half a hundred times in the past couple
of months of trying to measure the performance gain.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 27674
Build 25875: arc lint + arc unit

Event Timeline

ota_j.email.ne.jp created this revision.Nov 16 2019, 2:39 PM

Herald added a subscriber: imp. · View Herald TranscriptNov 16 2019, 2:39 PM

Harbormaster completed remote builds in B27576: Diff 64444.Nov 16 2019, 2:39 PM

Dropped self-test.

Updating D22409: swp_pager_meta_lookup() to find swap address and additionally search for

the number of continuous blocks.

Harbormaster completed remote builds in B27581: Diff 64455.Nov 16 2019, 6:22 PM

ota_j.email.ne.jp mentioned this in D22324: try to free blocks by upgrading object locks.Nov 18 2019, 4:34 AM

Update and merge D22437.

Updating D22409: swp_pager_meta_lookup() to find swap address and additionally search for

the number of continuous blocks.

Harbormaster completed remote builds in B27653: Diff 64601.Nov 20 2019, 1:41 AM

ota_j.email.ne.jp retitled this revision from swp_pager_meta_lookup() to find swap address and additionally search for the number of continuous blocks. to Optimize swp_pager_meta_lookup() to find additionally the numbers of continuous blocks..Nov 20 2019, 1:45 AM

ota_j.email.ne.jp edited the summary of this revision. (Show Details)

ota_j.email.ne.jp added inline comments.Nov 20 2019, 5:31 AM

sys/vm/swap_pager.c
2163	I need to restore this KASSERT.

Restore KASSERT.

Harbormaster completed remote builds in B27674: Diff 64662.Nov 21 2019, 3:27 AM

Where is this optimization actually used ? Almost all calls to vm_pager_has_page() pass NULLs for behind/ahead.

sys/vm/swap_pager.c
2162	before != NULL, there and in all other places
2163	How could this condition (left part before \|\|) be true. And even if it is, why do we care ?

ota_j.email.ne.jp added inline comments.Nov 22 2019, 12:34 AM

sys/vm/swap_pager.c
1076	These swp_pager_meta_lookup function calls here and below are meld into a single swp_pager_meta_lookup. For up to the size of SWAP_META_PAGES, swp_pager_meta_lookup can access adjacent elements via array access avoiding most of PCTRIE lookup.
2163	The argument is an integer and vm_pindex_t is unsigned long long; I wanted to check negative values earlier to avoid comparing these 2 different types.

Check NULL pointers explicitly.

Harbormaster completed remote builds in B27705: Diff 64709.Nov 22 2019, 12:36 AM

ota_j.email.ne.jp edited the test plan for this revision. (Show Details)Feb 25 2020, 6:13 AM

ota_j.email.ne.jp added a reviewer: markj.

markj added inline comments.Feb 27 2020, 1:44 PM

sys/vm/swap_pager.c
1057	This is changing the values of maxahead and maxbehind passed in swap_pager_getpages(), so the caller's hints may not be respected.
2170	This comment seems a little misleading? vm_pindex_t is unsigned.
2219	I find this function quite hard to read: there are many local variables and adjustments by one. Is it possible to simplify this at all?

Thank you for quick response, Mark.

sys/vm/swap_pager.c
2170	I will change this to something like /* be sure to avoid under-flow */
2219	I will see how I can simplify.

Split backward and forward search into separate functions for ease of reading.

Harbormaster completed remote builds in B29949: Diff 69535.Mar 15 2020, 12:24 AM

ota_j.email.ne.jp added inline comments.Mar 15 2020, 12:28 AM

sys/vm/swap_pager.c
1057	Existing code has limit with look like: for (i = 1; i < SWB_NPAGES; i++)

ota_j.email.ne.jp added inline comments.Mar 15 2020, 1:05 PM

sys/vm/swap_pager.c
2170	Problem cases were when pindex was very small like 1 and *before was larger, like 31, the subtraction resulted a huge number. I think the comment is now accurate.

markj added inline comments.Mar 18 2020, 6:17 PM

sys/vm/swap_pager.c
1068	I think these initializations should be done in swp_pager_meta_lookup() now.
2165	Style, `} else {` should be one line.
2182	I think it is clearer to write `pindex + *after < pindex`.

Fixed style, moved 0 assignment to *before and *after when SWWAPBLK_NONE,
and adjusted if/else statement for *after case.

Harbormaster completed remote builds in B30019: Diff 69709.Mar 20 2020, 2:04 AM

Address other review comments.

Harbormaster completed remote builds in B31073: Diff 71745.May 14 2020, 2:10 AM

ota_j.email.ne.jp added inline comments.May 14 2020, 2:11 AM

sys/vm/swap_pager.c
1057	I'm interested in removing SWB_NPAGES ceiling and see if it improves performance. However, for now, I want to keep the same behavior.
2170	I'm removing, actually.

I think I addressed all of feed backs so far.
I'm wondering if someone can take a look.

Optimize swp_pager_meta_lookup() to find additionally the numbers of continuous blocks.
Needs ReviewPublic
Actions

Details

Diff Detail

Event Timeline

Updating D22409: swp_pager_meta_lookup() to find swap address and additionally search for

Updating D22409: swp_pager_meta_lookup() to find swap address and additionally search for

Revision Contents
Changeset List

Diff 64662

sys/vm/swap_pager.c

Optimize swp_pager_meta_lookup() to find additionally the numbers of continuous blocks.Needs ReviewPublicActions

Details

Diff Detail

Event Timeline

Updating D22409: swp_pager_meta_lookup() to find swap address and additionally search for

Updating D22409: swp_pager_meta_lookup() to find swap address and additionally search for

Revision ContentsChangeset List

Diff 64662

sys/vm/swap_pager.c

Optimize swp_pager_meta_lookup() to find additionally the numbers of continuous blocks.
Needs ReviewPublic
Actions

Revision Contents
Changeset List