Page MenuHomeFreeBSD

Optimize swp_pager_meta_lookup() to find additionally the numbers of continuous blocks.
Needs ReviewPublic

Authored by ota_j.email.ne.jp on Nov 16 2019, 2:39 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Mar 20, 7:21 AM
Unknown Object (File)
Jan 30 2024, 8:55 AM
Unknown Object (File)
Jan 7 2024, 10:51 AM
Unknown Object (File)
Dec 20 2023, 5:17 AM
Unknown Object (File)
Dec 9 2023, 12:30 PM
Unknown Object (File)
Nov 22 2023, 4:45 PM
Unknown Object (File)
Nov 5 2023, 8:33 AM
Unknown Object (File)
Nov 2 2023, 9:58 AM
Subscribers

Details

Reviewers
alc
kib
dougm
markj
Summary

This is to reduce PCTRIE lookups from swap_pager_haspage().
Given swap_pager_haspage() is called frequently during heavy swap in and out activities, avoiding each individual lookup helps a lot.
For example, on i386, SWB_NPAGES is 32 and each swap_pager_haspage may call
PCTRIE_LOOKUP up to 65 times. Instead, new code will call 9 lookups for
the same size giving speed up of 700%.
On amd64, it is 64 and even twice faster as SWB_NPAGES is 64.

Test Plan

This changeset contains self-validations as a test driver.
I haven't seen any behavior changes between old and new implementations.
I will upload a version without self-test later.

make buildworld -j <hight number> causes lots of page in/out and is a
good test case

I analyzed the difference with the dtrace code below:

fbt::swap_pager_haspage:entry
{

self->ts = timestamp;

}
fbt::swap_pager_haspage:return
/self->ts/
{

@count["swap_pager_haspage count"] = count();
@avg["swap_pager_haspage average time"] = avg(timestamp - self->ts);
@sum["swap_pager_haspage total time"] = sum(timestamp - self->ts);
self->ts = 0;

}

I wanted to measure swp_meta_lookup but somehow dtrace -l did not list swp_meta_
lookup probe. So, I measured swap_pager_haspage() which calls the
swp_meta_lookup.

The results are from running buildworld from the 2 kernels with the same
base with and without the patch.
The system was setup in Parallels VM with 1.5 GB physical memory and 10GB swap s
pace on SSD.

r357119M: /usr/src/usr/obj/i386.i386/sys/GENERIC-NODEBUG i386

swap_pager_haspage count                                   33807456
swap_pager_haspage average time                               12867
swap_pager_haspage total time                          435000565713

r357119M: /usr/obj/mnt/sys/swp_meta_lookup/i386.i386/sys/GENERIC-NODEBUG i386

swap_pager_haspage count                                      88970
swap_pager_haspage average time                                8397
swap_pager_haspage total time                             747098759

average time is about 50% faster.
number of calls are reduced by about 380 times lesser.
total time is reduced by about 60 times lesser.

The speed up per each call is 50% faster and it sounds somewhat reasonable.
However, I am not sure why the total number of calls are reduced so much.

The challange of running dtrace for swap-in/swap-out is that
OOM killer frequently killed dtrace during low memory.
I had my dtrace killed nearly a half a hundred times in the past couple
of months of trying to measure the performance gain.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 27674
Build 25875: arc lint + arc unit

Event Timeline

Dropped self-test.

Updating D22409: swp_pager_meta_lookup() to find swap address and additionally search for

the number of continuous blocks.

Update and merge D22437.

Updating D22409: swp_pager_meta_lookup() to find swap address and additionally search for

the number of continuous blocks.

ota_j.email.ne.jp retitled this revision from swp_pager_meta_lookup() to find swap address and additionally search for the number of continuous blocks. to Optimize swp_pager_meta_lookup() to find additionally the numbers of continuous blocks..Nov 20 2019, 1:45 AM
ota_j.email.ne.jp edited the summary of this revision. (Show Details)
sys/vm/swap_pager.c
2163

I need to restore this KASSERT.

Where is this optimization actually used ? Almost all calls to vm_pager_has_page() pass NULLs for behind/ahead.

sys/vm/swap_pager.c
2162

before != NULL, there and in all other places

2163

How could this condition (left part before ||) be true. And even if it is, why do we care ?

sys/vm/swap_pager.c
1076

These swp_pager_meta_lookup function calls here and below are meld into a single swp_pager_meta_lookup. For up to the size of SWAP_META_PAGES, swp_pager_meta_lookup can access adjacent elements via array access avoiding most of PCTRIE lookup.

2163

The argument is an integer and vm_pindex_t is unsigned long long; I wanted to check negative values earlier to avoid comparing these 2 different types.

sys/vm/swap_pager.c
1057

This is changing the values of maxahead and maxbehind passed in swap_pager_getpages(), so the caller's hints may not be respected.

2170

This comment seems a little misleading? vm_pindex_t is unsigned.

2219

I find this function quite hard to read: there are many local variables and adjustments by one. Is it possible to simplify this at all?

Thank you for quick response, Mark.

sys/vm/swap_pager.c
2170

I will change this to something like /* be sure to avoid under-flow */

2219

I will see how I can simplify.

Split backward and forward search into separate functions for ease of reading.

sys/vm/swap_pager.c
1057

Existing code has limit with look like:
for (i = 1; i < SWB_NPAGES; i++)

sys/vm/swap_pager.c
2170

Problem cases were when pindex was very small like 1 and *before was larger, like 31, the subtraction resulted a huge number.

I think the comment is now accurate.

sys/vm/swap_pager.c
1068

I think these initializations should be done in swp_pager_meta_lookup() now.

2165

Style, } else { should be one line.

2182

I think it is clearer to write pindex + *after < pindex.

Fixed style, moved 0 assignment to *before and *after when SWWAPBLK_NONE,
and adjusted if/else statement for *after case.

ota_j.email.ne.jp marked 8 inline comments as done.

Address other review comments.

sys/vm/swap_pager.c
1057

I'm interested in removing SWB_NPAGES ceiling and see if it improves performance. However, for now, I want to keep the same behavior.

2170

I'm removing, actually.

I think I addressed all of feed backs so far.
I'm wondering if someone can take a look.