Optimize swp_pager_meta_lookup() to find additionally the numbers of continuous blocks.
This is to reduce PCTRIE lookups from swap_pager_haspage().
Given swap_pager_haspage() is called frequently during heavy swap in and out activities, avoiding each individual lookup helps a lot.
For example, on i386, SWB_NPAGES is 32 and each swap_pager_haspage may call
PCTRIE_LOOKUP up to 65 times. Instead, new code will call 9 lookups for
the same size giving speed up of 700%.
On amd64, it is 64 and even twice faster as SWB_NPAGES is 64.

This changeset contains self-validations as a test driver.
I haven't seen any behavior changes between old and new implementations.
I will upload a version without self-test later.

make buildworld -j <hight number> causes lots of page in/out and is a
good test case

I analyzed the difference with the dtrace code below:


self->ts = timestamp;


@count["swap_pager_haspage count"] = count();
@avg["swap_pager_haspage average time"] = avg(timestamp - self->ts);
@sum["swap_pager_haspage total time"] = sum(timestamp - self->ts);
self->ts = 0;


I wanted to measure swp_meta_lookup but somehow dtrace -l did not list swp_meta_
lookup probe. So, I measured swap_pager_haspage() which calls the

The results are from running buildworld from the 2 kernels with the same
base with and without the patch.
The system was setup in Parallels VM with 1.5 GB physical memory and 10GB swap s
pace on SSD.

r357119M: /usr/src/usr/obj/i386.i386/sys/GENERIC-NODEBUG i386

swap_pager_haspage count                                   33807456
swap_pager_haspage average time                               12867
swap_pager_haspage total time                          435000565713

r357119M: /usr/obj/mnt/sys/swp_meta_lookup/i386.i386/sys/GENERIC-NODEBUG i386

swap_pager_haspage count                                      88970
swap_pager_haspage average time                                8397
swap_pager_haspage total time                             747098759

average time is about 50% faster.
number of calls are reduced by about 380 times lesser.
total time is reduced by about 60 times lesser.

The speed up per each call is 50% faster and it sounds somewhat reasonable.
However, I am not sure why the total number of calls are reduced so much.

The challange of running dtrace for swap-in/swap-out is that
OOM killer frequently killed dtrace during low memory.
I had my dtrace killed nearly a half a hundred times in the past couple
of months of trying to measure the performance gain.

I need to restore this KASSERT.

Where is this optimization actually used ? Almost all calls to vm_pager_has_page() pass NULLs for behind/ahead.


before != NULL, there and in all other places


How could this condition (left part before ||) be true. And even if it is, why do we care ?


These swp_pager_meta_lookup function calls here and below are meld into a single swp_pager_meta_lookup. For up to the size of SWAP_META_PAGES, swp_pager_meta_lookup can access adjacent elements via array access avoiding most of PCTRIE lookup.


The argument is an integer and vm_pindex_t is unsigned long long; I wanted to check negative values earlier to avoid comparing these 2 different types.


This is changing the values of maxahead and maxbehind passed in swap_pager_getpages(), so the caller's hints may not be respected.


This comment seems a little misleading? vm_pindex_t is unsigned.


I find this function quite hard to read: there are many local variables and adjustments by one. Is it possible to simplify this at all?

Thank you for quick response, Mark.


I will change this to something like /* be sure to avoid under-flow */


I will see how I can simplify.

Split backward and forward search into separate functions for ease of reading.


Existing code has limit with look like:
for (i = 1; i < SWB_NPAGES; i++)


Problem cases were when pindex was very small like 1 and *before was larger, like 31, the subtraction resulted a huge number.

I think the comment is now accurate.


I think these initializations should be done in swp_pager_meta_lookup() now.


Style, } else { should be one line.


I think it is clearer to write pindex + *after < pindex.

Fixed style, moved 0 assignment to *before and *after when SWWAPBLK_NONE,
and adjusted if/else statement for *after case. marked 8 inline comments as done.

Address other review comments.


I'm interested in removing SWB_NPAGES ceiling and see if it improves performance. However, for now, I want to keep the same behavior.


I'm removing, actually.

I think I addressed all of feed backs so far.
I'm wondering if someone can take a look.