Paths

Table of Contentst

-
head/sys/
-
sys/
-
sys/
-
vmmeter.h
-
vm/
-
swap_pager.c
-
vm_fault.c
-
vm_meter.c
-
vm_object.c
-
vm_page.h
-
vm_page.c
-
vm_pageout.c

PQ_LAUNDRY
ClosedPublic
Actions

Authored by alc on Oct 20 2016, 5:44 PM.

Details

Reviewers

kib
markj

Commits

rS308474: Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty

Summary

Introduce a new page queue, PQ_LAUNDRY, for unreferenced, i.e., inactive, dirty pages and a new thread for laundering the pages on this queue. In essence, this change decouples page laundering and reclamation. For example, one effect of this decoupling is that the legacy page daemon thread(s) will no longer block because laundering anonymous pages consumes all of the available pbufs for writing to swap. Instead, they are able to continue with page reclamation. This eliminates the need for dubious low-memory deadlock avoidance hacks, specifically, the vm_page_try_to_cache() calls in I/O completion handlers.

The laundry thread sleeps while waiting from a request from the pagedaemon(s). A request is raised by setting vm_laundry_request and waking the laundry thread. We request launderings for two reasons: to try and balance the inactive and laundry queue sizes (background laundering), and to quickly make up for a shortage of free and clean inactive pages (shortfall). When a background laundering is requested, the laundry thread computes the number of pagedaemon wakeups that have taken place since the last laundering. If this number is large enough relative to the ratio of the laundry and (global) inactive queue sizes, we will launder vm_background_launder_target pages at vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back to sleep without doing any work. When scanning the laundry queue during background laundering, reactivated pages are counted towards the laundry thread's target.

A shortfall laundering is requested when an inactive queue scan fails to meet its target. In this case, the laundry thread attempts to launder enough pages to meet v_free_target within 0.5s, the inactive scan period.

A laundry request can be latched while another is currently being serviced. A shortfall request will immediately preempt a background laundering.

The change also redefines the meaning of vm_cnt.v_reactivated and removes the functions vm_page_cache() and vm_page_try_to_cache(). vm_cnt.v_reactivated now represents the number of inactive or laundry pages that are returned to the active queue on account of a reference.

Test Plan

For testing shortfall, a modified sysbench which uses NCPU threads to write large NOSYNC-mapped files. This effectively forces all of system memory through the laundry queue; in testing on a system with 16GB of RAM, the laundry thread is able to write pages at 200-250MB/s to an SSD backing a UFS filesystem.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

alc updated this revision to Diff 21549.Oct 20 2016, 5:44 PM

alc retitled this revision from to PQ_LAUNDRY.

alc updated this object.

alc edited the test plan for this revision. (Show Details)

alc added reviewers: kib, markj.

alc updated this object.Oct 20 2016, 6:43 PM

markj updated this object.Oct 20 2016, 8:35 PM

markj added a subscriber: pho.

markj updated this object.Oct 20 2016, 8:41 PM

alc added inline comments.Oct 20 2016, 10:39 PM

sys/vm/vm_pageout.c
238–240 ↗	(On Diff #21549)	Kostik, what does Bruce say about proper SYSCTL style? Is the CTLFLAG_RW supposed to be on the first line and the indentation of the continuation lines four spaces? (Currently, the new SYSCTL's are following the style of the existing ones in this file.)

kib added inline comments.Oct 21 2016, 11:09 AM

sys/vm/swap_pager.c
1641 ↗	(On Diff #21549)	This comment is confusing IMO. Due to the backing store free, the page is marked dirty, and not queued. But since the page is not worth keeping in memory (it was in swap after all), it should go into laundry.
sys/vm/vm_pageout.c
238–240 ↗	(On Diff #21549)	I am only sure about the 4-spaces indent and the new line before description. I used line break after the CTLFLAG* flags, but indeed Bruce might said that line break should be used before.
477 ↗	(On Diff #21549)	What code is supposed to re-queue the page after flush ? I see that the patch added a call to vm_page_deactivate_noreuse() for write completion in the swap pager, but I failed to find something that would re-queue pages for the local vnode pager.

alc updated this revision to Diff 21595.Oct 21 2016, 4:47 PM

alc marked an inline comment as done.

alc added inline comments.

sys/vm/swap_pager.c
1641 ↗	(On Diff #21549)	Done.

alc updated this object.Oct 21 2016, 4:51 PM

alc marked an inline comment as done.

It occurred to me that the behaviour of madvise(MADV_DONTNEED) for dirty pages is less than ideal on this branch. Right now, we always requeue such pages at the tail of the inactive queue, meaning that we may move them out of the laundry queue. That seems incorrect.

I propose modifying vm_page_advise() so that we call vm_page_launder() on dirty pages, rather than _vm_page_deactivate().

In D8302#173090, @markj wrote:

It occurred to me that the behaviour of madvise(MADV_DONTNEED) for dirty pages is less than ideal on this branch. Right now, we always requeue such pages at the tail of the inactive queue, meaning that we may move them out of the laundry queue. That seems incorrect.

I propose modifying vm_page_advise() so that we call vm_page_launder() on dirty pages, rather than _vm_page_deactivate().

Yes, do it.

markj added inline comments.Oct 21 2016, 9:01 PM

sys/vm/vm_pageout.c
477 ↗	(On Diff #21549)	I think this is a valid problem in that there is no general mechanism to requeue the pages. The buffer cache takes care of this for some filesystems it seems, but I don't see how it would work for ZFS.

kib added inline comments.Oct 22 2016, 9:23 AM

sys/vm/vm_pageout.c
477 ↗	(On Diff #21549)	Thank you for pointing out the buffer cache involvement there, indeed, for filesystems using buffer cache it happens automatically. But IMO relying of the pager behaviour is wrong, and ZFS should not know such peculiarities of VM. I would expect that e.g. completion code for vm_pageout_flush() take care of the re-queue, esp. because it already handles it for some error cases.

markj added inline comments.Oct 22 2016, 6:37 PM

sys/vm/vm_pageout.c
477 ↗	(On Diff #21549)	Even for filesystems which use the buffer cache, I don't see anything that automatically unwires the buf's pages once the write completes. vnode_pager_generic_putpages() specifies IO_VMIO, which is translated to B_RELBUF by ext2 and UFS, but it seems that this should really be enforced by generic code. Unfortunately, it doesn't seem possible for vm_pageout_flush() or even the vnode pager to specify a completion handler - VOP_PUTPAGES provides no mechanism to do so. Am I missing something?

kib added inline comments.Oct 23 2016, 1:23 AM

sys/vm/vm_pageout.c
477 ↗	(On Diff #21549)	We do not need to force unwire. It is enough for pages to be queued laterm when the buffer is recycled. Buffer cache size is limited to the fixed amount, so the count of pages participating in the VMIO buffers and not visible to the page daemon is limited. OTOH, pages that not queued because they were missed are effectively unswappable until the owning object is destroyed. VOP_PUTPAGES() is synchronous, more, the typical operation of the vnode pager marks the page clean before the buffer write is initiated. It is, so to say, migrate the dirtyness from the pages to buffer. I mean that vm_pageout_flush() could re-queue the pages after the pager returned.

markj added inline comments.Oct 23 2016, 2:52 AM

sys/vm/vm_pageout.c
477 ↗	(On Diff #21549)	Forcing an unwire is not strictly necessary, but in the case of laundering, the pages have gone through LRU and are eligible for reclamation. It seems strange to let them exert pressure on the bufspace and go through the buffer cache's own LRU. I think UFS' current behaviour of specifying B_RELBUF is correct. VOP_PUTPAGES is not synchronous by default - one needs to specify VM_PAGER_PUT_SYNC, and this is only done when v_free_count < v_pageout_free_min. Even if VOP_PUTPAGES allowed one to specify an iodone handler like VOP_GETPAGES_ASYNC does, I don't see a way to implement it as a generic vnode method. vop_stdputpages() currently just calls VOP_WRITE, which also doesn't provide notifications for async writes.

kib added inline comments.Oct 23 2016, 11:52 AM

sys/vm/vm_pageout.c
477 ↗	(On Diff #21549)	If you want to modify the bufcache behaviour WRT unwiring of the laundered pages, then vfs_vmio_unwire() looks like a proper place. It already tried to free pages or affect their LRU position on unwire in several cases, so one more case is not too outstanding. I tried to express that VOP_PUTPAGES() is synchronous from the VM PoV: the page is marked clean outright, even before the write is scheduled somehow in the io subsystem. It is io level which records the need of performing write, and e.g. for clustering allowed (async putpages in VM terms), the dirty buffer may sit on the dirty queue until buffer or syncer daemons care about it. But from the VM look, the page is clean after the successful return from vm_pager_putpages(), and sometimes even earlier. So vm_pageout_fault() can do whatever re-queuing attempts it finds suitable, after the pager call.

I have run stress2 testes on i386 and amd64.
I ran a buildkernel on both i386 and amd64 with 256MB RAM / UP.
Buildworld was run with various small RAM configurations.
No problems seen.

alc added inline comments.Oct 23 2016, 6:45 PM

sys/vm/vm_pageout.c
477 ↗	(On Diff #21549)	Do we use VM_PAGER_PEND anywhere besides the swap pager? To Kostik's point, I think that I agree. We should remove the vm_page_dequeue() calls from vm_pageout_cluster() and instead call vm_page_deactivate_noreuse() in vm_pageout_flush() when the pager returns VM_PAGER_OK. However, there is one catch. We shouldn't automatically call vm_page_deactivate_noreuse() when vm_pageout_flush() is called by msync(), or in general any caller besides the laundry thread. I think it would suffice to test whether the page is in the laundry queue, and only call vm_page_deactivate_noreuse() if it is. That way, we would also handle the case where msync() is performed on a page in the laundry queue. Turning to Mark's point, we ought to tell the buffer cache to immediately release the buffer and perform vm_page_deactivate_noreuse() on the pages. However, I don't think that any of the existing flags that vm_pageout_flush() can pass to vm_pager_put_pages() accomplishes the latter. Am I wrong?

markj added inline comments.Oct 23 2016, 8:38 PM

sys/vm/vm_pageout.c
477 ↗	(On Diff #21549)	Ok, I understand the suggestion now. I think that queuing the page using vm_page_deactivate_noreuse() if it was on the laundry queue is a reasonable policy, and we can use the B_NOREUSE flag to effect this in the buffer cache. It does indeed seem like we need to add a new VM_PAGER_PUT_* flag to signal our intent to VOP_PUTPAGES, and it also needs to be plumbed through VOP_WRITE somehow for the generic PUTPAGES implementation. If we add a new VM_PAGER_PUT_* flag, then we actually don't need to test whether the page is in the laundry queue: vm_pageout_flush() takes the pager flags as a parameter, so we can just set the flag in vm_pageout_cluster() and use that to determine where to queue. That way, msync and so on will be unaffected. VM_PAGER_PEND only appears to be set in the swap pager.

alc added inline comments.Oct 23 2016, 9:04 PM

sys/vm/vm_pageout.c
477 ↗	(On Diff #21549)	That way, msync and so on will be unaffected. We might have a page in the laundry queue that is actually laundered by the msync(2) call. In that case, we would want the page to be moved to the inactive queue. If the page has been referenced while in the laundry queue, there shouldn't be a problem with having used vm_page_deactivate_noreuse() on the page because vm_pageout_scan() will see the reference and not reclaim the page.

alc added inline comments.Oct 23 2016, 10:22 PM

sys/vm/vm_pageout.c

477 ↗

(On Diff #21549)

Here is the proposed patch:

Index: vm/vm_pageout.c
===================================================================
--- vm/vm_pageout.c     (revision 307753)
+++ vm/vm_pageout.c     (working copy)
@@ -405,7 +405,6 @@ vm_pageout_cluster(vm_page_t m)
         */
        vm_page_assert_unbusied(m);
        KASSERT(m->hold_count == 0, ("page %p is held", m));
-       vm_page_dequeue(m);
        vm_page_unlock(m);
 
        mc[vm_pageout_page_count] = pb = ps = m;
@@ -448,7 +447,6 @@ more:
                        ib = 0;
                        break;
                }
-               vm_page_dequeue(p);
                vm_page_unlock(p);
                mc[--page_base] = pb = p;
                ++pageout_count;
@@ -474,7 +472,6 @@ more:
                        vm_page_unlock(p);
                        break;
                }
-               vm_page_dequeue(p);
                vm_page_unlock(p);
                mc[page_base + pageout_count] = ps = p;
                ++pageout_count;
@@ -550,6 +547,10 @@ vm_pageout_flush(vm_page_t *mc, int count, int fla
                    ("vm_pageout_flush: page %p is not write protected", mt));
                switch (pageout_status[i]) {
                case VM_PAGER_OK:
+                       vm_page_lock(mt);
+                       if (vm_page_in_laundry(mt))
+                               vm_page_deactivate_noreuse(mt);
+                       vm_page_unlock(mt);
                case VM_PAGER_PEND:
                        numpagedout++;
                        break;

kib added inline comments.Oct 23 2016, 11:57 PM

sys/vm/vm_pageout.c
477 ↗	(On Diff #21549)	Looks fine.

markj added inline comments.Oct 24 2016, 12:52 AM

sys/vm/vm_pageout.c
477 ↗	(On Diff #21549)	Seems right to me. I can work on the corresponding buffer cache change, but that's probably not a prerequisite to merging PQ_LAUNDRY?

alc updated this revision to Diff 21657.Oct 24 2016, 5:26 PM

alc edited edge metadata.

alc marked an inline comment as done.Oct 24 2016, 5:32 PM

alc added inline comments.

sys/vm/vm_pageout.c
477 ↗	(On Diff #21549)	No, I don't think it's a prerequisite.

alc added inline comments.Oct 24 2016, 9:30 PM

sys/sys/vmmeter.h
100 ↗	(On Diff #21657)	This ought to have a more accurate description. Suggestions?

markj edited the test plan for this revision. (Show Details)Oct 24 2016, 10:55 PM

markj added inline comments.

sys/sys/vmmeter.h
100 ↗	(On Diff #21657)	"pages eligible for laundering"?

Revise the description of v_laundry_count.

alc marked an inline comment as done.Oct 25 2016, 4:18 AM

emaste added a subscriber: emaste.Oct 25 2016, 2:25 PM

alc added inline comments.Oct 25 2016, 7:03 PM

sys/vm/vm_pageout.c
1978 ↗	(On Diff #21661)	Should the "2" here be "VM_INACT_SCAN_INTERVAL"?

markj added inline comments.Oct 26 2016, 8:47 PM

sys/vm/vm_pageout.c

1978 ↗

(On Diff #21661)

Yes. That name doesn't really make sense though - it's a rate.

How about:

diff --git a/sys/vm/vm_pageout.c b/sys/vm/vm_pageout.c
index d09dccb..c996797 100644
--- a/sys/vm/vm_pageout.c
+++ b/sys/vm/vm_pageout.c
@@ -155,11 +155,9 @@ static struct kproc_desc vm_kp = {
 SYSINIT(vmdaemon, SI_SUB_KTHREAD_VM, SI_ORDER_FIRST, kproc_start, &vm_kp);
 #endif

-/* Sleep intervals for pagedaemon threads, in subdivisions of one second. */
-#define        VM_LAUNDER_INTERVAL     10
-#define        VM_INACT_SCAN_INTERVAL  2
-
-#define        VM_LAUNDER_RATE         (VM_LAUNDER_INTERVAL / VM_INACT_SCAN_INTERVAL)
+/* Pagedaemon activity rates, in subdivisions of one second. */
+#define        VM_LAUNDER_RATE         10
+#define        VM_INACT_SCAN_RATE      2

 int vm_pageout_deficit;                /* Estimated number of pages deficit */
 u_int vm_pageout_wakeup_thresh;
@@ -1149,7 +1147,7 @@ vm_pageout_laundry_worker(void *arg)
                 */
                if (shortfall > 0) {
                        in_shortfall = true;
-                       shortfall_cycle = VM_LAUNDER_RATE;
+                       shortfall_cycle = VM_LAUNDER_RATE / VM_INACT_SCAN_RATE;
                        target = shortfall;
                } else if (!in_shortfall)
                        goto trybackground;
@@ -1211,7 +1209,7 @@ trybackground:
                                target = 0;
                        }
                        launder = vm_background_launder_rate * PAGE_SIZE / 1024;
-                       launder /= VM_LAUNDER_INTERVAL;
+                       launder /= VM_LAUNDER_RATE;
                        if (launder > target)
                                launder = target;
                }
@@ -1225,7 +1223,7 @@ dolaundry:
                         */
                        target -= min(vm_pageout_launder(domain, launder,
                            in_shortfall), target);
-                       pause("laundp", hz / VM_LAUNDER_INTERVAL);
+                       pause("laundp", hz / VM_LAUNDER_RATE);
                }

                /*
@@ -2001,7 +1999,7 @@ vm_pageout_worker(void *arg)
                         */
                        mtx_unlock(&vm_page_queue_free_mtx);
                        if (pass >= 1)
-                               pause("psleep", hz / 2);
+                               pause("psleep", hz / VM_INACT_SCAN_RATE);
                        pass++;
                } else {
                        /*

alc added inline comments.Oct 26 2016, 10:49 PM

sys/vm/vm_pageout.c
1978 ↗	(On Diff #21661)	I agree. Commit your proposed change.

alc updated this revision to Diff 21728.Oct 27 2016, 7:49 AM

Style only

alc added inline comments.Oct 27 2016, 4:05 PM

sys/vm/vm_pageout.c
563 ↗	(On Diff #21736)	I don't think that we ever consciously chose between vm_page_deactivate() and vm_page_deactivate_noreuse() here. Using vm_page_deactivate() will preserve the contents of this "failed page" from reclamation for a little longer. Is there actually a reason to prefer that?

markj added inline comments.Oct 27 2016, 7:16 PM

sys/vm/vm_pageout.c
563 ↗	(On Diff #21736)	I can't see any good reason either way. It looks like the failure modes that lead to VM_PAGER_BAD are transient (e.g. vnode is being reclaimed) and will lead to the page being freed by another mechanism.

alc added inline comments.Oct 28 2016, 4:12 PM

sys/vm/vm_pageout.c
563 ↗	(On Diff #21736)	After sleeping on the question and your response, I have a slight preference for using the _noreuse option. I'm also going to condition the _noreuse call on whether the page is in the laundry, like we did for the OK and PEND cases. This way, msync() pages will remain in their current queue, unless they were in the laundry.

Tweak vm_pageout_flush()'s handling of the VM_PAGER_BAD case.

alc marked 2 inline comments as done.Oct 28 2016, 4:42 PM

I went through the entire diff again yesterday. I don't have any other questions or planned changes.

In D8302#174196, @alc wrote:

I went through the entire diff again yesterday. I don't have any other questions or planned changes.

Cool! I don't have anything further to add at the moment either.

In D8302#174211, @markj wrote:

In D8302#174196, @alc wrote:

I went through the entire diff again yesterday. I don't have any other questions or planned changes.

Cool! I don't have anything further to add at the moment either.

Could you please rerun your stress test from the test plan so that the caveat about "this test result was from a few months ago" can be removed? (I'm viewing the summary as a draft of the commit message.)

Peter,

Could you please boot a PQ_LAUNDRY kernel with VM_NUMA_ALLOC enabled and put enough memory stress on it to trigger page laundering? I just want to double check that we haven't introduced any regressions in the current NUMA support.

In D8302#174630, @alc wrote:

In D8302#174211, @markj wrote:

In D8302#174196, @alc wrote:

I went through the entire diff again yesterday. I don't have any other questions or planned changes.

Cool! I don't have anything further to add at the moment either.

Could you please rerun your stress test from the test plan so that the caveat about "this test result was from a few months ago" can be removed? (I'm viewing the summary as a draft of the commit message.)

Sure, will do tonight.

In D8302#174631, @alc wrote:

Peter,

Could you please boot a PQ_LAUNDRY kernel with VM_NUMA_ALLOC enabled and put enough memory stress on it to trigger page laundering? I just want to double check that we haven't introduced any regressions in the current NUMA support.

Sure.
18:29:26 vm.stats.vm.v_laundry_count: 0
18:30:22 vm.stats.vm.v_laundry_count: 2437688
18:30:38 vm.stats.vm.v_laundry_count: 2448018
^C
$ uname -a
FreeBSD t2.osted.lan 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r308137: Mon Oct 31 18:19:24 CET 2016 pho@t2.osted.lan:/var/tmp/PQ_LAUNDRY/sys/amd64/compile/MAXMEMDOM amd64
$ sysctl vm.ndomains
vm.ndomains: 2

In D8302#174638, @markj wrote:

In D8302#174630, @alc wrote:

In D8302#174211, @markj wrote:

In D8302#174196, @alc wrote:

I went through the entire diff again yesterday. I don't have any other questions or planned changes.

Cool! I don't have anything further to add at the moment either.

Could you please rerun your stress test from the test plan so that the caveat about "this test result was from a few months ago" can be removed? (I'm viewing the summary as a draft of the commit message.)

Sure, will do tonight.

I ran the test overnight on HEAD and haven't hit any problems, so I just removed that reference.

markj accepted this revision.Nov 2 2016, 8:14 PM

markj edited edge metadata.

This revision is now accepted and ready to land.Nov 2 2016, 8:14 PM

kib accepted this revision.Nov 2 2016, 8:23 PM

kib edited edge metadata.

markj edited edge metadata.Nov 2 2016, 8:24 PM

markj added a subscriber: jhb.

kbowling added a subscriber: kbowling.Nov 3 2016, 9:17 PM

Closed by commit rS308474: Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirty (authored by alc). · Explain WhyNov 9 2016, 6:48 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

Path

Size

head/

sys/

vmmeter.h

24 lines

vm/

28 lines

9 lines

38 lines

4 lines

30 lines

180 lines

693 lines

Diff 22110

View Options

head/sys/sys/vmmeter.h

Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	struct vmmeter {
u_int v_swapout; /* (p) swap pager pageouts */		u_int v_swapout; /* (p) swap pager pageouts */
u_int v_swappgsin; /* (p) swap pager pages paged in */		u_int v_swappgsin; /* (p) swap pager pages paged in */
u_int v_swappgsout; /* (p) swap pager pages paged out */		u_int v_swappgsout; /* (p) swap pager pages paged out */
u_int v_vnodein; /* (p) vnode pager pageins */		u_int v_vnodein; /* (p) vnode pager pageins */
u_int v_vnodeout; /* (p) vnode pager pageouts */		u_int v_vnodeout; /* (p) vnode pager pageouts */
u_int v_vnodepgsin; /* (p) vnode_pager pages paged in */		u_int v_vnodepgsin; /* (p) vnode_pager pages paged in */
u_int v_vnodepgsout; /* (p) vnode pager pages paged out */		u_int v_vnodepgsout; /* (p) vnode pager pages paged out */
u_int v_intrans; /* (p) intransit blocking page faults */		u_int v_intrans; /* (p) intransit blocking page faults */
u_int v_reactivated; /* (f) pages reactivated from free list */		u_int v_reactivated; /* (p) pages reactivated by the pagedaemon */
u_int v_pdwakeups; /* (p) times daemon has awaken from sleep */		u_int v_pdwakeups; /* (p) times daemon has awaken from sleep */
u_int v_pdpages; /* (p) pages analyzed by daemon */		u_int v_pdpages; /* (p) pages analyzed by daemon */
		u_int v_pdshortfalls; /* (p) page reclamation shortfalls */

u_int v_tcached; /* (p) total pages cached */		u_int v_tcached; /* (p) total pages cached */
u_int v_dfree; /* (p) pages freed by daemon */		u_int v_dfree; /* (p) pages freed by daemon */
u_int v_pfree; /* (p) pages freed by exiting processes */		u_int v_pfree; /* (p) pages freed by exiting processes */
u_int v_tfree; /* (p) total pages freed */		u_int v_tfree; /* (p) total pages freed */
/*		/*
* Distribution of page usages.		* Distribution of page usages.
*/		*/
u_int v_page_size; /* (c) page size in bytes */		u_int v_page_size; /* (c) page size in bytes */
u_int v_page_count; /* (c) total number of pages in system */		u_int v_page_count; /* (c) total number of pages in system */
u_int v_free_reserved; /* (c) pages reserved for deadlock */		u_int v_free_reserved; /* (c) pages reserved for deadlock */
u_int v_free_target; /* (c) pages desired free */		u_int v_free_target; /* (c) pages desired free */
u_int v_free_min; /* (c) pages desired free */		u_int v_free_min; /* (c) pages desired free */
u_int v_free_count; /* (f) pages free */		u_int v_free_count; /* (f) pages free */
u_int v_wire_count; /* (a) pages wired down */		u_int v_wire_count; /* (a) pages wired down */
u_int v_active_count; /* (q) pages active */		u_int v_active_count; /* (q) pages active */
u_int v_inactive_target; /* (c) pages desired inactive */		u_int v_inactive_target; /* (c) pages desired inactive */
u_int v_inactive_count; /* (q) pages inactive */		u_int v_inactive_count; /* (q) pages inactive */
		u_int v_laundry_count; /* (q) pages eligible for laundering */
u_int v_cache_count; /* (f) pages on cache queue */		u_int v_cache_count; /* (f) pages on cache queue */
u_int v_pageout_free_min; /* (c) min pages reserved for kernel */		u_int v_pageout_free_min; /* (c) min pages reserved for kernel */
u_int v_interrupt_free_min; /* (c) reserved pages for int code */		u_int v_interrupt_free_min; /* (c) reserved pages for int code */
u_int v_free_severe; /* (c) severe page depletion point */		u_int v_free_severe; /* (c) severe page depletion point */
/*		/*
* Fork/vfork/rfork activity.		* Fork/vfork/rfork activity.
*/		*/
u_int v_forks; /* (p) fork() calls */		u_int v_forks; /* (p) fork() calls */
u_int v_vforks; /* (p) vfork() calls */		u_int v_vforks; /* (p) vfork() calls */
u_int v_rforks; /* (p) rfork() calls */		u_int v_rforks; /* (p) rfork() calls */
u_int v_kthreads; /* (p) fork() calls by kernel */		u_int v_kthreads; /* (p) fork() calls by kernel */
u_int v_forkpages; /* (p) VM pages affected by fork() */		u_int v_forkpages; /* (p) VM pages affected by fork() */
u_int v_vforkpages; /* (p) VM pages affected by vfork() */		u_int v_vforkpages; /* (p) VM pages affected by vfork() */
u_int v_rforkpages; /* (p) VM pages affected by rfork() */		u_int v_rforkpages; /* (p) VM pages affected by rfork() */
u_int v_kthreadpages; /* (p) VM pages affected by fork() by kernel */		u_int v_kthreadpages; /* (p) VM pages affected by fork() by kernel */
u_int v_spare[2];
};		};
#ifdef _KERNEL		#ifdef _KERNEL

extern struct vmmeter vm_cnt;		extern struct vmmeter vm_cnt;

extern u_int vm_pageout_wakeup_thresh;		extern u_int vm_pageout_wakeup_thresh;

/*		/*
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
*/		*/
static inline int		static inline int
vm_paging_needed(void)		vm_paging_needed(void)
{		{

return (vm_cnt.v_free_count + vm_cnt.v_cache_count <		return (vm_cnt.v_free_count + vm_cnt.v_cache_count <
vm_pageout_wakeup_thresh);		vm_pageout_wakeup_thresh);
}		}

		/*
		* Return the number of pages we need to launder.
		* A positive number indicates that we have a shortfall of clean pages.
		*/
		static inline int
		vm_laundry_target(void)
		{

		return (vm_paging_target());
		}

		/*
		* Obtain the value of a per-CPU counter.
		*/
		#define VM_METER_PCPU_CNT(member) \
		vm_meter_cnt(__offsetof(struct vmmeter, member))

		u_int vm_meter_cnt(size_t);

#endif		#endif

/* systemwide totals computed every five seconds */		/* systemwide totals computed every five seconds */
struct vmtotal {		struct vmtotal {
int16_t t_rq; /* length of the run queue */		int16_t t_rq; /* length of the run queue */
int16_t t_dw; /* jobs in ``disk wait'' (neg priority) */		int16_t t_dw; /* jobs in ``disk wait'' (neg priority) */
int16_t t_pw; /* jobs in page wait */		int16_t t_pw; /* jobs in page wait */
Show All 14 Lines

View Options

head/sys/vm/swap_pager.c

Show First 20 Lines • Show All 1,543 Lines • ▼ Show 20 Lines	if (bp->b_ioflags & BIO_ERROR) {
if (i < bp->b_pgbefore \|\|		if (i < bp->b_pgbefore \|\|
i >= bp->b_npages - bp->b_pgafter)		i >= bp->b_npages - bp->b_pgafter)
vm_page_readahead_finish(m);		vm_page_readahead_finish(m);
} else {		} else {
/*		/*
* For write success, clear the dirty		* For write success, clear the dirty
* status, then finish the I/O ( which decrements the		* status, then finish the I/O ( which decrements the
* busy count and possibly wakes waiter's up ).		* busy count and possibly wakes waiter's up ).
		* A page is only written to swap after a period of
		* inactivity. Therefore, we do not expect it to be
		* reused.
*/		*/
KASSERT(!pmap_page_is_write_mapped(m),		KASSERT(!pmap_page_is_write_mapped(m),
("swp_pager_async_iodone: page %p is not write"		("swp_pager_async_iodone: page %p is not write"
" protected", m));		" protected", m));
vm_page_undirty(m);		vm_page_undirty(m);
vm_page_sunbusy(m);
if (vm_page_count_severe()) {
vm_page_lock(m);		vm_page_lock(m);
vm_page_try_to_cache(m);		vm_page_deactivate_noreuse(m);
vm_page_unlock(m);		vm_page_unlock(m);
		vm_page_sunbusy(m);
}		}
}		}
}

/*		/*
* adjust pip. NOTE: the original parent may still have its own		* adjust pip. NOTE: the original parent may still have its own
* pip refs on the object.		* pip refs on the object.
*/		*/
if (object != NULL) {		if (object != NULL) {
vm_object_pip_wakeupn(object, bp->b_npages);		vm_object_pip_wakeupn(object, bp->b_npages);
VM_OBJECT_WUNLOCK(object);		VM_OBJECT_WUNLOCK(object);
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	swap_pager_isswapped(vm_object_t object, struct swdevt *sp)
}		}
mtx_unlock(&swhash_mtx);		mtx_unlock(&swhash_mtx);
return (0);		return (0);
}		}

/*		/*
* SWP_PAGER_FORCE_PAGEIN() - force a swap block to be paged in		* SWP_PAGER_FORCE_PAGEIN() - force a swap block to be paged in
*		*
* This routine dissociates the page at the given index within a		* This routine dissociates the page at the given index within an object
* swap block from its backing store, paging it in if necessary.		* from its backing store, paging it in if it does not reside in memory.
* If the page is paged in, it is placed in the inactive queue,		* If the page is paged in, it is marked dirty and placed in the laundry
* since it had its backing store ripped out from under it.		* queue. The page is marked dirty because it no longer has backing
* We also attempt to swap in all other pages in the swap block,		* store. It is placed in the laundry queue because it has not been
* we only guarantee that the one at the specified index is		* accessed recently. Otherwise, it would already reside in memory.
		*
		* We also attempt to swap in all other pages in the swap block.
		* However, we only guarantee that the one at the specified index is
* paged in.		* paged in.
*		*
* XXX - The code to page the whole block in doesn't work, so we		* XXX - The code to page the whole block in doesn't work, so we
* revert to the one-by-one behavior for now. Sigh.		* revert to the one-by-one behavior for now. Sigh.
*/		*/
static inline void		static inline void
swp_pager_force_pagein(vm_object_t object, vm_pindex_t pindex)		swp_pager_force_pagein(vm_object_t object, vm_pindex_t pindex)
{		{
Show All 12 Lines	if (m->valid == VM_PAGE_BITS_ALL) {
return;		return;
}		}

if (swap_pager_getpages(object, &m, 1, NULL, NULL) != VM_PAGER_OK)		if (swap_pager_getpages(object, &m, 1, NULL, NULL) != VM_PAGER_OK)
panic("swap_pager_force_pagein: read from swap failed");/XXX/		panic("swap_pager_force_pagein: read from swap failed");/XXX/
vm_object_pip_wakeup(object);		vm_object_pip_wakeup(object);
vm_page_dirty(m);		vm_page_dirty(m);
vm_page_lock(m);		vm_page_lock(m);
vm_page_deactivate(m);		vm_page_launder(m);
vm_page_unlock(m);		vm_page_unlock(m);
vm_page_xunbusy(m);		vm_page_xunbusy(m);
vm_pager_page_unswapped(m);		vm_pager_page_unswapped(m);
}		}

/*		/*
* swap_pager_swapoff:		* swap_pager_swapoff:
*		*
▲ Show 20 Lines • Show All 1,097 Lines • Show Last 20 Lines

View Options

head/sys/vm/vm_fault.c

Show First 20 Lines • Show All 284 Lines • ▼ Show 20 Lines
#endif		#endif
return (result);		return (result);
}		}

int		int
vm_fault_hold(vm_map_t map, vm_offset_t vaddr, vm_prot_t fault_type,		vm_fault_hold(vm_map_t map, vm_offset_t vaddr, vm_prot_t fault_type,
int fault_flags, vm_page_t *m_hold)		int fault_flags, vm_page_t *m_hold)
{		{
vm_prot_t prot;
vm_object_t next_object;
struct faultstate fs;		struct faultstate fs;
struct vnode *vp;		struct vnode *vp;
		vm_object_t next_object, retry_object;
vm_offset_t e_end, e_start;		vm_offset_t e_end, e_start;
vm_page_t m;		vm_page_t m;
		vm_pindex_t retry_pindex;
		vm_prot_t prot, retry_prot;
int ahead, alloc_req, behind, cluster_offset, error, era, faultcount;		int ahead, alloc_req, behind, cluster_offset, error, era, faultcount;
int locked, map_generation, nera, result, rv;		int locked, map_generation, nera, result, rv;
u_char behavior;		u_char behavior;
boolean_t wired; /* Passed by reference. */		boolean_t wired; /* Passed by reference. */
bool dead, growstack, hardfault, is_first_object_locked;		bool dead, growstack, hardfault, is_first_object_locked;

PCPU_INC(cnt.v_vm_faults);		PCPU_INC(cnt.v_vm_faults);
fs.vp = NULL;		fs.vp = NULL;
▲ Show 20 Lines • Show All 634 Lines • ▼ Show 20 Lines	#endif
}		}
}		}

/*		/*
* We must verify that the maps have not changed since our last		* We must verify that the maps have not changed since our last
* lookup.		* lookup.
*/		*/
if (!fs.lookup_still_valid) {		if (!fs.lookup_still_valid) {
vm_object_t retry_object;
vm_pindex_t retry_pindex;
vm_prot_t retry_prot;

if (!vm_map_trylock_read(fs.map)) {		if (!vm_map_trylock_read(fs.map)) {
release_page(&fs);		release_page(&fs);
unlock_and_deallocate(&fs);		unlock_and_deallocate(&fs);
goto RetryFault;		goto RetryFault;
}		}
fs.lookup_still_valid = true;		fs.lookup_still_valid = true;
if (fs.map->timestamp != map_generation) {		if (fs.map->timestamp != map_generation) {
result = vm_map_lookup_locked(&fs.map, vaddr, fault_type,		result = vm_map_lookup_locked(&fs.map, vaddr, fault_type,
▲ Show 20 Lines • Show All 565 Lines • Show Last 20 Lines

View Options

head/sys/vm/vm_meter.c

Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	TAILQ_FOREACH(object, &vm_object_list, object_list) {
}		}
}		}
mtx_unlock(&vm_object_list_mtx);		mtx_unlock(&vm_object_list_mtx);
total.t_free = vm_cnt.v_free_count + vm_cnt.v_cache_count;		total.t_free = vm_cnt.v_free_count + vm_cnt.v_cache_count;
return (sysctl_handle_opaque(oidp, &total, sizeof(total), req));		return (sysctl_handle_opaque(oidp, &total, sizeof(total), req));
}		}

/*		/*
* vcnt() - accumulate statistics from all cpus and the global cnt		* vm_meter_cnt() - accumulate statistics from all cpus and the global cnt
* structure.		* structure.
*		*
* The vmmeter structure is now per-cpu as well as global. Those		* The vmmeter structure is now per-cpu as well as global. Those
* statistics which can be kept on a per-cpu basis (to avoid cache		* statistics which can be kept on a per-cpu basis (to avoid cache
* stalls between cpus) can be moved to the per-cpu vmmeter. Remaining		* stalls between cpus) can be moved to the per-cpu vmmeter. Remaining
* statistics, such as v_free_reserved, are left in the global		* statistics, such as v_free_reserved, are left in the global
* structure.		* structure.
*
* (sysctl_oid oidp, void arg1, int arg2, struct sysctl_req *req)
*/		*/
static int		u_int
vcnt(SYSCTL_HANDLER_ARGS)		vm_meter_cnt(size_t offset)
{		{
int count = (int )arg1;		struct pcpu *pcpu;
int offset = (char )arg1 - (char )&vm_cnt;		u_int count;
int i;		int i;

		count = (u_int )((char *)&vm_cnt + offset);
CPU_FOREACH(i) {		CPU_FOREACH(i) {
struct pcpu *pcpu = pcpu_find(i);		pcpu = pcpu_find(i);
count += (int )((char *)&pcpu->pc_cnt + offset);		count += (u_int )((char *)&pcpu->pc_cnt + offset);
}		}
return (SYSCTL_OUT(req, &count, sizeof(int)));		return (count);
}		}

		static int
		cnt_sysctl(SYSCTL_HANDLER_ARGS)
		{
		u_int count;

		count = vm_meter_cnt((char )arg1 - (char )&vm_cnt);
		return (SYSCTL_OUT(req, &count, sizeof(count)));
		}

SYSCTL_PROC(_vm, VM_TOTAL, vmtotal, CTLTYPE_OPAQUE\|CTLFLAG_RD\|CTLFLAG_MPSAFE,		SYSCTL_PROC(_vm, VM_TOTAL, vmtotal, CTLTYPE_OPAQUE\|CTLFLAG_RD\|CTLFLAG_MPSAFE,
0, sizeof(struct vmtotal), vmtotal, "S,vmtotal",		0, sizeof(struct vmtotal), vmtotal, "S,vmtotal",
"System virtual memory statistics");		"System virtual memory statistics");
SYSCTL_NODE(_vm, OID_AUTO, stats, CTLFLAG_RW, 0, "VM meter stats");		SYSCTL_NODE(_vm, OID_AUTO, stats, CTLFLAG_RW, 0, "VM meter stats");
static SYSCTL_NODE(_vm_stats, OID_AUTO, sys, CTLFLAG_RW, 0,		static SYSCTL_NODE(_vm_stats, OID_AUTO, sys, CTLFLAG_RW, 0,
"VM meter sys stats");		"VM meter sys stats");
static SYSCTL_NODE(_vm_stats, OID_AUTO, vm, CTLFLAG_RW, 0,		static SYSCTL_NODE(_vm_stats, OID_AUTO, vm, CTLFLAG_RW, 0,
"VM meter vm stats");		"VM meter vm stats");
SYSCTL_NODE(_vm_stats, OID_AUTO, misc, CTLFLAG_RW, 0, "VM meter misc stats");		SYSCTL_NODE(_vm_stats, OID_AUTO, misc, CTLFLAG_RW, 0, "VM meter misc stats");

#define VM_STATS(parent, var, descr) \		#define VM_STATS(parent, var, descr) \
SYSCTL_PROC(parent, OID_AUTO, var, \		SYSCTL_PROC(parent, OID_AUTO, var, \
CTLTYPE_UINT \| CTLFLAG_RD \| CTLFLAG_MPSAFE, &vm_cnt.var, 0, vcnt, \		CTLTYPE_UINT \| CTLFLAG_RD \| CTLFLAG_MPSAFE, &vm_cnt.var, 0, \
"IU", descr)		cnt_sysctl, "IU", descr)
#define VM_STATS_VM(var, descr) VM_STATS(_vm_stats_vm, var, descr)		#define VM_STATS_VM(var, descr) VM_STATS(_vm_stats_vm, var, descr)
#define VM_STATS_SYS(var, descr) VM_STATS(_vm_stats_sys, var, descr)		#define VM_STATS_SYS(var, descr) VM_STATS(_vm_stats_sys, var, descr)

VM_STATS_SYS(v_swtch, "Context switches");		VM_STATS_SYS(v_swtch, "Context switches");
VM_STATS_SYS(v_trap, "Traps");		VM_STATS_SYS(v_trap, "Traps");
VM_STATS_SYS(v_syscall, "System calls");		VM_STATS_SYS(v_syscall, "System calls");
VM_STATS_SYS(v_intr, "Device interrupts");		VM_STATS_SYS(v_intr, "Device interrupts");
VM_STATS_SYS(v_soft, "Software interrupts");		VM_STATS_SYS(v_soft, "Software interrupts");
VM_STATS_VM(v_vm_faults, "Address memory faults");		VM_STATS_VM(v_vm_faults, "Address memory faults");
VM_STATS_VM(v_io_faults, "Page faults requiring I/O");		VM_STATS_VM(v_io_faults, "Page faults requiring I/O");
VM_STATS_VM(v_cow_faults, "Copy-on-write faults");		VM_STATS_VM(v_cow_faults, "Copy-on-write faults");
VM_STATS_VM(v_cow_optim, "Optimized COW faults");		VM_STATS_VM(v_cow_optim, "Optimized COW faults");
VM_STATS_VM(v_zfod, "Pages zero-filled on demand");		VM_STATS_VM(v_zfod, "Pages zero-filled on demand");
VM_STATS_VM(v_ozfod, "Optimized zero fill pages");		VM_STATS_VM(v_ozfod, "Optimized zero fill pages");
VM_STATS_VM(v_swapin, "Swap pager pageins");		VM_STATS_VM(v_swapin, "Swap pager pageins");
VM_STATS_VM(v_swapout, "Swap pager pageouts");		VM_STATS_VM(v_swapout, "Swap pager pageouts");
VM_STATS_VM(v_swappgsin, "Swap pages swapped in");		VM_STATS_VM(v_swappgsin, "Swap pages swapped in");
VM_STATS_VM(v_swappgsout, "Swap pages swapped out");		VM_STATS_VM(v_swappgsout, "Swap pages swapped out");
VM_STATS_VM(v_vnodein, "Vnode pager pageins");		VM_STATS_VM(v_vnodein, "Vnode pager pageins");
VM_STATS_VM(v_vnodeout, "Vnode pager pageouts");		VM_STATS_VM(v_vnodeout, "Vnode pager pageouts");
VM_STATS_VM(v_vnodepgsin, "Vnode pages paged in");		VM_STATS_VM(v_vnodepgsin, "Vnode pages paged in");
VM_STATS_VM(v_vnodepgsout, "Vnode pages paged out");		VM_STATS_VM(v_vnodepgsout, "Vnode pages paged out");
VM_STATS_VM(v_intrans, "In transit page faults");		VM_STATS_VM(v_intrans, "In transit page faults");
VM_STATS_VM(v_reactivated, "Pages reactivated from free list");		VM_STATS_VM(v_reactivated, "Pages reactivated by pagedaemon");
VM_STATS_VM(v_pdwakeups, "Pagedaemon wakeups");		VM_STATS_VM(v_pdwakeups, "Pagedaemon wakeups");
VM_STATS_VM(v_pdpages, "Pages analyzed by pagedaemon");		VM_STATS_VM(v_pdpages, "Pages analyzed by pagedaemon");
		VM_STATS_VM(v_pdshortfalls, "Page reclamation shortfalls");
VM_STATS_VM(v_tcached, "Total pages cached");		VM_STATS_VM(v_tcached, "Total pages cached");
VM_STATS_VM(v_dfree, "Pages freed by pagedaemon");		VM_STATS_VM(v_dfree, "Pages freed by pagedaemon");
VM_STATS_VM(v_pfree, "Pages freed by exiting processes");		VM_STATS_VM(v_pfree, "Pages freed by exiting processes");
VM_STATS_VM(v_tfree, "Total pages freed");		VM_STATS_VM(v_tfree, "Total pages freed");
VM_STATS_VM(v_page_size, "Page size in bytes");		VM_STATS_VM(v_page_size, "Page size in bytes");
VM_STATS_VM(v_page_count, "Total number of pages in system");		VM_STATS_VM(v_page_count, "Total number of pages in system");
VM_STATS_VM(v_free_reserved, "Pages reserved for deadlock");		VM_STATS_VM(v_free_reserved, "Pages reserved for deadlock");
VM_STATS_VM(v_free_target, "Pages desired free");		VM_STATS_VM(v_free_target, "Pages desired free");
VM_STATS_VM(v_free_min, "Minimum low-free-pages threshold");		VM_STATS_VM(v_free_min, "Minimum low-free-pages threshold");
VM_STATS_VM(v_free_count, "Free pages");		VM_STATS_VM(v_free_count, "Free pages");
VM_STATS_VM(v_wire_count, "Wired pages");		VM_STATS_VM(v_wire_count, "Wired pages");
VM_STATS_VM(v_active_count, "Active pages");		VM_STATS_VM(v_active_count, "Active pages");
VM_STATS_VM(v_inactive_target, "Desired inactive pages");		VM_STATS_VM(v_inactive_target, "Desired inactive pages");
VM_STATS_VM(v_inactive_count, "Inactive pages");		VM_STATS_VM(v_inactive_count, "Inactive pages");
		VM_STATS_VM(v_laundry_count, "Pages eligible for laundering");
VM_STATS_VM(v_cache_count, "Pages on cache queue");		VM_STATS_VM(v_cache_count, "Pages on cache queue");
VM_STATS_VM(v_pageout_free_min, "Min pages reserved for kernel");		VM_STATS_VM(v_pageout_free_min, "Min pages reserved for kernel");
VM_STATS_VM(v_interrupt_free_min, "Reserved pages for interrupt code");		VM_STATS_VM(v_interrupt_free_min, "Reserved pages for interrupt code");
VM_STATS_VM(v_forks, "Number of fork() calls");		VM_STATS_VM(v_forks, "Number of fork() calls");
VM_STATS_VM(v_vforks, "Number of vfork() calls");		VM_STATS_VM(v_vforks, "Number of vfork() calls");
VM_STATS_VM(v_rforks, "Number of rfork() calls");		VM_STATS_VM(v_rforks, "Number of rfork() calls");
VM_STATS_VM(v_kthreads, "Number of fork() calls by kernel");		VM_STATS_VM(v_kthreads, "Number of fork() calls by kernel");
VM_STATS_VM(v_forkpages, "VM pages affected by fork()");		VM_STATS_VM(v_forkpages, "VM pages affected by fork()");
VM_STATS_VM(v_vforkpages, "VM pages affected by vfork()");		VM_STATS_VM(v_vforkpages, "VM pages affected by vfork()");
VM_STATS_VM(v_rforkpages, "VM pages affected by rfork()");		VM_STATS_VM(v_rforkpages, "VM pages affected by rfork()");
VM_STATS_VM(v_kthreadpages, "VM pages affected by fork() by kernel");		VM_STATS_VM(v_kthreadpages, "VM pages affected by fork() by kernel");