Page MenuHomeFreeBSD

release multiple swap blocks at a time in swp_pager_force_pagein
ClosedPublic

Authored by dougm on Jun 14 2019, 8:05 AM.

Details

Summary

Starting with D13484, adapt that patch to identify page ranges for swap block release without extra PCTRIE lookups.

Improve swp_pager_force_pagein() such that "swapoff" reads multiple blocks at a time.
Ref: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223831

Tested by: pho

Test Plan

Peter, can you test this please?

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
dougm updated this revision to Diff 58646.Jun 15 2019, 4:18 AM
dougm retitled this revision from The patch after merging in small helper functions to release multiple swap blocks at a time.
dougm edited the summary of this revision. (Show Details)
dougm added a subscriber: ota_j.email.ne.jp.
dougm updated this revision to Diff 59149.Jun 28 2019, 7:43 AM
dougm added a subscriber: alc.
dougm edited the summary of this revision. (Show Details)Jun 29 2019, 2:42 AM
dougm edited the test plan for this revision. (Show Details)
dougm added a subscriber: pho.
dougm updated this revision to Diff 59201.Jun 29 2019, 5:27 PM

Undo some changes that may not have been necessary and may have done harm.

dougm updated this revision to Diff 59205.Jun 29 2019, 6:49 PM

Reinsert missing line.

pho added a comment.Jun 29 2019, 7:26 PM

I saw the same problem with Diff 59201. I building with Diff 59205 now.

dougm updated this revision to Diff 59216.Jun 30 2019, 8:26 AM

I'm guessing, with low confidence, that reducing the max block count to nsw_cluster_max would help.

pho added a comment.Jun 30 2019, 10:56 AM

With D20635.59216.diff I see:

20190630 11:39:48 all (5/10): swap5.sh
panic: swapoff: failed to locate 8036 swap blocks
cpuid = 2
time = 1561887597
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00ad6c26a0
vpanic() at vpanic+0x19d/frame 0xfffffe00ad6c26f0
panic() at panic+0x43/frame 0xfffffe00ad6c2750
swap_pager_swapoff() at swap_pager_swapoff+0x194/frame 0xfffffe00ad6c27d0
swapoff_one() at swapoff_one+0x15e/frame 0xfffffe00ad6c2820
sys_swapoff() at sys_swapoff+0x1d6/frame 0xfffffe00ad6c2990
amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00ad6c2ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00ad6c2ab0
--- syscall (424, FreeBSD ELF64, sys_swapoff), rip = 0x8002f88ca, rsp = 0x7fffffffe4c8, rbp = 0x7fffffffe5f0 ---
KDB: enter: panic
[ thread pid 15529 tid 100290 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db> x/s version
version:        FreeBSD 13.0-CURRENT #1 r349552M: Sun Jun 30 10:48:06 CEST 2019\012    pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012
db>

The swap5.sh test scenario is a "Test with out of swap space" test using a 4g memory disk as swap.

alc added inline comments.Jul 1 2019, 7:00 AM
swap_pager.c
1743 ↗(On Diff #59216)

n_blks must not exceed MAXPHYS / PAGE_SIZE.

dougm added inline comments.Jul 1 2019, 7:04 AM
swap_pager.c
1743 ↗(On Diff #59216)

The nsw_cluster_max is constrained by the bp->b_pages[] array (MAXPHYS/PAGE_SIZE) and our locally defined MAX_PAGEOUT_CLUSTER.

alc added a comment.Jul 1 2019, 7:21 AM

Why is vm_pager_page_unswapped() no longer called?

dougm updated this revision to Diff 59244.Jul 1 2019, 7:41 AM

Add vm_pager_page_unswapped() calls.

alc added a comment.Jul 1 2019, 2:51 PM

Why is vm_object_pip_wakeup() no longer called?

dougm updated this revision to Diff 59261.Jul 1 2019, 2:56 PM

vm_object_pip_wakeup() is now called.

dougm updated this revision to Diff 59264.Jul 1 2019, 4:01 PM

Add more pip_wakeups.

pho added a comment.Jul 1 2019, 4:03 PM

D20635.59261.diff did not seem to fix the problem.

20190701 17:39:16 all (5/10): swap5.sh

root@mercat1:~ # ps
load: 0.05  cmd: csh 44768 [objde2] 2.04r 0.00u 0.00s 0% 0k
~^B
KDB: enter: Break to debugger
[ thread pid 11 tid 100003 ]
Stopped at      kdb_alt_break_internal+0x106:   movq    $0,kdb_why
db> x/s version
version:        FreeBSD 13.0-CURRENT #6 r349552M: Mon Jul  1 17:10:58 CEST 2019\012    pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012
db> ps
  pid  ppid  pgrp   uid  state   wmesg   wchan               cmd
46898  2857  2825     0  S       nanslp  0xffffffff81e8f0b6  sleep
44768  2333 44768     0  D+      objde2  0xfffff800037fe400  csh
39418  2273 39418     0  Ds      objde2  0xfffff800037fed00  sshd
 6374  2727  2727     0  DE+     objde2  0xfffff802d7200000  sh
54890  2277  2277     0  DE      objde2  0xfffff802825df900  cron
 2860  2825  2825     0  S       piperd  0xfffff8001b1232f8  sh
 2858  2825  2825     0  S       piperd  0xfffff8000af3f000  awk
 2857  2825  2825     0  S       wait    0xfffff801b64a8000  sh
 2825     1  2825     0  SWs     wait    0xfffff8000aa59000  sh
 2727  2348  2727     0  S+      wait    0xfffff800036bf530  sh
 2348  2346  2348     0  SW+     wait    0xfffff8000a30a530  bash
 2346  2345  2346     0  SW+     pause   0xfffff80003fdd5d8  csh
 2345  2341  2345  2006  SW+     wait    0xfffff80003fde000  su
 2341  2340  2341  2006  SWs+    wait    0xfffff801b62b3a60  bash
 2340  2337  2337  2006  S       select  0xfffff8001b97aac0  sshd
 2337  2273  2337     0  Ss      select  0xfffff8000ad89c40  sshd
 2333  2332  2333     0  S+      pause   0xfffff80003fddb08  csh
 2332     1  2332     0  SWs+    wait    0xfffff801b64a8a60  login
 2331     1  2331     0  Ss+     ttyin   0xfffff800061544b0  getty
db>
pho added a comment.Jul 1 2019, 4:04 PM

I'll try Diff 59264.

pho added a comment.Jul 1 2019, 5:00 PM
20190701 18:36:41 all (5/10): swap5.sh
panic: swp_pager_force_pagein: Too many pages: 32
cpuid = 8
time = 1561999005
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00adc594c0
vpanic() at vpanic+0x19d/frame 0xfffffe00adc59510
panic() at panic+0x43/frame 0xfffffe00adc59570
swp_pager_force_pagein() at swp_pager_force_pagein+0x7f/frame 0xfffffe00adc596f0
swap_pager_swapoff_object() at swap_pager_swapoff_object+0xfe/frame 0xfffffe00adc59750
swap_pager_swapoff() at swap_pager_swapoff+0xf9/frame 0xfffffe00adc597d0
swapoff_one() at swapoff_one+0x15e/frame 0xfffffe00adc59820
sys_swapoff() at sys_swapoff+0x1d6/frame 0xfffffe00adc59990
amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00adc59ab0

https://people.freebsd.org/~pho/stress/log/dougm045.txt

dougm updated this revision to Diff 59273.Jul 1 2019, 5:17 PM

Change assertion from '<' to '<='.

pho added a comment.Jul 1 2019, 6:36 PM

This version got me past the test that kept triggering problems. I'll run some more tests ...

pho added a comment.Jul 2 2019, 7:29 AM

I have completed the selected swap tests and continued to run random tests for a total of 13H30.
No problems seen.

I've been away and haven't been able to follow up or read updates on any of "swapoff" related changes for the last one week.
I won't be able to get back to this for another week; I will find the situations and figure out what to do later.
It is currently looking like Doug is finalizing changes based at quick look of incoming email titles. I appreciate your passion.

dougm updated this revision to Diff 59351.Jul 3 2019, 4:25 PM

After swp_pager_force_pagein, dont' rely on the local swap block value 'sb'. It may have been freed, so get a new one.

This addresses a problem that predates this patch.

alc added inline comments.Jul 3 2019, 6:49 PM
sys/vm/swap_pager.c
1690 ↗(On Diff #59351)

I would also KASSERT that npages is greater than zero.

1695 ↗(On Diff #59351)

Style(9)

1711 ↗(On Diff #59351)

Given that we are within swap_pager.c, I would call swap_pager_unswapped() directly. Moreover, it would be reasonable to perform that call as the last statement in swp_pager_force_launder() and swp_pager_force_dirty(). (Marking the page dirty and calling "unswapped" on it are logically related.)

1721 ↗(On Diff #59351)

Go ahead and replace the one-by-one wakeups with this call.

1755 ↗(On Diff #59351)

Please add a comment here explaining why this test is necessary.

1759–1761 ↗(On Diff #59351)

Please add a comment here explaining why a "restart" is necessary.

alc added a reviewer: alc.Jul 3 2019, 6:50 PM
dougm updated this revision to Diff 59369.Jul 3 2019, 8:52 PM

Apply reviewer suggestions.

dougm updated this revision to Diff 59370.Jul 3 2019, 8:55 PM

Fix comment typo.

alc added a comment.Jul 3 2019, 10:57 PM

I would ask Peter to retest this latest version, and if all is well add Kostik and Mark as reviewers. I am ready to rubber-stamp this version unless Peter turns up a problem.

sys/vm/swap_pager.c
541 ↗(On Diff #59370)

Keep this and the above change separate. The parentheses around MAXPHYS / PAGE_SIZE are not required.

1685 ↗(On Diff #59370)

"Page that ..." -> "Pages that ..."

1727 ↗(On Diff #59370)

"from" -> "to"

1746 ↗(On Diff #59370)

Spelling: "accumlated"

1769 ↗(On Diff #59370)

Add a blank line before this comment.

1771 ↗(On Diff #59370)

"them all out" -> "them all in"

dougm updated this revision to Diff 59379.Jul 4 2019, 2:15 AM

Apply reviewer suggestions.

dougm marked 12 inline comments as done.Jul 4 2019, 2:18 AM

This needs a final round of testing before calling for more reviewers.

alc added a comment.Jul 4 2019, 6:41 PM

Peter, can you please retest this patch? I think that all of the open issues have been addressed.

pho added a comment.Jul 4 2019, 6:52 PM
In D20635#451659, @alc wrote:

Peter, can you please retest this patch? I think that all of the open issues have been addressed.

Oh, sorry for the missing feedback. I've been testing D20635.59379.diff for 14 hours. I'd like to let the tests run a bit longer.

pho added a comment.Jul 5 2019, 5:12 AM

I have run tests on D20635.59379.diff for 24 hours without seeing any problems.

dougm retitled this revision from release multiple swap blocks at a time to release multiple swap blocks at a time in swp_pager_force_pagein.Jul 5 2019, 5:15 AM
dougm edited the summary of this revision. (Show Details)
dougm added reviewers: kib, markj.
dougm added a subscriber: emaste.
alc accepted this revision.Jul 5 2019, 5:24 AM
This revision is now accepted and ready to land.Jul 5 2019, 5:24 AM
kib added inline comments.Jul 5 2019, 11:05 AM
sys/vm/swap_pager.c
1712 ↗(On Diff #59379)

I think we traditionally put {} around the body of do/while loops, always.

markj accepted this revision.Jul 5 2019, 3:10 PM
dougm updated this revision to Diff 59445.Jul 5 2019, 3:16 PM

Add braces around one-line do-while body.

This revision now requires review to proceed.Jul 5 2019, 3:16 PM
kib accepted this revision.Jul 5 2019, 4:18 PM
This revision is now accepted and ready to land.Jul 5 2019, 4:18 PM
alc accepted this revision.Jul 5 2019, 4:27 PM
alc added inline comments.
sys/vm/swap_pager.c
1761 ↗(On Diff #59445)

I would add "()" after "swp_pager_force_pagein" to make it unambiguous that you are talking about a function. (There is no need to repost a new version of this patch here. Just add the "()" before committing.)

This revision was automatically updated to reflect the committed changes.
dougm added a comment.Oct 16 2019, 7:50 AM

It seems that this change broke something:
Hi Doug,

I ran into this panic while running tests with debug.vmmap_check=1

20191016 05:38:12 all (408/677): swappedout.sh
panic: page 0xfffffe000f288068 is neither wired nor queued
cpuid = 2
time = 1571197220
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0349836610
vpanic() at vpanic+0x19d/frame 0xfffffe0349836660
panic() at panic+0x43/frame 0xfffffe03498366c0
swp_pager_force_pagein() at swp_pager_force_pagein+0x2df/frame 0xfffffe03498367f0
swapoff_one() at swapoff_one+0x2ee/frame 0xfffffe0349836870
sys_swapoff() at sys_swapoff+0x108/frame 0xfffffe0349836980
amd64_syscall() at amd64_syscall+0x2d4/frame 0xfffffe0349836ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0349836ab0

  • syscall (424, FreeBSD ELF64, sys_swapoff), rip = 0x8002f18ea, rsp = 0x7fffffffe4b8, rbp = 0x7fffffffe5f0 ---

KDB: enter: panic
[ thread pid 26889 tid 104160 ]
Stopped at kdb_enter+0x3b: movq $0,kdb_why
db>

https://people.freebsd.org/~pho/stress/log/swappedout-2.txt

  • Peter
markj added a comment.Oct 16 2019, 3:43 PM

It seems that this change broke something:

Do you mind adding me to the email thread? I suspect this is related to work that I've done.

BTW, from Peter's report I see that VPO_UNMANAGED is set in the page. That seems unexpected.

dougm added a comment.Oct 17 2019, 6:17 AM

Here is a small patch that adds a few KASSERTS. There might be some value in seeing which of them is tripped first on the way to this failure.

dougm added a comment.Oct 17 2019, 3:15 PM

Fix a bug in the debug patch.

pho added a comment.Oct 18 2019, 8:26 AM
20191018 08:58:59 all (284/676): pipe2.sh

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address	= 0xf90
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80f1177e
stack pointer	        = 0x0:0xfffffe00d3224530
frame pointer	        = 0x0:0xfffffe00d3224580
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 30 (dom0)
trap number		= 12
panic: page fault
cpuid = 1
time = 1571381969
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00d32241e0
vpanic() at vpanic+0x19d/frame 0xfffffe00d3224230
panic() at panic+0x43/frame 0xfffffe00d3224290
trap_fatal() at trap_fatal+0x39c/frame 0xfffffe00d32242f0
trap_pfault() at trap_pfault+0x99/frame 0xfffffe00d3224350
trap() at trap+0x2bd/frame 0xfffffe00d3224460
calltrap() at calltrap+0x8/frame 0xfffffe00d3224460
--- trap 0xc, rip = 0xffffffff80f1177e, rsp = 0xfffffe00d3224530, rbp = 0xfffffe00d3224580 ---
zone_release() at zone_release+0x11e/frame 0xfffffe00d3224580
bucket_drain() at bucket_drain+0x80/frame 0xfffffe00d32245b0
bucket_cache_reclaim() at bucket_cache_reclaim+0x154/frame 0xfffffe00d3224600
zone_reclaim() at zone_reclaim+0x99/frame 0xfffffe00d3224640
uma_reclaim() at uma_reclaim+0xb2/frame 0xfffffe00d3224670
vm_pageout_worker() at vm_pageout_worker+0x404/frame 0xfffffe00d3224a30
vm_pageout() at vm_pageout+0x176/frame 0xfffffe00d3224a70
fork_exit() at fork_exit+0x84/frame 0xfffffe00d3224ab0

https://people.freebsd.org/~pho/stress/log/dougm058.txt

pho added a comment.Oct 18 2019, 12:46 PM

This too, what not what I was hoping for:

20191018 13:51:51 all (363/677): vmstat.sh


Fatal trap 12: page fault while in kernel mode
cpuid = 10; apic id = 0a
fault virtual address	= 0x10
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80f134b3
stack pointer	        = 0x28:0xfffffe00e56c7450
frame pointer	        = 0x28:0xfffffe00e56c7490
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 13145 (vmstat)
trap number		= 12
panic: page fault
cpuid = 10
time = 1571399544
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e56c7100
vpanic() at vpanic+0x19d/frame 0xfffffe00e56c7150
panic() at panic+0x43/frame 0xfffffe00e56c71b0
trap_fatal() at trap_fatal+0x39c/frame 0xfffffe00e56c7210
trap_pfault() at trap_pfault+0x99/frame 0xfffffe00e56c7270
trap() at trap+0x2bd/frame 0xfffffe00e56c7380
calltrap() at calltrap+0x8/frame 0xfffffe00e56c7380
--- trap 0xc, rip = 0xffffffff80f134b3, rsp = 0xfffffe00e56c7450, rbp = 0xfffffe00e56c7490 ---
uma_vm_zone_stats() at uma_vm_zone_stats+0x1c3/frame 0xfffffe00e56c7490
sysctl_vm_zone_stats() at sysctl_vm_zone_stats+0x77e/frame 0xfffffe00e56c7680
sysctl_root_handler_locked() at sysctl_root_handler_locked+0x7b/frame 0xfffffe00e56c76c0
sysctl_root() at sysctl_root+0x20c/frame 0xfffffe00e56c7740
userland_sysctl() at userland_sysctl+0x17b/frame 0xfffffe00e56c77f0
kern___sysctlbyname() at kern___sysctlbyname+0x21b/frame 0xfffffe00e56c7940
sys___sysctlbyname() at sys___sysctlbyname+0x2d/frame 0xfffffe00e56c7980
amd64_syscall() at amd64_syscall+0x2d4/frame 0xfffffe00e56c7ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00e56c7ab0

https://people.freebsd.org/~pho/stress/log/dougm059.txt

markj added a comment.Oct 18 2019, 4:54 PM
In D20635#482367, @pho wrote:

This too, what not what I was hoping for:

Are you compiling with -O0 or some other unusual compiler flags? I suspect the patch in D22081 is required, though it has no effect on generated code in my testing.

pho added a comment.Oct 18 2019, 5:08 PM
In D20635#482367, @pho wrote:

This too, what not what I was hoping for:

Are you compiling with -O0 or some other unusual compiler flags? I suspect the patch in D22081 is required, though it has no effect on generated code in my testing.

Yes uma_core.c and vm_pageout.c I'll stop doing that to see if it makes any difference.
This is a new test idea, where I after each test do a "swapoff -a; swapon -a"

pho added a comment.Oct 20 2019, 5:17 PM

Here's a

20191020 18:20:46 all (406/677): kevent4.sh
panic: pmap_clear_modify: page 0xfffffe000019a360 is not managed
cpuid = 8
time = 1571588452
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe037b686240
vpanic() at vpanic+0x19d/frame 0xfffffe037b686290
panic() at panic+0x43/frame 0xfffffe037b6862f0
pmap_clear_modify() at pmap_clear_modify+0x4c/frame 0xfffffe037b6863e0
vm_page_set_validclean() at vm_page_set_validclean+0xdc/frame 0xfffffe037b686410
bdwrite() at bdwrite+0x20c/frame 0xfffffe037b686480
ffs_update() at ffs_update+0x303/frame 0xfffffe037b6864e0
ufs_makeinode() at ufs_makeinode+0x349/frame 0xfffffe037b686670
ufs_create() at ufs_create+0x34/frame 0xfffffe037b686690
VOP_CREATE_APV() at VOP_CREATE_APV+0x86/frame 0xfffffe037b6866c0
vn_open_cred() at vn_open_cred+0x34d/frame 0xfffffe037b686810
kern_openat() at kern_openat+0x1f3/frame 0xfffffe037b686980
amd64_syscall() at amd64_syscall+0x2d4/frame 0xfffffe037b686ab0

https://people.freebsd.org/~pho/stress/log/dougm060.txt