Page MenuHomeFreeBSD

release multiple swap blocks at a time in swp_pager_force_pagein
ClosedPublic

Authored by dougm on Jun 14 2019, 8:05 AM.

Details

Summary

Starting with D13484, adapt that patch to identify page ranges for swap block release without extra PCTRIE lookups.

Improve swp_pager_force_pagein() such that "swapoff" reads multiple blocks at a time.
Ref: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223831

Tested by: pho

Test Plan

Peter, can you test this please?

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
dougm retitled this revision from The patch after merging in small helper functions to release multiple swap blocks at a time.
dougm edited the summary of this revision. (Show Details)
dougm added a subscriber: ota_j.email.ne.jp.
dougm added a subscriber: alc.
dougm edited the test plan for this revision. (Show Details)
dougm added a subscriber: pho.

Undo some changes that may not have been necessary and may have done harm.

I saw the same problem with Diff 59201. I building with Diff 59205 now.

I'm guessing, with low confidence, that reducing the max block count to nsw_cluster_max would help.

With D20635.59216.diff I see:

20190630 11:39:48 all (5/10): swap5.sh
panic: swapoff: failed to locate 8036 swap blocks
cpuid = 2
time = 1561887597
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00ad6c26a0
vpanic() at vpanic+0x19d/frame 0xfffffe00ad6c26f0
panic() at panic+0x43/frame 0xfffffe00ad6c2750
swap_pager_swapoff() at swap_pager_swapoff+0x194/frame 0xfffffe00ad6c27d0
swapoff_one() at swapoff_one+0x15e/frame 0xfffffe00ad6c2820
sys_swapoff() at sys_swapoff+0x1d6/frame 0xfffffe00ad6c2990
amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00ad6c2ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00ad6c2ab0
--- syscall (424, FreeBSD ELF64, sys_swapoff), rip = 0x8002f88ca, rsp = 0x7fffffffe4c8, rbp = 0x7fffffffe5f0 ---
KDB: enter: panic
[ thread pid 15529 tid 100290 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db> x/s version
version:        FreeBSD 13.0-CURRENT #1 r349552M: Sun Jun 30 10:48:06 CEST 2019\012    pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012
db>

The swap5.sh test scenario is a "Test with out of swap space" test using a 4g memory disk as swap.

swap_pager.c
1743 ↗(On Diff #59216)

n_blks must not exceed MAXPHYS / PAGE_SIZE.

swap_pager.c
1743 ↗(On Diff #59216)

The nsw_cluster_max is constrained by the bp->b_pages[] array (MAXPHYS/PAGE_SIZE) and our locally defined MAX_PAGEOUT_CLUSTER.

Why is vm_pager_page_unswapped() no longer called?

Add vm_pager_page_unswapped() calls.

Why is vm_object_pip_wakeup() no longer called?

vm_object_pip_wakeup() is now called.

D20635.59261.diff did not seem to fix the problem.

20190701 17:39:16 all (5/10): swap5.sh

root@mercat1:~ # ps
load: 0.05  cmd: csh 44768 [objde2] 2.04r 0.00u 0.00s 0% 0k
~^B
KDB: enter: Break to debugger
[ thread pid 11 tid 100003 ]
Stopped at      kdb_alt_break_internal+0x106:   movq    $0,kdb_why
db> x/s version
version:        FreeBSD 13.0-CURRENT #6 r349552M: Mon Jul  1 17:10:58 CEST 2019\012    pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012
db> ps
  pid  ppid  pgrp   uid  state   wmesg   wchan               cmd
46898  2857  2825     0  S       nanslp  0xffffffff81e8f0b6  sleep
44768  2333 44768     0  D+      objde2  0xfffff800037fe400  csh
39418  2273 39418     0  Ds      objde2  0xfffff800037fed00  sshd
 6374  2727  2727     0  DE+     objde2  0xfffff802d7200000  sh
54890  2277  2277     0  DE      objde2  0xfffff802825df900  cron
 2860  2825  2825     0  S       piperd  0xfffff8001b1232f8  sh
 2858  2825  2825     0  S       piperd  0xfffff8000af3f000  awk
 2857  2825  2825     0  S       wait    0xfffff801b64a8000  sh
 2825     1  2825     0  SWs     wait    0xfffff8000aa59000  sh
 2727  2348  2727     0  S+      wait    0xfffff800036bf530  sh
 2348  2346  2348     0  SW+     wait    0xfffff8000a30a530  bash
 2346  2345  2346     0  SW+     pause   0xfffff80003fdd5d8  csh
 2345  2341  2345  2006  SW+     wait    0xfffff80003fde000  su
 2341  2340  2341  2006  SWs+    wait    0xfffff801b62b3a60  bash
 2340  2337  2337  2006  S       select  0xfffff8001b97aac0  sshd
 2337  2273  2337     0  Ss      select  0xfffff8000ad89c40  sshd
 2333  2332  2333     0  S+      pause   0xfffff80003fddb08  csh
 2332     1  2332     0  SWs+    wait    0xfffff801b64a8a60  login
 2331     1  2331     0  Ss+     ttyin   0xfffff800061544b0  getty
db>
20190701 18:36:41 all (5/10): swap5.sh
panic: swp_pager_force_pagein: Too many pages: 32
cpuid = 8
time = 1561999005
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00adc594c0
vpanic() at vpanic+0x19d/frame 0xfffffe00adc59510
panic() at panic+0x43/frame 0xfffffe00adc59570
swp_pager_force_pagein() at swp_pager_force_pagein+0x7f/frame 0xfffffe00adc596f0
swap_pager_swapoff_object() at swap_pager_swapoff_object+0xfe/frame 0xfffffe00adc59750
swap_pager_swapoff() at swap_pager_swapoff+0xf9/frame 0xfffffe00adc597d0
swapoff_one() at swapoff_one+0x15e/frame 0xfffffe00adc59820
sys_swapoff() at sys_swapoff+0x1d6/frame 0xfffffe00adc59990
amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00adc59ab0

https://people.freebsd.org/~pho/stress/log/dougm045.txt

Change assertion from '<' to '<='.

This version got me past the test that kept triggering problems. I'll run some more tests ...

I have completed the selected swap tests and continued to run random tests for a total of 13H30.
No problems seen.

I've been away and haven't been able to follow up or read updates on any of "swapoff" related changes for the last one week.
I won't be able to get back to this for another week; I will find the situations and figure out what to do later.
It is currently looking like Doug is finalizing changes based at quick look of incoming email titles. I appreciate your passion.

After swp_pager_force_pagein, dont' rely on the local swap block value 'sb'. It may have been freed, so get a new one.

This addresses a problem that predates this patch.

sys/vm/swap_pager.c
1690 ↗(On Diff #59351)

I would also KASSERT that npages is greater than zero.

1695 ↗(On Diff #59351)

Style(9)

1711 ↗(On Diff #59351)

Given that we are within swap_pager.c, I would call swap_pager_unswapped() directly. Moreover, it would be reasonable to perform that call as the last statement in swp_pager_force_launder() and swp_pager_force_dirty(). (Marking the page dirty and calling "unswapped" on it are logically related.)

1721 ↗(On Diff #59351)

Go ahead and replace the one-by-one wakeups with this call.

1755 ↗(On Diff #59351)

Please add a comment here explaining why this test is necessary.

1759–1761 ↗(On Diff #59351)

Please add a comment here explaining why a "restart" is necessary.

Apply reviewer suggestions.

I would ask Peter to retest this latest version, and if all is well add Kostik and Mark as reviewers. I am ready to rubber-stamp this version unless Peter turns up a problem.

sys/vm/swap_pager.c
541 ↗(On Diff #59370)

Keep this and the above change separate. The parentheses around MAXPHYS / PAGE_SIZE are not required.

1685 ↗(On Diff #59370)

"Page that ..." -> "Pages that ..."

1727 ↗(On Diff #59370)

"from" -> "to"

1746 ↗(On Diff #59370)

Spelling: "accumlated"

1769 ↗(On Diff #59370)

Add a blank line before this comment.

1771 ↗(On Diff #59370)

"them all out" -> "them all in"

Apply reviewer suggestions.

dougm marked 12 inline comments as done.Jul 4 2019, 2:18 AM

This needs a final round of testing before calling for more reviewers.

Peter, can you please retest this patch? I think that all of the open issues have been addressed.

In D20635#451659, @alc wrote:

Peter, can you please retest this patch? I think that all of the open issues have been addressed.

Oh, sorry for the missing feedback. I've been testing D20635.59379.diff for 14 hours. I'd like to let the tests run a bit longer.

I have run tests on D20635.59379.diff for 24 hours without seeing any problems.

dougm retitled this revision from release multiple swap blocks at a time to release multiple swap blocks at a time in swp_pager_force_pagein.Jul 5 2019, 5:15 AM
dougm edited the summary of this revision. (Show Details)
dougm added reviewers: kib, markj.
dougm added a subscriber: emaste.
This revision is now accepted and ready to land.Jul 5 2019, 5:24 AM
sys/vm/swap_pager.c
1712 ↗(On Diff #59379)

I think we traditionally put {} around the body of do/while loops, always.

Add braces around one-line do-while body.

This revision now requires review to proceed.Jul 5 2019, 3:16 PM
This revision is now accepted and ready to land.Jul 5 2019, 4:18 PM
alc added inline comments.
sys/vm/swap_pager.c
1761 ↗(On Diff #59445)

I would add "()" after "swp_pager_force_pagein" to make it unambiguous that you are talking about a function. (There is no need to repost a new version of this patch here. Just add the "()" before committing.)

It seems that this change broke something:
Hi Doug,

I ran into this panic while running tests with debug.vmmap_check=1

20191016 05:38:12 all (408/677): swappedout.sh
panic: page 0xfffffe000f288068 is neither wired nor queued
cpuid = 2
time = 1571197220
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0349836610
vpanic() at vpanic+0x19d/frame 0xfffffe0349836660
panic() at panic+0x43/frame 0xfffffe03498366c0
swp_pager_force_pagein() at swp_pager_force_pagein+0x2df/frame 0xfffffe03498367f0
swapoff_one() at swapoff_one+0x2ee/frame 0xfffffe0349836870
sys_swapoff() at sys_swapoff+0x108/frame 0xfffffe0349836980
amd64_syscall() at amd64_syscall+0x2d4/frame 0xfffffe0349836ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0349836ab0

  • syscall (424, FreeBSD ELF64, sys_swapoff), rip = 0x8002f18ea, rsp = 0x7fffffffe4b8, rbp = 0x7fffffffe5f0 ---

KDB: enter: panic
[ thread pid 26889 tid 104160 ]
Stopped at kdb_enter+0x3b: movq $0,kdb_why
db>

https://people.freebsd.org/~pho/stress/log/swappedout-2.txt

  • Peter

It seems that this change broke something:

Do you mind adding me to the email thread? I suspect this is related to work that I've done.

BTW, from Peter's report I see that VPO_UNMANAGED is set in the page. That seems unexpected.

Here is a small patch that adds a few KASSERTS. There might be some value in seeing which of them is tripped first on the way to this failure.

Fix a bug in the debug patch.

20191018 08:58:59 all (284/676): pipe2.sh

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address	= 0xf90
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80f1177e
stack pointer	        = 0x0:0xfffffe00d3224530
frame pointer	        = 0x0:0xfffffe00d3224580
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 30 (dom0)
trap number		= 12
panic: page fault
cpuid = 1
time = 1571381969
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00d32241e0
vpanic() at vpanic+0x19d/frame 0xfffffe00d3224230
panic() at panic+0x43/frame 0xfffffe00d3224290
trap_fatal() at trap_fatal+0x39c/frame 0xfffffe00d32242f0
trap_pfault() at trap_pfault+0x99/frame 0xfffffe00d3224350
trap() at trap+0x2bd/frame 0xfffffe00d3224460
calltrap() at calltrap+0x8/frame 0xfffffe00d3224460
--- trap 0xc, rip = 0xffffffff80f1177e, rsp = 0xfffffe00d3224530, rbp = 0xfffffe00d3224580 ---
zone_release() at zone_release+0x11e/frame 0xfffffe00d3224580
bucket_drain() at bucket_drain+0x80/frame 0xfffffe00d32245b0
bucket_cache_reclaim() at bucket_cache_reclaim+0x154/frame 0xfffffe00d3224600
zone_reclaim() at zone_reclaim+0x99/frame 0xfffffe00d3224640
uma_reclaim() at uma_reclaim+0xb2/frame 0xfffffe00d3224670
vm_pageout_worker() at vm_pageout_worker+0x404/frame 0xfffffe00d3224a30
vm_pageout() at vm_pageout+0x176/frame 0xfffffe00d3224a70
fork_exit() at fork_exit+0x84/frame 0xfffffe00d3224ab0

https://people.freebsd.org/~pho/stress/log/dougm058.txt

This too, what not what I was hoping for:

20191018 13:51:51 all (363/677): vmstat.sh


Fatal trap 12: page fault while in kernel mode
cpuid = 10; apic id = 0a
fault virtual address	= 0x10
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80f134b3
stack pointer	        = 0x28:0xfffffe00e56c7450
frame pointer	        = 0x28:0xfffffe00e56c7490
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 13145 (vmstat)
trap number		= 12
panic: page fault
cpuid = 10
time = 1571399544
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e56c7100
vpanic() at vpanic+0x19d/frame 0xfffffe00e56c7150
panic() at panic+0x43/frame 0xfffffe00e56c71b0
trap_fatal() at trap_fatal+0x39c/frame 0xfffffe00e56c7210
trap_pfault() at trap_pfault+0x99/frame 0xfffffe00e56c7270
trap() at trap+0x2bd/frame 0xfffffe00e56c7380
calltrap() at calltrap+0x8/frame 0xfffffe00e56c7380
--- trap 0xc, rip = 0xffffffff80f134b3, rsp = 0xfffffe00e56c7450, rbp = 0xfffffe00e56c7490 ---
uma_vm_zone_stats() at uma_vm_zone_stats+0x1c3/frame 0xfffffe00e56c7490
sysctl_vm_zone_stats() at sysctl_vm_zone_stats+0x77e/frame 0xfffffe00e56c7680
sysctl_root_handler_locked() at sysctl_root_handler_locked+0x7b/frame 0xfffffe00e56c76c0
sysctl_root() at sysctl_root+0x20c/frame 0xfffffe00e56c7740
userland_sysctl() at userland_sysctl+0x17b/frame 0xfffffe00e56c77f0
kern___sysctlbyname() at kern___sysctlbyname+0x21b/frame 0xfffffe00e56c7940
sys___sysctlbyname() at sys___sysctlbyname+0x2d/frame 0xfffffe00e56c7980
amd64_syscall() at amd64_syscall+0x2d4/frame 0xfffffe00e56c7ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00e56c7ab0

https://people.freebsd.org/~pho/stress/log/dougm059.txt

In D20635#482367, @pho wrote:

This too, what not what I was hoping for:

Are you compiling with -O0 or some other unusual compiler flags? I suspect the patch in D22081 is required, though it has no effect on generated code in my testing.

In D20635#482367, @pho wrote:

This too, what not what I was hoping for:

Are you compiling with -O0 or some other unusual compiler flags? I suspect the patch in D22081 is required, though it has no effect on generated code in my testing.

Yes uma_core.c and vm_pageout.c I'll stop doing that to see if it makes any difference.
This is a new test idea, where I after each test do a "swapoff -a; swapon -a"

Here's a

20191020 18:20:46 all (406/677): kevent4.sh
panic: pmap_clear_modify: page 0xfffffe000019a360 is not managed
cpuid = 8
time = 1571588452
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe037b686240
vpanic() at vpanic+0x19d/frame 0xfffffe037b686290
panic() at panic+0x43/frame 0xfffffe037b6862f0
pmap_clear_modify() at pmap_clear_modify+0x4c/frame 0xfffffe037b6863e0
vm_page_set_validclean() at vm_page_set_validclean+0xdc/frame 0xfffffe037b686410
bdwrite() at bdwrite+0x20c/frame 0xfffffe037b686480
ffs_update() at ffs_update+0x303/frame 0xfffffe037b6864e0
ufs_makeinode() at ufs_makeinode+0x349/frame 0xfffffe037b686670
ufs_create() at ufs_create+0x34/frame 0xfffffe037b686690
VOP_CREATE_APV() at VOP_CREATE_APV+0x86/frame 0xfffffe037b6866c0
vn_open_cred() at vn_open_cred+0x34d/frame 0xfffffe037b686810
kern_openat() at kern_openat+0x1f3/frame 0xfffffe037b686980
amd64_syscall() at amd64_syscall+0x2d4/frame 0xfffffe037b686ab0

https://people.freebsd.org/~pho/stress/log/dougm060.txt