Page MenuHomeFreeBSD

swap_pager: speedup meta_transfer
ClosedPublic

Authored by dougm on Jun 29 2024, 11:25 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Oct 16, 11:21 AM
Unknown Object (File)
Mon, Oct 14, 7:54 PM
Unknown Object (File)
Mon, Oct 14, 7:54 PM
Unknown Object (File)
Sat, Oct 12, 3:45 PM
Unknown Object (File)
Fri, Oct 11, 7:35 AM
Unknown Object (File)
Thu, Oct 10, 12:52 AM
Unknown Object (File)
Tue, Oct 8, 2:49 AM
Unknown Object (File)
Fri, Oct 4, 12:17 AM
Subscribers

Details

Summary

Add a parameter to swp_pager_meta_build, for the benefit of swp_pager_meta_transfer.

swp_pager_meta_transfer calls swp_pager_xfer_source, which may look up the same trie entry twice - first, by calling sw_pager_meta_lookup, and then as the first step in swp_pager_meta_build. A boolean parameter to swp_pager_meta_build tells that function not to replace a previously assigned swapblk with a new one, and setting it in this call makes the first meta_lookup call unnecessary.

swp_pager_meta_transfer calls swp_pager_xfer_source, which may release and reacquire the source object write lock, because the call to swp_pager_meta_build may acquire and then release the destination object write block. But it probably doesn't, so fiddling with the source object write block was probably unnecessary. This boolean parameter to swp_pager_meta_build tells it to return immediately if memory allocation problems are about to require a lock acquisitiion, so that the caller can release/reacquire the source object write lock only if truly necessary, around a second call the swp_pager_meta_build with that boolean parameter not set. This should make manipulation of the source object write lock rarer.

Test Plan

I'm running a swapoff.sh test, but always welcome Peter's help.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

dougm created this revision.

Reporting an error:
swap_pager: out of swap space
swp_pager_getswapspace(13): failed
panic: deadlres_td_sleep_q: possible deadlock detected for 0xfffff80024826740 (mkdir), blocked for 180003 ticks

cpuid = 1
time = 1719695387
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe008e9b7d20
vpanic() at vpanic+0x13f/frame 0xfffffe008e9b7e50
panic() at panic+0x43/frame 0xfffffe008e9b7eb0
deadlkres() at deadlkres+0x32a/frame 0xfffffe008e9b7ef0
fork_exit() at fork_exit+0x82/frame 0xfffffe008e9b7f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe008e9b7f30

  • trap 0x1da1d, rip = 0xcc0000000a, rsp = 0xcc0000000a, rbp = 0x13b85 ---

KDB: enter: panic
[ thread pid 0 tid 100069 ]
Stopped at kdb_enter+0x33: movq $0,0x105f3a2(%rip)
db>

dougm edited the summary of this revision. (Show Details)

Remove changes to swp_pager_meta_transfer. Restore swp_pager_xfer_source, with changes.

200 minutes of swapout.sh testing with no problems so far.

I'm starting tests with D45781.140368.patch

Return to the original changes, but with the function swp_pager_allloc_wait dropped and those lines restored to swp_pager_meta_build.

After dropping that function, testing has not shown a kernel abort on account of deadlock.

I'm starting tests with D45781.140421.patch

Now I have had this version deadlock too.

I'm starting tests with D45781.140421.patch

Now I have had this version deadlock too.

Hmm. I'm not seeing any problems. Are you running any of the stress2 tests mentioned in the all.exclude file?

In D45781#1044916, @pho wrote:

Hmm. I'm not seeing any problems. Are you running any of the stress2 tests mentioned in the all.exclude file?

I'm running './all.sh swapoff.sh' (or is it swapout.sh? One of those) in /usr/src/tools/tests/stress2/misc.

In D45781#1044916, @pho wrote:

Hmm. I'm not seeing any problems. Are you running any of the stress2 tests mentioned in the all.exclude file?

I'm running './all.sh swapoff.sh' (or is it swapout.sh? One of those) in /usr/src/tools/tests/stress2/misc.

OK, I'll let swapoff.sh run through the night.

By capping RAM to 8 GB I was able to get a "hang". Unfortunately this is AFAIK a know issue when using memory disks:

100151                   D       -       0xfffff80003604e00  [mca taskq]
100152                   D       vmwait  0xffffffff818076c0  [CAM taskq]  <------ Known issue
100156                   D       -       0xfffff800081e3400  [mlx5_core0-rec]

Not sure if this is what you see?

https://people.freebsd.org/~pho/stress/log/log0532.txt

I posted something Saturday about what I was seeing.

I've been testing with an unmodified kernel for a day or two and I've had it hang during this test, but not fail in the same way.

I'm testing on a bhyve virtual machine. I don't know where to find the logs that you've posted, so I can't compare your results to mine.

Restore the patch to what it was when I started, before my own unreliable testing led me to discard parts that probably weren't broken. I'll just rely on Peter to tell me if things are broken.

This is the setup I use for stress testing on both real HW and bhyve (4 CPUs and 6GB RAM):

$ sed '/^#/d;/^$/d' < /usr/src/sys/amd64/conf/PHO
include GENERIC
ident           PHO-GENERIC
options         ALT_BREAK_TO_DEBUGGER
options         SW_WATCHDOG
options         WITNESS
options         INVARIANTS              # Enable calls of extra sanity checking
options         INVARIANT_SUPPORT       # Extra sanity checks of internal structures, required by INVARIANTS
options         DEBUG_LOCKS
options         DEBUG_VFS_LOCKS
options         DIAGNOSTIC
nooptions         DEADLKRES             # watchdogd handles this
options         UFS_EXTATTR
options         UFS_EXTATTR_AUTOSTART
$ grep -E 'ddb|watchdogd' /etc/rc.conf
ddb_enable="YES"
watchdogd_enable="YES"
watchdogd_flags="-t 3600 -e 'ls -l /tmp /dev /mnt* /media > /dev/null; true' -s 60"
$ tail -2 /etc/ddb.conf
script pho1=dump; bt; show allpcpu; show alllocks; show lockedvnods; show mount; show bufqueues; show page; show pageq
script pho=set $lines 20000; run pho1; show freepages; show uma; show umacache; acttrace; ps; allt
$ 

With D45781.140474.patch added I got a deadlock after 6h30:
https://people.freebsd.org/~pho/stress/log/log0534.txt
It's not clear to me if this is related to your patch, so I'll repeat the test with a pristine kernel.

Here is the problem history of the test scenario:

[pho@freefall ~/public_html/stress/log]$ ls -alsrt `grep -lE 'Test sce.*swapoff.sh' *.txt`
 524 -rw-r--r--  1 pho pho  510077 Aug 28  2016 kostik934.txt
1035 -rw-r--r--  1 pho pho  993733 Apr 22  2018 mark035.txt
 652 -rw-r--r--  1 pho pho  583181 Apr 22  2019 dougm024.txt
 181 -r--r--r--  1 pho pho 1357814 Jul  2 15:18 log0532.txt
[pho@freefall ~/public_html/stress/log]$

Remove the parts that make swp_pager_meta_build less ugly, to reduce the burden on reviewers. It can stay ugly.

I find that a swapoff.sh stress test crashes my test box whether or not this patch is installed.

These are my observations with running the stress2 swapoff.sh test in a loop on real hardware:

  1. Uptime with a pristine kernel is 27 hours and no issues seen
  2. Uptime with a kernel patched with D45781.140474.patch was 6 1/2 hours before log0534.txt was seen

I do not have enough information to draw any conclusions about this.

I ran a mix of 48 tests with D45781.140542.patch for 13 hours. I saw no problems with this.

This revision is now accepted and ready to land.Jul 7 2024, 4:16 AM
sys/vm/swap_pager.c
2031–2039

This sentence should still be included: "If the swapblk is not valid, it is freed instead."

2042

"nowait" doesn't fully convey the effect. I would include "xfer" or "transfer" in the parameter name.

dougm marked 2 inline comments as done.

Rewrite comment, rename variable. No functional changes.

This revision now requires review to proceed.Jul 7 2024, 8:18 AM
dougm removed a subscriber: alc.
alc added inline comments.
sys/vm/swap_pager.c
2037

There is an extra period.

This revision is now accepted and ready to land.Jul 7 2024, 11:13 PM
This revision was automatically updated to reflect the committed changes.