Page MenuHomeFreeBSD

swap_pager: move scan_all_shadowed, use pctrie iterators
ClosedPublic

Authored by dougm on Oct 16 2024, 6:38 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Nov 21, 11:03 AM
Unknown Object (File)
Tue, Nov 19, 5:30 AM
Unknown Object (File)
Sun, Nov 17, 12:10 AM
Unknown Object (File)
Thu, Nov 14, 6:41 PM
Unknown Object (File)
Wed, Nov 13, 10:44 AM
Unknown Object (File)
Tue, Nov 12, 6:08 PM
Unknown Object (File)
Fri, Nov 8, 7:30 PM
Unknown Object (File)
Thu, Nov 7, 5:40 AM
Subscribers

Details

Summary

Move vm_object_scan_all_shadowed from vm_object.c to swap_pager.c, and rename it. In the moved function, use vm_page and swblk iterators to advance through the objects. Avoid checking a backing page for busyness or validity more than once, or when it is beyond the upper boundof the scan.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

dougm requested review of this revision.Oct 16 2024, 6:38 AM
dougm created this revision.
dougm added a reviewer: kib.
sys/vm/swap_pager.c
2505

Since this is visible function, we must assert that object is swapping object, i.e. OBJ_SWAP. It is also probably reasonable to assert that it is OBJ_ANON, and that object->backing_object exists.

2527
2568

Since you unbusied p right after tryxbusy(), this statement is no longer true.

dougm added inline comments.
sys/vm/swap_pager.c
2505

The call to swblk_iter_init_only in line 2517 leads to an OBJ_SWAP assertion.

The code is written to handle non-ANON objects by immediately returning false; I don't understand why I should change that behavior.

If backing_object is NULL, the kernel will crash immediately. I didn't think that we commonly added assertions that replace one immediate crash with another.

2568

I think you're telling me to fix a comment, and not to rewrite the code. But I'm not sure of that.

Also, if ps < pv, then p probably wasn't doing anything to affect pp.

I'll rewrite the comment, but probably not very well.

I ran tests with D47150.144979.patch for 17 hours before getting this seemingly unrelated ext2fs panic:
https://people.freebsd.org/~pho/stress/log/log0555.txt

sys/vm/swap_pager.c
2505

I mean that OBJ_ANON should be asserted on object, not backing object. The object must be anon because it shadows the below object.

2568

Then I do not understand why ever do the tryxbusy/unbusy dance. Can you just check that the page is not xbusy above?

But what prevents unlocked lookup from busying the page and then the caller can do something incompatible with the collapse?

sys/vm/vm_object.c
1760

Can you explain "busy of p disallows fault handler to validate pp" in the case when pi < p->pindex?

Add an assertion.

Don't do any unbusying of a backing page until the parent page has been checked out, in case that unbusying has some bad effect that I don't really understand. I assume that parent pages without backing pages (but backed by swap blocks) work okay now, and won't be affected by this.

sys/vm/swap_pager.c
2568

I tried to take the tryxbusy/unbusy code that was already there and just avoid having the unbusy appear in so many places. I don't know why its there, which is why I'm not intentionally changing it.

sys/vm/vm_object.c
1760

Immediate case is the failed CoW. For it to happen, p must be valid, and pp was allocated by the fault handler, initially invalid. Then both p and pp are xbusy, the objects are not locked, and the fault handler copies content of p to pp.

See vm_fault_cow(), the 'oh well lets copy' part.

Rewrote some parts with perhaps a better understanding of what the busy/unbusy parts are intended to do.

I cannot boot with D47150.145065.patch:

:
pci8: <ACPI PCI bus> on pcib8
igb0: <Intel(R) I350 (Copper)> port 0xd020-0xd03f mem 0xfb320000-0xfb33ffff,0xfb344000-0xfb347fff irq 16 at device 0.0 on pci8
igb0: EEPROM V1.63-0 eTrack 0x800009fa
igb0: Using 1024 TX descriptors and 1024 RX descriptors
igb0: queue equality override not set, capping rx_queues at 6 and tx_queues at 6
igb0: Usinollapse+0x607/frame 0xfffffe0108438cb0
vmspace_fork() at vmspace_fork+0xb67/frame 0xfffffe0108438d30
fork1() at fork1+0x4f8/frame 0xfffffe0108438da0
sys_fork() at sys_fork+0x54/frame 0xfffffe0108438e00
amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe0108438f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0108438f30
--- syscall (2, FreeBSD ELF64, fork), rip = 0x10a3cebd165a, rsp = 0x10a3cad3ba08, rbp = 0x10a3cad3ba40 ---
KDB: enter: panic
[ thread pid 38 tid 100213 ]
Stopped at      kdb_enter+0x33: movq    $0,0x1052702(%rip)
db> show registers
cs                        0x20
ds                        0x3b
es                        0x3b
fs                        0x13
gs                        0x1b
ss                        0x28
rax                       0x12
rcx         0x4c0cb7c143e92607
rdx         0xffffffff811ece61
rbx                      0x100
rsp         0xfffffe0108438868
rbp         0xfffffe0108438990
rsi         0xfffffe0108438720
rdi         0xffffffff81b9d3c0  cnputs_mtx
r8                        0x10
r9                        0x10
r10                        0xf
r11                       0x10
r12                          0
r13         0xfffff8000afc3e88
r14         0xffffffff811b0c1c
r15         0xfffff8000a21a740
rip         0xffffffff80ba6503  kdb_enter+0x33
rflags                    0x86
kdb_enter+0x33: movq    $0,0x1052702(%rip)
db> x/s version
version:        FreeBSD 15.0-CURRENT #1 main-n273016-fa573868f1879-dirty: Fri Oct 18 08:40:46 CEST 2024\012    pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012
db> bt
Tracing pid 38 tid 100213 td 0xfffff8000a21a740
kdb_enter() at kdb_enter+0x33/frame 0xfffffe0108438990
panic() at panic+0x43/frame 0xfffffe01084389f0
swap_pager_scan_all_shadowed() at swap_pager_scan_all_shadowed+0x27f/frame 0xfffffe0108438c30
vm_object_collapse() at vm_object_collapse+0x607/frame 0xfffffe0108438cb0
vmspace_fork() at vmspace_fork+0xb67/frame 0xfffffe0108438d30
fork1() at fork1+0x4f8/frame 0xfffffe0108438da0
sys_fork() at sys_fork+0x54/frame 0xfffffe0108438e00
amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe0108438f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0108438f30
--- syscall (2, FreeBSD ELF64, fork), rip = 0x10a3cebd165a, rsp = 0x10a3cad3ba08, rbp = 0x10a3cad3ba40 ---
db>

Oops. Forgot the panic string :)

db> show panic
panic: Shadow object is not anonymous
db> bt
Tracing pid 38 tid 100213 td 0xfffff80014e12740
kdb_enter() at kdb_enter+0x33/frame 0xfffffe0108438990
panic() at panic+0x43/frame 0xfffffe01084389f0
swap_pager_scan_all_shadowed() at swap_pager_scan_all_shadowed+0x27f/frame 0xfffffe0108438c30
vm_object_collapse() at vm_object_collapse+0x607/frame 0xfffffe0108438cb0
vmspace_fork() at vmspace_fork+0xb67/frame 0xfffffe0108438d30
fork1() at fork1+0x4f8/frame 0xfffffe0108438da0
sys_fork() at sys_fork+0x54/frame 0xfffffe0108438e00
amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe0108438f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0108438f30
--- syscall (2, FreeBSD ELF64, fork), rip = 0x6fb515f865a, rsp = 0x6fb4e3982d8, rbp = 0x6fb4e398310 ---
db>

Remove suspicious assertion.

Remove unrelated change included by accident.

Remove suspicious assertion.

What was the assert? I can see only one case where it might be not true, when vm_map_entry_delete() calls collapse for kernel object. But this is definitely not the case for the Peter' backtrace.

In D47150#1076221, @kib wrote:

Remove suspicious assertion.

What was the assert? I can see only one case where it might be not true, when vm_map_entry_delete() calls collapse for kernel object. But this is definitely not the case for the Peter' backtrace.

KASSERT((object->flags & OBJ_ANON) == 0,
    ("Shadow object is not anonymous"));

I guess it was supposed to be != instead.

Restore requested KASSERT.

This revision is now accepted and ready to land.Oct 19 2024, 11:22 PM

I'm running tests with both D47150.145188.patch and D47200.145189.patch. I'll be running a full set of stress2 tests, which takes about 60 hours.
No problems seen in the first 8 hours.

markj added inline comments.
sys/vm/swap_pager.c
2577

Here, we know that this turns into a call to swap_pager_haspage(), so the loop might benefit from maintaining an iterator for the swap blocks of the shadow object as well.

sys/vm/swap_pager.c
2577

I'll rename blks to backing_blks, and replace vm_pager_has_page with swap_pager_haspage, in anticipation of a subsequent change based on this suggestion, but I think it should be a separate change from this one to modify swap_pager_haspage and swp_pager_meta_lookup.

sys/vm/swap_pager.c
2577

Certainly, I'd also prefer not to make this patch more complex than it already is.

In D47150#1076296, @pho wrote:

I'm running tests with both D47150.145188.patch and D47200.145189.patch. I'll be running a full set of stress2 tests, which takes about 60 hours.
No problems seen in the first 8 hours.

I completed a full stress2 test without seeing any issues.