Reduce the code size and number of ffsl calls in vm_reserv_break. Use xor to find where free ranges begin and end.
Tested by: pho
Differential D20256
Simplify vm_reserv_break Authored by dougm on May 13 2019, 9:36 PM. Tags None Referenced Files
Details Reduce the code size and number of ffsl calls in vm_reserv_break. Use xor to find where free ranges begin and end. Tested by: pho alc developed performance test framework for this patch, measuring the cycle count for a vm_reserv_break operation. For each distribution of free block in a reservation, I took 10 samples with the current vm_reserv_break implementation and 10 samples from the implementation offered by this patch. The first column is from the unmodified implementation. alternating 15 pages allocated, 15 pages free: alternating 1 page allocated, 1 page free: alternating 170 pages allocated, 170 pages free: Pages 0 and 511 allocated, the rest free: The more fragmented the pages in the reservation, the better the modified code seems to perform a bit better. With big ranges of freed or allocated pages, there's no clear benefit in cycle count.
Diff Detail
Event TimelineComment Actions Modify vm_reserv_reclaim_contig to also use ffsl less, and to use xor to find 0-1 transitions in the bitstream. Comment Actions Sure happy to. Will I get credit for doing so?
Comment Actions I regret failing to acknowledge you in some, or all, recent commits. If you want me to make some sort of public statement about it, I will. I'll try to do better. Comment Actions :) With D20256.57387.diff I got: 20190514 07:51:36 all (1/2): su.sh witness_lock_list_get: witness exhausted panic: vm_page_dequeue: queued unlocked page 0xfffff81019657208 cpuid = 15 time = 1557813179 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00dc6ae610 vpanic() at vpanic+0x19d/frame 0xfffffe00dc6ae660 panic() at panic+0x43/frame 0xfffffe00dc6ae6c0 vm_page_dequeue() at vm_page_dequeue+0x2e2/frame 0xfffffe00dc6ae700 vm_page_alloc_domain_after() at vm_page_alloc_domain_after+0x29b/frame 0xfffffe00dc6ae770 vm_page_alloc() at vm_page_alloc+0x74/frame 0xfffffe00dc6ae7d0 vm_fault_hold() at vm_fault_hold+0x12d1/frame 0xfffffe00dc6ae910 vm_fault() at vm_fault+0x60/frame 0xfffffe00dc6ae950 trap_pfault() at trap_pfault+0x188/frame 0xfffffe00dc6ae9a0 trap() at trap+0x46b/frame 0xfffffe00dc6aeab0 calltrap() at calltrap+0x8/frame 0xfffffe00dc6aeab0 --- trap 0xc, rip = 0x80020d3f6, rsp = 0x7fffffffc650, rbp = 0x7fffffffc6b0 --- Comment Actions Just stick to the easy part of the patch, which I don't expect to change, to get something under review. The harder half can be done separately.
Comment Actions I'm a bit low on test H/W, so I ran D20256.57402.diff on an i386 test host and got this: 20190515 03:16:01 all (113/620): sendfile17.sh
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address = 0xdeadc0ee
fault code = supervisor read data, page not present
instruction pointer = 0x20:0x1089740
stack pointer = 0x28:0x21216a3c
frame pointer = 0x28:0x21216a50
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 65852 (sendfile17)
trap number = 12
panic: page fault
cpuid = 3
time = 1557883251
KDB: stack backtrace:
db_trace_self_wrapper(e8ac4d,1c11188,0,2121684c,be79c1,...) at db_trace_self_wrapper+0x2a/frame 0x21216820
kdb_backtrace(7,3,3,212169fc,212169fc,...) at kdb_backtrace+0x2e/frame 0x21216880
vpanic(15bbd75,212168c4,212168c4,212168f8,1558676,...) at vpanic+0x121/frame 0x212168a4
panic(15bbd75,1649719,2db5000,0,deadc0ee,...) at panic+0x14/frame 0x212168b8
trap_fatal(13029a5,0,4,16480b0,332,...) at trap_fatal+0x356/frame 0x212168f8
trap_pfault(deadc0ee) at trap_pfault+0x51/frame 0x21216934
trap(212169fc,8,28,28,24582d80,...) at trap+0x3c0/frame 0x212169f0
calltrap() at 0xffc0316d/frame 0x212169f0
--- trap 0xc, eip = 0x1089740, esp = 0x21216a3c, ebp = 0x21216a50 ---
unp_dispose(21995618) at unp_dispose+0x80/frame 0x21216a50
sofree(21995618) at sofree+0x293/frame 0x21216a7c
soclose(21995618) at soclose+0x2f7/frame 0x21216ab4
soo_close(189eddc8,99d8a80) at soo_close+0x1f/frame 0x21216ac0
_fdrop(189eddc8,99d8a80) at _fdrop+0x18/frame 0x21216ad8
closef(189eddc8,99d8a80,189eddc8,99d8a80,21cb56f0,...) at closef+0x1e0/frame 0x21216b2c
fdescfree_fds(1) at fdescfree_fds+0x98/frame 0x21216b54
fdescfree(99d8a80) at fdescfree+0x389/frame 0x21216bc4
exit1(99d8a80,0,0,21216cdc,1558d79,...) at exit1+0x47a/frame 0x21216bf4
sys_sys_exit(99d8a80,99d8d08) at sys_sys_exit+0x12/frame 0x21216c08
syscall(21216ce8,3b,3b,3b,ffbfe808,...) at syscall+0x2d9/frame 0x21216cdchttps://people.freebsd.org/~pho/stress/log/dougm029.txt Comment Actions I have not been able to reproduce the page fault with or without your patch.
Comment Actions I ran into another problem, so now I'm switching from i386 to amd64. 20190515 21:24:35 all (167/620): snap5.sh panic: Memory modified after free 0x2879d800(2048) val=0 @ 0x2879d9b4 cpuid = 3 time = 1557949826 KDB: stack backtrace: db_trace_self_wrapper(e8ac4d,1c11188,0,93ff92c,be79c1,...) at db_trace_self_wrapper+0x2a/frame 0x93ff900 kdb_backtrace(7,3,3,94b05e0,2879d9b4,...) at kdb_backtrace+0x2e/frame 0x93ff960 vpanic(16a8223,93ff9a4,93ff9a4,93ff9bc,12e1444,...) at vpanic+0x121/frame 0x93ff984 panic(16a8223,2879d800,800,0,2879d9b4,...) at panic+0x14/frame 0x93ff998 trash_ctor(2879d800,800,93ffa48,2) at trash_ctor+0x44/frame 0x93ff9bc mb_ctor_pack(2a70de00,100,93ffa48,2) at mb_ctor_pack+0x35/frame 0x93ff9e8 uma_zalloc_arg(99997e0,93ffa48,2) at uma_zalloc_arg+0xa0b/frame 0x93ffa34 m_getm2(0,21c,2,1,0) at m_getm2+0xf8/frame 0x93ffa74 m_uiotombuf(93ffbd8,2,8148,0,0) at m_uiotombuf+0x5a/frame 0x93ffaa4 sosend_generic(21905a28,0,93ffbd8,0,0,0,188b2700) at sosend_generic+0x2d7/frame 0x93ffafc sosend(21905a28,0,93ffbd8,0,0,0,188b2700) at sosend+0x50/frame 0x93ffb2c soo_write(21040700,93ffbd8,99a4500,0,188b2700) at soo_write+0x32/frame 0x93ffb5c dofilewrite(21040700,93ffbd8,ffffffff,ffffffff,0) at dofilewrite+0x86/frame 0x93ffb90 kern_writev(188b2700,3,93ffbd8) at kern_writev+0x3b/frame 0x93ffbbc sys_write(188b2700,188b2988) at sys_write+0x48/frame 0x93ffc08 syscall(93ffce8,3b,3b,3b,20bf7000,...) at syscall+0x2d9/frame 0x93ffcdc Comment Actions Peter, is there any data on how many reservations get broken in these tests? Without breaking reservations, this code isn't tested. Comment Actions I ran tests for 18 hours, including a buildworld. Comment Actions I'm somewhat surprised that the value is that large. Peter, if you run a buildworld on a freshly rebooted amd64 machine, what are the reported values for vm.pmap.pde and vm.reserv before and after the buildworld. Comment Actions Here's a buildworld from a single user mode boot: # /usr/bin/time -h ./zzbuildworld.sh
FreeBSD t2.osted.lan 13.0-CURRENT FreeBSD 13.0-CURRENT #2 r347793: Thu May 16 19:17:57 CEST 2019 pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO amd64
vm.pmap.pde.promotions: 302
vm.pmap.pde.p_failures: 111
vm.pmap.pde.mappings: 0
vm.pmap.pde.demotions: 27
vm.reserv.reclaimed: 0
vm.reserv.partpopq:
DOMAIN LEVEL SIZE NUMBER
0, -1, 40944K, 22
1, -1, 4588K, 3
vm.reserv.fullpop: 135
vm.reserv.freed: 650
vm.reserv.broken: 0
make -j25 buildworld
vm.pmap.pde.promotions: 98586
vm.pmap.pde.p_failures: 5940
vm.pmap.pde.mappings: 146743
vm.pmap.pde.demotions: 18127
vm.reserv.reclaimed: 0
vm.reserv.partpopq:
DOMAIN LEVEL SIZE NUMBER
0, -1, 90936K, 74
1, -1, 17112K, 11
vm.reserv.fullpop: 143
vm.reserv.freed: 1125509
vm.reserv.broken: 0
Elapsed 01:00
1h20.91s real 17h14m11.64s user 1h15m2.16s sys
#Comment Actions Given that the new version is shorter (144 bytes less machine code on amd64), the performance results are good enough. If you add comments explaining how this new version works, I'm happy to see it committed. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||