Reduce the code size and number of ffsl calls in vm_reserv_break. Use xor to find where free ranges begin and end.
Tested by: pho
Differential D20256
Simplify vm_reserv_break dougm on May 13 2019, 9:36 PM. Authored by Tags None Referenced Files
Details Reduce the code size and number of ffsl calls in vm_reserv_break. Use xor to find where free ranges begin and end. Tested by: pho alc developed performance test framework for this patch, measuring the cycle count for a vm_reserv_break operation. For each distribution of free block in a reservation, I took 10 samples with the current vm_reserv_break implementation and 10 samples from the implementation offered by this patch. The first column is from the unmodified implementation. alternating 15 pages allocated, 15 pages free: alternating 1 page allocated, 1 page free: alternating 170 pages allocated, 170 pages free: Pages 0 and 511 allocated, the rest free: The more fragmented the pages in the reservation, the better the modified code seems to perform a bit better. With big ranges of freed or allocated pages, there's no clear benefit in cycle count.
Diff Detail
Event TimelineComment Actions Modify vm_reserv_reclaim_contig to also use ffsl less, and to use xor to find 0-1 transitions in the bitstream. Comment Actions Sure happy to. Will I get credit for doing so?
Comment Actions I regret failing to acknowledge you in some, or all, recent commits. If you want me to make some sort of public statement about it, I will. I'll try to do better. Comment Actions :) With D20256.57387.diff I got: 20190514 07:51:36 all (1/2): su.sh witness_lock_list_get: witness exhausted panic: vm_page_dequeue: queued unlocked page 0xfffff81019657208 cpuid = 15 time = 1557813179 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00dc6ae610 vpanic() at vpanic+0x19d/frame 0xfffffe00dc6ae660 panic() at panic+0x43/frame 0xfffffe00dc6ae6c0 vm_page_dequeue() at vm_page_dequeue+0x2e2/frame 0xfffffe00dc6ae700 vm_page_alloc_domain_after() at vm_page_alloc_domain_after+0x29b/frame 0xfffffe00dc6ae770 vm_page_alloc() at vm_page_alloc+0x74/frame 0xfffffe00dc6ae7d0 vm_fault_hold() at vm_fault_hold+0x12d1/frame 0xfffffe00dc6ae910 vm_fault() at vm_fault+0x60/frame 0xfffffe00dc6ae950 trap_pfault() at trap_pfault+0x188/frame 0xfffffe00dc6ae9a0 trap() at trap+0x46b/frame 0xfffffe00dc6aeab0 calltrap() at calltrap+0x8/frame 0xfffffe00dc6aeab0 --- trap 0xc, rip = 0x80020d3f6, rsp = 0x7fffffffc650, rbp = 0x7fffffffc6b0 --- Comment Actions Just stick to the easy part of the patch, which I don't expect to change, to get something under review. The harder half can be done separately.
Comment Actions I'm a bit low on test H/W, so I ran D20256.57402.diff on an i386 test host and got this: 20190515 03:16:01 all (113/620): sendfile17.sh Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0xdeadc0ee fault code = supervisor read data, page not present instruction pointer = 0x20:0x1089740 stack pointer = 0x28:0x21216a3c frame pointer = 0x28:0x21216a50 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 65852 (sendfile17) trap number = 12 panic: page fault cpuid = 3 time = 1557883251 KDB: stack backtrace: db_trace_self_wrapper(e8ac4d,1c11188,0,2121684c,be79c1,...) at db_trace_self_wrapper+0x2a/frame 0x21216820 kdb_backtrace(7,3,3,212169fc,212169fc,...) at kdb_backtrace+0x2e/frame 0x21216880 vpanic(15bbd75,212168c4,212168c4,212168f8,1558676,...) at vpanic+0x121/frame 0x212168a4 panic(15bbd75,1649719,2db5000,0,deadc0ee,...) at panic+0x14/frame 0x212168b8 trap_fatal(13029a5,0,4,16480b0,332,...) at trap_fatal+0x356/frame 0x212168f8 trap_pfault(deadc0ee) at trap_pfault+0x51/frame 0x21216934 trap(212169fc,8,28,28,24582d80,...) at trap+0x3c0/frame 0x212169f0 calltrap() at 0xffc0316d/frame 0x212169f0 --- trap 0xc, eip = 0x1089740, esp = 0x21216a3c, ebp = 0x21216a50 --- unp_dispose(21995618) at unp_dispose+0x80/frame 0x21216a50 sofree(21995618) at sofree+0x293/frame 0x21216a7c soclose(21995618) at soclose+0x2f7/frame 0x21216ab4 soo_close(189eddc8,99d8a80) at soo_close+0x1f/frame 0x21216ac0 _fdrop(189eddc8,99d8a80) at _fdrop+0x18/frame 0x21216ad8 closef(189eddc8,99d8a80,189eddc8,99d8a80,21cb56f0,...) at closef+0x1e0/frame 0x21216b2c fdescfree_fds(1) at fdescfree_fds+0x98/frame 0x21216b54 fdescfree(99d8a80) at fdescfree+0x389/frame 0x21216bc4 exit1(99d8a80,0,0,21216cdc,1558d79,...) at exit1+0x47a/frame 0x21216bf4 sys_sys_exit(99d8a80,99d8d08) at sys_sys_exit+0x12/frame 0x21216c08 syscall(21216ce8,3b,3b,3b,ffbfe808,...) at syscall+0x2d9/frame 0x21216cdc https://people.freebsd.org/~pho/stress/log/dougm029.txt Comment Actions I have not been able to reproduce the page fault with or without your patch.
Comment Actions I ran into another problem, so now I'm switching from i386 to amd64. 20190515 21:24:35 all (167/620): snap5.sh panic: Memory modified after free 0x2879d800(2048) val=0 @ 0x2879d9b4 cpuid = 3 time = 1557949826 KDB: stack backtrace: db_trace_self_wrapper(e8ac4d,1c11188,0,93ff92c,be79c1,...) at db_trace_self_wrapper+0x2a/frame 0x93ff900 kdb_backtrace(7,3,3,94b05e0,2879d9b4,...) at kdb_backtrace+0x2e/frame 0x93ff960 vpanic(16a8223,93ff9a4,93ff9a4,93ff9bc,12e1444,...) at vpanic+0x121/frame 0x93ff984 panic(16a8223,2879d800,800,0,2879d9b4,...) at panic+0x14/frame 0x93ff998 trash_ctor(2879d800,800,93ffa48,2) at trash_ctor+0x44/frame 0x93ff9bc mb_ctor_pack(2a70de00,100,93ffa48,2) at mb_ctor_pack+0x35/frame 0x93ff9e8 uma_zalloc_arg(99997e0,93ffa48,2) at uma_zalloc_arg+0xa0b/frame 0x93ffa34 m_getm2(0,21c,2,1,0) at m_getm2+0xf8/frame 0x93ffa74 m_uiotombuf(93ffbd8,2,8148,0,0) at m_uiotombuf+0x5a/frame 0x93ffaa4 sosend_generic(21905a28,0,93ffbd8,0,0,0,188b2700) at sosend_generic+0x2d7/frame 0x93ffafc sosend(21905a28,0,93ffbd8,0,0,0,188b2700) at sosend+0x50/frame 0x93ffb2c soo_write(21040700,93ffbd8,99a4500,0,188b2700) at soo_write+0x32/frame 0x93ffb5c dofilewrite(21040700,93ffbd8,ffffffff,ffffffff,0) at dofilewrite+0x86/frame 0x93ffb90 kern_writev(188b2700,3,93ffbd8) at kern_writev+0x3b/frame 0x93ffbbc sys_write(188b2700,188b2988) at sys_write+0x48/frame 0x93ffc08 syscall(93ffce8,3b,3b,3b,20bf7000,...) at syscall+0x2d9/frame 0x93ffcdc Comment Actions Peter, is there any data on how many reservations get broken in these tests? Without breaking reservations, this code isn't tested. Comment Actions I ran tests for 18 hours, including a buildworld. Comment Actions I'm somewhat surprised that the value is that large. Peter, if you run a buildworld on a freshly rebooted amd64 machine, what are the reported values for vm.pmap.pde and vm.reserv before and after the buildworld. Comment Actions Here's a buildworld from a single user mode boot: # /usr/bin/time -h ./zzbuildworld.sh FreeBSD t2.osted.lan 13.0-CURRENT FreeBSD 13.0-CURRENT #2 r347793: Thu May 16 19:17:57 CEST 2019 pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO amd64 vm.pmap.pde.promotions: 302 vm.pmap.pde.p_failures: 111 vm.pmap.pde.mappings: 0 vm.pmap.pde.demotions: 27 vm.reserv.reclaimed: 0 vm.reserv.partpopq: DOMAIN LEVEL SIZE NUMBER 0, -1, 40944K, 22 1, -1, 4588K, 3 vm.reserv.fullpop: 135 vm.reserv.freed: 650 vm.reserv.broken: 0 make -j25 buildworld vm.pmap.pde.promotions: 98586 vm.pmap.pde.p_failures: 5940 vm.pmap.pde.mappings: 146743 vm.pmap.pde.demotions: 18127 vm.reserv.reclaimed: 0 vm.reserv.partpopq: DOMAIN LEVEL SIZE NUMBER 0, -1, 90936K, 74 1, -1, 17112K, 11 vm.reserv.fullpop: 143 vm.reserv.freed: 1125509 vm.reserv.broken: 0 Elapsed 01:00 1h20.91s real 17h14m11.64s user 1h15m2.16s sys # Comment Actions Given that the new version is shorter (144 bytes less machine code on amd64), the performance results are good enough. If you add comments explaining how this new version works, I'm happy to see it committed. |