As far as I can tell the "panic: freeing free block" was introduced by r349777 and this patch didn't fix that.
I have not observed any other problems while testing with D20893.59572.diff
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jul 10 2019
I have tested D20893.59572.diff on amd64 and briefly on i386.
I still see the "panic: freeing free block" on both a MEMGUARD and a GENERICish build.
Jul 9 2019
Looks like this patch is responsible for the panics I see.
20190709 11:56:55 all (1/1): sort.sh swap_pager: out of swap space swp_pager_getswapspace(32): failed Jul 9 12:03:41 mercat1 kernel: pid 3581 (sort), jid 0, uid 0, was killed: out of swap space Jul 9 12:03:42 mercat1 kernel: pid 3583 (sort), jid 0, uid 0, was killed: out of swap space panic: freeing free block: ffffc0, size 16, mask 1 cpuid = 7 time = 1562666624 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe102d95b270 vpanic() at vpanic+0x19d/frame 0xfffffe102d95b2c0 panic() at panic+0x43/frame 0xfffffe102d95b320 blst_meta_free() at blst_meta_free+0x12b/frame 0xfffffe102d95b360 blst_meta_free() at blst_meta_free+0x102/frame 0xfffffe102d95b3a0 blst_meta_free() at blst_meta_free+0x102/frame 0xfffffe102d95b3e0 blst_meta_free() at blst_meta_free+0x102/frame 0xfffffe102d95b420 blst_meta_free() at blst_meta_free+0x102/frame 0xfffffe102d95b460 blist_free() at blist_free+0x2e/frame 0xfffffe102d95b480 swp_pager_freeswapspace() at swp_pager_freeswapspace+0x8a/frame 0xfffffe102d95b4a0 swp_pager_meta_free_all() at swp_pager_meta_free_all+0xbb/frame 0xfffffe102d95b4f0 swap_pager_dealloc() at swap_pager_dealloc+0x115/frame 0xfffffe102d95b510 vm_object_terminate() at vm_object_terminate+0x27b/frame 0xfffffe102d95b560 vm_object_deallocate() at vm_object_deallocate+0x412/frame 0xfffffe102d95b5c0 vm_map_process_deferred() at vm_map_process_deferred+0x7f/frame 0xfffffe102d95b5e0 vm_map_remove() at vm_map_remove+0xc6/frame 0xfffffe102d95b610 vmspace_exit() at vmspace_exit+0xd3/frame 0xfffffe102d95b650 exit1() at exit1+0x5ad/frame 0xfffffe102d95b6c0 sigexit() at sigexit+0xdaf/frame 0xfffffe102d95b9a0 postsig() at postsig+0x336/frame 0xfffffe102d95ba70 ast() at ast+0x4c7/frame 0xfffffe102d95bab0 doreti_ast() at doreti_ast+0x1f/frame 0x7fffffffded0 KDB: enter: panic [ thread pid 3581 tid 100238 ] Stopped at kdb_enter+0x3b: movq $0,kdb_why db> x/s version version: FreeBSD 13.0-CURRENT #5 r349777: Tue Jul 9 11:37:16 CEST 2019\012 pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/MEMGUARD\012 db>
The same test works fine on r349776.
In D20772#452692, @kib wrote:Thank you, Mark.
Peter, could you, please, run the tests one more time, hopefully this is the last.
I'll try again and avoid a panic, but in the meantime here's some trace: https://people.freebsd.org/~pho/stress/log/dougm047.txt
Jul 8 2019
So, this is unrelated to D20833.
In D20833#452429, @dougm wrote:The obvious questions are:
Does this happen without the patch in place?
Does this happen before r349777?
Came across this panic:
Jul 7 2019
I ran tests for 5 hours on i386 with D20833.59496.diff. No problems seen.
I have started a brief test run on i386 with D20833.59496.diff.
Jul 6 2019
I tested D20833.59460.diff for 5 hours without seeing any problems.
Jul 5 2019
I have run tests on D20579.59404.diff (with the typo fixed) for 8 hours. This included a buildworld / installworld.
cc -c -O2 -pipe -fno-strict-aliasing -g -nostdinc -I. -I../../.. -I../../../contrib/ck/include -I../../../contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -MD -MF.depend.subr_blist.o -MTsubr_blist.o -fdebug-prefix-map=./machine=/usr/src/sys/amd64/include -fdebug-prefix-map=./x86=/usr/src/sys/x86/include -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -fstack-protector -gdwarf-2 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wno-error-tautological-compare -Wno-error-empty-body -Wno-error-parentheses-equality -Wno-error-unused-function -Wno-error-pointer-sign -Wno-error-shift-negative-value -Wno-address-of-packed-member -mno-aes -mno-avx -std=iso9899:1999 -Werror ../../../kern/subr_blist.c ../../../kern/subr_blist.c:659:17: error: use of undeclared identifier 'BLIST_MAP_RADIX' if (maxcount % BLIST_MAP_RADIX != 0) ^ 1 error generated. *** Error code 1
Ran a brief test (5 hours) on D20833.59430.diff. No problems seen.
I have run tests on D20635.59379.diff for 24 hours without seeing any problems.
Jul 4 2019
In D20635#451659, @alc wrote:Peter, can you please retest this patch? I think that all of the open issues have been addressed.
Jul 3 2019
I ran tests on D20833.59317.diff for 14 hours. LGTM.
Jul 2 2019
Yes, fetch doesn't work well. Try using wget, which recovers from closed connections.
Fatal trap 12: page fault while in kernel mode cpuid = 0; current process = 4000 (tmlock) interrupt enabled, fault virtual address = 0x20 stack pointer = 0x28:0xfffffe00ad822860 resume, IOPL = 0 current process = 4002 (tmlock) code segment = base 0x0, limit 0xfffff, type 0x1b code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 trap number = 12 panic: page fault cpuid = 11 time = 1562057287 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00ac311520 vpanic() at vpanic+0x19d/frame 0xfffffe00ac311570 panic() at panic+0x43/frame 0xfffffe00ac3115d0 trap_fatal() at trap_fatal+0x39c/frame 0xfffffe00ac311630 trap_pfault() at trap_pfault+0x62/frame 0xfffffe00ac311680 trap() at trap+0x2b4/frame 0xfffffe00ac311790 calltrap() at calltrap+0x8/frame 0xfffffe00ac311790 --- trap 0xc, rip = 0xffffffff80f0db15, rsp = 0xfffffe00ac311860, rbp = 0xfffffe00ac311910 --- vm_map_wire_locked() at vm_map_wire_locked+0x155/frame 0xfffffe00ac311910 vm_map_wire() at vm_map_wire+0x40/frame 0xfffffe00ac311940 kern_mlock() at kern_mlock+0x179/frame 0xfffffe00ac311990 amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00ac311ab0
I have completed the selected swap tests and continued to run random tests for a total of 13H30.
No problems seen.
Jul 1 2019
This version got me past the test that kept triggering problems. I'll run some more tests ...
20190701 18:36:41 all (5/10): swap5.sh panic: swp_pager_force_pagein: Too many pages: 32 cpuid = 8 time = 1561999005 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00adc594c0 vpanic() at vpanic+0x19d/frame 0xfffffe00adc59510 panic() at panic+0x43/frame 0xfffffe00adc59570 swp_pager_force_pagein() at swp_pager_force_pagein+0x7f/frame 0xfffffe00adc596f0 swap_pager_swapoff_object() at swap_pager_swapoff_object+0xfe/frame 0xfffffe00adc59750 swap_pager_swapoff() at swap_pager_swapoff+0xf9/frame 0xfffffe00adc597d0 swapoff_one() at swapoff_one+0x15e/frame 0xfffffe00adc59820 sys_swapoff() at sys_swapoff+0x1d6/frame 0xfffffe00adc59990 amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00adc59ab0
I'll try Diff 59264.
D20635.59261.diff did not seem to fix the problem.
20190701 17:39:16 all (5/10): swap5.sh
With D20635.59244.diff I now see a deadlock:
https://people.freebsd.org/~pho/stress/log/dougm044.txt
20190701 07:18:49 all (2/70): mmap10.sh panic: _vm_map_clip_start: invalid clip of entry 0xfffff801496d34d0 cpuid = 9 time = 1561958339 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00ad7307a0 vpanic() at vpanic+0x19d/frame 0xfffffe00ad7307f0 panic() at panic+0x43/frame 0xfffffe00ad730850 _vm_map_clip_start() at _vm_map_clip_start+0x81/frame 0xfffffe00ad730890 vm_map_unwire() at vm_map_unwire+0x350/frame 0xfffffe00ad730960 sys_munlockall() at sys_munlockall+0x71/frame 0xfffffe00ad730990 amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00ad730ab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00ad730ab0
Jun 30 2019
With D20635.59216.diff I see:
20190630 11:39:48 all (5/10): swap5.sh panic: swapoff: failed to locate 8036 swap blocks cpuid = 2 time = 1561887597 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00ad6c26a0 vpanic() at vpanic+0x19d/frame 0xfffffe00ad6c26f0 panic() at panic+0x43/frame 0xfffffe00ad6c2750 swap_pager_swapoff() at swap_pager_swapoff+0x194/frame 0xfffffe00ad6c27d0 swapoff_one() at swapoff_one+0x15e/frame 0xfffffe00ad6c2820 sys_swapoff() at sys_swapoff+0x1d6/frame 0xfffffe00ad6c2990 amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00ad6c2ab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00ad6c2ab0 --- syscall (424, FreeBSD ELF64, sys_swapoff), rip = 0x8002f88ca, rsp = 0x7fffffffe4c8, rbp = 0x7fffffffe5f0 --- KDB: enter: panic [ thread pid 15529 tid 100290 ] Stopped at kdb_enter+0x3b: movq $0,kdb_why db> x/s version version: FreeBSD 13.0-CURRENT #1 r349552M: Sun Jun 30 10:48:06 CEST 2019\012 pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012 db>
The swap5.sh test scenario is a "Test with out of swap space" test using a 4g memory disk as swap.
On i386 I ran a buildworld / installworld + random stress2 tests for a total of 10H30.
On amd64 I ran random stress2 test for 10H30.
No problems seen.
Jun 29 2019
I saw the same problem with Diff 59201. I building with Diff 59205 now.
I reproduced the problem and verified that this patch fixed the problems.
No other problems seen with this patch.
D20635.59149.diff seems to be looping in swapoff(8):
https://people.freebsd.org/~pho/stress/log/dougm041.txt
D20664.59090.diff completed tests on amd64 without any problems seen.
Jun 28 2019
I ran random tests for 3 hours with D20633.59145.diff.
No problems seen.
Jun 27 2019
I got this strange one while testing on i386. Not sure if it's related to your patch.
I ran all of the threaded tests I have, twice. I also did a buildworld / installworld. This on amd64.
I'll run some i386 tests once my test box is available.
Jun 26 2019
This looks promising.
I ran tests on i386 for three hours and uptime for amd64 is 5 hours. I'll leave the amd64 tests running.
(kgdb) bt #0 doadump (textdump=0x0) at include/pcpu.h:246 #1 0xffffffff8049c4fb in db_dump (dummy=<value optimized out>, dummy3=<value optimized out>, dummy4=<value optimized out>) at ../../../ddb/db_command.c:575 #2 0xffffffff8049c2c9 in db_command (cmd_table=<value optimized out>, dopager=0x0) at ../../../ddb/db_command.c:482 #3 0xffffffff804a1248 in db_script_exec (scriptname=<value optimized out>, warnifnotfound=<value optimized out>) at ../../../ddb/db_script.c:304 #4 0xffffffff8049c2c9 in db_command (cmd_table=<value optimized out>, dopager=0x1) at ../../../ddb/db_command.c:482 #5 0xffffffff8049c044 in db_command_loop () at ../../../ddb/db_command.c:535 #6 0xffffffff8049f1ef in db_trap (type=<value optimized out>, code=<value optimized out>) at ../../../ddb/db_main.c:252 #7 0xffffffff80c1384c in kdb_trap (type=0x3, code=0x0, tf=<value optimized out>) at ../../../kern/subr_kdb.c:692 #8 0xffffffff8109ec31 in trap (frame=0xfffffe00ad617cd0) at ../../../amd64/amd64/trap.c:621 #9 0xffffffff810776d5 in calltrap () at ../../../amd64/amd64/exception.S:232 #10 0xffffffff80c12f5b in kdb_enter (why=0xffffffff81333b87 "panic", msg=<value optimized out>) at include/cpufunc.h:65 #11 0xffffffff80bca7ea in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at ../../../kern/kern_shutdown.c:894 #12 0xffffffff80bca563 in panic (fmt=<value optimized out>) at ../../../kern/kern_shutdown.c:832 #13 0xffffffff80f04b4a in vm_map_splay_split (map=0xfffff8000ae825a0, addr=0x3ad618460, length=0x20000000, out_llist=0x7fffffffffffffff, out_rlist=0xfffff8000ae825a0) at ../../../vm/vm_map.c:1085 #14 0x7fffffffffffffff in ?? () #15 0x0000000600000001 in ?? () #16 0xfffff8000ae825a0 in ?? () #17 0x7fffffffffffffff in ?? () #18 0x0000000000000000 in ?? () Current language: auto; currently minimal (kgdb) f 13 #13 0xffffffff80f04b4a in vm_map_splay_split (map=0xfffff8000ae825a0, addr=0x3ad618460, length=0x20000000, out_llist=0x7fffffffffffffff, out_rlist=0xfffff8000ae825a0) at ../../../vm/vm_map.c:1085 1085 SPLAY_LEFT_STEP(root, y, rlist, (kgdb) l 1080 root = map->root; 1081 while (root != NULL && root->max_free >= length) { 1082 KASSERT(llist->end <= root->start && root->end <= rlist->start, 1083 ("%s: root not within tree bounds", __func__)); 1084 if (addr < root->start) { 1085 SPLAY_LEFT_STEP(root, y, rlist, 1086 y->max_free >= length && addr < y->start); 1087 } else if (addr >= root->end) { 1088 SPLAY_RIGHT_STEP(root, y, llist, 1089 y->max_free >= length && addr >= y->end); (kgdb) info loc max_free = 0xfffff8000aee1002 llist = 0xaad617f00 rlist = 0x7fffffffffffffff root = 0xfffffe00ad618460 y = 0x0 (kgdb) p *map $1 = {header = {prev = 0xffffffff81eaa140, next = 0xfffff8000aee1000, left = 0x0, right = 0xfffff8000aee1010, start = 0x0, end = 0xffffffff81eaa670, next_read = 0x0, max_free = 0x0, object = {vm_object = 0x0, sub_map = 0x0}, offset = 0x0, eflags = 0x0, protection = 0x0, max_protection = 0x0, inheritance = 0x0, read_ahead = 0x0, wired_count = 0x63fc98, cred = 0xfffff80003683f00, wiring_thread = 0xffffffff81e889a8}, lock = {lock_object = {lo_name = 0x0, lo_flags = 0x0, lo_data = 0x0, lo_witness = 0xfffff80006118880}, sx_lock = 0xfffff8000611b480}, system_mtx = {lock_object = { lo_name = 0xfffff8000a9ff758 "", lo_flags = 0x6073700, lo_data = 0xfffff800, lo_witness = 0x18793}, mtx_lock = 0x0}, nentries = 0x0, size = 0x0, timestamp = 0x0, needs_wakeup = 0x0, system_map = 0x0, flags = 0x0, root = 0x0, pmap = 0x0, anon_loc = 0x0, busy = 0xae82678} (kgdb)
i386 booted with this patch.
On amd64 I got:
20190626 08:28:23 all (220/636): beneath.sh Fatal double fault rip 0xffffffff80ef9584 rsp 0xfffffe00a7c00f80 rbp 0xfffffe00a7c010b0 rax 0xfffff80397d07000 rdx 0x1 rbx 0 rcx 0 rsi 0xfffff803e0c70000 rdi 0 r8 0 r9 0xfffffe00a7c014f8 r10 0xfffffe00a7c014cc r11 0xfffffe00a7c01527 r12 0x1 r13 0xfffff803979ae000 r14 0 r15 0xfffff803e0c70000 rflags 0x10282 cs 0x20 ss 0x28 ds 0x3b es 0x3b fs 0x13 gs 0x1b fsbase 0x8002438d0 gsbase 0xffffffff820c7480 kgsbase 0 cpuid = 9; apic id = 09 panic: double fault cpuid = 9 time = 1561530504 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0006bd1db0 vpanic() at vpanic+0x19d/frame 0xfffffe0006bd1e00 panic() at panic+0x43/frame 0xfffffe0006bd1e60 dblfault_handler() at dblfault_handler+0x1de/frame 0xfffffe0006bd1f30 Xdblfault() at Xdblfault+0xc3/frame 0xfffffe0006bd1f30 --- trap 0x17, rip = 0xffffffff80ef9584, rsp = 0xfffffe00a7c00f80, rbp = 0xfffffe00a7c010b0 --- vm_fault_hold() at vm_fault_hold+0x14/frame 0xfffffe00a7c010b0 vm_fault() at vm_fault+0x60/frame 0xfffffe00a7c010f0 trap_pfault() at trap_pfault+0x188/frame 0xfffffe00a7c01140 trap() at trap+0x2b4/frame 0xfffffe00a7c01250 calltrap() at calltrap+0x8/frame 0xfffffe00a7c01250 --- trap 0xc, rip = 0xffffffff80f06a14, rsp = 0xfffffe00a7c01320, rbp = 0xfffffe00a7c01400 --- vm_map_lookup() at vm_map_lookup+0x294/frame 0xfffffe00a7c01400 vm_fault_hold() at vm_fault_hold+0x80/frame 0xfffffe00a7c01550 vm_fault() at vm_fault+0x60/frame 0xfffffe00a7c01590 trap_pfault() at trap_pfault+0x188/frame 0xfffffe00a7c015e0 trap() at trap+0x2b4/frame 0xfffffe00a7c016f0 calltrap() at calltrap+0x8/frame 0xfffffe00a7c016f0 --- trap 0xc, rip = 0x7ffffffcb001, rsp = 0xfffffe00a7c017c0, rbp = 0xfffffe00a7c01988 --- ??() at 0x7ffffffcb001/frame 0xfffffe00a7c01988 ??() at 0xfffff803e0c70000/frame 0xffffffff820be480 ??() at 0xfffff800036bf000/frame 0xfffff800036bf000 ??() at 0xfffff800036b7000/frame 0xffffffff81ea54c0 ll() at 0xb0000/frame 0xffffffff81ea6138 KDB: enter: panic [ thread pid 19330 tid 100497 ] Stopped at kdb_enter+0x3b: movq $0,kdb_why db>
https://people.freebsd.org/~pho/stress/log/dougm039.txt
I'll retry with vm_map.c compiled with '-O0' for more debug info.
This is what I see on i386:
uhub3: 2 ports with 2 removable, self powered uhub5: 2 ports with 2 removable, self powered panic: Bad entry start/end for new stack entry cpuid = 3 time = 1561525410 KDB: stack backtrace: db_trace_self_wrapper(e64bdd,1bd0e18,0,1251889c,bd29a1,...) at db_trace_self_wrapper+0x2a/frame 0x12518870 kdb_backtrace(7,3,3,ffbdf000,ffbff000,...) at kdb_backtrace+0x2e/frame 0x125188d0 vpanic(162b7f6,12518918,12518918,1251893c,12bd2ec,...) at vpanic+0x121/frame 0x125188f8 panic(162b7f6,17fc10c0,17fc10c0,fbbff000,2104fbac,...) at panic+0x14/frame 0x1251890c vm_map_stack_locked(4000000,20000,3,7,1000) at vm_map_stack_locked+0x19c/frame 0x1251893c vm_map_stack(2104fbac,fbbff000,4000000,3,7,1000) at vm_map_stack+0x9e/frame 0x12518968 exec_new_vmspace(12518a98,1c2c880) at exec_new_vmspace+0x2f6/frame 0x125189c0 exec_elf32_imgact(12518a98) at exec_elf32_imgact+0x7f6/frame 0x12518a4c kern_execve(99d9a80,12518c70,0) at kern_execve+0x546/frame 0x12518c44 start_init(0,12518ce8) at start_init+0x190/frame 0x12518cb4 fork_exit(f6e2d0,0,12518ce8,0,0,...) at fork_exit+0x6c/frame 0x12518cd4 fork_trampoline() at 0xffc033ca/frame 0x12518cd4 --- trap 0, eip = 0, esp = 0x12518d20, ebp = 0 --- (null)() at 0 KDB: enter: panic [ thread pid 1 tid 100002 ] Stopped at kdb_enter+0x35: movl $0,kdb_why db> x/s version version: FreeBSD 13.0-CURRENT #0 r349393: Wed Jun 26 06:46:30 CEST 2019\012 pho@x4.osted.lan:/usr/src/sys/i386/compile/PHO\012 db>
Jun 24 2019
With D20711.58934.diff I get:
20190624 05:49:07 all (2/15): mmap11.sh panic: vm_map_protect: wrong amount reserved cpuid = 1 time = 1561348157 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00aca04800 vpanic() at vpanic+0x19d/frame 0xfffffe00aca04850 panic() at panic+0x43/frame 0xfffffe00aca048b0 vm_map_protect() at vm_map_protect+0x7b9/frame 0xfffffe00aca04960 kern_mprotect() at kern_mprotect+0xc0/frame 0xfffffe00aca04990 amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00aca04ab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00aca04ab0
Jun 23 2019
Ah!
https://people.freebsd.org/~pho/stress/log/dougm036.txt is patched with D20711.58892.diff
I updated dougm036.txt with some gdb output.
During a "init 6" with this kernel I got:
Jun 22 2019
I ran all of the mmap() tests I have on D20711.58892.diff for a total of 8 hours.
I also did a buildworld / installworld.
No problems seen with this partial test.
With D20711.58891.diff I got:
I'm almost done with the amd64 tests and only found one instance of the printf:
20190621 13:21:33 all (235/646): mmap14.sh _vm_map_clip_start: simplifying entry start 7fffdfdfe000 end 7fffdffde000 next_read 7fffdfdfd000 max_free 7ff7de9fc000 eflags 30000 object-type -1
Jun 20 2019
No.
This is the first time I ran tests on i386, so I have no way of knowing if this was an issue before.
doug033.txt was on amd64 and doug034.txt on i386.
Today on amd64 I ran all of the mmap() tests I have (that is, not a full test) and observed no printfs.
I'll start a full test on amd64, just to be sure.
I forgot to mention that it is *only* on i386 I see this. No printfs on amd64.
I see lots of _vm_map_clip_start on i386 with D20633.58825.diff.
https://people.freebsd.org/~pho/stress/log/dougm034.txt
Jun 15 2019
No sorry, other tests also produced the printout.
Here's the full console log: https://people.freebsd.org/~pho/stress/log/dougm033.txt
I ran all of the mmap(2) tests I have + a buildworld / installworld.
No problems seen.
Jun 14 2019
The mmap11.sh test triggered these :
_vm_map_clip_start: simplifying entry start 206000 end 217000 next_read 203000 max_free 7ff7ddff9000 eflags 24 object-type 0 _vm_map_clip_start: simplifying entry start 20d000 end 217000 next_read 203000 max_free 7ff7ddff9000 eflags 24 object-type 0 _vm_map_clip_start: simplifying entry start 800e00000 end 802014000 next_read 800e00000 max_free 7ff7d1df9000 eflags 0 object-type 0 _vm_map_clip_start: simplifying entry start 800e00000 end 80b0fb000 next_read 800e00000 max_free 7ff7d1df9000 eflags 0 object-type 0
[root@mercat1 /usr/src/sys/amd64/compile/PHO]# cc -c -O0 -pipe -fno-strict-aliasing -g -nostdinc -I. -I../../.. -I../../../contrib/ck/include -I../../../contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -MD -MF.depend.vm_map.o -MTvm_map.o -fdebug-prefix-map=./machine=/usr/src/sys/amd64/include -fdebug-prefix-map=./x86=/usr/src/sys/x86/include -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -fstack-protector -gdwarf-2 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wno-error-tautological-compare -Wno-error-empty-body -Wno-error-parentheses-equality -Wno-error-unused-function -Wno-error-pointer-sign -Wno-error-shift-negative-value -Wno-address-of-packed-member -mno-aes -mno-avx -std=iso9899:1999 -Werror ../../../vm/vm_map.c ../../../vm/vm_map.c:2203:36: error: member reference type 'union vm_map_object' is not a pointer; did you mean to use '.'? entry->object ? entry->object->type : -1); ~~~~~~~~~~~~~^~ . ../../../vm/vm_map.c:2203:38: error: no member named 'type' in 'union vm_map_object' entry->object ? entry->object->type : -1); ~~~~~~~~~~~~~ ^ 2 errors generated. [root@mercat1 /usr/src/sys/amd64/compile/PHO]#
panic: _vm_map_clip_begin: entry can be simplified cpuid = 1 time = 1560522673 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe02cf87a7b0 vpanic() at vpanic+0x19d/frame 0xfffffe02cf87a800 panic() at panic+0x43/frame 0xfffffe02cf87a860 _vm_map_clip_start() at _vm_map_clip_start+0x10a/frame 0xfffffe02cf87a8a0 vm_map_delete() at vm_map_delete+0x99/frame 0xfffffe02cf87a920 kern_munmap() at kern_munmap+0x115/frame 0xfffffe02cf87a990 amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe02cf87aab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe02cf87aab0
I'm building a kernel with this patch right now.
Jun 13 2019
Jun 12 2019
Jun 11 2019
I tested this patch on i386 with all of the mmap() tests I have. I also ran the same tests on amd64 plus a buildworld.
I can run a full test if you prefer that.
Yes, this fixes the boot issue for me.
Jun 10 2019
I do not seem to be able to boot successfully with this patch?
I ran a full stress2 test with debug.vmmap_check=1. This included a buildworld / installworld.
No problems seen.
Jun 8 2019
All testing I do is with:
Jun 7 2019
I ran tests on D19826.58346.diff for 3 1/2 hours. This included a buildworld / installworld.
No problems seen.
Jun 6 2019
With D19826.58291.diff I see:
May 28 2019
I ran test for 5 hours on D20274.57979.diff.
The test included a buildworld.
No problems seen.
With D20274.57978.diff I get
cc -c -O2 -pipe -fno-strict-aliasing -g -nostdinc -I. -I../../.. -I../../../contrib/ck/include -I../../../contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -MD -MF.depend.vm_reserv.o -MTvm_reserv.o -fdebug-prefix-map=./machine=/usr/src/sys/amd64/include -fdebug-prefix-map=./x86=/usr/src/sys/x86/include -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -fstack-protector -gdwarf-2 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wno-error-tautological-compare -Wno-error-empty-body -Wno-error-parentheses-equality -Wno-error-unused-function -Wno-error-pointer-sign -Wno-error-shift-negative-value -Wno-address-of-packed-member -mno-aes -mno-avx -std=iso9899:1999 -Werror ../../../vm/vm_reserv.c ../../../vm/vm_reserv.c:1288:59: error: operator '<<' has lower precedence than '-'; '-' will be evaluated first [-Werror,-Wshift-op-parentheses] changes = rv->popmap[i] | (1UL << (low_index % NBPOPMAP) - 1); ~~ ~~~~~~~~~~~~~~~~~~~~~~~^~~ ../../../vm/vm_reserv.c:1288:59: note: place parentheses around the '-' expression to silence this warning changes = rv->popmap[i] | (1UL << (low_index % NBPOPMAP) - 1); ^ ( ) 1 error generated.
May 26 2019
I ran all of the devfs tests I have.
I added a new parallel mkdir() and rmdir() test with VM pressure.
No problems seen.
I ran tests on D20266.57868.diff for 24 hours without seeing any problems.
May 22 2019
May 21 2019
I reproduced the problem and verified that the patch fixes it.
I ran all of the stress2 tests on both amd64 and i386.
No problems seen.
May 20 2019
May 18 2019
I ran tests on D20299.57522.diff for five hours without seeing any problems.
With D20299.57518.diff I see
May 17 2019
$ cc -c -O0 -pipe -fno-strict-aliasing -g -nostdinc -I. -I../../.. -I../../../contrib/ck/include -I../../../contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -MD -MF.depend.vm_reserv.o -MTvm_reserv.o -fdebug-prefix-map=./machine=/usr/src/sys/amd64/include -fdebug-prefix-map=./x86=/usr/src/sys/x86/include -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -fstack-protector -gdwarf-2 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wno-error-tautological-compare -Wno-error-empty-body -Wno-error-parentheses-equality -Wno-error-unused-function -Wno-error-pointer-sign -Wno-error-shift-negative-value -Wno-address-of-packed-member -mno-aes -mno-avx -std=iso9899:1999 -Werror ../../../vm/vm_reserv.c ../../../vm/vm_reserv.c:181:34: error: & has lower precedence than ==; == will be evaluated first [-Werror,-Wparentheses] i < end / NBPOPMAP && mask & popmap[i] == 0; ^~~~~~~~~~~~~~~~ ../../../vm/vm_reserv.c:181:34: note: place parentheses around the '==' expression to silence this warning i < end / NBPOPMAP && mask & popmap[i] == 0; ^ ( ) ../../../vm/vm_reserv.c:181:34: note: place parentheses around the & expression to evaluate it first i < end / NBPOPMAP && mask & popmap[i] == 0; ^ ( ) ../../../vm/vm_reserv.c:188:15: error: & has lower precedence than ==; == will be evaluated first [-Werror,-Wparentheses] return (mask & popmap[i] == 0); ^~~~~~~~~~~~~~~~ ../../../vm/vm_reserv.c:188:15: note: place parentheses around the '==' expression to silence this warning return (mask & popmap[i] == 0); ^ ( ) ../../../vm/vm_reserv.c:188:15: note: place parentheses around the & expression to evaluate it first return (mask & popmap[i] == 0); ^ ( ) 2 errors generated. $
Here's a buildworld from a single user mode boot:
# /usr/bin/time -h ./zzbuildworld.sh FreeBSD t2.osted.lan 13.0-CURRENT FreeBSD 13.0-CURRENT #2 r347793: Thu May 16 19:17:57 CEST 2019 pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO amd64 vm.pmap.pde.promotions: 302 vm.pmap.pde.p_failures: 111 vm.pmap.pde.mappings: 0 vm.pmap.pde.demotions: 27 vm.reserv.reclaimed: 0 vm.reserv.partpopq: DOMAIN LEVEL SIZE NUMBER
May 16 2019
I ran tests for 18 hours, including a buildworld.
Last vm.reserv.broken value was 247091.
No problems seen.
OK.
vm.reserv.broken: 164537
Uptime is 12 hours on amd64
May 15 2019
I ran into another problem, so now I'm switching from i386 to amd64.
I have not been able to reproduce the page fault with or without your patch.
I have resumed testing with Diff 57406.
I'm a bit low on test H/W, so I ran D20256.57402.diff on an i386 test host and got this:
May 14 2019
In D20256#436596, @dougm wrote:In D20256#436594, @pho wrote:Sure happy to. Will I get credit for doing so?
I regret failing to acknowledge you in some, or all, recent commits. If you want me to make some sort of public statement about it, I will.
I'll try to do better.
Sure happy to. Will I get credit for doing so?
May 9 2019
I ran the full stress2 test using two hosts. No problems seen.
May 4 2019
With D20001.57040.diff I got:
May 3 2019
In D20001#433848, @dougm wrote:The patch that adds debug_counter statistics is written to be applied to the unmodified code. The patch under review here changes the type of the parameter 'count' from a value type to a pointer type, so the line
count_err += count - avg_count;
should be modified to
count_err += *count - avg_count;
to apply compute the statistics properly for the modified code. I apologize for not making this clear before.
Here's the i386 stats: