I saw the same problem with Diff 59201. I building with Diff 59205 now.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jun 29 2019
I reproduced the problem and verified that this patch fixed the problems.
No other problems seen with this patch.
D20635.59149.diff seems to be looping in swapoff(8):
https://people.freebsd.org/~pho/stress/log/dougm041.txt
D20664.59090.diff completed tests on amd64 without any problems seen.
Jun 28 2019
I ran random tests for 3 hours with D20633.59145.diff.
No problems seen.
Jun 27 2019
I got this strange one while testing on i386. Not sure if it's related to your patch.
I ran all of the threaded tests I have, twice. I also did a buildworld / installworld. This on amd64.
I'll run some i386 tests once my test box is available.
Jun 26 2019
This looks promising.
I ran tests on i386 for three hours and uptime for amd64 is 5 hours. I'll leave the amd64 tests running.
(kgdb) bt #0 doadump (textdump=0x0) at include/pcpu.h:246 #1 0xffffffff8049c4fb in db_dump (dummy=<value optimized out>, dummy3=<value optimized out>, dummy4=<value optimized out>) at ../../../ddb/db_command.c:575 #2 0xffffffff8049c2c9 in db_command (cmd_table=<value optimized out>, dopager=0x0) at ../../../ddb/db_command.c:482 #3 0xffffffff804a1248 in db_script_exec (scriptname=<value optimized out>, warnifnotfound=<value optimized out>) at ../../../ddb/db_script.c:304 #4 0xffffffff8049c2c9 in db_command (cmd_table=<value optimized out>, dopager=0x1) at ../../../ddb/db_command.c:482 #5 0xffffffff8049c044 in db_command_loop () at ../../../ddb/db_command.c:535 #6 0xffffffff8049f1ef in db_trap (type=<value optimized out>, code=<value optimized out>) at ../../../ddb/db_main.c:252 #7 0xffffffff80c1384c in kdb_trap (type=0x3, code=0x0, tf=<value optimized out>) at ../../../kern/subr_kdb.c:692 #8 0xffffffff8109ec31 in trap (frame=0xfffffe00ad617cd0) at ../../../amd64/amd64/trap.c:621 #9 0xffffffff810776d5 in calltrap () at ../../../amd64/amd64/exception.S:232 #10 0xffffffff80c12f5b in kdb_enter (why=0xffffffff81333b87 "panic", msg=<value optimized out>) at include/cpufunc.h:65 #11 0xffffffff80bca7ea in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at ../../../kern/kern_shutdown.c:894 #12 0xffffffff80bca563 in panic (fmt=<value optimized out>) at ../../../kern/kern_shutdown.c:832 #13 0xffffffff80f04b4a in vm_map_splay_split (map=0xfffff8000ae825a0, addr=0x3ad618460, length=0x20000000, out_llist=0x7fffffffffffffff, out_rlist=0xfffff8000ae825a0) at ../../../vm/vm_map.c:1085 #14 0x7fffffffffffffff in ?? () #15 0x0000000600000001 in ?? () #16 0xfffff8000ae825a0 in ?? () #17 0x7fffffffffffffff in ?? () #18 0x0000000000000000 in ?? () Current language: auto; currently minimal (kgdb) f 13 #13 0xffffffff80f04b4a in vm_map_splay_split (map=0xfffff8000ae825a0, addr=0x3ad618460, length=0x20000000, out_llist=0x7fffffffffffffff, out_rlist=0xfffff8000ae825a0) at ../../../vm/vm_map.c:1085 1085 SPLAY_LEFT_STEP(root, y, rlist, (kgdb) l 1080 root = map->root; 1081 while (root != NULL && root->max_free >= length) { 1082 KASSERT(llist->end <= root->start && root->end <= rlist->start, 1083 ("%s: root not within tree bounds", __func__)); 1084 if (addr < root->start) { 1085 SPLAY_LEFT_STEP(root, y, rlist, 1086 y->max_free >= length && addr < y->start); 1087 } else if (addr >= root->end) { 1088 SPLAY_RIGHT_STEP(root, y, llist, 1089 y->max_free >= length && addr >= y->end); (kgdb) info loc max_free = 0xfffff8000aee1002 llist = 0xaad617f00 rlist = 0x7fffffffffffffff root = 0xfffffe00ad618460 y = 0x0 (kgdb) p *map $1 = {header = {prev = 0xffffffff81eaa140, next = 0xfffff8000aee1000, left = 0x0, right = 0xfffff8000aee1010, start = 0x0, end = 0xffffffff81eaa670, next_read = 0x0, max_free = 0x0, object = {vm_object = 0x0, sub_map = 0x0}, offset = 0x0, eflags = 0x0, protection = 0x0, max_protection = 0x0, inheritance = 0x0, read_ahead = 0x0, wired_count = 0x63fc98, cred = 0xfffff80003683f00, wiring_thread = 0xffffffff81e889a8}, lock = {lock_object = {lo_name = 0x0, lo_flags = 0x0, lo_data = 0x0, lo_witness = 0xfffff80006118880}, sx_lock = 0xfffff8000611b480}, system_mtx = {lock_object = { lo_name = 0xfffff8000a9ff758 "", lo_flags = 0x6073700, lo_data = 0xfffff800, lo_witness = 0x18793}, mtx_lock = 0x0}, nentries = 0x0, size = 0x0, timestamp = 0x0, needs_wakeup = 0x0, system_map = 0x0, flags = 0x0, root = 0x0, pmap = 0x0, anon_loc = 0x0, busy = 0xae82678} (kgdb)
i386 booted with this patch.
On amd64 I got:
20190626 08:28:23 all (220/636): beneath.sh Fatal double fault rip 0xffffffff80ef9584 rsp 0xfffffe00a7c00f80 rbp 0xfffffe00a7c010b0 rax 0xfffff80397d07000 rdx 0x1 rbx 0 rcx 0 rsi 0xfffff803e0c70000 rdi 0 r8 0 r9 0xfffffe00a7c014f8 r10 0xfffffe00a7c014cc r11 0xfffffe00a7c01527 r12 0x1 r13 0xfffff803979ae000 r14 0 r15 0xfffff803e0c70000 rflags 0x10282 cs 0x20 ss 0x28 ds 0x3b es 0x3b fs 0x13 gs 0x1b fsbase 0x8002438d0 gsbase 0xffffffff820c7480 kgsbase 0 cpuid = 9; apic id = 09 panic: double fault cpuid = 9 time = 1561530504 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0006bd1db0 vpanic() at vpanic+0x19d/frame 0xfffffe0006bd1e00 panic() at panic+0x43/frame 0xfffffe0006bd1e60 dblfault_handler() at dblfault_handler+0x1de/frame 0xfffffe0006bd1f30 Xdblfault() at Xdblfault+0xc3/frame 0xfffffe0006bd1f30 --- trap 0x17, rip = 0xffffffff80ef9584, rsp = 0xfffffe00a7c00f80, rbp = 0xfffffe00a7c010b0 --- vm_fault_hold() at vm_fault_hold+0x14/frame 0xfffffe00a7c010b0 vm_fault() at vm_fault+0x60/frame 0xfffffe00a7c010f0 trap_pfault() at trap_pfault+0x188/frame 0xfffffe00a7c01140 trap() at trap+0x2b4/frame 0xfffffe00a7c01250 calltrap() at calltrap+0x8/frame 0xfffffe00a7c01250 --- trap 0xc, rip = 0xffffffff80f06a14, rsp = 0xfffffe00a7c01320, rbp = 0xfffffe00a7c01400 --- vm_map_lookup() at vm_map_lookup+0x294/frame 0xfffffe00a7c01400 vm_fault_hold() at vm_fault_hold+0x80/frame 0xfffffe00a7c01550 vm_fault() at vm_fault+0x60/frame 0xfffffe00a7c01590 trap_pfault() at trap_pfault+0x188/frame 0xfffffe00a7c015e0 trap() at trap+0x2b4/frame 0xfffffe00a7c016f0 calltrap() at calltrap+0x8/frame 0xfffffe00a7c016f0 --- trap 0xc, rip = 0x7ffffffcb001, rsp = 0xfffffe00a7c017c0, rbp = 0xfffffe00a7c01988 --- ??() at 0x7ffffffcb001/frame 0xfffffe00a7c01988 ??() at 0xfffff803e0c70000/frame 0xffffffff820be480 ??() at 0xfffff800036bf000/frame 0xfffff800036bf000 ??() at 0xfffff800036b7000/frame 0xffffffff81ea54c0 ll() at 0xb0000/frame 0xffffffff81ea6138 KDB: enter: panic [ thread pid 19330 tid 100497 ] Stopped at kdb_enter+0x3b: movq $0,kdb_why db>
https://people.freebsd.org/~pho/stress/log/dougm039.txt
I'll retry with vm_map.c compiled with '-O0' for more debug info.
This is what I see on i386:
uhub3: 2 ports with 2 removable, self powered uhub5: 2 ports with 2 removable, self powered panic: Bad entry start/end for new stack entry cpuid = 3 time = 1561525410 KDB: stack backtrace: db_trace_self_wrapper(e64bdd,1bd0e18,0,1251889c,bd29a1,...) at db_trace_self_wrapper+0x2a/frame 0x12518870 kdb_backtrace(7,3,3,ffbdf000,ffbff000,...) at kdb_backtrace+0x2e/frame 0x125188d0 vpanic(162b7f6,12518918,12518918,1251893c,12bd2ec,...) at vpanic+0x121/frame 0x125188f8 panic(162b7f6,17fc10c0,17fc10c0,fbbff000,2104fbac,...) at panic+0x14/frame 0x1251890c vm_map_stack_locked(4000000,20000,3,7,1000) at vm_map_stack_locked+0x19c/frame 0x1251893c vm_map_stack(2104fbac,fbbff000,4000000,3,7,1000) at vm_map_stack+0x9e/frame 0x12518968 exec_new_vmspace(12518a98,1c2c880) at exec_new_vmspace+0x2f6/frame 0x125189c0 exec_elf32_imgact(12518a98) at exec_elf32_imgact+0x7f6/frame 0x12518a4c kern_execve(99d9a80,12518c70,0) at kern_execve+0x546/frame 0x12518c44 start_init(0,12518ce8) at start_init+0x190/frame 0x12518cb4 fork_exit(f6e2d0,0,12518ce8,0,0,...) at fork_exit+0x6c/frame 0x12518cd4 fork_trampoline() at 0xffc033ca/frame 0x12518cd4 --- trap 0, eip = 0, esp = 0x12518d20, ebp = 0 --- (null)() at 0 KDB: enter: panic [ thread pid 1 tid 100002 ] Stopped at kdb_enter+0x35: movl $0,kdb_why db> x/s version version: FreeBSD 13.0-CURRENT #0 r349393: Wed Jun 26 06:46:30 CEST 2019\012 pho@x4.osted.lan:/usr/src/sys/i386/compile/PHO\012 db>
Jun 24 2019
With D20711.58934.diff I get:
20190624 05:49:07 all (2/15): mmap11.sh panic: vm_map_protect: wrong amount reserved cpuid = 1 time = 1561348157 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00aca04800 vpanic() at vpanic+0x19d/frame 0xfffffe00aca04850 panic() at panic+0x43/frame 0xfffffe00aca048b0 vm_map_protect() at vm_map_protect+0x7b9/frame 0xfffffe00aca04960 kern_mprotect() at kern_mprotect+0xc0/frame 0xfffffe00aca04990 amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00aca04ab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00aca04ab0
Jun 23 2019
Ah!
https://people.freebsd.org/~pho/stress/log/dougm036.txt is patched with D20711.58892.diff
I updated dougm036.txt with some gdb output.
During a "init 6" with this kernel I got:
Jun 22 2019
I ran all of the mmap() tests I have on D20711.58892.diff for a total of 8 hours.
I also did a buildworld / installworld.
No problems seen with this partial test.
With D20711.58891.diff I got:
I'm almost done with the amd64 tests and only found one instance of the printf:
20190621 13:21:33 all (235/646): mmap14.sh _vm_map_clip_start: simplifying entry start 7fffdfdfe000 end 7fffdffde000 next_read 7fffdfdfd000 max_free 7ff7de9fc000 eflags 30000 object-type -1
Jun 20 2019
No.
This is the first time I ran tests on i386, so I have no way of knowing if this was an issue before.
doug033.txt was on amd64 and doug034.txt on i386.
Today on amd64 I ran all of the mmap() tests I have (that is, not a full test) and observed no printfs.
I'll start a full test on amd64, just to be sure.
I forgot to mention that it is *only* on i386 I see this. No printfs on amd64.
I see lots of _vm_map_clip_start on i386 with D20633.58825.diff.
https://people.freebsd.org/~pho/stress/log/dougm034.txt
Jun 15 2019
No sorry, other tests also produced the printout.
Here's the full console log: https://people.freebsd.org/~pho/stress/log/dougm033.txt
I ran all of the mmap(2) tests I have + a buildworld / installworld.
No problems seen.
Jun 14 2019
The mmap11.sh test triggered these :
_vm_map_clip_start: simplifying entry start 206000 end 217000 next_read 203000 max_free 7ff7ddff9000 eflags 24 object-type 0 _vm_map_clip_start: simplifying entry start 20d000 end 217000 next_read 203000 max_free 7ff7ddff9000 eflags 24 object-type 0 _vm_map_clip_start: simplifying entry start 800e00000 end 802014000 next_read 800e00000 max_free 7ff7d1df9000 eflags 0 object-type 0 _vm_map_clip_start: simplifying entry start 800e00000 end 80b0fb000 next_read 800e00000 max_free 7ff7d1df9000 eflags 0 object-type 0
[root@mercat1 /usr/src/sys/amd64/compile/PHO]# cc -c -O0 -pipe -fno-strict-aliasing -g -nostdinc -I. -I../../.. -I../../../contrib/ck/include -I../../../contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -MD -MF.depend.vm_map.o -MTvm_map.o -fdebug-prefix-map=./machine=/usr/src/sys/amd64/include -fdebug-prefix-map=./x86=/usr/src/sys/x86/include -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -fstack-protector -gdwarf-2 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wno-error-tautological-compare -Wno-error-empty-body -Wno-error-parentheses-equality -Wno-error-unused-function -Wno-error-pointer-sign -Wno-error-shift-negative-value -Wno-address-of-packed-member -mno-aes -mno-avx -std=iso9899:1999 -Werror ../../../vm/vm_map.c ../../../vm/vm_map.c:2203:36: error: member reference type 'union vm_map_object' is not a pointer; did you mean to use '.'? entry->object ? entry->object->type : -1); ~~~~~~~~~~~~~^~ . ../../../vm/vm_map.c:2203:38: error: no member named 'type' in 'union vm_map_object' entry->object ? entry->object->type : -1); ~~~~~~~~~~~~~ ^ 2 errors generated. [root@mercat1 /usr/src/sys/amd64/compile/PHO]#
panic: _vm_map_clip_begin: entry can be simplified cpuid = 1 time = 1560522673 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe02cf87a7b0 vpanic() at vpanic+0x19d/frame 0xfffffe02cf87a800 panic() at panic+0x43/frame 0xfffffe02cf87a860 _vm_map_clip_start() at _vm_map_clip_start+0x10a/frame 0xfffffe02cf87a8a0 vm_map_delete() at vm_map_delete+0x99/frame 0xfffffe02cf87a920 kern_munmap() at kern_munmap+0x115/frame 0xfffffe02cf87a990 amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe02cf87aab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe02cf87aab0
I'm building a kernel with this patch right now.
Jun 13 2019
Jun 12 2019
Jun 11 2019
I tested this patch on i386 with all of the mmap() tests I have. I also ran the same tests on amd64 plus a buildworld.
I can run a full test if you prefer that.
Yes, this fixes the boot issue for me.
Jun 10 2019
I do not seem to be able to boot successfully with this patch?
I ran a full stress2 test with debug.vmmap_check=1. This included a buildworld / installworld.
No problems seen.
Jun 8 2019
All testing I do is with:
Jun 7 2019
I ran tests on D19826.58346.diff for 3 1/2 hours. This included a buildworld / installworld.
No problems seen.
Jun 6 2019
With D19826.58291.diff I see:
May 28 2019
I ran test for 5 hours on D20274.57979.diff.
The test included a buildworld.
No problems seen.
With D20274.57978.diff I get
cc -c -O2 -pipe -fno-strict-aliasing -g -nostdinc -I. -I../../.. -I../../../contrib/ck/include -I../../../contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -MD -MF.depend.vm_reserv.o -MTvm_reserv.o -fdebug-prefix-map=./machine=/usr/src/sys/amd64/include -fdebug-prefix-map=./x86=/usr/src/sys/x86/include -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -fstack-protector -gdwarf-2 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wno-error-tautological-compare -Wno-error-empty-body -Wno-error-parentheses-equality -Wno-error-unused-function -Wno-error-pointer-sign -Wno-error-shift-negative-value -Wno-address-of-packed-member -mno-aes -mno-avx -std=iso9899:1999 -Werror ../../../vm/vm_reserv.c ../../../vm/vm_reserv.c:1288:59: error: operator '<<' has lower precedence than '-'; '-' will be evaluated first [-Werror,-Wshift-op-parentheses] changes = rv->popmap[i] | (1UL << (low_index % NBPOPMAP) - 1); ~~ ~~~~~~~~~~~~~~~~~~~~~~~^~~ ../../../vm/vm_reserv.c:1288:59: note: place parentheses around the '-' expression to silence this warning changes = rv->popmap[i] | (1UL << (low_index % NBPOPMAP) - 1); ^ ( ) 1 error generated.
May 26 2019
I ran all of the devfs tests I have.
I added a new parallel mkdir() and rmdir() test with VM pressure.
No problems seen.
I ran tests on D20266.57868.diff for 24 hours without seeing any problems.
May 22 2019
May 21 2019
I reproduced the problem and verified that the patch fixes it.
I ran all of the stress2 tests on both amd64 and i386.
No problems seen.
May 20 2019
May 18 2019
I ran tests on D20299.57522.diff for five hours without seeing any problems.
With D20299.57518.diff I see
May 17 2019
$ cc -c -O0 -pipe -fno-strict-aliasing -g -nostdinc -I. -I../../.. -I../../../contrib/ck/include -I../../../contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -MD -MF.depend.vm_reserv.o -MTvm_reserv.o -fdebug-prefix-map=./machine=/usr/src/sys/amd64/include -fdebug-prefix-map=./x86=/usr/src/sys/x86/include -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -fstack-protector -gdwarf-2 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wno-error-tautological-compare -Wno-error-empty-body -Wno-error-parentheses-equality -Wno-error-unused-function -Wno-error-pointer-sign -Wno-error-shift-negative-value -Wno-address-of-packed-member -mno-aes -mno-avx -std=iso9899:1999 -Werror ../../../vm/vm_reserv.c ../../../vm/vm_reserv.c:181:34: error: & has lower precedence than ==; == will be evaluated first [-Werror,-Wparentheses] i < end / NBPOPMAP && mask & popmap[i] == 0; ^~~~~~~~~~~~~~~~ ../../../vm/vm_reserv.c:181:34: note: place parentheses around the '==' expression to silence this warning i < end / NBPOPMAP && mask & popmap[i] == 0; ^ ( ) ../../../vm/vm_reserv.c:181:34: note: place parentheses around the & expression to evaluate it first i < end / NBPOPMAP && mask & popmap[i] == 0; ^ ( ) ../../../vm/vm_reserv.c:188:15: error: & has lower precedence than ==; == will be evaluated first [-Werror,-Wparentheses] return (mask & popmap[i] == 0); ^~~~~~~~~~~~~~~~ ../../../vm/vm_reserv.c:188:15: note: place parentheses around the '==' expression to silence this warning return (mask & popmap[i] == 0); ^ ( ) ../../../vm/vm_reserv.c:188:15: note: place parentheses around the & expression to evaluate it first return (mask & popmap[i] == 0); ^ ( ) 2 errors generated. $
Here's a buildworld from a single user mode boot:
# /usr/bin/time -h ./zzbuildworld.sh FreeBSD t2.osted.lan 13.0-CURRENT FreeBSD 13.0-CURRENT #2 r347793: Thu May 16 19:17:57 CEST 2019 pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO amd64 vm.pmap.pde.promotions: 302 vm.pmap.pde.p_failures: 111 vm.pmap.pde.mappings: 0 vm.pmap.pde.demotions: 27 vm.reserv.reclaimed: 0 vm.reserv.partpopq: DOMAIN LEVEL SIZE NUMBER
May 16 2019
I ran tests for 18 hours, including a buildworld.
Last vm.reserv.broken value was 247091.
No problems seen.
OK.
vm.reserv.broken: 164537
Uptime is 12 hours on amd64
May 15 2019
I ran into another problem, so now I'm switching from i386 to amd64.
I have not been able to reproduce the page fault with or without your patch.
I have resumed testing with Diff 57406.
I'm a bit low on test H/W, so I ran D20256.57402.diff on an i386 test host and got this:
May 14 2019
In D20256#436596, @dougm wrote:In D20256#436594, @pho wrote:Sure happy to. Will I get credit for doing so?
I regret failing to acknowledge you in some, or all, recent commits. If you want me to make some sort of public statement about it, I will.
I'll try to do better.
Sure happy to. Will I get credit for doing so?
May 9 2019
I ran the full stress2 test using two hosts. No problems seen.
May 4 2019
With D20001.57040.diff I got:
May 3 2019
In D20001#433848, @dougm wrote:The patch that adds debug_counter statistics is written to be applied to the unmodified code. The patch under review here changes the type of the parameter 'count' from a value type to a pointer type, so the line
count_err += count - avg_count;
should be modified to
count_err += *count - avg_count;
to apply compute the statistics properly for the modified code. I apologize for not making this clear before.
Here's the i386 stats:
While waiting for free H/W I tried to run the tests on a box running i386:
I'm a bit low on free test H/W. I'll see what I can do tomorrow.
May 2 2019
I'll redo the test with 56965 once I free up my test box.
The summary:
This looks promising. I'll add the counters and get some stats.
May 1 2019
swp_pager_getswapspace(32): failed panic: freeing free block: fffc0, size 22, mask 1 cpuid = 21 time = 1556698660 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame 0xfffffe00cc3493b0 kdb_backtrace() at kdb_backtrace+0x53/frame 0xfffffe00cc349460 vpanic() at vpanic+0x19d/frame 0xfffffe00cc3494b0 panic() at panic+0x43/frame 0xfffffe00cc349510 blst_leaf_free() at blst_leaf_free+0x69/frame 0xfffffe00cc349540 blst_meta_free() at blst_meta_free+0x44/frame 0xfffffe00cc3495a0 blst_meta_free() at blst_meta_free+0x158/frame 0xfffffe00cc349600 blst_meta_free() at blst_meta_free+0x158/frame 0xfffffe00cc349660 blst_meta_free() at blst_meta_free+0x158/frame 0xfffffe00cc3496c0 blist_free() at blist_free+0x7e/frame 0xfffffe00cc3496f0 swp_pager_freeswapspace() at swp_pager_freeswapspace+0xb0/frame 0xfffffe00cc349720 swp_pager_meta_free_all() at swp_pager_meta_free_all+0x129/frame 0xfffffe00cc349760 swap_pager_dealloc() at swap_pager_dealloc+0x20d/frame 0xfffffe00cc349790 vm_object_terminate() at vm_object_terminate+0x27b/frame 0xfffffe00cc3497e0 vm_object_deallocate() at vm_object_deallocate+0x412/frame 0xfffffe00cc349840 vm_map_process_deferred() at vm_map_process_deferred+0x79/frame 0xfffffe00cc349860 vm_map_remove() at vm_map_remove+0xc6/frame 0xfffffe00cc349890 vmspace_exit() at vmspace_exit+0xd3/frame 0xfffffe00cc3498d0 exit1() at exit1+0x5ad/frame 0xfffffe00cc349940 sys_sys_exit() at sys_sys_exit+0xd/frame 0xfffffe00cc349950 syscallenter() at syscallenter+0x496/frame 0xfffffe00cc349a00 amd64_syscall() at amd64_syscall+0x4d/frame 0xfffffe00cc349ab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00cc349ab0 --- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x8003ab8da, rsp = 0x7fffffffe4d8, rbp = 0x7fffffffe500 --- KDB: enter: panic [ thread pid 4575 tid 100639 ] Stopped at breakpoint+0x5: popq %rbp db> x/s version version: FreeBSD 13.0-CURRENT #2 r346986M: Wed May 1 09:39:42 CEST 2019\012 pho@t1.osted.lan:/usr/src/sys/amd64/compile/PHO\012 db>
Apr 30 2019
While working on a new test scenario, which ran fine on HEAD:
Testing of r346804+db5a624a416(cpuctl) is complete and no problems seen.
Here's the difference:
I got this page fault using the test scenario from D19886:
Apr 29 2019
I'll see what I can do tomorrow. Could you possibly upload the diff?
I'm just using your test case:
service mdnsd onestart ifconfig vtnet0 delete 2>/dev/null ifconfig epair create ifconfig epair0a 0/24 up ifconfig epair0a destroy service mdnsd onestop
You may need to add some VM pressure.
D20001.56794.diff survived a 6 hour test. Note that a full test will take about 48 hours.
Apr 28 2019
FreeBSD 13.0-CURRENT (PHO) #14 r346790M: Sun Apr 28 21:12:11 CEST 2019 root@t2:~ # sh # /tmp/swp.sh Device 1K-blocks Used Avail Capacity /dev/da0p4 67108864 0 67108864 0% /dev/da0p4 67108864 1292524 65816340 2% /dev/da0p4 67108864 3709428 63399436 6% /dev/da0p4 67108864 7375256 59733608 11% /dev/da0p4 67108864 9751136 57357728 15% :
I'll run some more tests ...
FreeBSD 13.0-CURRENT (PHO) #13 r346790M: Sun Apr 28 20:34:43 CEST 2019 root@t2:~ # sh # /tmp/swp.sh Apr 28 20:38:22 t2 su[1043]: pho to root on /dev/pts/0 Device 1K-blocks Used Avail Capacity /dev/da0p4 67108864 0 67108864 0% swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed :
FreeBSD 13.0-CURRENT (PHO) #12 r346790M: Sun Apr 28 20:01:25 CEST 2019 root@t2:~ # sh # /tmp/swp.sh Device 1K-blocks Used Avail Capacity /dev/da0p4 67108864 0 67108864 0% swp_pager_getswapspace(2): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed :
FreeBSD 13.0-CURRENT (PHO) #11 r346790M: Sun Apr 28 19:16:35 CEST 2019 root@t2:~ # sh # /tmp/swp.sh Device 1K-blocks Used Avail Capacity /dev/da0p4 67108864 0 67108864 0% swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(2): failed swp_pager_getswapspace(1): failed :
FreeBSD 13.0-CURRENT (PHO) #10 r346790M: Sun Apr 28 18:51:09 CEST 2019 root@t2:~ # sh # /tmp/swp.sh Device 1K-blocks Used Avail Capacity /dev/da0p4 67108864 0 67108864 0% swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed :
FreeBSD 13.0-CURRENT (PHO) #9 r346790M: Sun Apr 28 16:30:09 CEST 2019 root@t2:~ # sh # /tmp/swp.sh Device 1K-blocks Used Avail Capacity /dev/da0p4 67108864 0 67108864 0% swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed
FreeBSD 13.0-CURRENT (PHO) #7 r346790M: Sun Apr 28 11:00:47 CEST 2019 root@t2:~ # sh # /tmp/swp.sh Device 1K-blocks Used Avail Capacity /dev/da0p4 67108864 0 67108864 0% swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(2): failed swp_pager_getswapspace(1): failed
56762 fixed the problem seen by my one and only multicast test scenario.
FreeBSD 13.0-CURRENT (PHO) #5 r346790M: Sun Apr 28 10:08:35 CEST 2019 You have new mail. root@t2:~ # sh # /tmp/swp.sh Device 1K-blocks Used Avail Capacity /dev/da0p4 67108864 0 67108864 0% swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(2): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(2): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(2): failed swp_pager_getswapspace(3): failed :
D20001.56768.diff also looks fine.
D20001.56757.diff looks good:
Apr 27 2019
D20001.56752.diff looks good:
With 56747 I got
With 56746 I see:
In D20001#431919, @dougm_rice.edu wrote:Back up to a patch that I understood to have led to a successful 6-hour test, but my confidence about that is a little shaky at the moment.
In D20001#431916, @dougm_rice.edu wrote:I see no change with Diff 56745.
No change from the bad test results produced by Diff 56733? Or no change from a working build without these changes?
I see no change with Diff 56745.
I have run the full stress2 test set on amd64 using r346698+3cb8bc20f88(cpuctl).
No problems seen.
In D20001#431837, @dougm_rice.edu wrote:I see a lot of "swp_pager_getswapspace(1): failed" on the console. This with 56730.
I hope that 56727 worked, and 56733 fixed what 56730 broke. Thanks for the feedback.
I see a lot of "swp_pager_getswapspace(1): failed" on the console. This with 56730.
Apr 26 2019
In D19886#431585, @hselasky wrote:Hi,
Can you add:
options DEBUG_MEMGUARD
To the kernel and then reproduce this test and you'll see the bug happens much earlier using this script:
ifconfig epair create ifconfig epair0a 0/24 up ifconfig epair0a destroy