20250423 09:20:24 all (1/1): syzkaller75.sh panic: vm_pager_assert_in: page 0xfffffe0000b5fcc0 is mapped cpuid = 11 time = 1745393009 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe016df1d7b0 vpanic() at vpanic+0x136/frame 0xfffffe016df1d8e0 panic() at panic+0x43/frame 0xfffffe016df1d940 vm_pager_assert_in() at vm_pager_assert_in+0x1fa/frame 0xfffffe016df1d980 vm_pager_get_pages() at vm_pager_get_pages+0x3d/frame 0xfffffe016df1d9d0 vm_fault() at vm_fault+0x745/frame 0xfffffe016df1db40 vm_map_wire_locked() at vm_map_wire_locked+0x385/frame 0xfffffe016df1dbf0 vm_mmap_object() at vm_mmap_object+0x2fd/frame 0xfffffe016df1dc50 vn_mmap() at vn_mmap+0x152/frame 0xfffffe016df1dce0 kern_mmap() at kern_mmap+0x621/frame 0xfffffe016df1ddc0 sys_mmap() at sys_mmap+0x42/frame 0xfffffe016df1de00 amd64_syscall() at amd64_syscall+0x15a/frame 0xfffffe016df1df30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe016df1df30 --- syscall (0, FreeBSD ELF64, syscall), rip = 0x823efb7fa, rsp = 0x821f52f68, rbp = 0x821f52f90 --- KDB: enter: panic [ thread pid 37673 tid 108519 ] Stopped at kdb_enter+0x33: movq $0,0x104d7a2(%rip) db> x/s version version: FreeBSD 15.0-CURRENT #0 main-n276680-d14036ea424d-dirty: Wed Apr 23 09:10:41 CEST 2025 pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO db>
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Today
Doug,
I have mailed you the test scenario. I will start a test on the latest version of HEAD.
In D49957#1139554, @dougm wrote:In D49957#1139502, @pho wrote:The syzkaller test still triggers a panic:
20250422 20:52:41 all (1/1): syzkaller75.shI don't have access to syzkaller75.sh. Can you provide it?
Yesterday
The syzkaller test still triggers a panic:
Sat, Apr 19
I got this panic, which seems unrelated to me, after 8 hours of testing:
Thu, Apr 3
Feb 27 2025
I completed a full stress2 test run with D49103.151425.patch without seeing any issues.
Feb 24 2025
I got this with D49103.151389.patch:
20250224 11:38:23 all (254/953): mmap8.sh panic: vm_reserv_from_object: msucc doesn't succeed pindex cpuid = 4 time = 1740393505 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01087809b0 vpanic() at vpanic+0x136/frame 0xfffffe0108780ae0 panic() at panic+0x43/frame 0xfffffe0108780b40 vm_reserv_from_object() at vm_reserv_from_object+0xc9/frame 0xfffffe0108780b50 vm_reserv_alloc_page() at vm_reserv_alloc_page+0x72/frame 0xfffffe0108780bb0 vm_page_alloc_domain_after() at vm_page_alloc_domain_after+0x140/frame 0xfffffe0108780c40 vm_page_alloc_after() at vm_page_alloc_after+0x54/frame 0xfffffe0108780cb0 vm_fault_copy_entry() at vm_fault_copy_entry+0x32f/frame 0xfffffe0108780d60 vm_map_protect() at vm_map_protect+0x72b/frame 0xfffffe0108780df0 sys_mprotect() at sys_mprotect+0x9f/frame 0xfffffe0108780e00 amd64_syscall() at amd64_syscall+0x15a/frame 0xfffffe0108780f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0108780f30 --- syscall (74, FreeBSD ELF64, mprotect), rip = 0x822c0ed7a, rsp = 0x820d0ee78, rbp = 0x820d0eea0 --- KDB: enter: panic [ thread pid 82131 tid 102074 ] Stopped at kdb_enter+0x33: movq $0,0x104ed52(%rip) db>
It's easy to reproduce on my test box.
Feb 10 2025
I ran tests with D48588.150722.patch for two days without observing any problems.
Feb 4 2025
I ran all of the stress2 tests using nullfs and also ronald@'s test scenario. LGTM.
Jan 29 2025
FYI: I came across this i386 problem after commit 0078df5f0258.
Trying to mount root from ufs:/dev/vtbd0p1 [rw]... WARNING: WITNESS option enabled, expect reduced performance. WARNING: DIAGNOSTIC option enabled, expect reduced performance. WARNING: 32-bit kernels are deprecated and may be removed in FreeBSD 15.0. panic: vm_page_t 0x5d49be0 phys_addr mismatch ffffffffb8779000 00000000b8779405 cpuid = 3 time = 1738177579 KDB: stack backtrace: db_trace_self_wrapper(7,114beb40,5d49be0,ffffffff,ffffffff,...) at db_trace_self_wrapper+0x28/frame 0x113f6744 vpanic(148badd,113f6780,113f6780,113f67f0,13a54e9,...) at vpanic+0xf4/frame 0x113f6760 panic(148badd,5d49be0,b8779000,ffffffff,b8779405,...) at panic+0x14/frame 0x113f6774 pmap_pae_remove_pages(114128e8) at pmap_pae_remove_pages+0x599/frame 0x113f67f0 exec_new_vmspace(113f6994,188247c) at exec_new_vmspace+0x1cb/frame 0x113f6810 exec_elf32_imgact(113f6994,16d4801,113f6984,0,0,...) at exec_elf32_imgact+0x5e4/frame 0x113f6874 kern_execve(114beb40,113f6a8c,0,1141284c) at kern_execve+0x72c/frame 0x113f6a74 sys_execve(114beb40,114bedf0,114beb40,13ae913,114bede4,...) at sys_execve+0x4e/frame 0x113f6ac8 syscall(113f6ba8,3b,3b,3b,402aaf,...) at syscall+0x1e6/frame 0x113f6b9c Xint0x80_syscall() at 0xffc03479/frame 0x113f6b9c --- syscall (59, FreeBSD ELF32, execve), eip = 0x47e53f, esp = 0xffbfe848, ebp = 0xffbfe858 --- KDB: enter: panic [ thread pid 17 tid 100080 ] Stopped at kdb_enter+0x34: movl $0,kdb_why db> x/s version version: FreeBSD 15.0-CURRENT #0 main-n275068-0078df5f0258-dirty: Wed Jan 29 18:44:45 CET 2025\012 pho@mercat1.netperf.freebsd.org:/mnt25/obj/usr/src/i386.i386/sys/PHO\012 db>
Jan 27 2025
With this patch I caught this:
I ran tests. for a day with D45409.149981.patch. LGTM.
Dec 4 2024
In D47875#1092294, @des wrote:I only needed to run tmpfs24 which does not use $testuser but refuses to run if it is not defined. This patch does not fix every broken test, but it fixes some of them.
In D47876#1092295, @des wrote:Look again. There _is_ a hole at the end of the file.
Dec 3 2024
I'm puzzled by how your change could work for you.
In D47876#1091689, @des wrote:In D47876#1091682, @pho wrote:I too have been thinking about removing the "Missing EOF hole" comment, but stalled because I still am not sure that the output is correct.
Isn't there supposed to be a virtual hole at the end of a file?Only if there isn't already a hole there.
$ /tmp/lsholes tmpfs24.sh Min hole size is 512, file size is 2382. data #1 @ 0, size=2382) hole #2 @ 2382, size=0
I too have been thinking about removing the "Missing EOF hole" comment, but stalled because I still am not sure that the output is correct.
Isn't there supposed to be a virtual hole at the end of a file? See for example https://docs.oracle.com/cd/E86824_01/html/E54765/lseek-2.html
That does not work for me. This is what I get on a pristine install:
Nov 28 2024
Nov 20 2024
I completed a full stress2 test without seing any issues.
Nov 18 2024
Nov 14 2024
I do net see any panics with D47559.146396.patch
Nov 13 2024
I ran all of the stress2 SU+J tests without seeing any issues with this patch.
Oct 22 2024
In D47200#1076295, @pho wrote:
In D47150#1076296, @pho wrote:
Oct 20 2024
Oct 18 2024
Oops. Forgot the panic string :)
I cannot boot with D47150.145065.patch:
Oct 17 2024
I ran tests with D47150.144979.patch for 17 hours before getting this seemingly unrelated ext2fs panic:
https://people.freebsd.org/~pho/stress/log/log0555.txt
Oct 14 2024
I ran tests with D46963.144796.patch added for 6 hours without seeing any issues.
Oct 12 2024
I ran tests with D46963.144614.patch added for 14 hours. I did not observe any issues.
Oct 11 2024
I ran the test swapoff4.sh for 3 hours with your patch and didn't see any issues.
Not sure if the panic is related to this patch?
20241011 13:49:02 all (664/970): swapoff4.sh Oct 11 13:50:22 mercat1 kernel: pid 90548 (swap), jid 0, uid 0, was killed: failed to reclaim memory Oct 11 13:50:23 mercat1 kernel: pid 90555 (swap), jid 0, uid 0, was killed: failed to reclaim memory Oct 11 13:50:25 mercat1 kernel: pid 90550 (swap), jid 0, uid 0, was killed: failed to reclaim memory panic: Assertion (object->flags & OBJ_SWAP) != 0 failed at ../../../vm/swap_pager.c:564 cpuid = 5 time = 1728647425 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01d8174990 vpanic() at vpanic+0x13f/frame 0xfffffe01d8174ac0 panic() at panic+0x43/frame 0xfffffe01d8174b20 swapoff_one() at swapoff_one+0x8a9/frame 0xfffffe01d8174d00 kern_swapoff() at kern_swapoff+0x1ab/frame 0xfffffe01d8174e00 amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe01d8174f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01d8174f30 --- syscall (582, FreeBSD ELF64, swapoff), rip = 0xdc6777123ba, rsp = 0xdc6748c87e8, rbp = 0xdc6748c8920 ---
Oct 3 2024
Sep 16 2024
Sep 13 2024
I ran a full stress2 test with D45627.143204.patch added and saw no (new) issues.
Aug 30 2024
Aug 29 2024
Aug 23 2024
I ran a 14 hour test with D45627.142138.patch without finding any issues.
Aug 17 2024
Aug 16 2024
Aug 15 2024
Ran a 8 hour test with D45627.142095.patch. No problems seen.
Aug 14 2024
I ran a brief test with D45987.142064.patch without seeing any problems.
Aug 8 2024
cc -c -O2 -pipe -fno-strict-aliasing -g -nostdinc -I. -I../../.. -I../../../contrib/ck/include -I../../../contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -MD -MF.depend.subr_rangeset.o -MTsubr_rangeset.o -fdebug-prefix-map=./machine=/usr/src/sys/amd64/include -fdebug-prefix-map=./x86=/usr/src/sys/x86/include -fdebug-prefix-map=./i386=/usr/src/sys/i386/include -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -fstack-protector -gdwarf-4 -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wswitch -Wno-error=tautological-compare -Wno-error=empty-body -Wno-error=parentheses-equality -Wno-error=unused-function -Wno-error=pointer-sign -Wno-error=shift-negative-value -Wno-address-of-packed-member -Wno-format-zero-length -mno-aes -mno-avx -std=gnu99 -Werror ../../../kern/subr_rangeset.c ../../../kern/subr_rangeset.c:329:25: error: use of undeclared identifier 'src_rs' 329 | pctrie_iter_init(&it, &src_rs->rs_trie); | ^ 1 error generated. *** Error code 1
I ran tests with D45987.141850.patch without seeing any issues.
Aug 5 2024
Here's a panic with D45627.141818.patch:
Jul 30 2024
I ran tests with D45627.141545.patch for 10 hours, without seeing any problems.
Jul 25 2024
I ran stress tests for 10 hours with D46099.141332.patch. I did not observe any issues.
Jul 20 2024
Jul 19 2024
Jul 18 2024
I ran tests with D45987.141024.patch. I ran all of the tmpfs test scenarios in a loop for 15 hours, without seeing any problems.
Jul 17 2024
I ran tests with D45627.141008.patch for a day, without seeing any problems.
Jul 16 2024
Here's a panic in subr_pctrie.c:100
20240716 09:00:01 all (1/958): pfl4.sh Jul 16 09:02:51 mercat1 kernel: pid 12456 (swap), jid 0, uid 2007, was killed: a thread waited too long to allocate a page Jul 16 09:04:09 mercat1 kernel: pid 16534 (rw), uid 2007 inumber 53830 on /mnt12: filesystem full Kernel page fault with the following non-sleepable locks held: exclusive rw vmobject (vmobject) r = 0 (0xfffff80013bce738) locked @ vm/vm_object.c:1333 stack backtrace: #0 0xffffffff80bc82bc at witness_debugger+0x6c #1 0xffffffff80bc94b3 at witness_warn+0x403 #2 0xffffffff81076ff0 at trap_pfault+0x80 #3 0xffffffff81049258 at calltrap+0x8 #4 0xffffffff80ba6655 at pctrie_remove+0x1e5 #5 0xffffffff80ba7715 at pctrie_iter_remove+0x145 #6 0xffffffff80eeca49 at SWAP_PCTRIE_ITER_REMOVE+0x19 #7 0xffffffff80ee8520 at swp_pager_meta_free+0x380 #8 0xffffffff80ee7e3a at swap_pager_freespace_pgo+0x7a #9 0xffffffff80f10f19 at vm_object_madvise+0x149 #10 0xffffffff80f05e5a at vm_map_madvise+0x3ea #11 0xffffffff81077918 at amd64_syscall+0x158 #12 0xffffffff81049b6b at fast_syscall_common+0xf8
I was able to reproduce a similar "out of pages" watchdog panic with a pristine kernel and a new test scenario.
So, it seems to me that the watchdog issue is unrelated to your D45627.140892.patch?
Here's the one with a pristine kernel: https://people.freebsd.org/~pho/stress/log/log0540.txt
Jul 15 2024
This is how I ran this test:
While running tests with D45627.140892.patch I got this: "The watchdog fired with a one hour timeout"
https://people.freebsd.org/~pho/stress/log/log0539.txt
Jul 9 2024
jemalloc(3) is failing again:
20240709 10:24:07 all: crossmp4.sh <jemalloc>: jemalloc_base.c:190: Failed assertion: "extent_bsize_get(extent) >= *gap_size + size" <jemalloc>: jemalloc_base.c:190: Failed assertion: "extent_bsize_get(extent) >= *gap_size + size" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tcache_inlines.h:52: Failed assertion: "tcache_success == (ret != NULL)" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tcache_inlines.h:52: Failed assertion: "tcache_success == (ret != NULL)"
The watchdog, with a timeout of one hour, fired: https://people.freebsd.org/~pho/stress/log/log0538.txt
I'll resume testing ...
Jul 6 2024
I ran tests with D45398.140613.patch for 9 hours without seeing any problems.
Jul 5 2024
In my experience, it is not uncommon for the test environment to have a big impact on finding errors.
I got this panic with union19.sh:
20240705 20:25:38 all (1/1): unionfs19.sh VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: (*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time) VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: (*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time) VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: (*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time) 0xfffffe016f682bb8: (*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time) (*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time) 0xfffffe016f6d04b0: 0xfffffe016f1aa068: type VBAD state VSTATE_DEAD op 0xffffffff818ac760 usecount 2, writecount 0, refcount 1 seqc users 10xfffffe016ffba068: type VBAD state VSTATE_DEAD op 0xffffffff818ac760
I ran a mix of 48 tests with D45781.140542.patch for 13 hours. I saw no problems with this.
In D45398#1046015, @jah wrote:@pho You may want to run unionfs19.sh against this patchset. I believe the "forward VOP" guards added in unionfs_lookup() will address the panic seen there.
Jul 4 2024
These are my observations with running the stress2 swapoff.sh test in a loop on real hardware:
Jul 3 2024
With D45781.140474.patch added I got a deadlock after 6h30:
https://people.freebsd.org/~pho/stress/log/log0534.txt
It's not clear to me if this is related to your patch, so I'll repeat the test with a pristine kernel.
Jul 2 2024
This is the setup I use for stress testing on both real HW and bhyve (4 CPUs and 6GB RAM):
By capping RAM to 8 GB I was able to get a "hang". Unfortunately this is AFAIK a know issue when using memory disks:
Jul 1 2024
In D45781#1044917, @dougm wrote:In D45781#1044916, @pho wrote:Hmm. I'm not seeing any problems. Are you running any of the stress2 tests mentioned in the all.exclude file?
I'm running './all.sh swapoff.sh' (or is it swapout.sh? One of those) in /usr/src/tools/tests/stress2/misc.
I'm starting tests with D45781.140421.patch
I'm starting tests with D45781.140421.patch
Jun 30 2024
I'm starting tests with D45781.140368.patch
Jun 28 2024
D45627.140331.patch looks good as. well. No issues seen with a 10 hour test.
Jun 23 2024
I ran selected tests for 10 hours with D45627.140112.patch. I did not observe any issues.
Jun 22 2024
I ran a very short test (4 hours) with D45668.140098.patch, without observing any issues.
My shell still core dumps with D45627.140103.patch added:
Jun 21 2024
I have not observed any panics with D45627.140047.patch, but the jemalloc issues persist. I do not see this on a pristine HEAD.
/bin/sh keeps core dumping with various errors, like for example:
<jemalloc>: jemalloc_base.c:190: Failed assertion: "extent_bsize_get(extent) >= *gap_size + size" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tcache_inlines.h:52: Failed assertion: "tcache_success == (ret != NULL)"
Jun 20 2024
Much improved uptime!
I noticed different malloc errors like this one::
I got this after a few minutes:
Jun 19 2024
19:30 ~ $ sync 19:30 ~ $ sync 19:30 ~ $ sort /dev/zero & [1] 4135 19:30 ~ $ sort /dev/zero & [2] 4138 19:30 ~ $ sort /dev/zero & [3] 4139 19:30 ~ $ panic: ASan: Invalid access, 8-byte read at 0xfffffe0182981630, UMAUseAfterFree(fd) cpuid = 4 time = 1718818384 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0xa5/frame 0xfffffe014cc0af50 kdb_backtrace() at kdb_backtrace+0xc6/frame 0xfffffe014cc0b0b0 vpanic() at vpanic+0x226/frame 0xfffffe014cc0b250 panic() at panic+0xb5/frame 0xfffffe014cc0b320 kasan_report() at kasan_report+0xdf/frame 0xfffffe014cc0b3f0 pctrie_keybarr() at pctrie_keybarr+0x29/frame 0xfffffe014cc0b450 pctrie_iter_step() at pctrie_iter_step+0x15e/frame 0xfffffe014cc0b550 SWAP_PCTRIE_ITER_STEP() at SWAP_PCTRIE_ITER_STEP+0x1d/frame 0xfffffe014cc0b570 swp_pager_meta_transfer() at swp_pager_meta_transfer+0x6e9/frame 0xfffffe014cc0b840 swap_pager_copy() at swap_pager_copy+0x4b0/frame 0xfffffe014cc0b960 vm_object_collapse() at vm_object_collapse+0xad5/frame 0xfffffe014cc0ba30 vm_object_deallocate() at vm_object_deallocate+0x5ad/frame 0xfffffe014cc0bb10 vm_map_process_deferred() at vm_map_process_deferred+0x15f/frame 0xfffffe014cc0bb50 vmspace_dofree() at vmspace_dofree+0xdf/frame 0xfffffe014cc0bb90 vmspace_exit() at vmspace_exit+0x203/frame 0xfffffe014cc0bc50 exit1() at exit1+0x76e/frame 0xfffffe014cc0bcf0 sys_exit() at sys_exit+0x28/frame 0xfffffe014cc0bd10 amd64_syscall() at amd64_syscall+0x39e/frame 0xfffffe014cc0bf30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe014cc0bf30 --- syscall (1, FreeBSD ELF64, exit), rip = 0x236e0fe894ba, rsp = 0x236e0c6e78d8, rbp = 0x236e0c6e7900 --- KDB: enter: panic [ thread pid 4144 tid 100329 ] Stopped at kdb_enter+0x34: movq $0,0x20d19b1(%rip) db>
A KASAN build reported this:
I ran tests with D45398.139944.patch for a day without observing any issues.
Jun 15 2024
Jun 12 2024
I ran a longer test with this patch and found this:
Jun 11 2024
Jun 4 2024
May 26 2024
I do not have any arm hardware, but will take a look at bhyve. I already have a setup for building amd64/i386 images for bhyve.
stress2 runs for two days and some hours on mercat1 (Intel(R) Xeon(R) CPU E5-1650, 6 cores and 32GB of ram).
Nice catch. LGTM.
I have obviously never run tests on arm. Would it be useful if I looked into doing that?