User Details
- User Since
- Aug 6 2014, 5:32 AM (538 w, 6 d)
Today
I'm puzzled by how your change could work for you.
I too have been thinking about removing the "Missing EOF hole" comment, but stalled because I still am not sure that the output is correct.
Isn't there supposed to be a virtual hole at the end of a file? See for example https://docs.oracle.com/cd/E86824_01/html/E54765/lseek-2.html
That does not work for me. This is what I get on a pristine install:
Thu, Nov 28
Wed, Nov 20
I completed a full stress2 test without seing any issues.
Mon, Nov 18
Thu, Nov 14
I do net see any panics with D47559.146396.patch
Wed, Nov 13
I ran all of the stress2 SU+J tests without seeing any issues with this patch.
Oct 22 2024
Oct 20 2024
Oct 18 2024
Oops. Forgot the panic string :)
I cannot boot with D47150.145065.patch:
Oct 17 2024
I ran tests with D47150.144979.patch for 17 hours before getting this seemingly unrelated ext2fs panic:
https://people.freebsd.org/~pho/stress/log/log0555.txt
Oct 14 2024
I ran tests with D46963.144796.patch added for 6 hours without seeing any issues.
Oct 12 2024
I ran tests with D46963.144614.patch added for 14 hours. I did not observe any issues.
Oct 11 2024
I ran the test swapoff4.sh for 3 hours with your patch and didn't see any issues.
Not sure if the panic is related to this patch?
20241011 13:49:02 all (664/970): swapoff4.sh Oct 11 13:50:22 mercat1 kernel: pid 90548 (swap), jid 0, uid 0, was killed: failed to reclaim memory Oct 11 13:50:23 mercat1 kernel: pid 90555 (swap), jid 0, uid 0, was killed: failed to reclaim memory Oct 11 13:50:25 mercat1 kernel: pid 90550 (swap), jid 0, uid 0, was killed: failed to reclaim memory panic: Assertion (object->flags & OBJ_SWAP) != 0 failed at ../../../vm/swap_pager.c:564 cpuid = 5 time = 1728647425 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01d8174990 vpanic() at vpanic+0x13f/frame 0xfffffe01d8174ac0 panic() at panic+0x43/frame 0xfffffe01d8174b20 swapoff_one() at swapoff_one+0x8a9/frame 0xfffffe01d8174d00 kern_swapoff() at kern_swapoff+0x1ab/frame 0xfffffe01d8174e00 amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe01d8174f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01d8174f30 --- syscall (582, FreeBSD ELF64, swapoff), rip = 0xdc6777123ba, rsp = 0xdc6748c87e8, rbp = 0xdc6748c8920 ---
Oct 3 2024
Sep 16 2024
Sep 13 2024
I ran a full stress2 test with D45627.143204.patch added and saw no (new) issues.
Aug 30 2024
Aug 29 2024
Aug 23 2024
I ran a 14 hour test with D45627.142138.patch without finding any issues.
Aug 17 2024
Aug 16 2024
Aug 15 2024
Ran a 8 hour test with D45627.142095.patch. No problems seen.
Aug 14 2024
I ran a brief test with D45987.142064.patch without seeing any problems.
Aug 8 2024
cc -c -O2 -pipe -fno-strict-aliasing -g -nostdinc -I. -I../../.. -I../../../contrib/ck/include -I../../../contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -MD -MF.depend.subr_rangeset.o -MTsubr_rangeset.o -fdebug-prefix-map=./machine=/usr/src/sys/amd64/include -fdebug-prefix-map=./x86=/usr/src/sys/x86/include -fdebug-prefix-map=./i386=/usr/src/sys/i386/include -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -fstack-protector -gdwarf-4 -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wswitch -Wno-error=tautological-compare -Wno-error=empty-body -Wno-error=parentheses-equality -Wno-error=unused-function -Wno-error=pointer-sign -Wno-error=shift-negative-value -Wno-address-of-packed-member -Wno-format-zero-length -mno-aes -mno-avx -std=gnu99 -Werror ../../../kern/subr_rangeset.c ../../../kern/subr_rangeset.c:329:25: error: use of undeclared identifier 'src_rs' 329 | pctrie_iter_init(&it, &src_rs->rs_trie); | ^ 1 error generated. *** Error code 1
I ran tests with D45987.141850.patch without seeing any issues.
Aug 5 2024
Here's a panic with D45627.141818.patch:
Jul 30 2024
I ran tests with D45627.141545.patch for 10 hours, without seeing any problems.
Jul 25 2024
I ran stress tests for 10 hours with D46099.141332.patch. I did not observe any issues.
Jul 20 2024
Jul 19 2024
Jul 18 2024
I ran tests with D45987.141024.patch. I ran all of the tmpfs test scenarios in a loop for 15 hours, without seeing any problems.
Jul 17 2024
I ran tests with D45627.141008.patch for a day, without seeing any problems.
Jul 16 2024
Here's a panic in subr_pctrie.c:100
20240716 09:00:01 all (1/958): pfl4.sh Jul 16 09:02:51 mercat1 kernel: pid 12456 (swap), jid 0, uid 2007, was killed: a thread waited too long to allocate a page Jul 16 09:04:09 mercat1 kernel: pid 16534 (rw), uid 2007 inumber 53830 on /mnt12: filesystem full Kernel page fault with the following non-sleepable locks held: exclusive rw vmobject (vmobject) r = 0 (0xfffff80013bce738) locked @ vm/vm_object.c:1333 stack backtrace: #0 0xffffffff80bc82bc at witness_debugger+0x6c #1 0xffffffff80bc94b3 at witness_warn+0x403 #2 0xffffffff81076ff0 at trap_pfault+0x80 #3 0xffffffff81049258 at calltrap+0x8 #4 0xffffffff80ba6655 at pctrie_remove+0x1e5 #5 0xffffffff80ba7715 at pctrie_iter_remove+0x145 #6 0xffffffff80eeca49 at SWAP_PCTRIE_ITER_REMOVE+0x19 #7 0xffffffff80ee8520 at swp_pager_meta_free+0x380 #8 0xffffffff80ee7e3a at swap_pager_freespace_pgo+0x7a #9 0xffffffff80f10f19 at vm_object_madvise+0x149 #10 0xffffffff80f05e5a at vm_map_madvise+0x3ea #11 0xffffffff81077918 at amd64_syscall+0x158 #12 0xffffffff81049b6b at fast_syscall_common+0xf8
I was able to reproduce a similar "out of pages" watchdog panic with a pristine kernel and a new test scenario.
So, it seems to me that the watchdog issue is unrelated to your D45627.140892.patch?
Here's the one with a pristine kernel: https://people.freebsd.org/~pho/stress/log/log0540.txt
Jul 15 2024
This is how I ran this test:
While running tests with D45627.140892.patch I got this: "The watchdog fired with a one hour timeout"
https://people.freebsd.org/~pho/stress/log/log0539.txt
Jul 9 2024
jemalloc(3) is failing again:
20240709 10:24:07 all: crossmp4.sh <jemalloc>: jemalloc_base.c:190: Failed assertion: "extent_bsize_get(extent) >= *gap_size + size" <jemalloc>: jemalloc_base.c:190: Failed assertion: "extent_bsize_get(extent) >= *gap_size + size" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tcache_inlines.h:52: Failed assertion: "tcache_success == (ret != NULL)" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tcache_inlines.h:52: Failed assertion: "tcache_success == (ret != NULL)"
The watchdog, with a timeout of one hour, fired: https://people.freebsd.org/~pho/stress/log/log0538.txt
I'll resume testing ...
Jul 6 2024
I ran tests with D45398.140613.patch for 9 hours without seeing any problems.
Jul 5 2024
In my experience, it is not uncommon for the test environment to have a big impact on finding errors.
I got this panic with union19.sh:
20240705 20:25:38 all (1/1): unionfs19.sh VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: (*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time) VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: (*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time) VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: (*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time) 0xfffffe016f682bb8: (*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time) (*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time) 0xfffffe016f6d04b0: 0xfffffe016f1aa068: type VBAD state VSTATE_DEAD op 0xffffffff818ac760 usecount 2, writecount 0, refcount 1 seqc users 10xfffffe016ffba068: type VBAD state VSTATE_DEAD op 0xffffffff818ac760
I ran a mix of 48 tests with D45781.140542.patch for 13 hours. I saw no problems with this.
Jul 4 2024
These are my observations with running the stress2 swapoff.sh test in a loop on real hardware:
Jul 3 2024
With D45781.140474.patch added I got a deadlock after 6h30:
https://people.freebsd.org/~pho/stress/log/log0534.txt
It's not clear to me if this is related to your patch, so I'll repeat the test with a pristine kernel.
Jul 2 2024
This is the setup I use for stress testing on both real HW and bhyve (4 CPUs and 6GB RAM):
By capping RAM to 8 GB I was able to get a "hang". Unfortunately this is AFAIK a know issue when using memory disks:
Jul 1 2024
I'm starting tests with D45781.140421.patch
I'm starting tests with D45781.140421.patch
Jun 30 2024
I'm starting tests with D45781.140368.patch
Jun 28 2024
D45627.140331.patch looks good as. well. No issues seen with a 10 hour test.
Jun 23 2024
I ran selected tests for 10 hours with D45627.140112.patch. I did not observe any issues.
Jun 22 2024
I ran a very short test (4 hours) with D45668.140098.patch, without observing any issues.
My shell still core dumps with D45627.140103.patch added:
Jun 21 2024
I have not observed any panics with D45627.140047.patch, but the jemalloc issues persist. I do not see this on a pristine HEAD.
/bin/sh keeps core dumping with various errors, like for example:
<jemalloc>: jemalloc_base.c:190: Failed assertion: "extent_bsize_get(extent) >= *gap_size + size" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tcache_inlines.h:52: Failed assertion: "tcache_success == (ret != NULL)"
Jun 20 2024
Much improved uptime!
I noticed different malloc errors like this one::
I got this after a few minutes:
Jun 19 2024
19:30 ~ $ sync 19:30 ~ $ sync 19:30 ~ $ sort /dev/zero & [1] 4135 19:30 ~ $ sort /dev/zero & [2] 4138 19:30 ~ $ sort /dev/zero & [3] 4139 19:30 ~ $ panic: ASan: Invalid access, 8-byte read at 0xfffffe0182981630, UMAUseAfterFree(fd) cpuid = 4 time = 1718818384 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0xa5/frame 0xfffffe014cc0af50 kdb_backtrace() at kdb_backtrace+0xc6/frame 0xfffffe014cc0b0b0 vpanic() at vpanic+0x226/frame 0xfffffe014cc0b250 panic() at panic+0xb5/frame 0xfffffe014cc0b320 kasan_report() at kasan_report+0xdf/frame 0xfffffe014cc0b3f0 pctrie_keybarr() at pctrie_keybarr+0x29/frame 0xfffffe014cc0b450 pctrie_iter_step() at pctrie_iter_step+0x15e/frame 0xfffffe014cc0b550 SWAP_PCTRIE_ITER_STEP() at SWAP_PCTRIE_ITER_STEP+0x1d/frame 0xfffffe014cc0b570 swp_pager_meta_transfer() at swp_pager_meta_transfer+0x6e9/frame 0xfffffe014cc0b840 swap_pager_copy() at swap_pager_copy+0x4b0/frame 0xfffffe014cc0b960 vm_object_collapse() at vm_object_collapse+0xad5/frame 0xfffffe014cc0ba30 vm_object_deallocate() at vm_object_deallocate+0x5ad/frame 0xfffffe014cc0bb10 vm_map_process_deferred() at vm_map_process_deferred+0x15f/frame 0xfffffe014cc0bb50 vmspace_dofree() at vmspace_dofree+0xdf/frame 0xfffffe014cc0bb90 vmspace_exit() at vmspace_exit+0x203/frame 0xfffffe014cc0bc50 exit1() at exit1+0x76e/frame 0xfffffe014cc0bcf0 sys_exit() at sys_exit+0x28/frame 0xfffffe014cc0bd10 amd64_syscall() at amd64_syscall+0x39e/frame 0xfffffe014cc0bf30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe014cc0bf30 --- syscall (1, FreeBSD ELF64, exit), rip = 0x236e0fe894ba, rsp = 0x236e0c6e78d8, rbp = 0x236e0c6e7900 --- KDB: enter: panic [ thread pid 4144 tid 100329 ] Stopped at kdb_enter+0x34: movq $0,0x20d19b1(%rip) db>
A KASAN build reported this:
I ran tests with D45398.139944.patch for a day without observing any issues.
Jun 15 2024
Jun 12 2024
I ran a longer test with this patch and found this:
Jun 11 2024
Jun 4 2024
May 26 2024
I do not have any arm hardware, but will take a look at bhyve. I already have a setup for building amd64/i386 images for bhyve.
stress2 runs for two days and some hours on mercat1 (Intel(R) Xeon(R) CPU E5-1650, 6 cores and 32GB of ram).
Nice catch. LGTM.
I have obviously never run tests on arm. Would it be useful if I looked into doing that?
May 13 2024
I ran test with the D45119.138429.patch. 5 hours with the problem test scenario, followed by 24 hours af all of the tempfs tests.
No problems seen.
May 12 2024
Running tests with D45119.138427.patch I got this:
Apr 26 2024
Apr 21 2024
I ran a 5 hour test with D44788.137374.patch added. I did not observe any new issues.
Apr 16 2024
Apr 14 2024
I ran a 5 hour test with D44788.137005.patch added. I did not observe any issues.
Mar 18 2024
D44319.id135654.diff doesn't seem to work for me:
Mar 3 2024
Tests with D44076.135202.patch showed no issues.
Feb 27 2024
Feb 26 2024
I ran the unionfs specific tests with the D44076.134963.patch added.
I did not observe any problems during an 8 hour test.