blist_meta_alloc assumes that mask=scan->bm_bitmap is nonzero. But if the cursor lies in the middle of the
space that the meta node represents, then blanking the low bits of mask may make it zero, and break later code that expects a nonzero value. Add a test that returns failure if the mask has been cleared.
Details
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
D18058.id50652.diff ran the stress2/misc/stealer.sh test scenario for 11 hours before freezing up: https://people.freebsd.org/~pho/stress/log/dougm015.txt
Allow bighint to be updated for a meta-node where the cursor lies ahead of all nonempty children, or even when there are no nonempty children, since blist_alloc depends on a reduction of the root bighint to allow a failed allocation to finish.
Alternatively, we could change the test in blist_alloc to depend on something other than bighint. This is the only case where having a too-large bighint breaks things.
Umm, you just deleted the patch here that fixes the original panic, and the current patch has nothing to do with the summary. Please create a new review for the current patch.
My desktop panicked overnight with panic "freeing freed block" on line 828. I have a kernel dump if there's anything you'd like to see:
Edit: meant to add that this is an INVARIANTS kernel at r340969 I was using a pre-r340914 kernel by accident.
(kgdb) bt #0 __curthread () at ./machine/pcpu.h:230 #1 doadump (textdump=0) at /home/mark/src/freebsd-dev/sys/kern/kern_shutdown.c:371 #2 0xffffffff823961a8 in vt_kms_postswitch () from /boot/modules/drm.ko #3 0xffffffff806451f8 in vt_window_switch (vw=0xffffffff80ea4990 <vt_conswindow>) at /home/mark/src/freebsd-dev/sys/dev/vt/vt_core.c:580 #4 0xffffffff806428c0 in vtterm_cngrab (tm=<optimized out>) at /home/mark/src/freebsd-dev/sys/dev/vt/vt_core.c:1572 #5 0xffffffff80731042 in cngrab () at /home/mark/src/freebsd-dev/sys/kern/kern_cons.c:370 #6 0xffffffff8079216b in vpanic (fmt=0xffffffff80bcbaa2 "freeing free block", ap=0xfffffe085221d670) at /home/mark/src/freebsd-dev/sys/kern/kern_shutdown.c:851 #7 0xffffffff80791fd3 in panic (fmt=<unavailable>) at /home/mark/src/freebsd-dev/sys/kern/kern_shutdown.c:804 #8 0xffffffff807c5c6b in blst_leaf_free (scan=0xfffffe0850b7efe0, blk=<optimized out>, count=<unavailable>) at /home/mark/src/freebsd-dev/sys/kern/subr_blist.c:828 #9 blst_meta_free (scan=0xfffffe0850b7efe0, freeBlk=<optimized out>, count=<unavailable>, radix=64) at /home/mark/src/freebsd-dev/sys/kern/subr_blist.c:857 #10 0xffffffff807c5c42 in blst_meta_free (scan=0xfffffe0850b7ebe0, freeBlk=499712, count=<unavailable>, radix=64) at /home/mark/src/freebsd-dev/sys/kern/subr_blist.c:869 #11 0xffffffff807c5c42 in blst_meta_free (scan=0xfffffe0850b70440, freeBlk=499712, count=<unavailable>, radix=4096) at /home/mark/src/freebsd-dev/sys/kern/subr_blist.c:869 #12 0xffffffff807c5c42 in blst_meta_free (scan=0xfffffe0850b60020, freeBlk=524288, count=<unavailable>, radix=262144) at /home/mark/src/freebsd-dev/sys/kern/subr_blist.c:869 #13 0xffffffff807c5b1b in blist_free (bl=0xfffffe0850b60000, blkno=<unavailable>, count=1) at /home/mark/src/freebsd-dev/sys/kern/subr_blist.c:336 #14 0xffffffff80ac628a in swp_pager_freeswapspace (blk=<optimized out>, npages=1) at /home/mark/src/freebsd-dev/sys/vm/swap_pager.c:815 #15 0xffffffff80ad3611 in vm_fault_hold (map=0xfffff80094cb6000, vaddr=<optimized out>, fault_type=2 '\002', fault_flags=<optimized out>, m_hold=0x0) at /home/mark/src/freebsd-dev/sys/vm/vm_fault.c:1279 #16 0xffffffff80ad1c30 in vm_fault (map=0xfffff80094cb6000, vaddr=<optimized out>, fault_type=2 '\002', fault_flags=0) at /home/mark/src/freebsd-dev/sys/vm/vm_fault.c:536 #17 0xffffffff80b29e98 in trap_pfault (frame=0xfffffe085221dac0, usermode=<optimized out>) at /home/mark/src/freebsd-dev/sys/amd64/amd64/trap.c:830 #18 0xffffffff80b2953a in trap (frame=0xfffffe085221dac0) at /home/mark/src/freebsd-dev/sys/amd64/amd64/trap.c:351
The stack trace shows a panic call from subr_blist.c:828. After this change was accepted with revision 340914, that panic line moved from line 828 to line 830. Can you double-check to make sure that you're testing a version that includes this change?
You're right. I had installed a new kernel yesterday but didn't boot into it, so I got confused. Sorry for the noise.
@markj Can you update this comment with the pre-340914 version you were actually using, if known? As-is, this is a little confusing :-). Thanks!
FWIW, I also hit the panic this revision fixes on a 340866+ (as in, SVN 340866 + local git modifications) non-INVARIANTS kernel and reopened the very similar-looking (but much older) PR 83761 for it. MarkJ pointed me at this review. @dougm_rice.edu Thanks for fixing the bug. I'm running 342026+ now.
If you've got the old one around, the what(1) command on the kernel file should show it. If not, that's ok; I'd still appreciate using markdown syntax to strike out the incorrect text (two tildes on either side of the text).