Page MenuHomeFreeBSD

Handle the possiblity that the cursor has already passed the free blocks of a meta-node.
ClosedPublic

Authored by dougm on Nov 20 2018, 9:12 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Mar 31, 8:22 AM
Unknown Object (File)
Dec 22 2023, 10:33 PM
Unknown Object (File)
Sep 4 2023, 12:49 AM
Unknown Object (File)
Aug 26 2023, 12:45 PM
Unknown Object (File)
Aug 23 2023, 7:01 PM
Unknown Object (File)
Aug 22 2023, 10:38 PM
Unknown Object (File)
Jul 15 2023, 10:09 PM
Unknown Object (File)
Jul 5 2023, 8:22 PM
Subscribers

Details

Summary

blist_meta_alloc assumes that mask=scan->bm_bitmap is nonzero. But if the cursor lies in the middle of the
space that the meta node represents, then blanking the low bits of mask may make it zero, and break later code that expects a nonzero value. Add a test that returns failure if the mask has been cleared.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

dougm edited the summary of this revision. (Show Details)
dougm retitled this revision from Handle the possiblity that the cursor has already passed the free nodes of a meta-node. to Handle the possiblity that the cursor has already passed the free blocks of a meta-node..Nov 21 2018, 4:18 AM
dougm added reviewers: alc, markj.
dougm added a subscriber: pho.

Yes, this is clearly a bug. Hopefully, it is the one that Peter is tripping over.

This revision is now accepted and ready to land.Nov 21 2018, 5:11 PM

D18058.id50652.diff ran the stress2/misc/stealer.sh test scenario for 11 hours before freezing up: https://people.freebsd.org/~pho/stress/log/dougm015.txt

Allow bighint to be updated for a meta-node where the cursor lies ahead of all nonempty children, or even when there are no nonempty children, since blist_alloc depends on a reduction of the root bighint to allow a failed allocation to finish.

Alternatively, we could change the test in blist_alloc to depend on something other than bighint. This is the only case where having a too-large bighint breaks things.

This revision now requires review to proceed.Nov 24 2018, 8:33 PM
In D18058#388822, @pho wrote:

D18058.id50652.diff ran the stress2/misc/stealer.sh test scenario for 11 hours before freezing up: https://people.freebsd.org/~pho/stress/log/dougm015.txt

This is a different problem that will require a different change.

In D18058#389007, @dougm_rice.edu wrote:

Allow bighint to be updated for a meta-node where the cursor lies ahead of all nonempty children, or even when there are no nonempty children, since blist_alloc depends on a reduction of the root bighint to allow a failed allocation to finish.

Alternatively, we could change the test in blist_alloc to depend on something other than bighint. This is the only case where having a too-large bighint breaks things.

Umm, you just deleted the patch here that fixes the original panic, and the current patch has nothing to do with the summary. Please create a new review for the current patch.

This revision was not accepted when it landed; it landed in state Needs Review.Nov 24 2018, 9:52 PM
This revision was automatically updated to reflect the committed changes.

My desktop panicked overnight with panic "freeing freed block" on line 828. I have a kernel dump if there's anything you'd like to see:

Edit: meant to add that this is an INVARIANTS kernel at r340969 I was using a pre-r340914 kernel by accident.

(kgdb) bt
#0  __curthread () at ./machine/pcpu.h:230
#1  doadump (textdump=0) at /home/mark/src/freebsd-dev/sys/kern/kern_shutdown.c:371
#2  0xffffffff823961a8 in vt_kms_postswitch () from /boot/modules/drm.ko
#3  0xffffffff806451f8 in vt_window_switch (vw=0xffffffff80ea4990 <vt_conswindow>) at /home/mark/src/freebsd-dev/sys/dev/vt/vt_core.c:580
#4  0xffffffff806428c0 in vtterm_cngrab (tm=<optimized out>) at /home/mark/src/freebsd-dev/sys/dev/vt/vt_core.c:1572
#5  0xffffffff80731042 in cngrab () at /home/mark/src/freebsd-dev/sys/kern/kern_cons.c:370
#6  0xffffffff8079216b in vpanic (fmt=0xffffffff80bcbaa2 "freeing free block", ap=0xfffffe085221d670) at /home/mark/src/freebsd-dev/sys/kern/kern_shutdown.c:851
#7  0xffffffff80791fd3 in panic (fmt=<unavailable>) at /home/mark/src/freebsd-dev/sys/kern/kern_shutdown.c:804
#8  0xffffffff807c5c6b in blst_leaf_free (scan=0xfffffe0850b7efe0, blk=<optimized out>, count=<unavailable>) at /home/mark/src/freebsd-dev/sys/kern/subr_blist.c:828
#9  blst_meta_free (scan=0xfffffe0850b7efe0, freeBlk=<optimized out>, count=<unavailable>, radix=64) at /home/mark/src/freebsd-dev/sys/kern/subr_blist.c:857
#10 0xffffffff807c5c42 in blst_meta_free (scan=0xfffffe0850b7ebe0, freeBlk=499712, count=<unavailable>, radix=64) at /home/mark/src/freebsd-dev/sys/kern/subr_blist.c:869
#11 0xffffffff807c5c42 in blst_meta_free (scan=0xfffffe0850b70440, freeBlk=499712, count=<unavailable>, radix=4096) at /home/mark/src/freebsd-dev/sys/kern/subr_blist.c:869
#12 0xffffffff807c5c42 in blst_meta_free (scan=0xfffffe0850b60020, freeBlk=524288, count=<unavailable>, radix=262144) at /home/mark/src/freebsd-dev/sys/kern/subr_blist.c:869
#13 0xffffffff807c5b1b in blist_free (bl=0xfffffe0850b60000, blkno=<unavailable>, count=1) at /home/mark/src/freebsd-dev/sys/kern/subr_blist.c:336
#14 0xffffffff80ac628a in swp_pager_freeswapspace (blk=<optimized out>, npages=1) at /home/mark/src/freebsd-dev/sys/vm/swap_pager.c:815
#15 0xffffffff80ad3611 in vm_fault_hold (map=0xfffff80094cb6000, vaddr=<optimized out>, fault_type=2 '\002', fault_flags=<optimized out>, m_hold=0x0) at /home/mark/src/freebsd-dev/sys/vm/vm_fault.c:1279
#16 0xffffffff80ad1c30 in vm_fault (map=0xfffff80094cb6000, vaddr=<optimized out>, fault_type=2 '\002', fault_flags=0) at /home/mark/src/freebsd-dev/sys/vm/vm_fault.c:536
#17 0xffffffff80b29e98 in trap_pfault (frame=0xfffffe085221dac0, usermode=<optimized out>) at /home/mark/src/freebsd-dev/sys/amd64/amd64/trap.c:830
#18 0xffffffff80b2953a in trap (frame=0xfffffe085221dac0) at /home/mark/src/freebsd-dev/sys/amd64/amd64/trap.c:351

The stack trace shows a panic call from subr_blist.c:828. After this change was accepted with revision 340914, that panic line moved from line 828 to line 830. Can you double-check to make sure that you're testing a version that includes this change?

In D18058#390117, @dougm_rice.edu wrote:

The stack trace shows a panic call from subr_blist.c:828. After this change was accepted with revision 340914, that panic line moved from line 828 to line 830. Can you double-check to make sure that you're testing a version that includes this change?

You're right. I had installed a new kernel yesterday but didn't boot into it, so I got confused. Sorry for the noise.

My desktop panicked overnight with panic "freeing freed block" on line 828. I have a kernel dump if there's anything you'd like to see:

Edit: meant to add that this is an INVARIANTS kernel at r340969

@markj Can you update this comment with the pre-340914 version you were actually using, if known? As-is, this is a little confusing :-). Thanks!

FWIW, I also hit the panic this revision fixes on a 340866+ (as in, SVN 340866 + local git modifications) non-INVARIANTS kernel and reopened the very similar-looking (but much older) PR 83761 for it. MarkJ pointed me at this review. @dougm_rice.edu Thanks for fixing the bug. I'm running 342026+ now.

In D18058#395282, @cem wrote:

My desktop panicked overnight with panic "freeing freed block" on line 828. I have a kernel dump if there's anything you'd like to see:

Edit: meant to add that this is an INVARIANTS kernel at r340969

@markj Can you update this comment with the pre-340914 version you were actually using, if known? As-is, this is a little confusing :-). Thanks!

I don't know what the revision was, just that it was before r340914.

In D18058#395282, @cem wrote:

@markj Can you update this comment with the pre-340914 version you were actually using, if known? As-is, this is a little confusing :-). Thanks!

I don't know what the revision was, just that it was before r340914.

If you've got the old one around, the what(1) command on the kernel file should show it. If not, that's ok; I'd still appreciate using markdown syntax to strike out the incorrect text (two tildes on either side of the text).