Page MenuHomeFreeBSD

Fix a next_leaf bug
ClosedPublic

Authored by dougm on Jul 9 2019, 3:53 PM.

Details

Summary

If blist_next_leaf_alloc is called with maxcount > BLIST_BMAP_RADIX, it is possible that that the last available free bytes are at the end of a leaf node, and are to be allocated. In looking for the next leaf, to see if it can provide more blocks 'scan' is incremented at least once, and ends up pointing at the empty leaf or an intervening meta-node. That is bad. This change seeks to walk back to the last leaf node in that case.

Test Plan

Peter, please test this. I can't promise that this fixes the bug you have recently discovered, just that it fixes something.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

dougm created this revision.Jul 9 2019, 3:53 PM
kib accepted this revision.Jul 9 2019, 9:37 PM
This revision is now accepted and ready to land.Jul 9 2019, 9:37 PM
pho added a comment.Jul 10 2019, 11:59 AM

I have tested D20893.59572.diff on amd64 and briefly on i386.
I still see the "panic: freeing free block" on both a MEMGUARD and a GENERICish build.

dougm added a comment.Jul 10 2019, 2:56 PM

Do I understand correctly that the "panic: freeing free block" condition is happening only with the patch from D20863 in place? Or does it happen with an unmodified kernel?

dougm updated this revision to Diff 59604.Jul 10 2019, 4:05 PM

In the case that we have to back-up 'scan' because we've found that there's no next leaf with free blocks, we also have to back-up 'blk'.

This revision now requires review to proceed.Jul 10 2019, 4:05 PM
pho added a comment.Jul 10 2019, 4:19 PM

As far as I can tell the "panic: freeing free block" was introduced by r349777 and this patch didn't fix that.
I have not observed any other problems while testing with D20893.59572.diff

Just to be clear: This panic is unrelated to your patch:

panic: freeing free block: ffff40, size 19, mask 1
cpuid = 7
time = 1562775366
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00ac04a270
vpanic() at vpanic+0x19d/frame 0xfffffe00ac04a2c0
panic() at panic+0x43/frame 0xfffffe00ac04a320
blst_meta_free() at blst_meta_free+0x12b/frame 0xfffffe00ac04a360
blst_meta_free() at blst_meta_free+0x102/frame 0xfffffe00ac04a3a0
blst_meta_free() at blst_meta_free+0x102/frame 0xfffffe00ac04a3e0
blst_meta_free() at blst_meta_free+0x102/frame 0xfffffe00ac04a420
blst_meta_free() at blst_meta_free+0x102/frame 0xfffffe00ac04a460
blist_free() at blist_free+0x2e/frame 0xfffffe00ac04a480
swp_pager_freeswapspace() at swp_pager_freeswapspace+0x8a/frame 0xfffffe00ac04a4a0
swp_pager_meta_free_all() at swp_pager_meta_free_all+0xbb/frame 0xfffffe00ac04a4f0
swap_pager_dealloc() at swap_pager_dealloc+0x115/frame 0xfffffe00ac04a510
vm_object_terminate() at vm_object_terminate+0x27b/frame 0xfffffe00ac04a560
vm_object_deallocate() at vm_object_deallocate+0x412/frame 0xfffffe00ac04a5c0
vm_map_process_deferred() at vm_map_process_deferred+0x7f/frame 0xfffffe00ac04a5e0
vm_map_remove() at vm_map_remove+0xc6/frame 0xfffffe00ac04a610
vmspace_exit() at vmspace_exit+0xd3/frame 0xfffffe00ac04a650
exit1() at exit1+0x5ad/frame 0xfffffe00ac04a6c0
sigexit() at sigexit+0xdaf/frame 0xfffffe00ac04a9a0
postsig() at postsig+0x336/frame 0xfffffe00ac04aa70
ast() at ast+0x4c7/frame 0xfffffe00ac04aab0
doreti_ast() at doreti_ast+0x1f/frame 0x7fffffffded0
KDB: enter: panic
[ thread pid 15632 tid 100166 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db> x/s version
version:        FreeBSD 13.0-CURRENT #0 r349886: Wed Jul 10 16:26:16 CEST 2019\012    pho@mercat1.netperf.freebsd.org:/usr/src/sys/amd64/compile/PHO\012
db>
dougm updated this revision to Diff 59605.Jul 10 2019, 5:54 PM

If we can allocation nothing from the next leaf, be sure to return immediately.

pho added a comment.Jul 11 2019, 6:08 AM

The test has now run for 12 hours without any problems. I'll let the test run for the full ~48 hours.

markj accepted this revision.Jul 11 2019, 3:43 PM
This revision is now accepted and ready to land.Jul 11 2019, 3:43 PM
alc added inline comments.Jul 11 2019, 7:40 PM
subr_blist.c
642 ↗(On Diff #59605)

Can you please explain why the avail == 0 test is needed, given the avail < count test? (I didn't think that count would be negative.)

dougm added inline comments.Jul 11 2019, 7:49 PM
subr_blist.c
642 ↗(On Diff #59605)

If *count == 5 and maxcount ==10 in blst_leaf_alloc, and the first free group of 5 or more free blocks is 6 blocks at the end of leaf 0, then blst_next_leaf_alloc will be called with count==-1 and maxcount == 4 because, while we would like to have 4 more for this allocation, we don't need them in order to complete the allocation successfully. If leaf 1 starts with an allocated block, we should return 0 from blst_next_leaf_alloc and NOT set about marking any blocks of leaf 1 as freed.

alc accepted this revision.Jul 11 2019, 8:00 PM
pho added a comment.Jul 12 2019, 6:43 PM

I finished the full stress2 test of D20893.59605.diff without finding any problems.