Page MenuHomeFreeBSD

Allow blist allocations to span leaf boundaries, but not meta boundaries
ClosedPublic

Authored by dougm on Aug 2 2017, 4:35 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Nov 23, 10:58 PM
Unknown Object (File)
Sat, Nov 23, 10:24 AM
Unknown Object (File)
Fri, Nov 22, 2:56 PM
Unknown Object (File)
Thu, Nov 21, 6:09 AM
Unknown Object (File)
Tue, Nov 19, 2:54 AM
Unknown Object (File)
Sat, Nov 16, 1:20 PM
Unknown Object (File)
Sat, Nov 16, 3:00 AM
Unknown Object (File)
Mon, Nov 11, 12:32 AM
Subscribers
None

Details

Summary

Modify blst_leaf_alloc to take only the cursor argument.

Modify blst_leaf_alloc to find allocations that cross the boundary between one leaf node and the next when those two leaves descend from the same meta node.

Update the hint field for leaves so that it represents a bound on how large an allocation can begin in that leaf, where it currently represents a bound on how large an allocation can be found within the boundaries of the leaf.

The first phase of blst_leaf_alloc currently shrinks sequences of consecutive 1-bits in mask until each has been shrunken by count-1 bits, so that any bits remaining show where an allocation can begin, or until all the bits have disappeared, in which case the allocation fails. This change amends that so that the high-order bit is copied, as if, when the last block was free, it was followed by an endless stream of free blocks. It also amends the early stopping condition, so that the shrinking of 1-sequences stops early when there are none, or there is only one unbounded one remaining.

The search for the first set bit is unchanged, and the code path thereafter is mostly unchanged unless the first set bit is in a position that makes some of those copied sign bits matter. In that case, we look for a next leaf, and at what blocks it can provide, to see if a cross-boundary allocation is possible.

The hint is updated on a successful allocation that clears the last bit, but it not updated on a failed allocation that leaves the last bit set. So, as long as the last block is free, the hint value for the leaf is large. As long as the last block is free, and there's a next leaf, a large allocation can begin here, perhaps. A stricter rule than this would mean that allocations and frees in one leaf could require hint updates to the preceding leaf, and this change seeks to leave the freeing code unmodified.

Define BLIST_BMAP_MASK, and use it for bit masking in blst_leaf_free and blist_leaf_fill, as well as in blst_leaf_alloc.

Correct a panic message in blst_leaf_free.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

Resolve newly introduced conflict.

sys/kern/subr_blist.c
375

mask is signed and -mask can overflow. This code depends on the presence of the -fwrapv option on the compiler invocation (which we do for kernel).

I do not propose to change the fragment, only note that compiler authors do not aim to make a useful tool.

Fix another stale comment.

Restore mask to u_daddr_t, and use a cast to daddr_t in the shift to get sign bits copied, to avoid concerns about signed overflow elsewhere.

markj added inline comments.
sys/kern/subr_blist.c
145

Looks like radix_to_skip() could use this too. It might also be reasonable to have a BLIST_BMAP_MASK.

This revision is now accepted and ready to land.Aug 5 2017, 10:18 PM
dougm edited edge metadata.

Apply reviewer suggestion regarding BLIST_BMAP_MASK, and BLIST_META_MASK.

This revision now requires review to proceed.Aug 5 2017, 10:33 PM
sys/kern/subr_blist.c
145

Done. BLIST_META_MASK will come up later in future patches, I hope.

tabs around #defined symbol

I've asked Doug to create some new statistics code (see D11906) so that we can quantify the effects of this patch.

Resolve conflict arising from a recent change.

Update patch to resolve conflicts introduced recently.

Limit the use of 'lo' in leaf_alloc to identifying the position of the potential allocation.

Restore handling of the fact that 'blk' in blst_leaf_alloc is not the beginning of the leaf, and so must be truncated to a leaf boundary in some way before it is used to compute the allocation address. Add a bit of consistency in the typecast used to produce an all-ones bit mask. Correct a panic message.

Update comments. Address style issues. Avoid initialization of 'n' in blst_leaf_free an blst_leaf_fill.

I've performed some testing with the new sysctl that calls blist_stats(). Specifically, I've configured a test machine with 2GB of RAM and 64GB of swap space, and run a "make -j7 buildworld" in a loop. After 6 consecutive builds, the before and after results are as follows.

Before:

vm.swap_fragmentation:
Free space on device ada0s3b:
number of maximal free ranges: 278
largest free range: 1627786
average maximal free range size: 60334
number of maximal free ranges of different sizes:
               count  |  size range
               -----  |  ----------
                  25  |  1
                   7  |  2
                  32  |  3 to 4
                  21  |  5 to 7
                  30  |  8 to 12
                  16  |  13 to 20
                  24  |  21 to 33
                   2  |  55 to 88
                   6  |  89 to 143
                   1  |  233 to 376
                   5  |  377 to 609
                   6  |  610 to 986
                   1  |  987 to 1596
                   2  |  1597 to 2583
                   3  |  2584 to 4180
                   6  |  4181 to 6764
                   5  |  6765 to 10945
                   7  |  10946 to 17710
                  11  |  17711 to 28656
                  17  |  28657 to 46367
                  10  |  46368 to 75024
                   8  |  75025 to 121392
                   9  |  121393 to 196417
                  11  |  196418 to 317810
                   5  |  317811 to 514228
                   3  |  514229 to 832039
                   3  |  832040 to 1346268
                   2  |  1346269 to 1627786

After:

vm.swap_fragmentation:
Free space on device ada0s3b:
number of maximal free ranges: 185
largest free range: 2087175
average maximal free range size: 90663
number of maximal free ranges of different sizes:
               count  |  size range
               -----  |  ----------
                  18  |  1
                   9  |  2
                  19  |  3 to 4
                   6  |  5 to 7
                  14  |  8 to 12
                   8  |  13 to 20
                   3  |  21 to 33
                   2  |  34 to 54
                   3  |  55 to 88
                   1  |  144 to 232
                   2  |  233 to 376
                   5  |  377 to 609
                   2  |  610 to 986
                   5  |  987 to 1596
                   2  |  1597 to 2583
                   4  |  2584 to 4180
                   5  |  4181 to 6764
                   8  |  6765 to 10945
                   4  |  10946 to 17710
                   5  |  17711 to 28656
                   3  |  28657 to 46367
                  10  |  46368 to 75024
                  11  |  75025 to 121392
                  14  |  121393 to 196417
                   8  |  196418 to 317810
                   6  |  317811 to 514228
                   3  |  514229 to 832039
                   4  |  832040 to 1346268
                   1  |  2087175

In summary, we're seeing both a reduction in the number of distinct free ranges, which is to be expected, but also an increase in the size of the largest free range.

This revision is now accepted and ready to land.Sep 13 2017, 4:27 PM

Initialize bitmap in termintors to avoid leaf allocations to go past last available block.

This revision now requires review to proceed.Sep 14 2017, 6:33 PM

If a leaf allocation uses all the free block of the leaf, make sure that the hint is zeroed.

The next to last change to this patch that zeroed the bitmap in terminator nodes addressed my last concern. I'm going to commit this patch shortly.

This revision is now accepted and ready to land.Sep 16 2017, 5:46 PM
This revision was automatically updated to reflect the committed changes.