Page MenuHomeFreeBSD

Allow allocations across meta boundaries
ClosedPublic

Authored by dougm_rice.edu on Oct 11 2017, 5:56 AM.

Details

Summary

Remove restrictions that prevent allocation requests to cross the boundary between two meta nodes.

Replace the bmu_avail field in meta nodes with a bitmap that identifies which subtrees have some free memory, and iterate over the nonempty subtrees only in blst_meta_alloc. If free memory is scarce, this should make searching for it faster.

Put the code for handling the next-leaf allocation in a separate function. When taking blocks from the next leaf empties the leaf, be sure to clear the appropriate bit in its parent, and so on, up to the least-common ancestor of this leaf and the next.

Eliminate special terminator nodes, and rely instead on the fact that there is a 0-bit at the end of the bitmask at the root of the tree that will stop a meta_alloc search, or a next-leaf search, before the search falls off the end of the tree. Make sure that the tree is big enough to have space for that 0-bit.

Eliminate special all-free indicators. Lazy initialization of subtrees stands in the way of having an allocation span a meta-node boundary, so a subtree of all free blocks is not treated specially. Subtrees of all-allocated blocks are still recognized by looking at the bitmask at the root and finding 0.

Don't print all-allocated subtrees. Do print the bitmasks for meta nodes, when tree-printing.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

dougm_rice.edu created this revision.Oct 11 2017, 5:56 AM
dougm_rice.edu added a reviewer: alc.

in blst_next_leaf_alloc, stop walking over meta-nodes as soon as one of them has an empty first child. That way, the extra empty meta node at the end of a 1024-block tree will stop a search for next leaf before we walk past allocated memory.

Correct calculation of number of nodes, to account for the extra zero node when blocks is a multiple of BLIST_BMAP_RADIX.

Drop KERN_{SUCCESS,FAILURE}. Make META_RADIX match BMAP_RADIX.

Fix a typo in updating meta-node hint. Change nodes allocted so that when an alloc fails and the last block is free, there is always some meta node that can have its hint updated to count-1.

Fix an avail update.

Use a more appropriate constant in blist_alloc.

Include the avail count in the blist printout.

Don't print the all-used tree. With fill-column set to 80, reformat some comments. Update others to reflect the changes made in the substance of the patch.

alc added inline comments.Oct 28 2017, 5:58 PM
sys/kern/subr_blist.c
845 ↗(On Diff #34013)

Because imin() is an inline function and not a polymorphic macro, it is going to truncate the 64-bit daddr_t values to 32 bits. Please check for similar typing errors elsewhere. This one just immediately jumped out at me.

dougm_rice.edu marked an inline comment as done.

Replace imin with ummin.

dougm_rice.edu added inline comments.Oct 28 2017, 6:28 PM
sys/kern/subr_blist.c
845 ↗(On Diff #34013)

I addressed this by changing to ummin.

markj added a comment.Oct 31 2017, 2:20 PM

Would it be possible to upload the diff with context (-U999999), so that it's possible to read in between the patch fragments? I find it quite a bit harder to digest the change otherwise.

With ample context.

It started with confusing blist_free() function comment.
I had thought the comment was wrong and asked to remove it and created https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223392

While I was at it, I removed few empty lines.
Alan suggested to proceed with empty line cleanup to follow coding style guideline.
I created another patch - https://bugs.freebsd.org/bugzilla/attachment.cgi?id=187951 - based on the head and D12635 change set.

Please bundle my change set into yours.

Alan Cox has told me that this blank line, before the first executable statement of a function, and regardless of whether or not there are any variable declarations preceding it, is a required element of FreeBSD coding style. So, by applying your patch, I would be ensuring his disapproval. Unless he directs me otherwise, I can't reasonably incorporate your patch into mine.

Would you please stress test this patch, Peter?

pho added a comment.Dec 1 2017, 9:12 PM

Would you please stress test this patch, Peter?

Sure, happy to.

pho added a comment.Dec 3 2017, 3:10 PM

Would you please stress test this patch, Peter?

Ran all stress2 tests on amd64. No problems seen.

Eliminate whitespace changes to comments that may have discouraged reviewers.

alc added inline comments.Dec 31 2017, 4:36 AM
sys/kern/subr_blist.c
185 ↗(On Diff #36432)

Insert a blank line here.

186 ↗(On Diff #36432)

This continuation line should be indented by 12 character positions.

612 ↗(On Diff #36432)

This continuation line should be indented by 12 character positions.

625 ↗(On Diff #36432)

Needs a period at the end of the sentence.

705 ↗(On Diff #36432)

The "/*" should not be on the same line as the first sentence.

768 ↗(On Diff #36432)

Needs a period at the end of the sentence.

Apply suggested whitespace and commenting changes.

dougm_rice.edu marked 6 inline comments as done.Dec 31 2017, 5:13 AM

Restore missing blist.h changes.

alc added inline comments.Jan 3 2018, 6:33 PM
sys/kern/subr_blist.c
797 ↗(On Diff #37395)

Elsewhere, you use the spelling BLIST_MAX_ALLOC for this constant. Please pick one spelling.

dougm_rice.edu added inline comments.Jan 3 2018, 6:59 PM
sys/kern/subr_blist.c
797 ↗(On Diff #37395)

BLIST_MAX_ALLOC and BLIST_META_RADIX have the same value after this change, but that does not mean they will always have the same value. A future change could seek to raise BLIST_MAX_ALLOC to 256, perhaps, while leaving BLIST_META_RADIX unchanged. So I disagree with this suggestion. Should I also eliminate the alternate spelling BLIST_BMAP_RADIX for this constant?

alc added inline comments.Jan 3 2018, 7:24 PM
sys/kern/subr_blist.c
797 ↗(On Diff #37395)

You never assign the value BLIST_BMAP_RADIX to bm_bighint, so testing for it is confusing.

Adopt reviewer suggestions. Reduce the size of the diff by reformatting modified comments to have fewer lines changed.

dougm_rice.edu marked 2 inline comments as done.Jan 3 2018, 9:08 PM

Rewrite a comment.

alc added a comment.Jan 4 2018, 12:19 AM

I've tested this patch under a parallel 'buildworld" workload on a machine with extremely limited memory. It appears to have the expected effect, specifically, a small but consistent reduction in swap fragmentation.

Before

Iteration 1:
number of maximal free ranges: 151
largest free range: 8042212
average maximal free range size: 111078

Iteration 2:
number of maximal free ranges: 169
largest free range: 2647365
average maximal free range size: 99247

Iteration 3:
number of maximal free ranges: 205
largest free range: 4969419
average maximal free range size: 81818

Iteration 4:
number of maximal free ranges: 202
largest free range: 2925051
average maximal free range size: 83033

Iteration 5:
number of maximal free ranges: 198
largest free range: 2233162
average maximal free range size: 84710

After

Iteration 1:
number of maximal free ranges: 141
largest free range: 8561931
average maximal free range size: 118956

Iteration 2:
number of maximal free ranges: 162
largest free range: 2674506
average maximal free range size: 103536

Iteration 3:
number of maximal free ranges: 187
largest free range: 5772547
average maximal free range size: 89693

Iteration 4:
number of maximal free ranges: 174
largest free range: 3091581
average maximal free range size: 96394

Iteration 5:
number of maximal free ranges: 196
largest free range: 3643414
average maximal free range size: 85574

If you compare corresponding iterations with and without the patch, the results are always better with the patch.

The only difference between this and what was posted in January is that that patch discarded code that has recently been found to be buggy, and this patch discards the fixed code.

pho added a comment.Sep 8 2018, 7:26 PM

The only difference between this and what was posted in January is that that patch discarded code that has recently been found to be buggy, and this patch discards the fixed code.

I ran a full test with D12635.id47723.diff. No problems seen.

alc added a comment.Sep 8 2018, 7:49 PM
In D12635#364226, @pho wrote:

The only difference between this and what was posted in January is that that patch discarded code that has recently been found to be buggy, and this patch discards the fixed code.

I ran a full test with D12635.id47723.diff. No problems seen.

Thanks, Peter. I'm inclined to commit this change after the freeze ends.

alc added inline comments.Sat, Nov 10, 5:39 PM
sys/kern/subr_blist.c
62 ↗(On Diff #47723)

It would be clearer for this sentence to say, "The non-blocking nature of allocations and frees is required by the swap code."

72 ↗(On Diff #47723)

I'm not sure that this line is clearly worded. At the very least, it seems like there is a missing article ("a") in the middle of it.

dougm_rice.edu marked 2 inline comments as done.

Prompted by reviewer comments, endeavor to clarify comments regarding data structure layout.

Add a missing "that" to a comment.

alc added inline comments.Sat, Nov 10, 7:05 PM
sys/kern/subr_blist.c
865 ↗(On Diff #50262)

There is an extra, unnecessary space after the ','.

868 ↗(On Diff #50262)

Aren't you using ummin() elsewhere for similar calculations?

977 ↗(On Diff #50262)

Ditto.

Address reviewer comments on spacing and ummin usage.

dougm_rice.edu marked 4 inline comments as done.Sat, Nov 10, 7:18 PM
alc accepted this revision.Sat, Nov 10, 8:22 PM
This revision is now accepted and ready to land.Sat, Nov 10, 8:22 PM
This revision was automatically updated to reflect the committed changes.