Page MenuHomeFreeBSD

Allow allocations across meta boundaries
ClosedPublic

Authored by dougm on Oct 11 2017, 5:56 AM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Mar 12, 11:52 PM
Unknown Object (File)
Sat, Mar 9, 9:11 PM
Unknown Object (File)
Feb 25 2024, 6:55 AM
Unknown Object (File)
Feb 20 2024, 7:43 PM
Unknown Object (File)
Feb 9 2024, 10:26 AM
Unknown Object (File)
Feb 7 2024, 6:51 PM
Unknown Object (File)
Jan 29 2024, 9:35 PM
Unknown Object (File)
Jan 29 2024, 9:35 PM

Details

Summary

Remove restrictions that prevent allocation requests to cross the boundary between two meta nodes.

Replace the bmu_avail field in meta nodes with a bitmap that identifies which subtrees have some free memory, and iterate over the nonempty subtrees only in blst_meta_alloc. If free memory is scarce, this should make searching for it faster.

Put the code for handling the next-leaf allocation in a separate function. When taking blocks from the next leaf empties the leaf, be sure to clear the appropriate bit in its parent, and so on, up to the least-common ancestor of this leaf and the next.

Eliminate special terminator nodes, and rely instead on the fact that there is a 0-bit at the end of the bitmask at the root of the tree that will stop a meta_alloc search, or a next-leaf search, before the search falls off the end of the tree. Make sure that the tree is big enough to have space for that 0-bit.

Eliminate special all-free indicators. Lazy initialization of subtrees stands in the way of having an allocation span a meta-node boundary, so a subtree of all free blocks is not treated specially. Subtrees of all-allocated blocks are still recognized by looking at the bitmask at the root and finding 0.

Don't print all-allocated subtrees. Do print the bitmasks for meta nodes, when tree-printing.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

dougm added a reviewer: alc.

in blst_next_leaf_alloc, stop walking over meta-nodes as soon as one of them has an empty first child. That way, the extra empty meta node at the end of a 1024-block tree will stop a search for next leaf before we walk past allocated memory.

Correct calculation of number of nodes, to account for the extra zero node when blocks is a multiple of BLIST_BMAP_RADIX.

Drop KERN_{SUCCESS,FAILURE}. Make META_RADIX match BMAP_RADIX.

Fix a typo in updating meta-node hint. Change nodes allocted so that when an alloc fails and the last block is free, there is always some meta node that can have its hint updated to count-1.

Use a more appropriate constant in blist_alloc.

Include the avail count in the blist printout.

Don't print the all-used tree. With fill-column set to 80, reformat some comments. Update others to reflect the changes made in the substance of the patch.

sys/kern/subr_blist.c
845

Because imin() is an inline function and not a polymorphic macro, it is going to truncate the 64-bit daddr_t values to 32 bits. Please check for similar typing errors elsewhere. This one just immediately jumped out at me.

dougm marked an inline comment as done.

Replace imin with ummin.

sys/kern/subr_blist.c
845

I addressed this by changing to ummin.

Would it be possible to upload the diff with context (-U999999), so that it's possible to read in between the patch fragments? I find it quite a bit harder to digest the change otherwise.

It started with confusing blist_free() function comment.
I had thought the comment was wrong and asked to remove it and created https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223392

While I was at it, I removed few empty lines.
Alan suggested to proceed with empty line cleanup to follow coding style guideline.
I created another patch - https://bugs.freebsd.org/bugzilla/attachment.cgi?id=187951 - based on the head and D12635 change set.

Please bundle my change set into yours.

Alan Cox has told me that this blank line, before the first executable statement of a function, and regardless of whether or not there are any variable declarations preceding it, is a required element of FreeBSD coding style. So, by applying your patch, I would be ensuring his disapproval. Unless he directs me otherwise, I can't reasonably incorporate your patch into mine.

Would you please stress test this patch, Peter?

In D12635#278132, @dougm_rice.edu wrote:

Would you please stress test this patch, Peter?

Sure, happy to.

In D12635#278132, @dougm_rice.edu wrote:

Would you please stress test this patch, Peter?

Ran all stress2 tests on amd64. No problems seen.

Eliminate whitespace changes to comments that may have discouraged reviewers.

sys/kern/subr_blist.c
185

Insert a blank line here.

186

This continuation line should be indented by 12 character positions.

612

This continuation line should be indented by 12 character positions.

625

Needs a period at the end of the sentence.

705

The "/*" should not be on the same line as the first sentence.

768

Needs a period at the end of the sentence.

Apply suggested whitespace and commenting changes.

Restore missing blist.h changes.

sys/kern/subr_blist.c
797

Elsewhere, you use the spelling BLIST_MAX_ALLOC for this constant. Please pick one spelling.

sys/kern/subr_blist.c
797

BLIST_MAX_ALLOC and BLIST_META_RADIX have the same value after this change, but that does not mean they will always have the same value. A future change could seek to raise BLIST_MAX_ALLOC to 256, perhaps, while leaving BLIST_META_RADIX unchanged. So I disagree with this suggestion. Should I also eliminate the alternate spelling BLIST_BMAP_RADIX for this constant?

sys/kern/subr_blist.c
797

You never assign the value BLIST_BMAP_RADIX to bm_bighint, so testing for it is confusing.

Adopt reviewer suggestions. Reduce the size of the diff by reformatting modified comments to have fewer lines changed.

dougm marked 2 inline comments as done.Jan 3 2018, 9:08 PM

I've tested this patch under a parallel 'buildworld" workload on a machine with extremely limited memory. It appears to have the expected effect, specifically, a small but consistent reduction in swap fragmentation.

Before

Iteration 1:
number of maximal free ranges: 151
largest free range: 8042212
average maximal free range size: 111078

Iteration 2:
number of maximal free ranges: 169
largest free range: 2647365
average maximal free range size: 99247

Iteration 3:
number of maximal free ranges: 205
largest free range: 4969419
average maximal free range size: 81818

Iteration 4:
number of maximal free ranges: 202
largest free range: 2925051
average maximal free range size: 83033

Iteration 5:
number of maximal free ranges: 198
largest free range: 2233162
average maximal free range size: 84710

After

Iteration 1:
number of maximal free ranges: 141
largest free range: 8561931
average maximal free range size: 118956

Iteration 2:
number of maximal free ranges: 162
largest free range: 2674506
average maximal free range size: 103536

Iteration 3:
number of maximal free ranges: 187
largest free range: 5772547
average maximal free range size: 89693

Iteration 4:
number of maximal free ranges: 174
largest free range: 3091581
average maximal free range size: 96394

Iteration 5:
number of maximal free ranges: 196
largest free range: 3643414
average maximal free range size: 85574

If you compare corresponding iterations with and without the patch, the results are always better with the patch.

The only difference between this and what was posted in January is that that patch discarded code that has recently been found to be buggy, and this patch discards the fixed code.

In D12635#363496, @dougm_rice.edu wrote:

The only difference between this and what was posted in January is that that patch discarded code that has recently been found to be buggy, and this patch discards the fixed code.

I ran a full test with D12635.id47723.diff. No problems seen.

In D12635#364226, @pho wrote:
In D12635#363496, @dougm_rice.edu wrote:

The only difference between this and what was posted in January is that that patch discarded code that has recently been found to be buggy, and this patch discards the fixed code.

I ran a full test with D12635.id47723.diff. No problems seen.

Thanks, Peter. I'm inclined to commit this change after the freeze ends.

sys/kern/subr_blist.c
62

It would be clearer for this sentence to say, "The non-blocking nature of allocations and frees is required by the swap code."

72

I'm not sure that this line is clearly worded. At the very least, it seems like there is a missing article ("a") in the middle of it.

dougm marked 2 inline comments as done.

Prompted by reviewer comments, endeavor to clarify comments regarding data structure layout.

Add a missing "that" to a comment.

sys/kern/subr_blist.c
865

There is an extra, unnecessary space after the ','.

868

Aren't you using ummin() elsewhere for similar calculations?

977

Ditto.

Address reviewer comments on spacing and ummin usage.

This revision is now accepted and ready to land.Nov 10 2018, 8:22 PM
This revision was automatically updated to reflect the committed changes.