In vm_phys_free_contig, many calls are made to vm_phys_free_pages, which attempts to merge free blocks with buddies to form larger free blocks. We can replace most of those calls with direct calls to vm_freelist_add that do not attempt merging, and call vm_phys_free_pages at most twice to get all the merging that is possible.
Details
The testing for D20256 included a table showing the number of cycles consumed in breaking a reservation with first and last pages, only, allocated, with and without that change in place. The first column was the original code, and the second showed the effects of that change. I'm adding a third column that shows the affect of this change.
Pages 0 and 511 allocated, the rest free:
cycles/break: 4233: 4226: 3629
cycles/break: 4236: 4251: 3777
cycles/break: 4218: 4204: 3781
cycles/break: 4248: 4219: 3775
cycles/break: 4241: 4233: 3813
cycles/break: 4195: 4232: 3785
cycles/break: 4251: 4280: 3794
cycles/break: 4193: 4222: 3778
cycles/break: 4281: 4221: 3762
cycles/break: 4240: 4202: 3761
Diff Detail
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
Please provide the diff context.
I am curious how large impact this change has, do you observe situations where it really makes the difference ? In other words, how often is vm_phys_free_pages() called for the run of >= VM_NFREEORDER ? Is it for free of superpages ?
sys/vm/vm_phys.c | ||
---|---|---|
1098 | I think this comment is mis-placed. It seems that it should be near the call to vm_phys_nomerge_free_contig(). | |
1112 | Continuation line should use +4 spaces for intent. I would put this min() calculation into a helper function and moved the long comment above to it. |
sys/vm/vm_phys.c | ||
---|---|---|
1095 | m_start would seem to be a more common name for the first page in a contiguous range. |
sys/vm/vm_phys.c | ||
---|---|---|
1049 | return (min(....)); |
sys/vm/vm_phys.c | ||
---|---|---|
1077 | There is an undocumented assumption here and in the original vm_phys_free_contig() code: That the contiguous pages don't cross a vm_phys_segs[] boundary. Right now, this function is only called in two places: vm_page_startup() and vm_reserv_break(). In both cases, we know that the contiguous pages won't cross a vm_phys_segs[] boundary. Moreover, we know that coalescing won't occur. In other words, the two calls to vm_phys_free() on the boundaries aren't needed to handle the possibility of coalescing. When I first talked to Doug about implementing this optimization, I was reluctant to provide a public function that requires the caller to know that coalescing can't occur. However, we already require the caller to know that the contiguous pages won't cross a vm_phys_segs[] boundary, so there is already one unusual requirement on the caller. Given that, should we just explicitly document the requirements, expose this non-coalescing version to callers, and use it in vm_page_startup() and vm_reserv_break()? |
sys/vm/vm_phys.c | ||
---|---|---|
1077 | I think it is not hard to assert that m->segind is not changing. |
This patch was ready to land, briefly, 6.5 months ago. This is just a refresh to account for unrelated changes to the modified file in the last 6.5 months.
Refresh this patch. Avoid potentially unneeded evaluations of flsl by recognizing when the max free block size is reached. Rewrite to update one variable per loop iteration instead of two.
I have noted Alan's comment and your followup. I speculate that, having noted them, you would have me
- explicitly document the requirements, expose this non-coalescing version to callers, and use it in vm_page_startup() and vm_reserv_break()
and
- assert that m->segind is not changing, somewhere in the code.
If you are not directing me to take these steps, please clarify.
If I use the non-coalescing version in the two places mentioned above, then we'll be left using the coalescing version exactly never, or so it seems. So ought we instead abandon all this cleverness and just make vm_phys_free_contig non-coalescing, and simpler?
sys/vm/vm_phys.c | ||
---|---|---|
1077 | In the cases of vm_page_startup() and vm_reserv_break(), some of us know that coalescing won't occur, but I am not one of them. So I'll leave this patch in perpetual limbo until I believe this thing that we all know. |
Rename the function that frees page ranges without merging across boundaries, since calling is "nomerge" suggests it doesn't merge adjacent freed pages in the range. Make that function public and use it in places where vm_phys_free_contig is used now. Add assertions to verify that the segment isn't changing as consecutive blocks of pages are freed.
vm_phys.c | ||
---|---|---|
1098 ↗ | (On Diff #57977) | "... of a set of pages starting ..." |
1102 ↗ | (On Diff #57977) | For the time being, style(9) still requires a blank line here. (There is a proposal to eliminate that rule that might apply here.) |
1119 ↗ | (On Diff #57977) | "... must be superpage boundary, ...". No. I think that you are trying to explain why vm_reserv_break() can call this function. Reservations are provided by the buddy allocator, and thus they are aligned on an address that is a multiple of their size. When we are breaking a reservation, there is at least one allocated page within the reservation. The pages being freed by vm_reserv_break() can't possibly coalesce with a page from outside the reservation until those pages are freed. |
1125 ↗ | (On Diff #57977) | I suggest vm_phys_enqueue_contig(). |
1145 ↗ | (On Diff #57977) | The page range that spans two segments is [original "m", current "m"), not [current "m", "m_end"). |
1147 ↗ | (On Diff #57977) | Unnecessary parentheses. |
1155 ↗ | (On Diff #57977) | Unnecessary parentheses. |
1164 ↗ | (On Diff #57977) | Unnecessary parentheses. |
Accept reviewer recommendations. Rebuild kernel and reinstall with no immediate problems.
Kostik, Mark,
While Kostik recently approved this change, I think that Doug is waiting for one of you to explicitly approve the latest version.
Doug,
When you write the commit message, you should include a sentence in that message quantifying the improvement.