Details
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
After two iterations of buildworld the debug.foo output is
foo: 39074413 14698168 5130471 1754002 592517 134527 38946 17422 0 0 0 0 0 calls avoided: 60091879
vm/vm_phys.c | ||
---|---|---|
647–648 ↗ | (On Diff #44337) | This is good, although I wonder if there is any benefit to passing the page array down a layer. You could then not only avoid splitting but also avoid multiple calls to walk the freelists and order lists. |
vm/vm_phys.c | ||
---|---|---|
647–648 ↗ | (On Diff #44337) | Yes, it's probably worthwhile, but not by as much as you might hope. When the state of the free lists is such that you're doing more calls, because you're returning pages from order 0, then you're spending less time climbing the order lists looking for a non-empty list. I'll give it a try. In the "believe or not" category, clang is completely unrolling the oind loop that searches for a non-empty free list, which would make sense if you expected the queues to be empty, but poorly utilizes the I-cache if you expect the first free list to be non-empty. |
vm/vm_phys.c | ||
---|---|---|
60–68 ↗ | (On Diff #44359) | Could this block be an inline? |
647–648 ↗ | (On Diff #44337) | I could benchmark with and without if you prefer but in my experience even eliminating small overheads from this path helps a lot because the free page lock is still semi-hot. I don't feel too strongly about it though so really use your discretion and either is good. |
vm/vm_phys.c | ||
---|---|---|
659–660 ↗ | (On Diff #44359) | I am somewhat surprised by the frequency at which this code is taking pages from the high-order queues under a buildworld workload. With that in mind, I've asked Doug Moore to look at further optimizing vm_phys_free_contig() to avoid unnecessary checks for coalescing that can't possibly happen. Alternatively, to avoid pointless coalescing checks, I would need to compute fls(npages - i - 1) on every iteration and perform a vm_phys_split_pages() here. |
Here is the final version that I propose to commit.
FYI, as compared to the initial version, having vm_phys_alloc_npages() fill the array of page pointers reduces the time spent in vm_page_import() over the course of 5 consecutive buildworld's by only 4%.
vm/vm_phys.c | ||
---|---|---|
640 ↗ | (On Diff #44428) | I don't believe that an explicit prefetch on TAILQ_NEXT("m") would do any good here because the first thing that TAILQ_REMOVE("m") does is touch m's next. Maybe a prefetch on TAILQ_NEXT(TAILQ_NEXT("m")) would do some good. |