Page MenuHomeFreeBSD

Change vm_page_import() to avoid physical memory fragmentation
ClosedPublic

Authored by alc on Jun 23 2018, 6:36 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jan 1, 4:35 PM
Unknown Object (File)
Wed, Jan 1, 2:51 AM
Unknown Object (File)
Nov 29 2024, 12:51 AM
Unknown Object (File)
Nov 24 2024, 10:26 PM
Unknown Object (File)
Nov 21 2024, 4:06 PM
Unknown Object (File)
Nov 19 2024, 12:54 PM
Unknown Object (File)
Nov 19 2024, 11:51 AM
Unknown Object (File)
Nov 19 2024, 1:56 AM
Subscribers

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

To be clear, I don't intend to commit the sysctl with the patch.

After two iterations of buildworld the debug.foo output is

foo: 39074413 14698168 5130471 1754002 592517 134527 38946 17422 0 0 0 0 0
calls avoided: 60091879
vm/vm_phys.c
647–648 ↗(On Diff #44337)

This is good, although I wonder if there is any benefit to passing the page array down a layer. You could then not only avoid splitting but also avoid multiple calls to walk the freelists and order lists.

This revision is now accepted and ready to land.Jun 23 2018, 11:03 AM
vm/vm_phys.c
647–648 ↗(On Diff #44337)

Yes, it's probably worthwhile, but not by as much as you might hope. When the state of the free lists is such that you're doing more calls, because you're returning pages from order 0, then you're spending less time climbing the order lists looking for a non-empty list.

I'll give it a try.

In the "believe or not" category, clang is completely unrolling the oind loop that searches for a non-empty free list, which would make sense if you expected the queues to be empty, but poorly utilizes the I-cache if you expect the first free list to be non-empty.

Pass the array of vm_page_t pointers into vm_phys.

This revision now requires review to proceed.Jun 23 2018, 9:47 PM
vm/vm_phys.c
60–68 ↗(On Diff #44359)

Could this block be an inline?

647–648 ↗(On Diff #44337)

I could benchmark with and without if you prefer but in my experience even eliminating small overheads from this path helps a lot because the free page lock is still semi-hot. I don't feel too strongly about it though so really use your discretion and either is good.

vm/vm_phys.c
659–660 ↗(On Diff #44359)

I am somewhat surprised by the frequency at which this code is taking pages from the high-order queues under a buildworld workload. With that in mind, I've asked Doug Moore to look at further optimizing vm_phys_free_contig() to avoid unnecessary checks for coalescing that can't possibly happen. Alternatively, to avoid pointless coalescing checks, I would need to compute fls(npages - i - 1) on every iteration and perform a vm_phys_split_pages() here.

Here is the final version that I propose to commit.

FYI, as compared to the initial version, having vm_phys_alloc_npages() fill the array of page pointers reduces the time spent in vm_page_import() over the course of 5 consecutive buildworld's by only 4%.

vm/vm_phys.c
640 ↗(On Diff #44428)

I don't believe that an explicit prefetch on TAILQ_NEXT("m") would do any good here because the first thing that TAILQ_REMOVE("m") does is touch m's next. Maybe a prefetch on TAILQ_NEXT(TAILQ_NEXT("m")) would do some good.

This revision was not accepted when it landed; it landed in state Needs Review.Jun 26 2018, 6:30 PM
This revision was automatically updated to reflect the committed changes.