Page MenuHomeFreeBSD

Add vm_page_alloc_pages_after().
AbandonedPublic

Authored by markj on Feb 14 2018, 8:14 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Nov 16, 12:26 PM
Unknown Object (File)
Oct 15 2024, 4:02 PM
Unknown Object (File)
Oct 9 2024, 3:15 PM
Unknown Object (File)
Oct 4 2024, 11:03 PM
Unknown Object (File)
Sep 26 2024, 5:24 PM
Unknown Object (File)
Sep 23 2024, 12:44 PM
Unknown Object (File)
Sep 15 2024, 6:46 AM
Unknown Object (File)
Sep 4 2024, 9:14 AM

Details

Reviewers
jeff
kib
alc
Summary

(I created this review mainly to facilitate discussion, I'm not
formally proposing it for inclusion yet. It's based on Jeff's numa
branch, not head/.)

This is another page allocation routine. It returns runs of pages
contiguous in the pindex space, with the intention of reducing the
frequency of lock acquisitions. When VM_ALLOC_{NOWAIT,WAITFAIL} are
specified, the returned run may be shorter than the one requested.
The new vm_phys_alloc_npages() is used to allocate pages from the
physical memory allocator when allocation from per-CPU caches or the
reservation system fail.

Some existing subroutines are changed. vm_reserv_reclaim_inactive() now
returns the number of pages freed by the break. vm_domain_available()
now returns the number of pages that may be allocated according to the
request type. vm_reserv_extend() and vm_reserv_alloc_page() now
optionally return a run of pages contiguous both in the pindex and the
physical address space.

vm_page_grab_pages() now uses vm_page_alloc_pages_after(), as do a
couple of routines which allocate readahead and readbehind pages before
performing a read from disk.

Some possible optimizations:

  • Batched insertion of the run into the object radix tree.
Test Plan

I've only done light testing of the change so far. Some testing
shows a ~10% improvement in throughput in a benchmark where
8 concurrent dd processes are reading sparse files from UFS with
a 32KB block size.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 15227
Build 15295: arc lint + arc unit

Event Timeline

markj edited the test plan for this revision. (Show Details)
markj added reviewers: jeff, kib, alc.
markj added subscribers: gallatin, scottl.
sys/vm/swap_pager.c
1096

Perhaps rename m to ma now ?

1143

I am curious, why ? If we allocated something, why not validate the pages despite we cannot validate the whole suggested run ? The caller accepts the cost of the read for all pages, why don't we read as much as we can get the memory ?

sys/vm/vm_page.c
1660

Purely stylistic note. I believe it would be easier to read if you used switch (req) there. The initial if() in vm_page_alloc() always caused the headache for me.

Since you copy the function body, you can also change this.

sys/vm/swap_pager.c
1096

I did that, though I forgot to update the comment above. Did you mean in vnode_pager_generic_getpages()?

1143

Are you suggesting moving the partially allocated pages to the correct pindexes, to avoid leaving a hole before ma[0]->pindex? If so, sure, I can add that.

sys/vm/vm_page.c
1660

I agree, I'll change it.

sys/vm/swap_pager.c
1096

I mean, commit the rename in advance.

1143

Yes, but this is a small consequence of the larger decision. Mostly, I do not see a reason to waste the done work to allocate.

sys/vm/vm_page.c
1660

I always imagined it would be better written by a switch that fetches the limit value and then a single comparison.

I have a change to make this an atomic which would be much better expressed that way if you don't mind doing so now it will make my patch simpler.

I'm not sure why it needed to move though?

1893

Since we do not need compatibility with some previous iteration of vm_page_alloc_pages, can we just drop the _after? There will never be some non _after version of this api? I realize it keeps it consistent with vm_page_alloc_after() but there are other inconsistencies between the two so I'm not sure it matters?

feel free to ignore my naming suggestions if you don't like them.

1945

Since there is nothing to do in an else clause can we shortern with reserv() && extend() as the other functions do.

2033–2037

Quite a lot of this code is duplicated among page alloc functions. Can we make some inlines to avoid this?

2053–2060

Can't we bump the wire count once? It's a minor optimization now that they're per-cpu but on some archs it still has to disable critical.

2068–2076

Is this right given that you haven't bumped the wire count for these pages yet? This looks like you're doing the full teardown for pages that haven't been initialized? Am I misreading?

sys/vm/vm_reserv.c
822

Rather than LELVEL_0_NPAGES this should be min(VM_LEVEL_0_NPAGES - index, *countp)

1178–1179

Nothing seems to be using the return value other than a zero check?

sys/vm/swap_pager.c
1096

Ok.

1143

Sure, but I think this is largely inconsequential - partial allocations will only happen during a severe memory shortage, and overall system performance is quite degraded by that point. So I don't think it's very important to try and optimize for that case.

However, it's easy enough loop over the partially allocated range and perform:

vm_page_remove(m);
vm_page_xbusy(m);
vm_page_insert(m, <new pindex>);

so I will do that instead.

markj added inline comments.
sys/vm/vm_page.c
1893

I was on the fence about this. I don't see why it's inconceivable that we might someday want a wrapper that looks up mpred?

2068–2076

Yeah, I messed this part up. It's clearly untested at this point. Will fix.

sys/vm/vm_reserv.c
822

Ok, but this doesn't affect correctness unless I'm missing something.

1178–1179

vm_page_alloc_pages_domain_after() stores the return value in "avail".

markj marked 3 inline comments as done.

Address review feedback.

sys/vm/swap_pager.c
1143

Never mind, this won't work if the pages were allocated from a reservation. I think the only choices here are to keep the existing diff, or avoid using alloc_pages for allocating readbehind ranges.

I don't think that it's necessarily harmful to throw away pages. Again, in this case the system's performance is degraded by design, and readbehind is not necessarily a good use of memory when free pages are scarce.

sys/vm/vm_page.c
1660

I didn't like that vm_domain _avail() was mixed in between several allocator functions and wrappers. I just moved it so that it preceded them all. If you feel strongly about keeping it where it was I'll move it back.

sys/vm/vm_page.c
2033–2037

I'm not sure how to do it cleanly. When initializing a run of pages, it's nice to pre-compute various page fields (oflags, etc.). So ideally we'd have a function which initializes a run of pages. But alloc_contig returns an array of struct vm_page, while alloc_pages returns an array of struct vm_page *.

Fix vm_radix insertion failure handling.

Avoid requesting a run of length 0.

Ensure that "shift" is initialized before use.