Page MenuHomeFreeBSD

vm: Allow kstack pages to come from different domains
Needs ReviewPublic

Authored by bnovkov on May 11 2024, 10:34 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Jul 14, 6:39 PM
Unknown Object (File)
Mon, Jul 8, 7:22 AM
Unknown Object (File)
Thu, Jun 27, 8:53 PM
Unknown Object (File)
Jun 20 2024, 2:12 PM
Unknown Object (File)
Jun 13 2024, 6:30 PM
Unknown Object (File)
May 19 2024, 8:05 PM
Unknown Object (File)
May 19 2024, 8:05 PM
Unknown Object (File)
May 19 2024, 8:05 PM
Subscribers

Details

Reviewers
jhb
markj
alc
kib
Summary

vm_thread_stack_back currently requires that all kstack pages must come from the same domain as the kstack KVA.
However, this can lead to an infinite loop if we constantly attempt to allocate pages from a depleted domain.

This patch reworks kstack page allocation to allocate and handle kstack pages from multiple NUMA domains.

Test Plan

I've tested the fix using stress2 on a non-NUMA bhyve VM with no issues so far.
I'm in the middle of reworking https://reviews.freebsd.org/D44567 so I sadly do not have a NUMA testing environment to test this patch.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

Simplify vm_thread_stack_back page allocation loop.

Remove redundant m == NULL check.

why even support something like this?

instead, if pages can't be all be allocated from one domain, the code could be patched to free up whatever it did and try a different one

per-cpu caching for thread stacks is mostly a waste of memory and could be reworked for something with significantly smaller granularity, also meaning such allocations would be less likely to fail. but then supporting mixed backing only complicates things.

In D45163#1030231, @mjg wrote:

why even support something like this?

instead, if pages can't be all be allocated from one domain, the code could be patched to free up whatever it did and try a different one

Well, if there is sufficient pressure in the original domain this code will do the same thing.
It will, however, try to minimize the amount of pages allocated from different domains which makes more sense imo.

The main goal here is to fall back to other domains when there's memory pressure.
Using DOMAINSET_PREF (which is how kstack pages were allocated before D38852 landed) will provide that fallback and preserve existing behavior when the system is not under memory pressure.

What if you revert the order of allocation, first finding some domain which provides enough physical pages to back the stack, and then try to allocate KVA from corresponding arena?

In D45163#1032283, @kib wrote:

What if you revert the order of allocation, first finding some domain which provides enough physical pages to back the stack, and then try to allocate KVA from corresponding arena?

Sorry, I completely missed your comment.
That could work as well, but there's another point that I'd like to discuss.

According to the the 15.0 planning document, we want to get rid of swapping out kernel stacks entirely (listed by @markj).
If that's still the case, we can drop this patch entirely and just retire vm_thread_swap{out, in}.

According to the the 15.0 planning document, we want to get rid of swapping out kernel stacks entirely (listed by @markj).
If that's still the case, we can drop this patch entirely and just retire vm_thread_swap{out, in}.

Yes, if we do that, this problem will go away. I think that's better than introducing even more complexity here to account for what should be a rare case.

According to the the 15.0 planning document, we want to get rid of swapping out kernel stacks entirely (listed by @markj).
If that's still the case, we can drop this patch entirely and just retire vm_thread_swap{out, in}.

Yes, if we do that, this problem will go away. I think that's better than introducing even more complexity here to account for what should be a rare case.

D46112 is the first patch in a series which removes stack swapping. It is incomplete, as I didn't touch vm_glue.c or the kick_proc0() callers yet.