Paths

Table of Contentst

Fix memguard when options NUMA is configured.
AbandonedPublic
Actions

Authored by markj on Sep 14 2018, 8:56 PM.

Details

Reviewers

kib
alc
jeff
cem

Summary

kmem_back() is now somewhat deficient since it doesn't know that pages
backing the same large virtual page must come from the same NUMA domain
in order to satisfy constraints of vm_reserv_extend(). It just selects
a domain according to the configured policy, which may or may not
correspond to the "colour" of the caller-supplied KVA.

memguard is the last in-tree consumer of kmem_back(). This change
allows it to work with "options NUMA" by ensuring that we consistently
select the same domain for pages backing a large virtual page. As part
of this, ensure that the KVA range reserved for memguard is large
page-aligned.

Test Plan

Set vm.memguard.desc="mbuf" on a NUMA system and ran network
traffic until the memguard cursor wrapped around.

Diff Detail

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 19604
Build 19185: arc lint + arc unit

Event Timeline

markj created this revision.Sep 14 2018, 8:56 PM

Harbormaster completed remote builds in B19600: Diff 48063.Sep 14 2018, 8:56 PM

markj edited the test plan for this revision. (Show Details)Sep 14 2018, 8:59 PM

markj added reviewers: kib, alc, jeff.

kib accepted this revision.Sep 14 2018, 9:10 PM

This revision is now accepted and ready to land.Sep 14 2018, 9:10 PM

Looks like kmem_back() can now be entirely removed.

In D17175#366163, @cem wrote:

Looks like kmem_back() can now be entirely removed.

Yep, I'll propose that change separately.

markj added inline comments.Sep 15 2018, 3:00 AM

sys/vm/memguard.c
374	I think we can fall back to kmem_back() if the initial allocation attempt fails.

markj added inline comments.Sep 15 2018, 5:03 PM

sys/vm/memguard.c
374	... not quite, since we could race with a free. I considered somehow moving this logic into kmem_back() itself (i.e., use the KVA to select an initial domain and then fall back if necessary, but memguard still needs to ensure that its KVA arena is aligned and a multiple of 2MB in size. One thing we could do is round up the cursor to the next 2MB page in the event of an allocation failure, so that the next allocation attempt will select a different domain. I'll attempt that in a separate change.
393	I think this should be origaddr + size_v.

Advance the cursor to the next superpage boundary if the page
allocation fails.

Advance the cursor by the correct amount in the case where page
allocation succeeds.

This revision now requires review to proceed.Sep 15 2018, 5:56 PM

Harbormaster completed remote builds in B19604: Diff 48073.Sep 15 2018, 5:56 PM

markj marked 2 inline comments as done.Sep 15 2018, 5:57 PM

markj added inline comments.

sys/vm/memguard.c
374	I ended up rolling that up into this change. Otherwise, if one domain is depleted, memguard allocation attempts won't advance the cursor at all and we'll just keep hitting the same domain over and over.

cem accepted this revision.Sep 15 2018, 6:18 PM

cem added inline comments.

sys/vm/memguard.c
370	Is this necessary? (Isn't any integer mod 1 going to be zero?) Or is it valid for vm_ndomains to be zero?
387–388	Why atomic_cmpset? No other update to memguard_cursor seems to be atomic. (And the cmpset seems meaningless, no?) For the cmpset behavior I think you want an extra local variable to show that the load is only performed once: if (vm_ndomains...) { vm_offset_t cursor; cursor = memguard_cursor; // atomic_load() ? next = roundup2(cursor, ...); atomic_cmpset_long(&memguard_cursor, cursor, next);
393	Agree

This revision is now accepted and ready to land.Sep 15 2018, 6:18 PM

markj marked an inline comment as done.Sep 15 2018, 6:37 PM

markj added inline comments.

sys/vm/memguard.c
370	It's not really necessary, but I wanted the condition to match the one in the failure case below. vm_ndomains is initialized to 1.
387–388	Oops, right, I need to saved the value of memguard_cursor loaded at the beginning of the routine. I also realized that the domain selection doesn't work properly if we're allocating multiple pages and the KVA crosses a superpage boundary.

alc added inline comments.Sep 15 2018, 9:09 PM

sys/vm/memguard.c
331–335	Setting aside the NUMA problem for a second, this actually doesn't work with a vmem arena behind the cursor. Suppose that we have wrapped around. vmem manages its free lists in a LIFO fashion. So, if you are allocating one page, vmem is going to return the most recently freed page that hasn't coalesced, and not the first free page near the start of the arena. The cursor will jump to that location, and I expect that the next allocation attempt will fail resulting in another cursor reset. Until wraparound, the most recently freed pages were rejected because their addresses were below the cursor. But in that case vmem_xalloc is iterating over the all of the free list entries for these pages until it gets to the entry representing the region between the cursor and the end of the arena. Hopefully some coalescing of these recently freed pages occurs.

markj added inline comments.Sep 16 2018, 1:44 AM

sys/vm/memguard.c
331–335	The vmem paper describes a "nextfit" allocation strategy, but our implementation doesn't have it yet. It could be used here since the memguard arena doesn't use a quantum cache. Assuming that you don't have an alternate strategy for handling the NUMA problem, what do you think of implementing that and using it here?

markj added inline comments.Sep 16 2018, 4:50 AM

sys/vm/memguard.c
331–335	I tried implementing it anyway: https://github.com/markjdb/freebsd-dev/commit/aaa28e0ddc0bc834da9ad4dbcb0985457cace394 https://github.com/markjdb/freebsd-dev/commit/514bef1f896b1ddb07bca279b188539f6fadfe5c

alc added inline comments.Sep 18 2018, 4:46 PM

sys/vm/memguard.c
331–335	Please upload the next-fit implementation to phabricator. I think that it's worth having. That said, iterating over an arena's segment list leads to a bad worst case performance bound. Because allocated boundary tags do not coalesce (unlike free boundary tags), a next-fit allocation might iterate over every single allocation. Essentially, one has to hope that the arena on which nextfit is applied is sparsely utilized.

alc added inline comments.Sep 18 2018, 4:59 PM

sys/vm/memguard.c
67–72	My suggestion would be to keep memguard NUMA-oblivious and fix kmem_back(). In other words, keep code like the above inside vm/vm_kern.c.

markj added inline comments.Sep 18 2018, 5:31 PM

sys/vm/memguard.c
67–72	I mentioned this possibility in a different comment thread. memguard still needs to ensure that its arena doesn't share any large pages with its parent, so kmem_back() cannot completely hide the problem.
331–335	Right, I was hesitating for a while because of this limitation of the approach. I took a look at illumos and their implementation seems to have the same problem.

alc added inline comments.Sep 18 2018, 11:18 PM

sys/vm/memguard.c
67–72	The kernel_object has only one use case outside of memguard.c and vm_kern.c, in subr_vmem.c, and that use case allocates an address from a per-domain arena. Also, the per-domain arenas should only be pulling address ranges that are a multiple of 2MB and start at a 2MB aligned address. So, memguard shouldn't be getting addresses under which there is a reservation that is shared with a per-domain arena. (memguard is getting its entire address from the the "global" kernel arena.)

Superseded by D17247 and D17248.

Revision Contents
Changeset List

Path

Size

sys/

vm/

memguard.c

45 lines

Diff 48073

View Options

Fix memguard when options NUMA is configured.AbandonedPublicActions