vmem: Avoid allocating span tags when segments are never released.
ClosedPublic
Actions

Authored by markj on Apr 23 2020, 4:33 PM.

Details

Reviewers

alc
kib
jeff

Commits

rS364819: vmem: Avoid allocating span tags when segments are never released.

Summary

vmem uses span tags to delimit imported segments, so that they can be
released if the segment becomes free in the future. However, the
per-domain kernel KVA arenas never release resources. Furthermore, the
span tags prevent coalescing free segments across KVA_QUANTUM
boundaries. As a minor optimization, avoid allocating span tags in this
case.

Test Plan

This was motivated by looking at vm.pmap.kernel_pmaps during poudriere
runs. I see many runs of 511 4KB mappings. Since UMA uses the direct
map for page-sized slabs, most allocations into kernel_object are > 4KB,
so we end up with page-sized holes, inhibiting superpage promotion and
causing fragmentation since kernel_object reservations remain partially
populated. For example:

0xfffffe0215200000-0xfffffe02157ff000 rw-s- WB 0 2 511
0xfffffe0215800000-0xfffffe0215dff000 rw-s- WB 0 2 511
0xfffffe0215e00000-0xfffffe02163ff000 rw-s- WB 0 2 511
0xfffffe0216400000-0xfffffe02165ff000 rw-s- WB 0 0 511
0xfffffe0216600000-0xfffffe02167ff000 rw-s- WB 0 0 511
0xfffffe0216800000-0xfffffe02169ff000 rw-s- WB 0 0 511
0xfffffe0216a00000-0xfffffe0216dff000 rw-s- WB 0 1 511
0xfffffe0216e00000-0xfffffe02175ff000 rw-s- WB 0 3 511

I tried measuring 2MB mapping usage within the kernel map during the
first few minutes of a poudriere run.

Before: https://reviews.freebsd.org/P378
After: https://reviews.freebsd.org/P379

There are some other approaches that would also help:

Use a larger import quantum on platforms where KVA is cheap
Use the per-domain arenas to manage physical memory instead of KVA

The second would avoid creation of holes, but we'd still have internal
fragmentation due to the rarity of 4KB allocations. Coalescing across 2MB
boundaries would also be less likely to occur, and we would want some
mechanism to reclaim memory from the arenas during a severe shortage.

I still see a number of holes even with the patch applied, I'm not yet sure
why. It might be that something is occasionally allocating and freeing 4KB
of memory using kmem_malloc().

Diff Detail

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 32957
Build 30351: arc lint + arc unit

Event Timeline

markj created this revision.Apr 23 2020, 4:33 PM

Harbormaster completed remote builds in B30685: Diff 70913.Apr 23 2020, 4:33 PM

markj edited the test plan for this revision. (Show Details)Apr 23 2020, 4:55 PM

markj edited the test plan for this revision. (Show Details)Apr 23 2020, 4:57 PM

markj added reviewers: alc, kib, jeff.

markj added inline comments.

sys/kern/subr_vmem.c
804	The segment list is supposed to be sorted, but here we are assuming that a newly imported range always sorts to the end of the list, which was surprising to me. The vmem implementation in illumos seems to do the same thing. I can't see a cheap way to ensure that the new segment is sorted.

Before: https://reviews.freebsd.org/P378
After: https://reviews.freebsd.org/P379

I meant to note, the columns are the number of 1GB, 2MB and 4KB mappings in the kernel map, respectively.

The increase is larger than it appears: of the ~1100 2MB mappings that exist when the test is started, 858 are from the static mapping of vm_page_array.

I still see a number of holes even with the patch applied, I'm not yet sure why.

I spent some more time on this. It is simply a result of NUMA: adjacent 2MB virtual pages get allocated to different domains, so there is no possibility of coalescing KVA allocations. Since ZFS frequently allocates kmem buffers with a size not equal to a power of 2 (or even a sum of two consecutive powers of 2), we end up with many runs of 511 4KB pages in the kernel map. With NUMA disabled and this patch applied, we get very good superpage utilization in the kernel map when poudriere is running. I think KVA_QUANTUM should be larger than 2MB on NUMA systems to help mitigate the problem. I can't really see a downside to having a larger KVA_QUANTUM, except maybe that we waste kernel page table pages if the imported KVA is underutilized.

Assert that the arena is empty when setting import/release functions.
This is true for all consumers in the tree. It could be relaxed to only
require that the arena be empty if a release function is set.

Harbormaster completed remote builds in B32957: Diff 75756.Aug 12 2020, 9:10 PM

markj mentioned this in D26050: Use a large kmem arena import size on NUMA systems..Aug 12 2020, 9:11 PM

I'd like to commit this in a couple of days if there are no objections.

This revision was not accepted when it landed; it landed in state Needs Review.Aug 26 2020, 2:31 PM

Closed by commit rS364819: vmem: Avoid allocating span tags when segments are never released. (authored by markj). · Explain Why

This revision was automatically updated to reflect the committed changes.

markj added a commit: rS364819: vmem: Avoid allocating span tags when segments are never released..

Herald added a subscriber: imp. · View Herald TranscriptAug 26 2020, 2:31 PM