Page MenuHomeFreeBSD

Segregate _NOFREE allocations in physical memory and KVA.
AbandonedPublic

Authored by markj on Mar 5 2020, 7:12 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sep 2 2025, 7:53 PM
Unknown Object (File)
Sep 2 2025, 4:07 PM
Unknown Object (File)
Aug 8 2025, 5:49 PM
Unknown Object (File)
Jul 25 2025, 12:42 PM
Unknown Object (File)
Jul 25 2025, 10:12 AM
Unknown Object (File)
Jul 24 2025, 5:24 PM
Unknown Object (File)
Jul 22 2025, 12:33 PM
Unknown Object (File)
Jun 30 2025, 2:59 PM
Subscribers

Details

Reviewers
kib
jeff
alc
rlibby
Summary

Add a new per-domain vmem arena whose import function allocates
aligned 2MB chunks of kernel memory. Introduce a new malloc flag,
M_STABLE, which indicates that the caller will not free the memory back
to the allocator. Handle M_STABLE in kmem_malloc() and
kmem_alloc_contig() by returning ranges from this arena. This ensures
that M_STABLE allocations are grouped together.

Pass M_STABLE when allocating slabs for a UMA_ZONE_NOFREE keg. Ensure
that we do not use the direct map for such slabs.

This helps minimize the fragmentation of physical memory caused by
_NOFREE objects, particularly VM object and thread structures. In the
past I have found that over time, slabs for these structures cover wide
ranges of memory, inhibiting superpage creation. For example, on my
desktop with ~2 days uptime, 25% of the 2MB chunks of RAM in the system
contain at least one VM object slab page. That's 4GB of RAM in which
every 2MB large page contains at least one _NOFREE slab, but the system
has less than 100MB worth of VM object slabs.

I wanted to call the flag M_NOFREE, but mbuf.h has already claimed this
name. I think this mechanism could be extended to support long-lived
allocations (like typical per-CPU structures) rather than strict _NOFREE
allocations, so "stable" seems like a reasonable name. Internally the
functions are suffixed with _nofree since that is more specific. I am
open to suggestions for different names.

Test Plan

I wrote a program that uses libkvm to walk the list of UMA slabs of a given
zone and print the slab address. I ran poudriere with this change applied and
used the program to verify that most VM object slabs are packed into 2MB
pages. Without the patch there is a long tail of 2MB physical pages containing
exactly one VM object slab.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 29816
Build 27644: arc lint + arc unit

Event Timeline

markj added a reviewer: rlibby.
sys/vm/uma_core.c
1677–1678

I would prefer to keep some comment here.

"Allocates slab pages requiring KVA from the system."

2240

I feel so/so about this.

We won't be using the direct map. This makes numa much more expensive. For proc and thread this isn't a big deal but object may or may not show a slowdown. Every free requires pmap_kextract(). We should audit other users.

I wonder if we can write something that allocates from the arena/object but returns the DMAP address if it exists.

sys/vm/vm_kern.c
808–809

What fraction of physical memory do we expect to be NOFREE? We should consider not even trying if we have too little memory.

sys/vm/uma_core.c
1677–1678

Ok.

2240

For most platforms it should be trivial to store direct map addresses in the arena. I'm not sure about powerpc yet. In the longer term it sounds like we want kmem_malloc() to allocate pages first, and then return the direct map address if they happen to be contiguous, else allocate KVA and create a mapping.

sys/vm/vm_kern.c
808–809

I think it would generally be quite small, in the range of a few %, but userspace could force allocation of a large number of VM objects. This should not tie up more memory than that allocated by a single import though.

markj marked an inline comment as done.

I'm working on a different approach to this. Aside from Jeff's note that we should aim to use the direct map whenever possible to minimize pmap_kextract() overhead in uma_zfree(), I now think it is a bit silly to use a vmem arena as the _NOFREE page allocator: each allocation requires the allocation of at least one boundary tag, effectively making its slab a _NOFREE page.