Differential D20929

Add a UMA option to support cross domain frees while preserving locality for first-touch zones.
ClosedPublic
Actions

Authored by jeff on Jul 12 2019, 12:55 AM.

Details

Reviewers

glebius
gallatin
markj
alc
kib

Commits

rS350659: Add two new kernel options to control memory locality on NUMA hardware.

Summary

This makes an optional cross-domain free bucket for memory freed to a different domain than it was allocated from. This gives us support for precise first-touch domains that won't mix memory. If you have a zone which does mix allocs and frees at a significant rate it can eventually bottleneck while freeing memory. I may address that in a follow-up patch if there is enough interest.

Combined with increased thread locality this can offer significant performance improvements for targeted workloads. It is likely not generally faster so it is hidden behind an option for now.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

jeff created this revision.Jul 12 2019, 12:55 AM

Harbormaster completed remote builds in B25307: Diff 59668.Jul 12 2019, 12:55 AM

jeff retitled this revision from Add a UMA option to support cross domain frees while preserving locality for first-touch zones. to Add a UMA option to support cross domain frees while preserving locality for first-touch zones..Jul 12 2019, 12:58 AM

jeff edited the summary of this revision. (Show Details)

jeff added reviewers: glebius, gallatin, markj, alc, kib.

gallatin accepted this revision.Jul 12 2019, 12:28 PM

This revision is now accepted and ready to land.Jul 12 2019, 12:28 PM

kib accepted this revision.Jul 22 2019, 12:03 PM

alc added inline comments.Jul 29 2019, 4:19 PM

sys/vm/uma_core.c
3083–3084 ↗	(On Diff #59668)	I'm curious why you didn't choose to compute (or lookup) the address of the slab header and use its us_domain field here.

jeff added inline comments.Jul 29 2019, 5:34 PM

sys/vm/uma_core.c
3083–3084 ↗	(On Diff #59668)	I believe this is cheaper. You can touch a very small amount of read-only global memory to calculate the domain assuming that you're allocating from the direct mapped region. I actually have a version of vm_phys_domain() which operates only from math, not a table, but found it isn't necessary so far. Roughly, you subtract the pci-e hole and divide the resulting address by the size of each domain. To get to the slab we need a couple extra steps and a cache line that we don't otherwise need.