Page MenuHomeFreeBSD

Add a UMA option to support cross domain frees while preserving locality for first-touch zones.
ClosedPublic

Authored by jeff on Jul 12 2019, 12:55 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Mar 22, 10:37 PM
Unknown Object (File)
Fri, Mar 22, 10:37 PM
Unknown Object (File)
Fri, Mar 22, 10:37 PM
Unknown Object (File)
Fri, Mar 8, 8:31 AM
Unknown Object (File)
Jan 6 2024, 5:13 PM
Unknown Object (File)
Jan 6 2024, 5:13 PM
Unknown Object (File)
Jan 6 2024, 5:13 PM
Unknown Object (File)
Jan 4 2024, 6:58 AM
Subscribers

Details

Summary

This makes an optional cross-domain free bucket for memory freed to a different domain than it was allocated from. This gives us support for precise first-touch domains that won't mix memory. If you have a zone which does mix allocs and frees at a significant rate it can eventually bottleneck while freeing memory. I may address that in a follow-up patch if there is enough interest.

Combined with increased thread locality this can offer significant performance improvements for targeted workloads. It is likely not generally faster so it is hidden behind an option for now.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jeff retitled this revision from Add a UMA option to support cross domain frees while preserving locality for first-touch zones. to Add a UMA option to support cross domain frees while preserving locality for first-touch zones..Jul 12 2019, 12:58 AM
jeff edited the summary of this revision. (Show Details)
jeff added reviewers: glebius, gallatin, markj, alc, kib.
This revision is now accepted and ready to land.Jul 12 2019, 12:28 PM
sys/vm/uma_core.c
3083–3084 ↗(On Diff #59668)

I'm curious why you didn't choose to compute (or lookup) the address of the slab header and use its us_domain field here.

sys/vm/uma_core.c
3083–3084 ↗(On Diff #59668)

I believe this is cheaper. You can touch a very small amount of read-only global memory to calculate the domain assuming that you're allocating from the direct mapped region.

I actually have a version of vm_phys_domain() which operates only from math, not a table, but found it isn't necessary so far. Roughly, you subtract the pci-e hole and divide the resulting address by the size of each domain.

To get to the slab we need a couple extra steps and a cache line that we don't otherwise need.