Page MenuHomeFreeBSD

Add a UMA option to support cross domain frees while preserving locality for first-touch zones.
ClosedPublic

Authored by jeff on Jul 12 2019, 12:55 AM.

Details

Summary

This makes an optional cross-domain free bucket for memory freed to a different domain than it was allocated from. This gives us support for precise first-touch domains that won't mix memory. If you have a zone which does mix allocs and frees at a significant rate it can eventually bottleneck while freeing memory. I may address that in a follow-up patch if there is enough interest.

Combined with increased thread locality this can offer significant performance improvements for targeted workloads. It is likely not generally faster so it is hidden behind an option for now.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

jeff created this revision.Jul 12 2019, 12:55 AM
jeff retitled this revision from Add a UMA option to support cross domain frees while preserving locality for first-touch zones. to Add a UMA option to support cross domain frees while preserving locality for first-touch zones..Jul 12 2019, 12:58 AM
jeff edited the summary of this revision. (Show Details)
jeff added reviewers: glebius, gallatin, markj, alc, kib.
gallatin accepted this revision.Jul 12 2019, 12:28 PM
This revision is now accepted and ready to land.Jul 12 2019, 12:28 PM
kib accepted this revision.Jul 22 2019, 12:03 PM
alc added inline comments.Jul 29 2019, 4:19 PM
sys/vm/uma_core.c
3083–3084 ↗(On Diff #59668)

I'm curious why you didn't choose to compute (or lookup) the address of the slab header and use its us_domain field here.

jeff added inline comments.Jul 29 2019, 5:34 PM
sys/vm/uma_core.c
3083–3084 ↗(On Diff #59668)

I believe this is cheaper. You can touch a very small amount of read-only global memory to calculate the domain assuming that you're allocating from the direct mapped region.

I actually have a version of vm_phys_domain() which operates only from math, not a table, but found it isn't necessary so far. Roughly, you subtract the pci-e hole and divide the resulting address by the size of each domain.

To get to the slab we need a couple extra steps and a cache line that we don't otherwise need.

alc accepted this revision.Jul 29 2019, 6:39 PM