Improve UMA cache reclamation
ClosedPublic
Actions

Authored by mav on Apr 16 2021, 3:54 AM.

Details

Reviewers

jeff
markj

Commits

rG555baef969a1: Improve UMA cache reclamation.
rG2760658b211c: Improve UMA cache reclamation.

Summary

When estimating working set size, measure only allocation batches, not free batches. Allocation and free patterns can be very different. For example, ZFS on vm_lowmem event can free to UMA few gigabytes of memory in one call, but it does not mean it will request the same amount back that fast too, in fact it won't.

Update working set size on every reclamation call, shrinking caches fast(er) under pressure. Lack of this caused repeating vm_lowmem events squeezing more and more memory out of real consumers only to make it stuck in UMA caches. I saw ZFS drop ARC size in half before previous algorithm after periodic WSS update decided to reclaim UMA caches.

Introduce voluntary reclamation of UMA caches not used for a long time. For each zdom track longterm minimal cache size watermark, freeing some unused items every UMA_TIMEOUT after first 15 minutes without cache misses. Freed memory can get better use by other consumers. For example, ZFS won't grow its ARC unless it see free memory, since it does not know it is not really used. And even if memory is not really needed, periodic free during inactivity periods should reduce its fragmentation.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

mav requested review of this revision.Apr 16 2021, 3:54 AM

mav created this revision.

I'm still trying to understand the patch, I don't have many good comments yet. A block comment explaining the scheme would be helpful.

I like the idea behind the change to zone_domain_imax_set(), I can see why that fixes the interaction with the ARC's lowmem handler. I'm not sure that uzd_imin has much purpose anymore? Overall I wonder if this can be simplified somewhat.

sys/vm/uma_core.c
1165	I'm not sure about performing potentially large bursts of work in a callout thread. I suspect it would be better to reclaim smaller amounts more frequently, or have some absolute limit on the number of items reclaimed.
1184	This loop is identical to the one in bucket_cache_reclaim_domain(), I think it should become a separate function.

markj added inline comments.Apr 19 2021, 8:37 PM

sys/vm/uma_core.c
1165	To be more concrete, reclamation from the abd chunk zone takes hundreds of ms on a system running postgres with 64GB of RAM. Reclamation from zones with slab size > PAGE_SIZE can also be quite expensive since they must be unmapped (with a corresponding TLB invalidation) before being freed. Rather than doing this work in one go, I think that waiting some time for the WSSE to stabilize, and then reclaiming a small portion every UMA_TIMEOUT ms would be more effective at preventing some latency spike. Alternately, we could extend the uma_reclaim() thread to handle this work, so that callouts are not deferred.

jeff added inline comments.Apr 19 2021, 10:40 PM

sys/vm/uma_core.c
1165	We definitely want to do smaller units of work more frequently. It will reduce effects from poor estimation as well as limiting latency caused by lock hold times and high priority callout work. I have always thought we should be constantly pruning towards wss with a variable buffer according to memory pressure. We have enough zones now that it probably makes sense to have multiple callouts and stagger them as well.

I've modified the patch considering comments:

I've made reclamation to run in smaller chunks every 20 seconds after first 15 minutes without cache overflows instead of once in 30 minutes.
To reduce chance of overflows I've made uzd_limin to include uzd_wss from the time. uzd_wss decays much faster than uzd_limin, so adding present value to uzd_limin of an hour ago makes less sense.
I've reused bucket_cache_reclaim_domain() for all 3 kinds of reclamation (drain, trim and voluntary) to avoid code duplication.

mav marked 4 inline comments as done.Apr 30 2021, 1:37 AM

This revision was not accepted when it landed; it landed in state Needs Review.May 2 2021, 11:45 PM

Closed by commit rG2760658b211c: Improve UMA cache reclamation. (authored by mav). · Explain Why

This revision was automatically updated to reflect the committed changes.

mav added a commit: rG2760658b211c: Improve UMA cache reclamation..

mav added a commit: rG555baef969a1: Improve UMA cache reclamation..May 16 2021, 2:17 AM