Page MenuHomeFreeBSD

Improve UMA cache reclamation
ClosedPublic

Authored by mav on Apr 16 2021, 3:54 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Apr 10, 5:54 AM
Unknown Object (File)
Thu, Apr 4, 12:37 AM
Unknown Object (File)
Sun, Mar 31, 5:31 AM
Unknown Object (File)
Mar 7 2024, 6:20 PM
Unknown Object (File)
Feb 8 2024, 3:50 PM
Unknown Object (File)
Dec 20 2023, 5:06 PM
Unknown Object (File)
Dec 20 2023, 5:43 AM
Unknown Object (File)
Dec 1 2023, 6:11 PM
Subscribers
None

Details

Summary

When estimating working set size, measure only allocation batches, not free batches. Allocation and free patterns can be very different. For example, ZFS on vm_lowmem event can free to UMA few gigabytes of memory in one call, but it does not mean it will request the same amount back that fast too, in fact it won't.

Update working set size on every reclamation call, shrinking caches fast(er) under pressure. Lack of this caused repeating vm_lowmem events squeezing more and more memory out of real consumers only to make it stuck in UMA caches. I saw ZFS drop ARC size in half before previous algorithm after periodic WSS update decided to reclaim UMA caches.

Introduce voluntary reclamation of UMA caches not used for a long time. For each zdom track longterm minimal cache size watermark, freeing some unused items every UMA_TIMEOUT after first 15 minutes without cache misses. Freed memory can get better use by other consumers. For example, ZFS won't grow its ARC unless it see free memory, since it does not know it is not really used. And even if memory is not really needed, periodic free during inactivity periods should reduce its fragmentation.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

mav requested review of this revision.Apr 16 2021, 3:54 AM
mav created this revision.

I'm still trying to understand the patch, I don't have many good comments yet. A block comment explaining the scheme would be helpful.

I like the idea behind the change to zone_domain_imax_set(), I can see why that fixes the interaction with the ARC's lowmem handler. I'm not sure that uzd_imin has much purpose anymore? Overall I wonder if this can be simplified somewhat.

sys/vm/uma_core.c
1165

I'm not sure about performing potentially large bursts of work in a callout thread. I suspect it would be better to reclaim smaller amounts more frequently, or have some absolute limit on the number of items reclaimed.

1184

This loop is identical to the one in bucket_cache_reclaim_domain(), I think it should become a separate function.

sys/vm/uma_core.c
1165

To be more concrete, reclamation from the abd chunk zone takes hundreds of ms on a system running postgres with 64GB of RAM. Reclamation from zones with slab size > PAGE_SIZE can also be quite expensive since they must be unmapped (with a corresponding TLB invalidation) before being freed.

Rather than doing this work in one go, I think that waiting some time for the WSSE to stabilize, and then reclaiming a small portion every UMA_TIMEOUT ms would be more effective at preventing some latency spike. Alternately, we could extend the uma_reclaim() thread to handle this work, so that callouts are not deferred.

sys/vm/uma_core.c
1165

We definitely want to do smaller units of work more frequently. It will reduce effects from poor estimation as well as limiting latency caused by lock hold times and high priority callout work. I have always thought we should be constantly pruning towards wss with a variable buffer according to memory pressure.

We have enough zones now that it probably makes sense to have multiple callouts and stagger them as well.

mav edited the summary of this revision. (Show Details)

I've modified the patch considering comments:

  • I've made reclamation to run in smaller chunks every 20 seconds after first 15 minutes without cache overflows instead of once in 30 minutes.
  • To reduce chance of overflows I've made uzd_limin to include uzd_wss from the time. uzd_wss decays much faster than uzd_limin, so adding present value to uzd_limin of an hour ago makes less sense.
  • I've reused bucket_cache_reclaim_domain() for all 3 kinds of reclamation (drain, trim and voluntary) to avoid code duplication.
mav marked 4 inline comments as done.Apr 30 2021, 1:37 AM
This revision was not accepted when it landed; it landed in state Needs Review.May 2 2021, 11:45 PM
This revision was automatically updated to reflect the committed changes.