Page MenuHomeFreeBSD

Run lowmem handlers before the PID controller.
ClosedPublic

Authored by markj on Aug 6 2018, 3:49 PM.
Tags
None
Referenced Files
F106774911: D16606.diff
Sun, Jan 5, 5:42 AM
F106756360: D16606.id46431.diff
Sat, Jan 4, 10:10 PM
F106721164: D16606.id46480.diff
Sat, Jan 4, 9:43 AM
Unknown Object (File)
Dec 5 2024, 6:48 PM
Unknown Object (File)
Nov 24 2024, 10:38 AM
Unknown Object (File)
Nov 24 2024, 10:38 AM
Unknown Object (File)
Nov 19 2024, 11:23 PM
Unknown Object (File)
Nov 18 2024, 1:07 PM
Subscribers

Details

Summary

Before the rework of the page daemon control loop, we would compute the
inactive queue scan target after running lowmem handlers, which will
generally free pages (i.e., shrink the difference
vmd_free_target - vmd_free_count). Now, we run the PID controller
before running lowmem handlers, so we may potentially overshoot the
target. On systems where many pages are reclaimed from lowmem handlers
(i.e., the ARC), the current behaviour may lead to excessive swapping in
response to pressure to reclaim pages from the inactive queue. To fix
this, predicate execution of lowmem handlers on a positive error in the
PID controller.

Also fix a related problem: currently, lowmem handlers are only ever
executed by the domain 0 page daemon. This means that lowmem handlers
only ever get executed in response to a shortage of free pages in domain
0. Allow any page daemon thread to execute lowmem handlers, and use an
atomic to ensure that only one attempt is made per lowmem period. To do
this, I switched back to using ticks; lowmem_uptime is a time_t, and the
platform-dependent width of this type makes it difficult to use atomics.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 18609
Build 18300: arc lint + arc unit

Event Timeline

This revision is now accepted and ready to land.Aug 6 2018, 9:53 PM

I would suggest a different approach. Estimate the number of pages freed by vm_pageout_lowmem() by computing the difference in vmd_free_count before and after calling vm_pageout_lowmem(), and reduce shortage by that estimate so that vm_pageout_scan_inactive() frees fewer pages.

Here is the intuition behind this suggestion. Suppose that the machine is reclaiming and allocating pages at a steady rate. The PID controller's integral term will dominate the output value, and the error term will be near zero on every invocation. Essentially, the PID controller's integral term divided by KID will be the average number of page allocations (and reclamations) per interval. Your approach is going to reduce that integral term even though the average rate of allocation hasn't really changed. So, on the next PID controller invocation, fewer pages will be freed.

  • Run the PID controller before lowmem handlers, but subtract any difference in the free page count from the scan target.
This revision now requires review to proceed.Aug 8 2018, 7:43 PM
  • Remove the now-unused pidctrl_error().
This revision is now accepted and ready to land.Aug 9 2018, 4:21 PM

At the moment, I think that this is the best approach. However, on a NUMA machine, the vm_pageout_lowmem() handlers have a system-wide effect, and so the non-invoking domains will be subject to the issue that I described earlier, i.e., their vmd_free_count's will be reduced by the system-wide effect of vm_pageout_lowmem(), and so it will appear to them that there has been a change in the rate of page allocation on their domains.

How hard will it be to make uma_reclaim() NUMA aware?

In D16606#353726, @alc wrote:

At the moment, I think that this is the best approach. However, on a NUMA machine, the vm_pageout_lowmem() handlers have a system-wide effect, and so the non-invoking domains will be subject to the issue that I described earlier, i.e., their vmd_free_count's will be reduced by the system-wide effect of vm_pageout_lowmem(), and so it will appear to them that there has been a change in the rate of page allocation on their domains.

How hard will it be to make uma_reclaim() NUMA aware?

UMA zones now have per-domain full bucket caches, but those caches are for allocations to a specific domain, rather than from a specific domain. Thus a given bucket will generally contain items from multiple domains, and if we were to free only a subset of the items we'd be caching partially full buckets. Then, to compute the domain of the slab for a given item we may require the keg lock, which is awkward since we currently need to drop the keg lock before releasing any item to the keg. I think these problems could be solved, but it's not as straightforward as it could be.

BTW, I'm finishing up a patch that Jeff and I discussed to compute a WSS estimate for each (per-domain) bucket cache. When the page daemon invokes the lowmem handlers, it'll only reclaim cached items in excess of the estimate, rather than freeing the entire cache. (If there is a severe free page shortage, we will still free the entire cache.) I hope to put the patch up for review today or tomorrow.

This revision was automatically updated to reflect the committed changes.