Page MenuHomeFreeBSD

[RFC]: decrement needfree by the amount of evicted cache
ClosedPublic

Authored by nikita_elyzion.net on Aug 30 2017, 10:42 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, May 2, 9:38 AM
Unknown Object (File)
Thu, May 2, 9:38 AM
Unknown Object (File)
Thu, May 2, 9:38 AM
Unknown Object (File)
Thu, May 2, 9:38 AM
Unknown Object (File)
Fri, Apr 26, 3:42 AM
Unknown Object (File)
Apr 9 2024, 7:42 PM
Unknown Object (File)
Feb 7 2024, 7:26 AM
Unknown Object (File)
Feb 4 2024, 5:35 AM

Details

Summary

In some rare case, during an heavy and constant load of several minutes, my arc is almost emptied.

After digging in the code of arc.c, when the variable needfree is setted by arc_lowmem, it is cleared only when we reach the test "if (arc_size <= arc_c || evicted == 0)" in arc_reclaim_thread().
But in my case, the test seems to not be true for a long period, arc_size remain superior to arc_c (because I have a lot of IO) and arc_adjust() is always succeding to free some bytes.
You can find here a screen capture of the stats of the arc during the bug https://elyzion.net/~nikita/arc_reclaim.png (in the interactive graph you can see that arc_size is always a little bit bigger than arc_c). Unfortunately I had not dtrace enabled the last time I encountered this bug.

I was also thinking about, instead of decrementing needfree, to wait that we have freed at least arc_c >> arc_no_grow_shift bytes to reset needfree to 0.

Notes:

  • In illumos, needfree is a tunable setted directly by their vm_page.c so they don't have this issue.
  • I don't have tested it extensively enough, and I'm pretty new in ZFS.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

On quick look I do see valid point in this patch: needfree in Illumos indeed reduced back as soon as allocation that increased it succeeds, but it doesn't happen in our case. But I am quite busy now to look deeper from all possible sides.

Can you try to experiment with removing needfree altogether?
I mean both the code that sets it (in arc_lowmwem and in arc_reclaim_thread) and the code that uses it (in arc_available_memory and in arc_reclaim_thread).
FreeBSD needfree does not appear to be very useful.

In D12163#252767, @avg wrote:

Can you try to experiment with removing needfree altogether?
I mean both the code that sets it (in arc_lowmwem and in arc_reclaim_thread) and the code that uses it (in arc_available_memory and in arc_reclaim_thread).
FreeBSD needfree does not appear to be very useful.

I may try something like that yes, but, we will still need something to signal to arc_available_memory() that arc_lowmem() was called and that we will need to free some memory for a reason unrelated to ZFS itself, correct ?

For what is worth, I took a look at the ZoL implementation too, they use the Linux shrinker API for this case.
In short, they also have a kind of arc_lowmem() callback which is called when the kernel is in high memory pressure. It results to something quite similar to the "needfree" we already have.

I think that just waking up the ARC reclaimed thread is enough of a signal. If it sees a system resource shortage, then it acts, if the shortage is already cleared, then there is nothing to do.
In illumos itself 'needfree' is a live variable managed by the VM subsystem and its value can significantly change before the reclaim thread examines it.

I took some time to dig more in the code of arc_adjust()/arc_shrink()/arc_kmem_reap_now() and the pagedaemon/vmmeter and the suggestion of avg seems good.
freemem is actually #define freemem vm_cnt.v_free_count (maybe we can use another stats from vmmeter/pagedaemon to replace the needfree by another define ?)
I have also improved my test to reproduce more easily the issue and removing the needfree variable seems good.

some output from dtrace when I'm reproducing the vm_lowmem singal when having an heavy IO load:

12  36374                none:arc-needfree arc needfree freemem:-73728 < actually I'm displaying `(freemem - zfs_arc_free_target) * PAGESIZE` as returned by arc_available_memory further in arc_reclaim_thread)
 25  36355                  none:arc-shrink arc shrink arc_c:246960619520, arc_p:240359950336, to_free:271495168  (to_free = arc_c >> arc_shrink_shift - ( freemem - zfs_arc_free_target) * PAGESIZE)
 25  36356                  none:arc-shrunk arc shrunk arc_c:246345166592, arc_p:239421044280
 36  36355                  none:arc-shrink arc shrink arc_c:246345166592, arc_p:239421044280, to_free:850637039
 36  36356                  none:arc-shrunk arc shrunk arc_c:244792145312, arc_p:238485805826
  9  36355                  none:arc-shrink arc shrink arc_c:244792145312, arc_p:238485805826, to_free:1591812021
  9  36356                  none:arc-shrunk arc shrunk arc_c:243200333291, arc_p:237554220647
  9  36357           none:arc-shrink_adjust arc shrink adjust arc_size:245134144992, arc_c:243200333291
 10  36355                  none:arc-shrink arc shrink arc_c:243200333291, arc_p:237554220647, to_free:986590869
 10  36356                  none:arc-shrunk arc shrunk arc_c:242213742422, arc_p:236626274473
 10  36357           none:arc-shrink_adjust arc shrink adjust arc_size:243900233208, arc_c:242213742422
 10  36355                  none:arc-shrink arc shrink arc_c:242213742422, arc_p:236626274473, to_free:94314599
 10  36356                  none:arc-shrunk arc shrunk arc_c:241973429384, arc_p:235701953089
 13  36355                  none:arc-shrink arc shrink arc_c:241973429384, arc_p:235701953089, to_free:137424260
 13  36356                  none:arc-shrunk arc shrunk arc_c:239688397056, arc_p:234781242335
  1  36374                none:arc-needfree arc needfree freemem:-666877952
 18  36355                  none:arc-shrink arc shrink arc_c:239688397056, arc_p:234781242335, to_free:1578044081
 18  36356                  none:arc-shrunk arc shrunk arc_c:238110352975, arc_p:233864128108
 18  36357           none:arc-shrink_adjust arc shrink adjust arc_size:240624901016, arc_c:238110352975
 32  36355                  none:arc-shrink arc shrink arc_c:238110352975, arc_p:233864128108, to_free:1813457830
 32  36356                  none:arc-shrunk arc shrunk arc_c:236296895145, arc_p:232950596358
 32  36357           none:arc-shrink_adjust arc shrink adjust arc_size:238836091176, arc_c:236296895145
.....

The dtrace script to generate it is trivial :

#!/usr/sbin/dtrace -s
arc-shrink { printf("arc shrink arc_c:%d, arc_p:%d, to_free:%d", args[0], args[2], args[3]) }
arc-shrunk { printf("arc shrunk arc_c:%d, arc_p:%d", args[0], args[1]) }
arc-shrink_adjust { printf("arc shrink adjust arc_size:%d, arc_c:%d", args[0], args[1]) }
arc-needfree { printf("arc needfree freemem:%d", args[0] ) }

The lowmem was done thanks to a dd into a tmpfs during the load.
I'm not sure if my test is enough, if you have some suggestion to test it further I can take some time to do it.

This revision is now accepted and ready to land.Sep 15 2017, 12:58 PM
This revision was automatically updated to reflect the committed changes.

On head there seems to be very little backpressure on the ARC lately to the point of needing extreme swap support. On my 78GB ram system with ARC limited to 40GB, I can have the ARC pegged at 40GB but swap in the 50GB range for package builds in tmpfs. The ARC never seems to shrink at all. This commit was partially implicated in the thread on current: https://lists.freebsd.org/pipermail/freebsd-current/2018-March/068888.html. Could it be a problem?