Page MenuHomeFreeBSD

kern/nullfs: Implement the recycling on the nullfs nodes.
AbandonedPublic

Authored by seigo.tanimura_gmail.com on Mar 1 2024, 5:41 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 5, 4:09 AM
Unknown Object (File)
Thu, Oct 31, 3:00 PM
Unknown Object (File)
Sep 30 2024, 12:16 PM
Unknown Object (File)
Sep 27 2024, 6:41 PM
Unknown Object (File)
Sep 27 2024, 6:54 AM
Unknown Object (File)
Sep 25 2024, 8:27 AM
Unknown Object (File)
Sep 19 2024, 5:29 AM
Unknown Object (File)
Sep 18 2024, 12:45 PM
Subscribers

Details

Reviewers
None
Summary

The nullfs upper vnode adds a use count to a lower vnode. This means that the
lower vnodes cannot be recycled until the uppper one gets recycled. This
behaviour makes a trouble upon the system-wide vnode recycling, eg vnlru and
ZFS.

This commit adds the recycling on the nullfs vnodes, so that the lower vnodes
get released from nullfs and ready for recycling. As of now, the trigger is
the vm_lowmem kernel event.

In order to let some nullfs vnodes remain under the continuous recycling, this
implementation recycles up to the tunable ratio of the recyclable nodes. The
ratio is chosen by the vm_lowmwm flag as follows:

  • VM_LOW_PAGES: 20% (sysctl OID: vfs.nullfs.recycle.lowpages)
  • VM_LOW_KMEM: 80% (sysctl OID: vfs.nullfs.recycle.lowkmem)

The in-use nullfs vnodes are tracked in the same way as ZFS.

  • Tested Case
  • poudriere-bulk(8) of 2128 ports by 8 builders.
  • Observed Improvements
  • The ARC dnode size no longer keeps increasing.
    • The vnodes have stopped increasing as well.
    • The ARC dnode size tends to decrease during the long-running builds.
  • The ARC evictable portion remains throughout the build.
    • The data is almost completely evictable.
  • New Sysctl(3) OIDs
  • vfs.nullfs.nodes The number of the total nullfs vnodes.
  • vfs.nullfs.inuse The number of the nullfs vnodes in use.
  • vfs.nullfs.recycle.calls The number of the nullfs recycle execution requests.
  • vfs.nullfs.recycle.lowpages The ratio of the nullfs vnodes recycled by the VM_LOW_PAGES vm_lowmem kernel event.
  • vfs.nullfs.recycle.lowkmem The ratio of the nullfs vnodes recycled by the VM_LOW_KMEM vm_lowmem kernel event.

PR: 275594
Security: FreeBSD-EN-23:18.openzfs
Signed-off-by: Seigo Tanimura <seigo.tanimura@gmail.com>

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 56376
Build 53264: arc lint + arc unit

Event Timeline

Target branches: main, stable/14, releng/14.0

If caching of nullfs vnodes creates a problem, it should be disabled on the problematic system, might be for specific mount, using -o nocache.
I do not see why lowmem handler should do the additional vnlru scans for nullfs.

Hello @kib,

Thanks for your input.

In D44177#1008037, @kib wrote:

If caching of nullfs vnodes creates a problem, it should be disabled on the problematic system, might be for specific mount, using -o nocache.
I do not see why lowmem handler should do the additional vnlru scans for nullfs.

The essential problem of the nullfs vnode caching over ZFS is that the cached vnodes are not maintained at all until a nullfs filesystem gets unmounted. In general, the cached data should be recyclable whenever the free memory is in shortage. When there is a sufficient size of the free memory, however, it should be all right to keep the cache for the performance improvement.

An advantage on this fix design is that the applications do not have to bother about the detail of the filesystem design, and they enjoy the improved performance when they can. In my use case (poudriere), the nullfs filesystems are mounted over ZFS as part of the poudriere bulk command execution. Since poudriere does not have the configuration for the additional nullfs mount options, the poudriere scripts have to be updated to disable the nullfs vnode caching.

If there is still a concern, we can disable the nullfs vnode caching over ZFS by giving MNTK_NULL_NOCACHE to a ZFS filesystem upon mounting. This should be configurable via sysctl, however, because the nullfs vnode caching over ZFS does not make any troubles on its own. The nfs client and fusefs use MNTK_NULL_NOCACHE, but that is because those filesystems may be modified off from the nullfs filesystems over them and hence make the cached nullfs vnodes incoherent or invalid. I believe such the limitation does not apply to ZFS.

Putting fuse and nfs aside.

I think the best route is to add some zfs fs property (I do not know what is the proper way) to set MNTK_NULL_NOCACHE on that fs.

Hello @kib,

In D44177#1008050, @kib wrote:

Putting fuse and nfs aside.

I think the best route is to add some zfs fs property (I do not know what is the proper way) to set MNTK_NULL_NOCACHE on that fs.

If that is the best way, it should be implemented in VFS rather than ZFS or any specific filesystem types. The issue can happen on any filesystem types in theory, eg ffs; only that ZFS is likely to hit into the issue easily.


From the user's point of view, I want to avoid the solutions that make the configuration by a user more complex. The per-filesystem configuration for MNTK_NULL_NOCACHE has such the risk, and that is what my fix design avoids.

Although not implemented for now, the nullfs vnodes should also be recycled when they reach the half of desiredvnodes, the theoretical upper bound.

nullfs should be served, and is served by normal vnlru thread. If you need something special, -o nocache is the solution.

In D44177#1008409, @kib wrote:

nullfs should be served, and is served by normal vnlru thread. If you need something special, -o nocache is the solution.

Then is it OK to make the nullfs lower vnode caching configurable via sysctl in the system-wide way?

The applications that mount the nullfs filesystems dynamically can then enjoy the solution without modifying the applications.


I still have to say there is something special on nullfs and, in general, the stacking vnode feature that may require a particular care. It introduces an inter-vnode dependency which is not seen on the ordinary filesystems.

In D44177#1008409, @kib wrote:

nullfs should be served, and is served by normal vnlru thread. If you need something special, -o nocache is the solution.

Then is it OK to make the nullfs lower vnode caching configurable via sysctl in the system-wide way?

The applications that mount the nullfs filesystems dynamically can then enjoy the solution without modifying the applications.

I have implemented the new fix as above and submitted the new reviews.