Page MenuHomeFreeBSD

vfs: implement usecount implies holdcnt
ClosedPublic

Authored by mjg on Aug 30 2019, 1:10 AM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Jan 21, 5:40 PM
Unknown Object (File)
Fri, Jan 17, 6:57 AM
Unknown Object (File)
Dec 14 2024, 11:47 PM
Unknown Object (File)
Dec 12 2024, 6:15 PM
Unknown Object (File)
Dec 10 2024, 7:07 AM
Unknown Object (File)
Nov 25 2024, 4:18 AM
Unknown Object (File)
Nov 18 2024, 10:29 PM
Unknown Object (File)
Nov 2 2024, 12:54 AM
Subscribers

Details

Summary

Patch below implements avoiding modifying holdcnt if usecount is already > 0.

Note the current vget only bumps usecount after locking the vnode. Preserving this behavior means the caller may find itself with either usecount or only holdcnt and this information has to be conveyed, to that end I added another flag to lockmgr. Perhaps it would be nicer to just give a new argument to vget instead. Note the old idiom of passing calling vget with the interlock held is actually harmful for many consumers, e.g. the namecache can most of the time vhold directories without taking it (usecount bump which follows and may need the interlock is done much later after locking the vnode).

Another caveat is that in case of 2 or more threads doing vget it may be both of them will only bump holdcnt, which means one of them will have to backpedal on it.

Finally vputx has to unlock the vnode early in order to avoid unlocking it after dropping usecount. This would possibly race with someone else dropping holdcnt, which would violate the invariant that all VOPs execute with the vnode at least already held. I can add a comment.

Longer term it would probably be faster multithreaded to bump usecount prior to locking the vnode and provide some form of a barrier. This barrier would allow the vnode to settle (e.g. finish inactive processing). I think these considerations are for later.

I'm not strongly attached to names here -- this is roughly a wip in terms of committability, but works fine.

Sample benchmark results will be done later.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

mjg retitled this revision from \ to [wip] vfs: implement usecount implies holdcnt.Aug 30 2019, 1:21 AM
mjg edited the summary of this revision. (Show Details)
mjg added reviewers: kib, jeff.
  • vrefl: clear VI_OWEINACT before grabbing an active ref. found by pho
  • rebase on top of r351632
  • add a comment explaining unlock in vputx

Is it really required that usecount incremented under/after the vnode is locked ? Is it for case of no LK_RETRY ?

sys/kern/vfs_subr.c
2735–2736

Why do you need LK_VNUSED ? You can pass additional arg to vget_finish() easily, it seems.

2839

Why rel ? Even, why this barrier is needed ?

I don't think it's required (and in fact vref already bumps usecount without bothering with vnode lock).

However, I find bumping usecount early in vget to be a separate change in behavior and one which should be done in a different commit (but perhaps prior to implementing usecount implies holdcnt). With the change vget will have no way to ever honor VI_OWEINACT and will have much easier time clearing it since it no longer has to compete for the vnode lock for this to happen. This effectively obsoletes the flag, but also sounds quite worrisome since it can keep postponing inactive. I noted that for this to work a mechanism should be provided for threads which find the flag to wait for it to clear -- they would msleep e.g. on vp->v_iflag and once inactive is finished it would wakeup on that chan. vget_prep could get everything set and vget_finish would wait (if necessary).

imo for this patch it's not problematic to handle not always being able to bump usecount. The above is a much more invasive change, but which will be trivial to adapt to. For this reason I think this patch should go first with its current semantics.

sys/kern/vfs_subr.c
2735–2736

I noted it's an option (and in fact preferred by me), but I went with this because LK_VNHELD was already there. More than happy to change it and get rid of LK_VNHELD.

2839

There were assertions failing on powerpc where the cpu would reorder flags vs refcount manipulation.

  • remove LK_VNHELD and pass info as a separate argument
In D21471#467746, @mjg wrote:

I don't think it's required (and in fact vref already bumps usecount without bothering with vnode lock).

However, I find bumping usecount early in vget to be a separate change in behavior and one which should be done in a different commit (but perhaps prior to implementing usecount implies holdcnt). With the change vget will have no way to ever honor VI_OWEINACT and will have much easier time clearing it since it no longer has to compete for the vnode lock for this to happen. This effectively obsoletes the flag, but also sounds quite worrisome since it can keep postponing inactive. I noted that for this to work a mechanism should be provided for threads which find the flag to wait for it to clear -- they would msleep e.g. on vp->v_iflag and once inactive is finished it would wakeup on that chan. vget_prep could get everything set and vget_finish would wait (if necessary).

imo for this patch it's not problematic to handle not always being able to bump usecount. The above is a much more invasive change, but which will be trivial to adapt to. For this reason I think this patch should go first with its current semantics.

The inactivation should be not considered as some abstract operations that we provide for abstract filesystem and make some guarantee. We should adapt both interface and the guarantees to the requirements of in-tree filesystems, making out-of-tree VFS consumers to adapt. In other words, we should investigate how much critical the missed inactivation is.

E.g. we might call vm_object_page_clean() when we vget() and usecount goes from 0 to 1, if we consider possible double of delay for page syncing of this vnode to be not acceptable, otherwise we can silently ignore the issue. Same for other actions in VOP_INACTIVE of misc. filesystems. For UFS, I believe there is nothing to loose from missed inactivation.

I ran the full stress2 test suite with D21471.61489.diff on amd64. No problems seen.

I don't think the extra info will be avoidable in the current state if any work is to be done at the transition,

There are 2 basic kinds of vget users:

  1. ones which simply hold the vnode, release whatever other guarantees they had on liveness of the vnode, and call vget
  2. same as above but they lock the interlock instead

The latter case typically causes avoidable contention and typical consumers can and should be migrated to the first format.

With this in mind note that if there is anything non-trivial to do at 0->1 usecount transition it already can't be done at the level of "vget_prep", which means we are back to telling the later function whether we got usecount or only holdcnt. We can avoid conveying anything if usecount is allowed to grow no what matter but threads bumping it from places where it is possibly 0 are forced to wait for a flag to clear (if set).

Side note is that with some extra hackery this can be made into a state where interlock holders who want to act on 0 usecount simply cmpset a count of 0 + a flag into it. vget_prep would fetchadd and vget_finish seeing the flag in the count can wait. This basically avoids taking the interlock in the common case.

Regardless, I find earlier activation to be an unrelated and much more involved change. On top of that this patch can be trivially adapted later. For these reasons I don't think it should be considered a blocker, but if you are adamant early activation has to happen first I can create a separate WIP review for it.

This revision is now accepted and ready to land.Sep 3 2019, 8:16 AM
mjg retitled this revision from [wip] vfs: implement usecount implies holdcnt to vfs: implement usecount implies holdcnt.
  • keep the interlock locked if passed; avoids an unnecessary change

As per the irc discussion, I took a closer look at all consumers. There is a bug in msdosfs_sync where it fails abort the foreach macro before restarting. ffs code has several loops, some of which restart and others continue. The discrepancy perhaps was not intended. Finally I did find a LK_SLEEPFAIL vnode user after all -- ffs_sync. I think dropping the interlock early was irrelevant for it, but that's separate.

Since dropping the interlock early is an optional part of the change I decided to avoid further delay and remove it. I'll address found issues separately. I ran the change through suj11.sh, no issues.

This revision now requires review to proceed.Sep 3 2019, 12:08 PM
This revision is now accepted and ready to land.Sep 3 2019, 3:08 PM
This revision was automatically updated to reflect the committed changes.