Page MenuHomeFreeBSD

vfs: add mandatory inactive processing to vget
Needs ReviewPublic

Authored by mjg on Jan 21 2020, 4:41 AM.

Details

Reviewers
kib
jeff
Summary

All vget callers are guaranteed to serialize on doing inactive, unless someone vref'ed the vnode and the count is > 0.

The VI_OWEINACT flag is moved to the read mostly area to avoid cacheline bouncing.

Test Plan

stress2. i changed ufs_need_inactive to always return 1 to trigger the codepath more.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 28806

Event Timeline

mjg created this revision.Jan 21 2020, 4:41 AM
mjg edited the summary of this revision. (Show Details)Jan 21 2020, 4:42 AM
kib added inline comments.Sat, Jan 25, 2:02 PM
sys/kern/vfs_subr.c
2852

I think you should not unlock there, only downgrade, and even this is optional. I think it is less work to keep the vnode exclusively locked there.

2978

I think there is some missed word in the sentence.

3596

I would not call this mandatory inactive processing. With unlocked vrefing, this is as uncertain as before. Since we own the excl lock, why not inactivate ?

mjg added inline comments.Sat, Jan 25, 9:36 PM
sys/kern/vfs_subr.c
2852

Now that I look at it it's a bug - completely ignores LK_RETRY. Should LK_RETRY be specified, there is nothing to do, otherwise we do need to unlock to remain compatible.

2978

indeed

3596

well mandatory only appears in the title, i should change it to "best effort" like i state in comments. If you are arguing for renaming the flag VIRF_OWEINACT i have no opinion.

kib added inline comments.Sun, Jan 26, 6:44 PM
sys/kern/vfs_subr.c
2852

In fact I think you should not upgrade. If vget() sees the VIRF_OWEINACT flag, the locking mode should be changed to exclusive.

3596

No, I think this place should be changed to call vinactivef() if the thread sees the flag set.

jeff added a comment.Sun, Jan 26, 8:49 PM

I'm not sure why we are still pursuing this when the bugs relating to inactive processing were fixed when vget was changed to not drop the lock.

When I originally weakened the guarantee for inactive I discussed it at length with Kirk and he agreed that no such guarantee was intended. I regret not taking it all the way then. I would like to reduce the proliferation of interfaces to deal with locks and references. If we went forward with this patch, who would use this interface? How would a filesystem author know which is correct? People rightly think our current VFS is confusing. There are some specific cases that have been added over the years that I don't fully understand myself. I have several regrets from my time bringing in SMP support and lockless lookups that mjg has already resolved. I want us to continue to work towards a healthy and simple middle layer even if that means we have to fix a handful of bugs along the way. I don't think performance and correctness need be at odds if we continue to look at problems holistically.

kib added a comment.Sun, Jan 26, 9:23 PM
In D23291#512527, @jeff wrote:

I'm not sure why we are still pursuing this when the bugs relating to inactive processing were fixed when vget was changed to not drop the lock.
When I originally weakened the guarantee for inactive I discussed it at length with Kirk and he agreed that no such guarantee was intended. I regret not taking it all the way then. I would like to reduce the proliferation of interfaces to deal with locks and references. If we went forward with this patch, who would use this interface? How would a filesystem author know which is correct? People rightly think our current VFS is confusing. There are some specific cases that have been added over the years that I don't fully understand myself. I have several regrets from my time bringing in SMP support and lockless lookups that mjg has already resolved. I want us to continue to work towards a healthy and simple middle layer even if that means we have to fix a handful of bugs along the way. I don't think performance and correctness need be at odds if we continue to look at problems holistically.

The optional VOP_INACTIVE is mostly useless, we could drop the VOP if we do not provide the guarantees. Initial intent was that inactivation is only a hint for filesystems to do optional cleanup that would happen at reclamation time if not done at inactive. Unfortunately it does not quite work because filesystems do need a hook to do something aggressive when the last user of some files goes away. Inactivation provides the hook and it is used by filesystems. If VOP_INACTIVE is declared as advisory, we would get a request to provide the reliable 'on last close' VOP and start the cycle anew.

Two examples. For UFS on inactivation, we free all data blocks and recalculate quota summaries if the file was unlinked. If we skip inactivation, this would only happen when the vnode is freed by vlru scan, leaving user with dead space on volume and in quota limits. For NFS, for v4 we need to close opens, and clean sillyrenames. Again, not doing it timely would leave user with useless opens, and with .nfsXXXX garbage In both cases the effects are visible and subtle.

Your note that it was broken is true, but it was broken in minor ways, i.e. it happens rarely. I believe that with the 'mandatory inactivation on vget' we close the last hole there. And I do not see any odds with performance for doing it correctly.

jeff added a comment.Sun, Jan 26, 9:37 PM

The question is really whether the next user is allowed to see the state of the file while needs inactive is set. In the cases described it is not harmful. I assert that the next user can see this incomplete state and what we're concerned with is properly tearing it down in a timely fashion. I would like to focus on solutions that only implement the required invariant and not something stronger than necessary.

In a case like rm vput() is going to handle inactive nearly every time. If the vput races with another vget and skips inactive that new owner will also try to vput when it is done.

The only exception is a lock from a mountpoint scan or other reference that does not vget(). In this case the vnode should be handled by the syncer at most 30 seconds later.

kib added a comment.Sun, Jan 26, 10:05 PM

Are you sure that syncer would process such vnodes ? After recent changes, for UFS I think only vnodes with UFS i_flags are on the syncer list, no ?