Page MenuHomeFreeBSD

1/3 vfs: stop handling VI_OWEINACT in vget
ClosedPublic

Authored by mjg on Jan 15 2020, 2:58 AM.

Details

Summary

vget is almost always called with LK_SHARED, meaning the flag (if present) is almost guaranteed to get cleared. Stop handling it in the first place and instead let the thread which wanted to do inactive handle the bumepd usecount.

If this seems too much of a change this can be special cased so that LK_EXCLUSIVE requests ignore it (but without calling VOP_ISLOCKED to figure out what it is, just look at the flag).

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

mjg created this revision.Jan 15 2020, 2:58 AM
mjg edited the summary of this revision. (Show Details)Jan 15 2020, 3:05 AM
jeff added a comment.Jan 17 2020, 11:50 PM

This is a nice simplification

sys/kern/vfs_subr.c
3112 ↗(On Diff #66763)

Why did this check go away?

3361 ↗(On Diff #66763)

what is the f? force? Can you write a comment on the responsibilities of vinactive() vs vinactivef().

I would possibly also rename vinactivef() . _vinactive() or vinactive_internal() or something.

mjg added inline comments.Jan 17 2020, 11:57 PM
sys/kern/vfs_subr.c
3112 ↗(On Diff #66763)

It's moved into the routine and done along with v_usecount. Basically just checking the flag is no longer sufficient and I wanted all checks in one place.

3361 ↗(On Diff #66763)

Force indeed, for vgone. Patched vinactive returns on v_usecount > 0 while this has to do the work no matter what to preserve old semantics.

Thus I don't think _internal is a good name. vinactive_force if going for long names? I don't have a strong opinion.

kib added a comment.EditedJan 18 2020, 2:51 PM

So basically this indefinitely postpones the inactivation after 1->0->1 transition, right ?

mjg added a comment.EditedJan 18 2020, 3:46 PM

Unclear what you mean.

inactive effectively not being done matches the current behavior (excluding the rare case where someone called vget with LK_EXCLUSIVE)

new behavior is that should the race happen, the syncer will keep setting VI_DEFINACT flag and retrying on every run. if this is considered a problem I can augment the code in vfs_deferred_inactive so that it clears VI_OWEINACT if v_usecount > 0, similarly to how vget does it now.

kib added a comment.Jan 18 2020, 4:13 PM
In D23184#509790, @mjg wrote:

Unclear what you mean.
inactive effectively not being done matches the current behavior (excluding the rare case where someone called vget with LK_EXCLUSIVE)
new behavior is that should the race happen, the syncer will keep setting VI_DEFINACT flag and retrying on every run. if this is considered a problem I can augment the code in vfs_deferred_inactive so that it clears VI_OWEINACT if v_usecount > 0, similarly to how vget does it now.

I mean that if not done at the vget that transition the usecount from 0 to 1, we miss inactivation for arbitrary long time because whoever sees VI_OWEINACT, cannot proceed since v_usecount > 0. I would even go as far as to suggest that if vget_finish() observes VI_OWEINACT and the locking mode is LK_SHARED, it is auto-promoted to LK_EXCLUSIVE, to be able to call VOP_INACTIVE().

mjg added a comment.EditedJan 19 2020, 4:07 AM

I claim this patch (apart from dropping LK_EXCLUSIVE check which can be restored) preserves the current semantics and does not alter how often inactive can be performed. Removal of inactive handling even with LK_EXCLUSIVE does not have a meaningful impact (see below for measurement).

The very behavior of not doing inactive is already present in head at least since 2008 (r177539):

commit 3ad75daf19094d458a872196403a2d5a77437da1
Author: jeff <jeff@FreeBSD.org>
Date: Mon Mar 24 04:22:58 2008 +0000

  • Greatly simplify vget() by removing the guarantee that any new references to a vnode with VI_OWEINACT set will force the vinactive() call. The kernel makes no guarantees about which reference was the last to close a file or when the actual inactive processing will happen. The previous code was designed to preserve existing semantics in the face of shared locks, however, this was unnecessary. Discussed with: mckusick

Perhaps more importantly currently the kernel can't guarantee it will do inactive because of existence of vref. vref always clears the flag and never takes the vnode lock.

Keeping the flag instead of checking usecount does not change anything in terms of when processing can be conducted. In stock code the flag would not be there to begin with, but usecount would be the same. There is perhaps a performace bug in the patch that in case of clearing VI_OWEINACT stock kernel would forget about it, while patched one will defer inactive and then keep checking for it. This bit can be easily fixed.

Also note the usecount transition from 0->1 in vget is still protected with vnode locks, meaning it is not more racy.

The dropped LK_EXCLUSIVE check can be restored, but I don't see any point as it is. I ran 2h of stress2 with a dtrace probe:

diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
index 08b7649a2aa0..e6b650e03c44 100644
--- a/sys/kern/vfs_subr.c
+++ b/sys/kern/vfs_subr.c
@@ -101,6 +101,9 @@ __FBSDID("$FreeBSD$");
 #include <ddb/ddb.h>
 #endif
 
+SDT_PROVIDER_DECLARE(vfs);
+SDT_PROBE_DEFINE3(vfs, vget, , , "struct vnode *", "int", "int");
+
 static void    delmntque(struct vnode *vp);
 static int     flushbuflist(struct bufv *bufv, int flags, struct bufobj *bo,
                    int slpflag, int slptimeo);
@@ -2952,6 +2955,7 @@ vget_finish(struct vnode *vp, int flags, enum vgetstate vs)
                vp->v_iflag &= ~VI_OWEINACT;
                VNODE_REFCOUNT_FENCE_REL();
        }
+       SDT_PROBE3(vfs, vget, , , vp, oweinact, flags);
        v_incr_devcount(vp);
        refcount_acquire(&vp->v_usecount);
        if (oweinact && VOP_ISLOCKED(vp) == LK_EXCLUSIVE &&

0/1 indicates whether the flag was seen. other flag combinations had no hits with oweinact and got ommitted.

 LK_EXCLUSIVE | LK_NODDLKTREAT
 value  ------------- Distribution ------------- count    
    -1 |                                         0        
     0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 289639   
     1 |                                         1        
     2 |                                         0        
LK_SHARED | LK_NODDLKTREAT
 value  ------------- Distribution ------------- count    
    -1 |                                         0        
     0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 18334748 
     1 |                                         14       
     2 |                                         0        
LK_SHARED | LK_RETRY | LK_INTERLOCK
 value  ------------- Distribution ------------- count    
    -1 |                                         0        
     0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 39237767 
     1 |                                         81       
     2 |                                         0        
 LK_EXCLUSIVE | LK_NOWAIT
 value  ------------- Distribution ------------- count    
    -1 |                                         0        
     0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  1013258  
     1 |@                                        26497    
     2 |                                         0

That is, there was literally one instance where the current code would act on the flag (mere LK_EXCLUSIVE, no LK_NOWAIT)

The LK_NOWAIT stuff comes from softdep during unmount.

Finally, I don't think inactive is really needed here anyway. Said code could have easily raced even with mandatory handling.

That said, should someone want to guarantee inactive they would need to sort out vref. Neither this patch nor switch to lockless 0->1 usecount transition prevent providing that guarantee for vget consumers, it would be just more work. As such, I think the issue of whether inactive should be mandatory is orthogonal to the patch.

mjg updated this revision to Diff 66991.Jan 19 2020, 6:23 AM
  • clear VI_OWEINACT instead of postponing to try again. when usecount drops to 0 someeone else will take care of it.
kib added a comment.Jan 19 2020, 8:26 PM

If you reliably do inactivation on vget() for the unreferenced vnodes, this should fix the i_mode == 0 issue. More, it should fix it for all filesystems, not only UFS.

mjg added a comment.EditedJan 19 2020, 9:48 PM

My understanding is that the i_mode == 0 case has 2 sides.

  1. when the vnode is not fully constructed yet. these vnodes don't have the VI_OWEINACT flag set anyway and subsequently inactive processing here would not be performed and the bug remains
  2. when the vnode got unlinked or similar. this condition exists at least since the commit mentioned in the previous comment and to my undesrtanding is harmless

Mandatory processing can be added here with more work. My idea was to modify filesystems to indicate they want inactive processing with setting a bit in v_usecount. This would taking the interlock in vdrop in the common case.

That said, I don't think mandatory processing is urgent to do and I don't think possibility of adding it is significantly complicated by this patch.

pho tested the previous iteration of this patch + the lockless switch.

jeff accepted this revision.Jan 20 2020, 9:40 PM

I agree with mjg. Inactive has been a half promise since shared locking went in. This is really my mess originally. Better to consistently not make the promise and clean up the interface.

This revision is now accepted and ready to land.Jan 20 2020, 9:40 PM
kib added a comment.Jan 20 2020, 9:49 PM
In D23184#510594, @jeff wrote:

I agree with mjg. Inactive has been a half promise since shared locking went in. This is really my mess originally. Better to consistently not make the promise and clean up the interface.

Would we control all users of the interface, then yes. I am very conservative there because this increases the crack. Even UFS is broken still, and arguably other filesystems are broken forever. I think it is better tactically to close the hole in the interface now, than to allow this subtle breakage.

As I said, I do not see why first vget(9) (in the sense 0->1 for usecount) cannot upgrade to exclusive if inactivation is needed.

jeff added a comment.Jan 20 2020, 9:53 PM

The mode == 0 issue is orthogonal. This guarantee was already broken for any filesystem that supported shared locks.

mjg added a comment.EditedJan 20 2020, 11:02 PM

I think this patch (and the next one) can wait a little bit. In general we should be moving towards explicitly expressed requirements but also without putting burden on common paths to handle corner cases if it can be avoided.

To that end, for the first problem type of the i_mode == 0 with not completely initialized vnode, we should add a func in the lines "vconstructed" or similar which on debug kernels would add a flag (say VIRF_CONSTRUCTED). Then vop_unlock_pre could assert that vnode is (VIRF_CONSTRUCTED | VIRF_DOOMED) != 0. With this in place weeding out problematic cases is a matter of coverage testing.

For mandatory inactive:
The entire 'inactive' situation should be rationalized a little bit as the operation is way overloaded in terms of its meaning. It denotes at least 2 different kinds of stuff to do:

  1. viable to do regardless of usecount, e.g. atime updates and which can be done by the syncer at any point if it can't be done now
  2. file removal. but if this happens doing inactive just kills the content and there is nothing to retun to the user in vget

That is to say, the invariant should be that if there is mandatory processing to do, the vnode should not be possible to find through the namecache. Thus the namecache can use vget variant which does not concern itself with inactive in the first place (and only asserts it is not needed).
Since the vnode may hang around in the vnode hash or other places without such guarantee, these places will do perform said processing.

That is, I think the first part (VIRF_CONSTRUCTED or so) addresses my bug and I will take care of it if the plan sounds fine. The latter part is a problem since at least 2008 which imo did not get meaningfully worsened by this patch and therefore can be fixed later.

This revision was automatically updated to reflect the committed changes.