Page MenuHomeFreeBSD

Rework v_object lifecycle for vnodes.
ClosedPublic

Authored by kib on Aug 25 2019, 1:47 PM.

Details

Summary

Current implementation of vnode_create_vobject()/vnode_destroy_vobject() is written so that it allows object destruction for live vnode. Practically, no filesystems use this, except for some remnants present in UFS. One of the consequences of that model is that each filesystem must call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result all of them get rid of the v_object in reclaim.

Move the call to vnode_destroy_vobject() to vgonel() before VOP_RECLAIM(). This makes v_object stable: either the object is NULL, or it is valid vm object till the vnode reclamation. Remove code from vnode_create_vobject() to handle races with the parallel destruction.

For UFS, in ffs_valloc(), force reclaim existing vnode on inode reuse, instead of trying to re-initialize the same vnode for new purposes.

[UFS change is really a blocker and must be handled before the rest, I am posting this for discussion and testing]

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

kib created this revision.Aug 25 2019, 1:47 PM
markj added inline comments.Aug 25 2019, 9:30 PM
sys/ufs/ffs/ffs_alloc.c
1142 ↗(On Diff #61261)

"force allocation of a new vnode..."

1148 ↗(On Diff #61261)

Don't you need to unlock *vpp?

sys/vm/vnode_pager.c
186 ↗(On Diff #61261)

Should we assert obj->type == OBJT_VNODE here?

kib updated this revision to Diff 61268.Aug 25 2019, 9:55 PM

vput ufs vnode after vgone.
add assert.

markj added inline comments.Aug 26 2019, 2:48 PM
sys/ufs/ffs/ffs_alloc.c
1152 ↗(On Diff #61268)

It looks to me like you could instead define FFSV_REPLACE to perform the vgone() in ffs_vgetf(), but it is just a suggestion.

sys/vm/vnode_pager.c
157 ↗(On Diff #61268)

I believe the OBJ_DISCONNECTWNT flag is now unused.

240 ↗(On Diff #61268)

Delete the "MPSAFE" while here?

276 ↗(On Diff #61268)

Can you explain this race? Does it arise from parallel open()s? I do not see why we want to destroy the object and retry in this case.

kib marked 7 inline comments as done.Aug 26 2019, 3:42 PM
kib added inline comments.
sys/vm/vnode_pager.c
276 ↗(On Diff #61268)

The vnode_pager_alloc() function may be called with vnode only share-locked. This can happen e.g. for UFS read-only opens indeed, or for ZFS any opens, or for UFS directories when lookup is performed.

Since these operations can occur in parallel, we might allocate two vnode objects for the vnode. In this case, allocators synchronize on the vnode interlock to assign to vp->v_object, and one thread wins. I do not see what else the loosing thread can do except to free now unneeded object. goto retry is just to ensure that winner succesfully set v_object, and we can bump ref count on it as normal flow.

kib updated this revision to Diff 61289.Aug 26 2019, 3:43 PM
kib marked an inline comment as done.

Rework FFS bits. Other minor tweaks.

markj added inline comments.Aug 26 2019, 4:27 PM
sys/ufs/ffs/ffs_vfsops.c
1673 ↗(On Diff #61289)

We should probably assert (flags & LK_EXCLUSIVE) != 0 if FFSV_REPLACE is set.

1697 ↗(On Diff #61289)

Hmm, so isn't it possible for two threads to race to allocate the same inode, and the losing thread calls vgone() on the newly created vnode?

sys/vm/vnode_pager.c
276 ↗(On Diff #61268)

I see, for some reason I read the code as destroying the pre-existing vp->v_object.

kib marked 3 inline comments as done.Aug 26 2019, 4:48 PM
kib added inline comments.
sys/ufs/ffs/ffs_vfsops.c
1697 ↗(On Diff #61289)

It is only possible for fully allocated inode, I believe. If inode is not yet allocated, only one thread can successfully allocate it, and then no other thread should instantiating the vnode. In other words, the described race is vald, but not for FFSV_REPLACE.

kib updated this revision to Diff 61297.Aug 26 2019, 4:50 PM
kib marked an inline comment as done.

Assert that FFSV_REPLACE requests exclusive vnode lock.

markj accepted this revision.Aug 26 2019, 4:56 PM
markj added inline comments.
sys/ufs/ffs/ffs_vfsops.c
1697 ↗(On Diff #61289)

Is it valid to add the following assertion below?

error = vfs_hash_insert(...);
if (error != 0)
    return (error);
if (*vpp != NULL) {
    MPASS((ffs_flags & FFSV_REPLACE) == 0);
    return (0);
}
This revision is now accepted and ready to land.Aug 26 2019, 4:56 PM
kib marked an inline comment as done.Aug 26 2019, 5:35 PM
kib added inline comments.
sys/ufs/ffs/ffs_vfsops.c
1697 ↗(On Diff #61289)

I think yes, it should be correct. I added it with some comment explaining the assumption about freshly allocated inode.

kib updated this revision to Diff 61303.Aug 26 2019, 5:35 PM
kib marked an inline comment as done.

Add assertions (LK_EXCLUSIVE now for real).

This revision now requires review to proceed.Aug 26 2019, 5:35 PM
markj accepted this revision.Aug 26 2019, 6:54 PM
This revision is now accepted and ready to land.Aug 26 2019, 6:54 PM
kib added a subscriber: pho.Aug 27 2019, 2:07 PM

Peter, could you, please, start testing this ?

pho added a comment.Aug 27 2019, 2:44 PM
In D21412#466484, @kib wrote:

Peter, could you, please, start testing this ?

Sure.

mckusick requested changes to this revision.Aug 27 2019, 6:45 PM
mckusick added a subscriber: mckusick.

This change looks good to me modulo minor inline comment.

This revision now requires changes to proceed.Aug 27 2019, 6:45 PM

My inline comment seems to have been lost. I requested that ufs_prepare_reclaim() be made static.

mckusick added inline comments.Aug 27 2019, 6:49 PM
sys/ufs/ufs/ufs_inode.c
183 ↗(On Diff #61303)

The ufs_prepare_reclaim() function should be made static.

kib marked an inline comment as done.Aug 28 2019, 10:16 AM
kib added inline comments.
sys/ufs/ufs/ufs_inode.c
183 ↗(On Diff #61303)

I removed the function at all. It no longer makes sense since it is single-use.

kib updated this revision to Diff 61396.Aug 28 2019, 10:17 AM
kib marked an inline comment as done.

Do not destroy v_object for vnodes which bypass.
Inline ufs_prepare_reclaim().

mckusick accepted this revision.Aug 28 2019, 8:00 PM

Removing the function is right approach. All looks good.

This revision is now accepted and ready to land.Aug 28 2019, 8:00 PM
pho added a comment.Aug 28 2019, 9:00 PM

I ran a full stress2 test on i386 with D21412.61303.diff.
I ran tests on amd64 with D21412.61396.diff for two hours.
No problems seen.

This revision was automatically updated to reflect the committed changes.