Page MenuHomeFreeBSD

nullfs: reduce areas protected by vnode interlock
ClosedPublic

Authored by mjg on Wed, Aug 21, 9:41 PM.

Details

Summary

Since hold count started being managed with atomics the vnode interlock is no longer mandatory. There are few places which only used it to bump the counter and patched it out.

Important note is that null_unlock now completely ignores the passed LK_INTERLOCK flag. The entire kernel apart from this particular func (and an equivalent bit in unionfs) passes 0 as flags to VOP_UNLOCK and LK_INTERLOCK is never seen in the first place. Part of motivation of the change is to remove the argument altogether, although I understand it may seem as a bug to start ignoring it with just this patch.

New routine is introduced to bump with an atomic while avoiding a fcmpset loop, which should be a little faster smp-wise in face of other threads doing the same. I don't have a good idea for the name so went with "vhold non zero". I understand it may sound a little misleading as there are other routines which use this to the work only under this condition and return an error otherwise. 'vhold_held' or similar is a mouthful imo.

Test Plan

Already tested by pho.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

mjg created this revision.Wed, Aug 21, 9:41 PM
mjg retitled this revision from nullfs: reduce vnode-interlock protected areas to nullfs: reduce areas protected by vnode interlock.
kib added inline comments.Wed, Aug 21, 10:11 PM
sys/kern/vfs_subr.c
3034 ↗(On Diff #61093)

After markj' work on refcount overflow, the INVARIANTS part is no longer equivalent to normal part. Or rather, I think we should not use refcount(9) on v_holdcnt/v_usecnt anymore since they are updated using normal arithmetic in other places.

Actually this seems to be a global bug in VFS.

mjg added inline comments.Wed, Aug 21, 10:26 PM
sys/kern/vfs_subr.c
3034 ↗(On Diff #61093)

As you noted this is already a problem (e.g. vrefact) and thus I don't think it's a factor in review of this patch.

If sorting this out, I don't think free form atomic bumps and drops for something which is a reference counter should be allowed. Instead, either the current api should be extended to fit all the needs or a different one derived if checks which come from refcount are not welcome.

Now that you mention it, I do wonder if int here is sufficiently big for amd64 boxes.

kib added inline comments.Thu, Aug 22, 3:25 PM
sys/fs/nullfs/null_vnops.c
710 ↗(On Diff #61093)

I wonder if we need null_unlock at all after the LK_INTERLOCK handling removal, but we might. v_lock is only stable under the vnode interlock, see e.g. null_reclaim. If the argument is that we have the vnode locked and so reclaim cannot run in parallel, then why do we need special null_unlock ?

Also, how could nn be null ?

sys/kern/vfs_subr.c
3034 ↗(On Diff #61093)

I think that you should use atomic_fetchadd in both cases.

mjg added inline comments.Thu, Aug 22, 6:53 PM
sys/fs/nullfs/null_vnops.c
710 ↗(On Diff #61093)

I don't know the original reasoning. I do note that vfs consistenly avoids a case where the vnode can get unlocked and there is no hold on it. To an extent this is a correctness issue. Other locking primitives are careful not to touch the lock after releasing it, but lockmgr probably does not. Should there be no hold it is threatens use-after-free, which probably would be harmless most of the time.

That said, I think the temporary added hold on the lower vnode serves purpose of protecting against this very scenario.

nullfs reclaim is protected with the interlock of the vnode (which gets dropped) and the lockmgr lock of the lower vnode, which is about to get unlocked. Short of a hold on it we get the uaf scenario described above.

I don't know if nn can really be null here. If it can be, it's probably from failed insmntque1 in null_nodeget. I think this is a consideration unrelated to the patch which does not alter the check.

sys/kern/vfs_subr.c
3034 ↗(On Diff #61093)

In that case atomic_add_int(&vp->v_holdcnt, 1); for the latter would be slightly nicer but that's cosmetics. I'm fine either way and I don't see much point reuploading the patch with either change.

kib accepted this revision.Sat, Aug 24, 6:19 PM
This revision is now accepted and ready to land.Sat, Aug 24, 6:19 PM
This revision was automatically updated to reflect the committed changes.