Details

Reviewers

kib
markj
mjg
pho

Commits

rGf7833196bd6b: vfs_lookup(): Minor performance optimizations
rG4390622c8d16: vfs_busy(): fix wording in comment
rG706f15c5fa6b: Remove witness directives from crossmp locking VOPs
rG080ef8a41851: Add VV_CROSSLOCK vnode flag to avoid cross-mount lookup LOR

Summary

When a lookup operation crosses into a new mountpoint, the mountpoint
must first be busied before the root vnode can be locked. When a
filesystem is unmounted, the vnode covered by the mountpoint must
first be locked, and then the busy count for the mountpoint drained.
Ordinarily, these two operations work fine if executed concurrently,
but with a stacked filesystem the root vnode may in fact use the
same lock as the covered vnode. By design, this will always be
the case for unionfs (with either the upper or lower root vnode
depending on mount options), and can also be the case for nullfs
if the target and mount point are the same (which admittedly is
very unlikely in practice).

In this case, we have LOR. The lookup path holds the mountpoint
busy while waiting on what is effectively the covered vnode lock,
while a concurrent unmount holds the covered vnode lock and waits
for the mountpoint's busy count to drain.

Attempt to resolve this LOR by allowing the stacked filesystem
to specify a new flag, VV_CROSSLOCK, on a covered vnode as necessary.
Upon observing this flag, the vfs_lookup() will leave the covered
vnode lock held while crossing into the mountpoint. Employ this flag
for unionfs with the caveat that it can't be used for '-o below' mounts
until other unionfs locking issues are resolved.

Reported by: pho
Tested by: pho

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

jah created this revision.Apr 25 2022, 3:49 AM

Herald added a subscriber: imp. · View Herald TranscriptApr 25 2022, 3:49 AM

jah requested review of this revision.Apr 25 2022, 3:49 AM

Harbormaster completed remote builds in B45355: Diff 105389.Apr 25 2022, 3:50 AM

I don't really like this change, it feels like a hack to me. At the same time, I don't have another idea that seems clearly better, so I thought I would post this to at least start a discussion.

I did consider some other options:

--Changing unionfs_root() to use a try-lock/re-lock approach similar to vn_lock_pair() but involving the busy count and the root vnode instead of two vnodes. I'm not convinced this would actually work, and in any case the idea seemed very ugly and likely to be detrimental to performance given how often VFS_ROOT() is used.

--Changing the order of dounmount() so that it drains the busy count before locking the covered vnode. I actually started implementing this one; it felt less hacky than VV_CROSSLOCK but also much more dangerous given the delicate ordering of dounmount(). I also very quickly realized that for this approach to have any chance of working, we would need to change the vfs_busy() call in vfs_lookup() to pass MBF_NOWAIT, otherwise we'll simply get a slightly different kind of LOR deadlock. I'm not sure MBF_NOWAIT would be acceptable there either.

markj added inline comments.Apr 25 2022, 2:47 PM

sys/kern/vfs_lookup.c
118	Should we perhaps make this a KASSERT?
1249	Would you please update the comment above vfs_busy()'s definition explaining this new special case?

jah added inline comments.Apr 26 2022, 12:00 AM

sys/kern/vfs_lookup.c
118	sounds good to me.
1249	i'll definitely do that before committing, if in fact everyone is ok with this approach.

mjg added inline comments.Apr 26 2022, 12:17 AM

sys/kern/vfs_lookup.c
1249	I don't think there is any need to pessimize the common case here. I would argue the thing to do here is to: factor mount point traversal out into a dedicated routine refactor the code, most notably to stop constantly testing for NOCROSSMOUNT for the problem at hand roll with NOWAIT vfs_busy, which will almost always succeed anyway ... should it fail, you can fallback to a __noinline routine which will to the dance

mjg added inline comments.Apr 26 2022, 12:20 AM

sys/kern/vfs_lookup.c
1249	instead of checking dp->v_type == VDIR && (mp = dp->v_mountedhere) you can (vn_irflag_read(vp) & VIRF_MOUNTPOINT) != 0)

jah added inline comments.May 15 2022, 2:49 AM

sys/kern/vfs_lookup.c
1249	This point, as well as your points 1) and 2) above are definitely good suggestions, but I think they should be done as part of a separate change. As for points 3) and 4), in this case NOWAIT vfs_busy() wouldn't help. The deadlock happens after vfs_busy() has already successfully completed (almost certainly without blocking), while the following call to VFS_ROOT() blocks while trying to lock the root vnode. At the same time, dounmount() on another thread blocks waiting for the mp busy count to drain while holding the covered vnode lock (which in this case happens to be the same as the root vnode lock being waited on by the first thread). On the other hand, I did mention this alternative approach in my earlier comment, which would reqiure NOWAIT vfs_busy(): --Changing the order of dounmount() so that it drains the busy count before locking the covered vnode. I actually started implementing this one; it felt less hacky than VV_CROSSLOCK but also much more dangerous given the delicate ordering of dounmount(). I also very quickly realized that for this approach to have any chance of working, we would need to change the vfs_busy() call in vfs_lookup() to pass MBF_NOWAIT, otherwise we'll simply get a slightly different kind of LOR deadlock. I'm not sure MBF_NOWAIT would be acceptable there either.

Replace runtime check with KASSERT, add comment on CROSSLOCK behavior

Harbormaster completed remote builds in B46482: Diff 108335.Jul 19 2022, 6:02 PM

kib added inline comments.Jul 20 2022, 1:28 PM

sys/kern/vfs_lookup.c
123	If you are dropping support for LK_INTERLOCK for crossmp vnode, then should it be asserted that LK_INTERLOCK is never specified for it? Also, I think that crossmp_vop_lock1() changes should be a separate commit.

jah added inline comments.Jul 20 2022, 9:22 PM

sys/kern/vfs_lookup.c
123	I need to re-test, because it's been so long since I wrote this patch that I no longer remember the exact set of issues I was having with this function. But I do recall that dropping the interlock here did not make sense to me, especially given that the corresponding vop_unlock doesn't re-acquire it, so it seemed as though that would cause problems for any caller holding the interlock. Is there some non-obvious (to me) reason why this would have been done?

jah added inline comments.Jul 20 2022, 9:27 PM

sys/kern/vfs_lookup.c
123	EDIT: nevermind, I'd forgotten that dropping the interlock is the documented behavior of VOP_LOCK(). It may not matter in practice, but it wasn't my intent to effectively drop support for LK_INTERLOCK, so let me revise this function.

Fix typo, restore support for LK_INTERLOCK on vp_crossmp

Harbormaster completed remote builds in B46541: Diff 108486.Jul 25 2022, 3:34 AM

jah added inline comments.Jul 25 2022, 3:41 AM

sys/kern/vfs_lookup.c
123	I've restored support for LK_INTERLOCK, this does not cause any issues in my testing. The reason I removed the WITNESS_CHECKORDER directive is that, with VV_CROSSLOCK the covered vnode lock may be upgraded to exclusive (depending on the requirements of the mounted FS) after locking vp_crossmp. This results in a crossmp->covered ordering instead of the usual covered->crossmp ordering and a corresponding witness LOR warning. But since crossmp_vop_lock1() doesn't actually take a lock and requires all consumers to pass LK_SHARED \| LK_NOWAIT, the witness directive did not seem useful here. Is there some reason we should keep it around? If not, should the WITNESS_[UN]LOCK directives also be removed?

kib added inline comments.Jul 25 2022, 7:23 AM

sys/kern/vfs_lookup.c
123	The directive is useful because it records a 'logical' lock owner, visible for instance with the db 'show alllocks' command. I believe actual lockmgr() call was removed for the sake of optimization. So I am not sure this is very useful now.

I have not observed any problems with D35054.108486.patch

jah added inline comments.Jul 27 2022, 3:56 AM

sys/kern/vfs_lookup.c
123	Would you prefer that I also remove the WITNESS_LOCK and WITNESS_UNLOCK calls here and in crossmp_vop_unlock (as part of a separate commit of course)?

kib added inline comments.Jul 27 2022, 9:46 AM

sys/kern/vfs_lookup.c
123	I suspect that yes, the calls have to be removed.

Remove remaining WITNESS directives and no-longer used variables

Harbormaster completed remote builds in B46752: Diff 108900.Aug 5 2022, 5:49 AM

kib accepted this revision.Aug 6 2022, 11:47 AM

kib added inline comments.

sys/kern/vfs_lookup.c
1237	Outer () are not needed

This revision is now accepted and ready to land.Aug 6 2022, 11:47 AM

D35054.108900.patch looks good to me.

Style

This revision now requires review to proceed.Aug 13 2022, 5:08 AM

Harbormaster completed remote builds in B46922: Diff 109294.Aug 13 2022, 5:08 AM

Fix wording in comment, incoporate some suggested performance optimizations

Harbormaster completed remote builds in B47938: Diff 112068.Oct 21 2022, 3:43 AM

Apologies for letting this review stall, $WORK got in the way in a big way for a while.
I've rebased, incorporated some of @mjg's optimization suggestions, and retested.