Page MenuHomeFreeBSD

Add VV_CROSSLOCK vnode flag to avoid cross-mount lookup LOR
Needs ReviewPublic

Authored by jah on Apr 25 2022, 3:49 AM.

Details

Reviewers
kib
markj
mjg
pho
Summary

When a lookup operation crosses into a new mountpoint, the mountpoint
must first be busied before the root vnode can be locked. When a
filesystem is unmounted, the vnode covered by the mountpoint must
first be locked, and then the busy count for the mountpoint drained.
Ordinarily, these two operations work fine if executed concurrently,
but with a stacked filesystem the root vnode may in fact use the
same lock as the covered vnode. By design, this will always be
the case for unionfs (with either the upper or lower root vnode
depending on mount options), and can also be the case for nullfs
if the target and mount point are the same (which admittedly is
very unlikely in practice).

In this case, we have LOR. The lookup path holds the mountpoint
busy while waiting on what is effectively the covered vnode lock,
while a concurrent unmount holds the covered vnode lock and waits
for the mountpoint's busy count to drain.

Attempt to resolve this LOR by allowing the stacked filesystem
to specify a new flag, VV_CROSSLOCK, on a covered vnode as necessary.
Upon observing this flag, the vfs_lookup() will leave the covered
vnode lock held while crossing into the mountpoint. Employ this flag
for unionfs with the caveat that it can't be used for '-o below' mounts
until other unionfs locking issues are resolved.

Reported by: pho
Tested by: pho

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 45355
Build 42243: arc lint + arc unit

Event Timeline

jah requested review of this revision.Apr 25 2022, 3:49 AM

I don't really like this change, it feels like a hack to me. At the same time, I don't have another idea that seems clearly better, so I thought I would post this to at least start a discussion.

I did consider some other options:

--Changing unionfs_root() to use a try-lock/re-lock approach similar to vn_lock_pair() but involving the busy count and the root vnode instead of two vnodes. I'm not convinced this would actually work, and in any case the idea seemed very ugly and likely to be detrimental to performance given how often VFS_ROOT() is used.

--Changing the order of dounmount() so that it drains the busy count before locking the covered vnode. I actually started implementing this one; it felt less hacky than VV_CROSSLOCK but also much more dangerous given the delicate ordering of dounmount(). I also very quickly realized that for this approach to have any chance of working, we would need to change the vfs_busy() call in vfs_lookup() to pass MBF_NOWAIT, otherwise we'll simply get a slightly different kind of LOR deadlock. I'm not sure MBF_NOWAIT would be acceptable there either.

sys/kern/vfs_lookup.c
111

Should we perhaps make this a KASSERT?

1256

Would you please update the comment above vfs_busy()'s definition explaining this new special case?

sys/kern/vfs_lookup.c
111

sounds good to me.

1256

i'll definitely do that before committing, if in fact everyone is ok with this approach.

sys/kern/vfs_lookup.c
1256

I don't think there is any need to pessimize the common case here.

I would argue the thing to do here is to:

  1. factor mount point traversal out into a dedicated routine
  2. refactor the code, most notably to stop constantly testing for NOCROSSMOUNT
  3. for the problem at hand roll with NOWAIT vfs_busy, which will almost always succeed anyway
  4. ... should it fail, you can fallback to a __noinline routine which will to the dance
sys/kern/vfs_lookup.c
1256

instead of checking dp->v_type == VDIR && (mp = dp->v_mountedhere) you can (vn_irflag_read(vp) & VIRF_MOUNTPOINT) != 0)

sys/kern/vfs_lookup.c
1256

This point, as well as your points 1) and 2) above are definitely good suggestions, but I think they should be done as part of a separate change.

As for points 3) and 4), in this case NOWAIT vfs_busy() wouldn't help. The deadlock happens after vfs_busy() has already successfully completed (almost certainly without blocking), while the following call to VFS_ROOT() blocks while trying to lock the root vnode. At the same time, dounmount() on another thread blocks waiting for the mp busy count to drain while holding the covered vnode lock (which in this case happens to be the same as the root vnode lock being waited on by the first thread).

On the other hand, I did mention this alternative approach in my earlier comment, which would reqiure NOWAIT vfs_busy():

--Changing the order of dounmount() so that it drains the busy count before locking the covered vnode. I actually started implementing this one; it felt less hacky than VV_CROSSLOCK but also much more dangerous given the delicate ordering of dounmount(). I also very quickly realized that for this approach to have any chance of working, we would need to change the vfs_busy() call in vfs_lookup() to pass MBF_NOWAIT, otherwise we'll simply get a slightly different kind of LOR deadlock. I'm not sure MBF_NOWAIT would be acceptable there either.