Page MenuHomeFreeBSD

unionfs: rework locking scheme to only lock a single vnode
ClosedPublic

Authored by jah on May 29 2024, 4:51 AM.
Tags
None
Referenced Files
F107435181: D45398.diff
Tue, Jan 14, 3:05 AM
Unknown Object (File)
Sat, Jan 11, 5:10 PM
Unknown Object (File)
Sun, Jan 5, 8:20 PM
Unknown Object (File)
Fri, Dec 27, 6:50 PM
Unknown Object (File)
Thu, Dec 26, 1:28 PM
Unknown Object (File)
Oct 23 2024, 11:34 AM
Unknown Object (File)
Oct 16 2024, 9:09 AM
Unknown Object (File)
Oct 8 2024, 11:55 PM

Details

Summary

Instead of locking both the lower and upper vnodes, which is both
complex and deadlock-prone, only lock the upper vnode, or the lower
vnode if no upper vnode is present.

In most cases this is all that is needed; for the cases in which
both vnodes do need to be locked, this change also employs deadlock-
avoiding techniques such as LK_NOWAIT and vn_lock_pair().

There are still some corner cases in which the current implementation
ends up taking multiple vnode locks across different filesystems
without taking special steps to avoid deadlock; those cases have
been noted in the comments.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sys/fs/unionfs/union_subr.c
392–414

I'm not sure there is any need for a fix in unionfs_ins_cached_vnode(). The current thread cannot have the containing directory's lower vnode locked (else, there is some violation of the cross-filesystem lock rule, basically not to establish any order, higher up in the call stack) because of systematic creation of shadow directories at lookup.

(I also have to go back to my project to globally simplify locking of multiple vnodes in the VFS and double-check whether it can really work.)

Removing the cache is primarily for semantics and lifecycle reasons, but indeed will have a huge simplification effect on several parts of unionfs. However, I don't think it makes much sense to do so without at the same time revising the semantics. I'll do both in the first phase of the plan.

418–432

Ah yes, dooming of the lower vnode would happen only on recursive unmount. This is also a paradigm that I will change. But that's true for the time being.

I'd like to check what could happen nonetheless after the lower vnode is doomed, because as you say, even if vflush() is eventually called, we don't really know what "shortly" means, and I'd rather not rely on that.

488–491

A small comment explaining why lvp needs to be locked would be great. What about the VV_ROOT case?

853

Yes, that's a problem, and likely the cause of @pho 's deadlock.

1580–1582

I wouldn't say that VOP_OPEN() is not truly needed. The current state in the VFS is sadly completely ad-hoc, some filesystem expecting it and some others not, with various degrees of consequences. I'm not sure at this point if we will require VOP_OPEN()/VOP_CLOSE() around VOP_READDIR() or not at all, only that we'll have to make a consistent choice.

1656

unionfs_check_rmdir() should return 0 if the directory is (determined) empty or ENOTEMPTY if it is not, but only if the determination didn't get any error. Any error in each of these calls should lead to bailing out immediately with that error, and callers adapted to cope.

This is not a new bug. However, the old code was quite close, but would in the end convert any error to ENOTEMPTY, while the new code steps farther away from that.

sys/fs/unionfs/union_vnops.c
222–224

lvp must be unlocked here.

262–264

Same as above for uvp.

627–629

Please keep this warning, as I don't think this case can occur (and should be fixed if it does).

826–827

(Too bad C is not Lispy enough...)

1718–1723

VOP_RENAME() seems to be a typo here. Did you mean VOP_READDIR()?

2232–2237

VV_ROOT case? (see unionfs_noderem())

Well, I'm almost done, but not completely (I need to finish reviewing unionfs_lock() and the lines below it). Here's a new round of comments/notes/requests.

Concerning unionfs_mkshadowdir(), I've added a series of suggested minor changes. These are mostly to clarify and simplify things (and, I suspect, marginally improve concurrency). You may as well forget about them if you think this is too much work for now (because, e.g., they would cause a new long round of testing).

The main problems I've spotted so far are the lack of some unlocks in unionfs_lookup() and the VV_ROOT case in unionfs_noderem() probably not handled correctly.

And, to correct something I've said:

The best approach, and perhaps the only sensible one, I have seen so far is that unionfs has to completely rely on lower layer filesystems for all operations, and for this must get rid of any cache it maintains from data on them.

Well, that's true, but only for information that can be modified by the underlying filesystems, which I had in mind when writing that. However, identity of the vnodes themselves is not included in these, so in fact a cache based on vnodes identities (and v_hash) is perfectly feasible, and will indeed allow to avoid the duplicate vnode problem in the lookup, and elsewhere. I'll also have to check for other foreseen uses developed in the full proposal.

Use of the vnode pointer as the identifier was indeed what I had in mind when I originally wrote the comment.

sys/fs/unionfs/union_subr.c
392–414

Technically, we don't proactively create shadow directories during lookup if the unionfs instance is mounted read-only (however unlikely it may be for a user to do that). But I'd also like to go back to the question I asked at the top of this review: how likely is it for this specific udvp->lvp locking sequence to present a practical risk of deadlock? In other words, how likely is it that some other agent will want to lock those two specific vnodes on different filesystems in the opposite order?

418–432

Actually I don't think this is a case we should encounter today. At this point, if the insmntque1() call hasn't already failed then the unionfs unmount operation's vflush() call should block on the lock we currently hold for the newly-allocated vnode. This should then prevent dooming of the underlying vnodes.

Perhaps these VN_IS_DOOMED() checks should instead be converted to VNASSERTs?

488–491

This is really to simplify unionfs_lock() (per the comment there) by synchronizing against any pending lock operation that may still be using the lower vnode lock. That should never be the case for the root vnode, as the root vnode should always have both an upper and lower vnode for its entire lifecycle, so unionfs_lock() should never attempt to use the lower root vnode. Moreover, if we are unmounting a non-"below" unionfs mount, this vn_lock_pair() call will assert because lvp will already be locked (as it is the covered vnode).

I'll add a comment to this effect.

927–929

These are error handling cases, so I wouldn't be very concerned about the extra locking cycles. But the other issue is that dvp may be reclaimed due to relocking of udvp in various places (unionfs_relookup, VOP_MKDIR, VOP_SETATTR). At that point, in locking terms udvp will no longer be an alias for dvp, so dvp must be explicitly relocked anyway.

I do prefer your label naming and placement of vn_finished_write() though, so I'll change those.

sys/fs/unionfs/union_vnops.c
222–224

oops, good catch

262–264

good catch, should also branch to unionfs_lookup_cleanup.

627–629

This warning actually does occur, and in fact it's extremely spammy during unionfs stress2 testing. IIRC this is because vgonel() calls VOP_CLOSE() on any active vnode, regardless of whether anything had it open.

1718–1723

Indeed, that's what I meant.

sys/fs/unionfs/union_vnops.c
627–629

Looking back over unionfs_get_node_status(), perhaps a better change would be to keep the warning but allow the close() path to use a NULL status object if one doesn't already exist, rather than allocating a new one through unionfs_get_node_status(). This would certainly be more efficient.

New round of review. I think I have looked at all the changes now, but may do an additional quick check pass tomorrow.

I'm proposing an alternate way to deal with cross-FS locking problems (in an inline response to your question; I'll elaborate later on the details, but they are pretty straightforward).

There is a remaining bug in unionfs_getextattr. The rest is answers to your latest responses, clarifications and requests for more comments explaining some code.

sys/fs/unionfs/union_subr.c
392–414

That's right, I forgot the read-only case (which I don't think is that unlikely), mostly because assuming that the containing directory has an upper vnode is so ingrained in the current code (and OK as an assumption as long as it is made as part of a write operation). This also is something that I'll likely change (time of creation for shadow directories should become configurable).

Concerning the different FS deadlock risk, I have a hard time trying to find any useful scenario where two FSes would be combined in opposite ways. That said, foot shooting is easy, and an administrator may try to mount unionfs with layers swapped (I'm pretty sure there is even a PR about such a scenario), which could lead to deadlocks. And that is an unacceptable system behavior, however infrequent it is. However, this doesn't mean this has to be prevented at the level of vnodes. A deadlock avoidance scheme that would directly prevent such erroneous mounts seems a much better solution to me, and will allow to simplify a lot of code.

418–432

Since dooming of the underlying vnodes cannot happen before unmount of the union above, if insmntque1() doesn't return failure after having locked the target underlying vnode, the latter simply can't be doomed before it is unlocked. That means that the two VN_IS_DOOMED() calls can't return true, and the blocks below the ifs are effectively dead code.

Given the unionfs vnode is itself locked (through the appropriate underlying vnode), I agree that lowervp can't be doomed regardless it was locked or not because the unionfs vnode can't be doomed yet, given that inmsntque1() checks for MNTK_UNMOUNT, and dounmount() sets this flag first thing, so before dooming the unionfs vnode, which requires its lock. I think we should add at least an assertion about lowervp.

To conclude, I'd just remove the ifs, whose blocks are dead code, and would add assertions that both underlying vnodes are not doomed. And your changes here indeed do not disturb what callers see.

488–491

And, for a below mount, calling vn_lock_pair() on the root vnode would assert as well since the upper vnode's lock is locked recursively.

Yes, a comment is warranted. Thanks.

986

Trailing whitespace.

988–1012

Trailing whitespace.

1661

Superfluous whitespace.

sys/fs/unionfs/union_vnops.c
627–629

Looking back over unionfs_get_node_status(), perhaps a better change would be to keep the warning but allow the close() path to use a NULL status object if one doesn't already exist, rather than allocating a new one through unionfs_get_node_status(). This would certainly be more efficient.

Yes, I spotted that more than a year ago and still remember it. Would be fine.

There is indeed the VOP_CLOSE() pairing problem in the VFS. I had some plan at some point, but mostly forgot it and will have to think about it again.

675–678

You have to reload unp here for the ENOENT case, else this may cause a crash or an access to the wrong structure from the code at unionfs_open_abort.

707

Trailing whitespace.

709–710

Trailing whitespace.

821–826

Same trailing whitespaces as in unionfs_open().

1947–1949

Trailing whitespace.

2154

Trailing whitespace.

2228–2235

All this block seems superfluous. There's no need to tweak flags on LK_UPGRADE or LK_DOWNGRADE. These don't make sense if the calling thread hasn't already locked the vnode, and in this case the underlying lock object can only be changed by this very thread (by an operation that requires holding the lock exclusively and gives back the lock as it is), which means the current thread can always know the state of the lock correctly. Therefore, we should just obey it.

2230

Please add a comment explaining why this is needed, along the lines of: "In-between unlock of the interlock and lock of the lower vnode, another process has a window where it could also acquire the lower vnode lock and then doom the unionfs vnode (changing its lock to v_lock) or perform an operation leading to the creation of an upper vnode, which then would be the one to lock. So we have to check for this once the lower vnode is locked, and if so restart the locking operation."

2232–2237

VV_ROOT case? (see unionfs_noderem())

Answered in unionfs_noderem(), thanks.

Please add a small comment saying that 'lvp_locked' is necessary false on VV_ROOT.

2232–2237

Mentioning unionfs_node_update() as well here and the additional guarantee we need below.

(Proposed text has been justified by hand.)

2664–2667

unp must be reloaded and could be NULL here (as you handled it in unionfs_setextattr). Alternatively, test for error being ENOENT.

sys/fs/unionfs/union_subr.c
392–414

You might be thinking of bug 172334, which landed on me fairly recently. If there is a general way to prevent such conflicting mounts at the time the mount is requested (and if that is in fact that only case that could lead to LOR in places like this), then I agree that would definitely be preferable to a vnode-centric solution.

sys/fs/unionfs/union_vnops.c
2228

'flags' should be passed here instead of ap->a_flags, otherwise we may miss LK_CANRECURSE set above. More generally, I'll clean up the flags handling in this function so that all updates are made against the 'flags' local, with ap->a_flags only modified in the one case in which we need to pass 'ap' along.

2228–2235

I think you're right that the LK_DOWNGRADE case is superfluous: that should instead be an assert, as already holding lvp's lock exclusive should guarantee that we don't first issue the downgrade request to lvp and then find that the unionfs vnode has been doomed or copied up.

The LK_UPGRADE case is needed though: LK_UPGRADE may cause the lock to be temporarily dropped (e.g. consider the case in which the lock has multiple shared holders), creating a window in which dooming or copy-up is possible.

sys/fs/unionfs/union_subr.c
392–414

Yes, exactly. I had listed that PR in the proposal document. I don't think we need to design/implement the mount dependencies deadlock avoidance scheme right now, as it isn't a prerequisite for the changes here. I'll address that point in the first phase as well.

927–929

But the other issue is that dvp may be reclaimed due to relocking of udvp in various places (unionfs_relookup, VOP_MKDIR, VOP_SETATTR).

Right, I have been too optimistic with this. Given that, I agree it's not worth the complexity of trying to distinguish the case where dvp and udvp locks are still aliased to try to optimize locking. Optimizing locking is not just for performance, it also gives hints of locking expectations at some points, so can help understanding (or, at least, help ask the right questions).

984–987

Avoid an unlock/relock for uvp in the common case.

sys/fs/unionfs/union_vnops.c
2228–2235

Yes, the LK_DOWNGRADE case is superfluous for the reason you stated, so an assert would be better.

I forgot about a possible lock drop in LK_UPGRADE. You're right then, the LK_UPGRADE case where LK_EXCLUSIVE is explicitly set is absolutely necessary (and the flag has to be explicit because we are changing the lock on which to operate, and we don't have this one locked).

sys/fs/unionfs/union_subr.c
984–987

I think for this we'll want VOP_VPUT_PAIR(udvp, &uvp, false), as we'll be unlocking udvp while a lock is still held on its child.

sys/fs/unionfs/union_subr.c
984–987

You're right, this is what, e.g., kern_mkdirat() does.

The need to call VOP_VPUT_PAIR() (a misnomer) complicates the VFS interface in non-obvious ways (and is not documented at all). For a while, I've been having the impression that introduction of that operation may have been ill-advised, but to be fair have not spent enough time to see if a better alternative is possible yet.

jah marked 48 inline comments as done.
  • unionfs: do not create a new status object during vop_close()
  • Review feedback from olce@, fix for hung process reported by pho@

I ran tests with D45398.139944.patch for a day without observing any issues.

Currently reviewing the update.

Looks good overall!

I have added 3 more comments suggesting more (or changes in) code comments.

Thanks, and sorry for the delay.

sys/fs/unionfs/union_subr.c
930–935

Rephrasing suggestion for more clarity.

980–981

Reference to the comment above.

988–1012

I would add a comment saying that uvp could become doomed in VOP_VPUT_PAIR() regardless of false being passed as unlock_vp, but here can't because the lower filesystem cannot be in the process of being forced unmounted (and VOP_MKDIR() returned it with an active reference).

This revision is now accepted and ready to land.Jul 2 2024, 9:10 AM
jah marked 2 inline comments as done.
  • Add and clarify some comments
This revision now requires review to proceed.Jul 2 2024, 10:10 PM

Nit in a new comment.

sys/fs/unionfs/union_subr.c
981–984
This revision is now accepted and ready to land.Jul 3 2024, 8:49 AM
jah marked an inline comment as done.
  • Avoid leaking a locked+referenced vnode in the dot-dot lookup case
  • Avoid returning a doomed vnode in the dot-dot lookup case
  • Simplify a few locking-related checks in unionfs_lookup()
This revision now requires review to proceed.Jul 5 2024, 4:12 PM

@pho You may want to run unionfs19.sh against this patchset. I believe the "forward VOP" guards added in unionfs_lookup() will address the panic seen there.

In D45398#1046015, @jah wrote:

@pho You may want to run unionfs19.sh against this patchset. I believe the "forward VOP" guards added in unionfs_lookup() will address the panic seen there.

Sure, I'll do that.

I got this panic with union19.sh:

20240705 20:25:38 all (1/1): unionfs19.sh
VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: (*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time)
VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: (*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time)
VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: (*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time)
0xfffffe016f682bb8: (*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time)
(*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time)
0xfffffe016f6d04b0: 0xfffffe016f1aa068: type VBAD state VSTATE_DEAD op 0xffffffff818ac760
    usecount 2, writecount 0, refcount 1 seqc users 10xfffffe016ffba068: type VBAD state VSTATE_DEAD op 0xffffffff818ac760

    hold count flags ()
0xfffffe016fed0000:     flags (VIRF_DOOMED)type VBAD state VSTATE_DEAD op 0xffffffff818ac760
    usecount 2, writecount 0, refcount 1 seqc users 1type VBAD state VSTATE_DEAD op 0xffffffff818ac760
type VBAD state VSTATE_DEAD op 0xffffffff818ac760
    usecount 2, writecount 0, refcount 1 seqc users 1
    hold count flags ()

    lock type unionfs: EXCL by thread 0xfffff802f76c7740 (pid 92551, mmap, tid 100429)

    usecount 2, writecount 0, refcount 1 seqc users 1
    usecount 2, writecount 0, refcount 1 seqc users 1    hold count flags ()
    flags (VIRF_DOOMED)    hold count flags ()

    flags (VIRF_DOOMED)    flags (VIRF_DOOMED)#0 0xffffffff80b14d98 at lockmgr_lock_flags+0x1b8
    
    #1 0xffffffff8113416a at VOP_LOCK1_APV+0x3a
lock type unionfs: EXCL by thread 0xfffff802f72a0740 (pid 92547, mmap, tid 100442)
#2 0xffffffff80c5ae03 at _vn_lock+0x53

    hold count flags ()
    flags (VIRF_DOOMED)
    #3 0xffffffff80c60017 at vn_lock_pair+0x3d7

lock type unionfs: EXCL by thread 0xfffff806a0787740 (pid 92565, mmap, tid 100465)
#0 0xffffffff80b14d98 at lockmgr_lock_flags+0x1b8
#0 0xffffffff80b14d98 at lockmgr_lock_flags+0x1b8
#1 0xffffffff8113416a at VOP_LOCK1_APV+0x3a
lock type unionfs: EXCL by thread 0xfffff802132f4740 (pid 92567, mmap, tid 100436)
#1 0xffffffff8113416a at VOP_LOCK1_APV+0x3a
    #0 0xffffffff80b14d98 at lockmgr_lock_flags+0x1b8
#2 0xffffffff80c5ae03 at _vn_lock+0x53
#4 0xffffffff8284e086 at unionfs_forward_vop_finish_pair+0x176
#1 0xffffffff8113416a at VOP_LOCK1_APV+0x3a
#3 0xffffffff80c60017 at vn_lock_pair+0x3d7
#2 0xffffffff80c5ae03 at _vn_lock+0x53
#5 0xffffffff8285014d at unionfs_lookup+0x22d
#2 0xffffffff80c5ae03 at _vn_lock+0x53
#3 0xffffffff80c60017 at vn_lock_pair+0x3d7
#6 0xffffffff81130c0f at VOP_CACHEDLOOKUP_APV+0x5f
lock type unionfs: EXCL by thread 0xfffff8021314b740 (pid 92569, mmap, tid 100415)
#3 0xffffffff80c60017 at vn_lock_pair+0x3d7
#7 0xffffffff80c20c36 at vfs_cache_lookup+0xa6
#0 0xffffffff80b14d98 at lockmgr_lock_flags+0x1b8
#4 0xffffffff8284e086 at unionfs_forward_vop_finish_pair+0x176
#8 0xffffffff81130a4f at VOP_LOOKUP_APV+0x5f
#1 0xffffffff8113416a at VOP_LOCK1_APV+0x3a
#9 0xffffffff80c321c7 at vfs_lookup+0x487
#2 0xffffffff80c5ae03 at _vn_lock+0x53
#10 0xffffffff80c312e1 at namei+0x2d1
#4 0xffffffff8284e086 at unionfs_forward_vop_finish_pair+0x176
#11 0xffffffff80c50223 at kern_chdir+0xc3
#4 0xffffffff8284e086 at unionfs_forward_vop_finish_pair+0x176
#12 0xffffffff8106a8a8 at amd64_syscall+0x158
#5 0xffffffff8285014d at unionfs_lookup+0x22d
#13 0xffffffff8103bd0b at fast_syscall_common+0xf8
panic: condition !VN_IS_DOOMED(dvp) not met at ../../../kern/vfs_cache.c:2481 (cache_enter_time)
cpuid = 2
time = 1720203945
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe010874e780
vpanic() at vpanic+0x13f/frame 0xfffffe010874e8b0
panic() at panic+0x43/frame 0xfffffe010874e910
cache_enter_time() at cache_enter_time+0x145b/frame 0xfffffe010874e9f0
unionfs_lookup() at unionfs_lookup+0x7fa/frame 0xfffffe010874eb50
VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0x5f/frame 0xfffffe010874eb80
vfs_cache_lookup() at vfs_cache_lookup+0xa6/frame 0xfffffe010874ebd0
VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x5f/frame 0xfffffe010874ec00
vfs_lookup() at vfs_lookup+0x487/frame 0xfffffe010874ec80
namei() at namei+0x2d1/frame 0xfffffe010874ece0
kern_chdir() at kern_chdir+0xc3/frame 0xfffffe010874ee00
amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe010874ef30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe010874ef30
--- syscall (12, FreeBSDdf55a, rsp = 0x3a114032c808, rbp = 0x3a114032c810 ---

https://people.freebsd.org/~pho/stress/log/log0535.txt

In D45398#1046132, @pho wrote:

I got this panic with union19.sh:

20240705 20:25:38 all (1/1): unionfs19.sh
VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: (*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time)
VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: (*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time)
VNASSERT failed: !__builtin_expect(((_Generic(*(&(dvp)->v_irflag), short: (*(volatile u_short *)(&(dvp)->v_irflag)), u_short: (*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time)
0xfffffe016f682bb8: (*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time)
(*(volatile u_short *)(&(dvp)->v_irflag))) & 0x0001) != 0), 0) not true at ../../../kern/vfs_cache.c:2481 (cache_enter_time)
0xfffffe016f6d04b0: 0xfffffe016f1aa068: type VBAD state VSTATE_DEAD op 0xffffffff818ac760
    usecount 2, writecount 0, refcount 1 seqc users 10xfffffe016ffba068: type VBAD state VSTATE_DEAD op 0xffffffff818ac760

    hold count flags ()
0xfffffe016fed0000:     flags (VIRF_DOOMED)type VBAD state VSTATE_DEAD op 0xffffffff818ac760
    usecount 2, writecount 0, refcount 1 seqc users 1type VBAD state VSTATE_DEAD op 0xffffffff818ac760
type VBAD state VSTATE_DEAD op 0xffffffff818ac760
    usecount 2, writecount 0, refcount 1 seqc users 1
    hold count flags ()

    lock type unionfs: EXCL by thread 0xfffff802f76c7740 (pid 92551, mmap, tid 100429)

    usecount 2, writecount 0, refcount 1 seqc users 1
    usecount 2, writecount 0, refcount 1 seqc users 1    hold count flags ()
    flags (VIRF_DOOMED)    hold count flags ()

    flags (VIRF_DOOMED)    flags (VIRF_DOOMED)#0 0xffffffff80b14d98 at lockmgr_lock_flags+0x1b8
    
    #1 0xffffffff8113416a at VOP_LOCK1_APV+0x3a
lock type unionfs: EXCL by thread 0xfffff802f72a0740 (pid 92547, mmap, tid 100442)
#2 0xffffffff80c5ae03 at _vn_lock+0x53

    hold count flags ()
    flags (VIRF_DOOMED)
    #3 0xffffffff80c60017 at vn_lock_pair+0x3d7

lock type unionfs: EXCL by thread 0xfffff806a0787740 (pid 92565, mmap, tid 100465)
#0 0xffffffff80b14d98 at lockmgr_lock_flags+0x1b8
#0 0xffffffff80b14d98 at lockmgr_lock_flags+0x1b8
#1 0xffffffff8113416a at VOP_LOCK1_APV+0x3a
lock type unionfs: EXCL by thread 0xfffff802132f4740 (pid 92567, mmap, tid 100436)
#1 0xffffffff8113416a at VOP_LOCK1_APV+0x3a
    #0 0xffffffff80b14d98 at lockmgr_lock_flags+0x1b8
#2 0xffffffff80c5ae03 at _vn_lock+0x53
#4 0xffffffff8284e086 at unionfs_forward_vop_finish_pair+0x176
#1 0xffffffff8113416a at VOP_LOCK1_APV+0x3a
#3 0xffffffff80c60017 at vn_lock_pair+0x3d7
#2 0xffffffff80c5ae03 at _vn_lock+0x53
#5 0xffffffff8285014d at unionfs_lookup+0x22d
#2 0xffffffff80c5ae03 at _vn_lock+0x53
#3 0xffffffff80c60017 at vn_lock_pair+0x3d7
#6 0xffffffff81130c0f at VOP_CACHEDLOOKUP_APV+0x5f
lock type unionfs: EXCL by thread 0xfffff8021314b740 (pid 92569, mmap, tid 100415)
#3 0xffffffff80c60017 at vn_lock_pair+0x3d7
#7 0xffffffff80c20c36 at vfs_cache_lookup+0xa6
#0 0xffffffff80b14d98 at lockmgr_lock_flags+0x1b8
#4 0xffffffff8284e086 at unionfs_forward_vop_finish_pair+0x176
#8 0xffffffff81130a4f at VOP_LOOKUP_APV+0x5f
#1 0xffffffff8113416a at VOP_LOCK1_APV+0x3a
#9 0xffffffff80c321c7 at vfs_lookup+0x487
#2 0xffffffff80c5ae03 at _vn_lock+0x53
#10 0xffffffff80c312e1 at namei+0x2d1
#4 0xffffffff8284e086 at unionfs_forward_vop_finish_pair+0x176
#11 0xffffffff80c50223 at kern_chdir+0xc3
#4 0xffffffff8284e086 at unionfs_forward_vop_finish_pair+0x176
#12 0xffffffff8106a8a8 at amd64_syscall+0x158
#5 0xffffffff8285014d at unionfs_lookup+0x22d
#13 0xffffffff8103bd0b at fast_syscall_common+0xf8
panic: condition !VN_IS_DOOMED(dvp) not met at ../../../kern/vfs_cache.c:2481 (cache_enter_time)
cpuid = 2
time = 1720203945
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe010874e780
vpanic() at vpanic+0x13f/frame 0xfffffe010874e8b0
panic() at panic+0x43/frame 0xfffffe010874e910
cache_enter_time() at cache_enter_time+0x145b/frame 0xfffffe010874e9f0
unionfs_lookup() at unionfs_lookup+0x7fa/frame 0xfffffe010874eb50
VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0x5f/frame 0xfffffe010874eb80
vfs_cache_lookup() at vfs_cache_lookup+0xa6/frame 0xfffffe010874ebd0
VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x5f/frame 0xfffffe010874ec00
vfs_lookup() at vfs_lookup+0x487/frame 0xfffffe010874ec80
namei() at namei+0x2d1/frame 0xfffffe010874ece0
kern_chdir() at kern_chdir+0xc3/frame 0xfffffe010874ee00
amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe010874ef30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe010874ef30
--- syscall (12, FreeBSDdf55a, rsp = 0x3a114032c808, rbp = 0x3a114032c810 ---

https://people.freebsd.org/~pho/stress/log/log0535.txt

Huh, I ran unionfs19 for a few hours and didn't see that, but it is definitely a problem in the updated code. Should be a straightforward fix, I'll post an update shortly.

In my experience, it is not uncommon for the test environment to have a big impact on finding errors.

  • Use the common cleanup path upon completion of dot-dot lookup This simplifies the code and avoids attempted creation of a negative dot-dot VFS cache entry for a doomed vnode.
sys/fs/unionfs/union_vnops.c
171

I find it very odd that unionfs_lookup() went to the trouble of explicitly creating a negative cache entry in the ENOENT case for dot-dot lookup but does not create a positive entry for successful dot-dot lookup. I don't see any apparent reason for this behavior.

I ran tests with D45398.140613.patch for 9 hours without seeing any problems.

Just a note that the last two diffs were to fix an unrelated, pre-existing bug. I completely understand it's easier to add to this review, especially for a small change.

I think you should commit this diff as is now, and if further testing shows other (and unrelated) problems, we'll fix them separately.

I do not intend to work very hard on unionfs and related subjects until the second half of September (lots of things on my plate up to then and going to be off most of August), so feel free to continue working on it, I'll review what you come out with.

Thanks.

This revision is now accepted and ready to land.Jul 9 2024, 1:26 PM