In D43815#1000687, @jah wrote:

In D43815#1000600, @olce wrote:

In D43815#1000340, @jah wrote:

In D43815#1000302, @olce wrote:

I don't think it can. Given the first point above, there can't be any unmount of some layer (even forced) until the unionfs mount on top is unmounted. As the layers' root vnodes are vrefed(), they can't become doomed (since unmount of their own FS is prevented), and consequently their v_mount is never modified (barring the ZFS rollback case). This is independent of holding (or not) any vnode lock.

Which doesn't say that they aren't any problems of the sort that you're reporting in unionfs, it's just a different matter.

That's not true; vref() does nothing to prevent a forced unmount from dooming the vnode, only holding its lock does this. As such, if the lock needs to be transiently dropped for some reason and the timing is sufficiently unfortunate, the concurrent recursive forced unmount can first unmount unionfs (dooming the unionfs vnode) and then the base FS (dooming the lower/upper vnode). The held references prevent the vnodes from being recycled (but not doomed), but even this isn't foolproof: for example, in the course of being doomed, the unionfs vnode will drop its references on the lower/upper vnodes, at which point they may become unreferenced unless additional action is taken. Whatever caller invoked the unionfs VOP will of course still hold a reference on the unionfs vnode, but this does not automatically guarantee that references will be held on the underlying vnodes for the duration of the call, due to the aforementioned scenario.

There is a misunderstanding. I'm very well aware of what you are saying, as you should know. But this is not my point, which concerns the sentence "Use of [vnode]->v_mount is unsafe in the presence of a concurrent forced unmount." in the context of the current change. The bulk of the latter is modifications of unionfs_vfsops.c, which contains VFS operations, and not vnode ones. There are no vnodes involved there, except accessing the layers' root ones. And what I'm saying, and that I proved above is that v_mount on these, again in the context of a VFS operation, cannot be NULL because of a force unmount (if you disagree, then please show where you think there is a flaw in the reasoning).

Actually the assertion about VFS operations isn't entirely true either (mostly, but not entirely); see the vfs_unbusy() dance we do in unionfs_quotactl().
But saying this makes me realize I actually need to bring back the atomic_load there (albeit the load should be of ump->um_uppermp now).

Otherwise your assertion should be correct, and indeed I doubt the two read-only VOPs in question would have these locking issues in practice.
I think the source of the misunderstanding here is that I just didn't word the commit message very well. Really what I meant there is what I said in a previous comment here: If we need to cache the mount objects anyway, it's better to use them everywhere to avoid the pitfalls of potentially accessing ->v_mount when it's unsafe to do so.

Feb 13 2024, 1:45 PM

jah updated the diff for D43815: unionfs: cache upper/lower mount objects.

Restore volatile load from ump in quotactl()

Feb 13 2024, 12:30 PM

jah added a comment to D43815: unionfs: cache upper/lower mount objects.

In D43815#1000600, @olce wrote:

In D43815#1000340, @jah wrote:

In D43815#1000302, @olce wrote:

I don't think it can. Given the first point above, there can't be any unmount of some layer (even forced) until the unionfs mount on top is unmounted. As the layers' root vnodes are vrefed(), they can't become doomed (since unmount of their own FS is prevented), and consequently their v_mount is never modified (barring the ZFS rollback case). This is independent of holding (or not) any vnode lock.

Which doesn't say that they aren't any problems of the sort that you're reporting in unionfs, it's just a different matter.

That's not true; vref() does nothing to prevent a forced unmount from dooming the vnode, only holding its lock does this. As such, if the lock needs to be transiently dropped for some reason and the timing is sufficiently unfortunate, the concurrent recursive forced unmount can first unmount unionfs (dooming the unionfs vnode) and then the base FS (dooming the lower/upper vnode). The held references prevent the vnodes from being recycled (but not doomed), but even this isn't foolproof: for example, in the course of being doomed, the unionfs vnode will drop its references on the lower/upper vnodes, at which point they may become unreferenced unless additional action is taken. Whatever caller invoked the unionfs VOP will of course still hold a reference on the unionfs vnode, but this does not automatically guarantee that references will be held on the underlying vnodes for the duration of the call, due to the aforementioned scenario.

There is a misunderstanding. I'm very well aware of what you are saying, as you should know. But this is not my point, which concerns the sentence "Use of [vnode]->v_mount is unsafe in the presence of a concurrent forced unmount." in the context of the current change. The bulk of the latter is modifications of unionfs_vfsops.c, which contains VFS operations, and not vnode ones. There are no vnodes involved there, except accessing the layers' root ones. And what I'm saying, and that I proved above is that v_mount on these, again in the context of a VFS operation, cannot be NULL because of a force unmount (if you disagree, then please show where you think there is a flaw in the reasoning).

Feb 13 2024, 12:26 PM

Feb 12 2024

jah added a comment to D43815: unionfs: cache upper/lower mount objects.

In D43815#1000302, @olce wrote:

In D43815#1000214, @jah wrote:

In D43815#1000171, @olce wrote:

If one of the layer if forcibly unmounted, there isn't much point in continuing operation. But, given the first point above, that cannot even happen. So really the only case when v_mount can get NULL is the ZFS rollback's one (the layers' root vnodes can't be recycled since they are vrefed). Thinking more about it, always testing if these are alive and well is going to be inevitable going forward. But I'm fine with this change as it is for now.

This can indeed happen, despite the first point above. If a unionfs VOP ever temporarily drops its lock, another thread is free to stage a recursive forced unmount of both the unionfs and the base FS during this window. Moreover, it's easy for this to happen without unionfs even being aware of it: because unionfs shares its lock with the base FS, if a base FS VOP (forwarded by a unionfs VOP) needs to drop the lock temporarily (this is common e.g. for FFS operations that need to update metadata), the unionfs vnode may effectively be unlocked during that time. That last point is a particularly dangerous one; I have another pending set of changes to deal with the problems that can arise in that situation.

This is why I say it's easy to make a mistake in accessing [base vp]->v_mount at an unsafe time.

I don't think it can. Given the first point above, there can't be any unmount of some layer (even forced) until the unionfs mount on top is unmounted. As the layers' root vnodes are vrefed(), they can't become doomed (since unmount of their own FS is prevented), and consequently their v_mount is never modified (barring the ZFS rollback case). This is independent of holding (or not) any vnode lock.

Which doesn't say that they aren't any problems of the sort that you're reporting in unionfs, it's just a different matter.

Feb 12 2024, 6:34 PM

jah added a comment to D43815: unionfs: cache upper/lower mount objects.

In D43815#1000171, @olce wrote:

In D43815#999937, @jah wrote:

Well, as it is today unmounting of the base FS is either recursive or it doesn't happen at all (i.e. the unmount attempt is rejected immediately because of the unionfs stacked atop the mount in question). I don't think it can work any other way, although I could see the default settings around recursive unmounts changing (maybe vfs.recursive_forced_unmount being enabled by default, or recursive unmounts even being allowed for the non-forced case as well). I don't have plans to change any of those defaults though.

I was asking because I was fearing that the unmount could proceed in the non-recursive case, but indeed it's impossible (handled by the !TAILQ_EMPTY(&mp->mnt_uppers) test in dounmount()). For the default value itself, for now I think it is fine as it is (prevents unwanted foot-shooting).

For the changes here, you're right that the first reason isn't an issue as long as the unionfs vnode is locked when the [base_vp]->v_mount access happens, as the unionfs unmount can't complete while the lock is held which then prevents the base FS from being unmounted. But it's also easy to make a mistake there, e.g. in cases where the unionfs lock is temporarily dropped, so if the base mount objects need to be cached anyway because of the ZFS case then it makes sense to just use them everywhere.

If one of the layer if forcibly unmounted, there isn't much point in continuing operation. But, given the first point above, that cannot even happen. So really the only case when v_mount can get NULL is the ZFS rollback's one (the layers' root vnodes can't be recycled since they are vrefed). Thinking more about it, always testing if these are alive and well is going to be inevitable going forward. But I'm fine with this change as it is for now.

Feb 12 2024, 4:43 PM

jah added a comment to D43818: unionfs: workaround underlying FS failing to respect cn_namelen.

In D43818#999955, @olce wrote:

OK as a workaround. Hopefully, we'll get OpenZFS fixed soon. If you don't plan to, I may try to submit a patch upstream, since it seems no one has proposed any change in https://github.com/openzfs/zfs/issues/15705.

Feb 12 2024, 12:34 AM

jah added a comment to D40850: VFS lookup: New vn_cross_single_mount() and vn_cross_mounts().

@olce @mjg This change seems to have stalled, what do you want to do about it?

Feb 12 2024, 12:32 AM

Feb 11 2024

jah added a comment to D43815: unionfs: cache upper/lower mount objects.

In D43815#999912, @olce wrote:

I think this goes in the right direction long term also.

Longer term, do you have any thoughts on only supporting recursive unmounting, regardless of whether forced or not? This would eliminate the first reason evoked in the commit message.

Feb 11 2024, 6:22 PM

jah updated the diff for D43815: unionfs: cache upper/lower mount objects.

Style

Feb 11 2024, 7:12 AM

jah updated the diff for D43818: unionfs: workaround underlying FS failing to respect cn_namelen.

Update comment

Feb 11 2024, 6:57 AM

jah requested review of D43818: unionfs: workaround underlying FS failing to respect cn_namelen.

Sadly my attempt at something less hacky didn't really improve things.

Feb 11 2024, 6:56 AM

jah added inline comments to D43817: unionfs: upgrade the vnode lock during fsync() if necessary.

Feb 11 2024, 6:39 AM

jah planned changes to D43818: unionfs: workaround underlying FS failing to respect cn_namelen.

Putting this on hold, as I'm evaluating a less-hacky approach.

Feb 11 2024, 5:28 AM

Feb 10 2024

jah added a comment to D43818: unionfs: workaround underlying FS failing to respect cn_namelen.

Also filed https://github.com/openzfs/zfs/issues/15705, as I think that would benefit OpenZFS as well.

Feb 10 2024, 4:53 PM

jah requested review of D43818: unionfs: workaround underlying FS failing to respect cn_namelen.

Feb 10 2024, 4:49 PM

jah requested review of D43817: unionfs: upgrade the vnode lock during fsync() if necessary.

Feb 10 2024, 4:39 PM

jah requested review of D43816: VFS: update VOP_FSYNC() debug check to reflect actual locking policy.

Feb 10 2024, 4:35 PM

jah requested review of D43815: unionfs: cache upper/lower mount objects.

Feb 10 2024, 4:31 PM

Jan 2 2024

jah committed rG10f2e94acc1e: vm_page_reclaim_contig(): update comment to chase recent changes (authored by jah).

vm_page_reclaim_contig(): update comment to chase recent changes

Jan 2 2024, 9:44 PM

Dec 24 2023

jah committed rG0ee1cd6da960: vm_page.h: tweak page-busied assertion macros (authored by jah).

vm_page.h: tweak page-busied assertion macros

Dec 24 2023, 5:40 AM

jah committed rG2619c5ccfe1f: Avoid waiting on physical allocations that can't possibly be satisfied (authored by jah).

Avoid waiting on physical allocations that can't possibly be satisfied

Dec 24 2023, 5:40 AM

jah closed D42706: Avoid waiting on physical allocations that can't possibly be satisfied.

Dec 24 2023, 5:40 AM

Dec 1 2023

jah updated the diff for D42706: Avoid waiting on physical allocations that can't possibly be satisfied.

Apply code review feedback from markj

Dec 1 2023, 4:33 AM

Nov 30 2023

jah added inline comments to D42706: Avoid waiting on physical allocations that can't possibly be satisfied.

Nov 30 2023, 5:30 PM

Nov 24 2023

jah updated the diff for D42706: Avoid waiting on physical allocations that can't possibly be satisfied.

Eliminate extraneous call to vm_phys_find_range()

Nov 24 2023, 5:41 AM

jah added inline comments to D42706: Avoid waiting on physical allocations that can't possibly be satisfied.

Nov 24 2023, 5:20 AM

Nov 23 2023

jah updated the diff for D42706: Avoid waiting on physical allocations that can't possibly be satisfied.

Avoid allocation in the ERANGE case, assert that return status is ENOMEM if not 0/ERANGE.

Nov 23 2023, 9:01 PM

Nov 21 2023

jah added inline comments to D42706: Avoid waiting on physical allocations that can't possibly be satisfied.

Nov 21 2023, 11:57 PM

jah requested review of D42706: Avoid waiting on physical allocations that can't possibly be satisfied.

Nov 21 2023, 11:45 PM

Nov 16 2023

jah accepted D42625: fuse copy_file_range() fixes.

Nov 16 2023, 12:42 AM

Nov 15 2023

jah added inline comments to D42625: fuse copy_file_range() fixes.

Nov 15 2023, 11:57 PM

jah accepted D42625: fuse copy_file_range() fixes.

Nov 15 2023, 11:35 PM

Nov 13 2023

jah added inline comments to D42554: vn_copy_file_range(): busy both in and out mp around call to VOP_COPY_FILE_RANGE().

Nov 13 2023, 4:57 PM

jah accepted D42554: vn_copy_file_range(): busy both in and out mp around call to VOP_COPY_FILE_RANGE().

Nov 13 2023, 4:18 PM

jah added inline comments to D42554: vn_copy_file_range(): busy both in and out mp around call to VOP_COPY_FILE_RANGE().

Nov 13 2023, 3:17 PM

jah added inline comments to D42554: vn_copy_file_range(): busy both in and out mp around call to VOP_COPY_FILE_RANGE().

Nov 13 2023, 2:38 PM

Nov 12 2023

jah committed rG66b8f5484cfe: vfs_lookup_cross_mount(): restore previous do...while loop (authored by jah).

vfs_lookup_cross_mount(): restore previous do...while loop

Nov 12 2023, 2:57 AM

Nov 4 2023

jah committed rG586fed0b0356: vfs_lookup_cross_mount(): restore previous do...while loop (authored by jah).

vfs_lookup_cross_mount(): restore previous do...while loop

Nov 4 2023, 5:16 PM

Oct 2 2023

jah added a comment to D42008: tun/tap: correct ref count on cloned cdevs.

From the original PR it also sounds as though this sort of refcounting issue is a common problem with drivers that use the clone facility? Could clone_create() be changed to automatically add the reference to an existing device, or perhaps a wrapper around clone_create() that does this automatically? Or would that merely create different complications elsewhere?

Oct 2 2023, 5:56 PM

jah added a comment to D42008: tun/tap: correct ref count on cloned cdevs.

In D42008#958212, @kib wrote:

Devfs clones is a way to handle (reserve) unit numbers. It seems that phk decided that the least involved way to code it is to just keep whole cdev with the unit number somewhere (on the clone list). These clones are not referenced, they exist by mere fact being on the clone list. When device driver allocates clone, it must make it fully correct, including the ref count.

References on cdev protect freeing of the device memory, they do not determine the lifecycle of the device. Device is created with make_dev() and destroyed with destroy_dev(), the later does not free the memory and does not even drop a reference. Devfs nodes are managed out of the driver context, by combination of dev_clone eventhandler and devfs_populate_loop() top-level code. Eventhandler is supposed to return device with additional reference to protect against parallel populate loop, and loop is the code which usually dereferences the last ref on destroyed (in destroy_dev() sense) device.

So typical driver does not need to manage dev_ref()/dev_rel() except for initial device creation, where clones and dev_clone context add some complications.

Oct 2 2023, 5:52 PM

Sep 28 2023

jah added a comment to D42008: tun/tap: correct ref count on cloned cdevs.

I've never used the clone KPIs before, so please forgive my ignorance in asking a couple of basic questions:

Sep 28 2023, 4:06 PM

jah committed rGee596061e5a5: devfs: add integrity asserts for cdevp_list (authored by jah).

devfs: add integrity asserts for cdevp_list

Sep 28 2023, 1:43 AM

jah committed rG23332e34e653: devfs: add integrity asserts for cdevp_list (authored by jah).

devfs: add integrity asserts for cdevp_list

Sep 28 2023, 1:29 AM

Sep 21 2023

jah committed rG67864268da53: devfs: add integrity asserts for cdevp_list (authored by jah).

devfs: add integrity asserts for cdevp_list

Sep 21 2023, 4:52 PM

Jul 24 2023

jah added a comment to D40883: vfs: factor out mount point traversal to a dedicated routine.

In D40883#931131, @mjg wrote:

huh, you just made me realize the committed change is buggy in that it fails to unlock dvp. i'll fix it up soon.

Jul 24 2023, 6:18 PM

jah added inline comments to D40852: Remove VV_CROSSLOCK flag, and logic in nullfs and unionfs.

Jul 24 2023, 6:18 PM

Jul 7 2023

jah added a comment to D40883: vfs: factor out mount point traversal to a dedicated routine.

Looks like a similar cleanup can be done in the needs_exclusive_leaf case at the end of vfs_lookup().

Jul 7 2023, 2:07 AM

Jul 3 2023

jah added inline comments to D40850: VFS lookup: New vn_cross_single_mount() and vn_cross_mounts().

Jul 3 2023, 11:35 PM

jah added inline comments to D40600: vfs_lookup(): remove VV_CROSSLOCK logic.

Jul 3 2023, 6:03 PM

jah added inline comments to D40850: VFS lookup: New vn_cross_single_mount() and vn_cross_mounts().

Jul 3 2023, 4:05 PM

jah added inline comments to D40600: vfs_lookup(): remove VV_CROSSLOCK logic.

Jul 3 2023, 3:41 PM

jah added inline comments to D40600: vfs_lookup(): remove VV_CROSSLOCK logic.

Jul 3 2023, 3:09 PM

Jun 22 2023

jah updated the diff for D40600: vfs_lookup(): remove VV_CROSSLOCK logic.

Remove extraneous vhold()

Jun 22 2023, 5:50 PM

jah added inline comments to D40600: vfs_lookup(): remove VV_CROSSLOCK logic.

Jun 22 2023, 3:21 PM

Jun 20 2023

jah updated the diff for D40600: vfs_lookup(): remove VV_CROSSLOCK logic.

Return the write sequence on coveredvp to the correct place, replace

Jun 20 2023, 9:57 PM

jah added inline comments to D40600: vfs_lookup(): remove VV_CROSSLOCK logic.

Jun 20 2023, 9:23 PM

Jun 19 2023

jah added a reviewer for D40600: vfs_lookup(): remove VV_CROSSLOCK logic: olce.

Jun 19 2023, 1:30 AM

jah added a comment to D40600: vfs_lookup(): remove VV_CROSSLOCK logic.

Right now this change is just a proposal. I've successfully run the unionfs and nullfs stress2 tests against it, and have also been running it on my -current machine for the last couple of months.
Of course it's entirely possible I've missed something that would make this change unworkable. But if you guys think this approach has merit, then I'll finish this patch by doing the following:

Jun 19 2023, 1:29 AM

jah requested review of D40600: vfs_lookup(): remove VV_CROSSLOCK logic.

Jun 19 2023, 1:23 AM

May 7 2023

jah committed rG0745d837c2e9: unionfs: prevent upperrootvp from being recycled during mount (authored by jah).

unionfs: prevent upperrootvp from being recycled during mount

May 7 2023, 11:53 PM

jah committed rG356e698011b2: unionfs(): destroy root vnode if upper registration fails (authored by jah).

unionfs(): destroy root vnode if upper registration fails

May 7 2023, 11:53 PM

jah committed rG080917298512: unionfs: fixes to unionfs_nodeget() error handling (authored by jah).

unionfs: fixes to unionfs_nodeget() error handling

May 7 2023, 11:53 PM

jah closed D39767: Various fixes to unionfs error handling logic.

May 7 2023, 11:53 PM

May 6 2023

jah committed rG95e02a419224: Intel DMAR: remove parsing of 6-level paging capability (authored by jah).

Intel DMAR: remove parsing of 6-level paging capability

May 6 2023, 1:16 AM

jah committed rG651e037e2720: Intel DMAR: remove parsing of 6-level paging capability (authored by jah).

Intel DMAR: remove parsing of 6-level paging capability

May 6 2023, 1:00 AM

May 2 2023

jah closed D39896: Intel DMAR: remove parsing of 6-level paging capability.

May 2 2023, 2:09 PM

jah committed rG6f378116e9bf: Intel DMAR: remove parsing of 6-level paging capability (authored by jah).

Intel DMAR: remove parsing of 6-level paging capability

May 2 2023, 2:09 PM

May 1 2023

jah requested review of D39896: Intel DMAR: remove parsing of 6-level paging capability.

May 1 2023, 4:32 PM

Apr 24 2023

jah added inline comments to D39767: Various fixes to unionfs error handling logic.

Apr 24 2023, 12:59 AM

Apr 23 2023

jah requested review of D39767: Various fixes to unionfs error handling logic.

Apr 23 2023, 3:10 AM

Apr 18 2023

jah committed rG0c01203e4725: vfs_lookup(): re-check v_mountedhere on lock upgrade (authored by jah).

vfs_lookup(): re-check v_mountedhere on lock upgrade

Apr 18 2023, 1:37 AM

jah committed rG93fe61afde72: unionfs_mkdir(): handle dvp reclamation (authored by jah).

unionfs_mkdir(): handle dvp reclamation

Apr 18 2023, 1:37 AM

jah committed rGd711884e60bf: Remove unionfs_islocked() (authored by jah).

Remove unionfs_islocked()

Apr 18 2023, 1:37 AM

jah committed rGa5d82b55fe76: Remove an impossible condition from unionfs_lock() (authored by jah).

Remove an impossible condition from unionfs_lock()

Apr 18 2023, 1:36 AM

jah committed rGa18c403fbddb: unionfs: remove LK_UPGRADE if falling back to the standard lock (authored by jah).

unionfs: remove LK_UPGRADE if falling back to the standard lock

Apr 18 2023, 1:36 AM

jah closed D39272: unionfs: remove LK_UPGRADE if falling back to the standard lock.

Apr 18 2023, 1:36 AM

Apr 10 2023

jah accepted D39477: ASSERT_VOP_LOCKED(): restore diagnostic for the witness use case.

Apr 10 2023, 4:14 PM

jah added inline comments to D39477: ASSERT_VOP_LOCKED(): restore diagnostic for the witness use case.

Apr 10 2023, 2:27 PM

Mar 28 2023

jah updated the diff for D39272: unionfs: remove LK_UPGRADE if falling back to the standard lock.

vfs_lookup(): re-check v_mountedhere on lock upgrade

Mar 28 2023, 6:14 PM

jah added a comment to D39272: unionfs: remove LK_UPGRADE if falling back to the standard lock.

In D39272#894822, @jah wrote:
In D39272#894669, @pho wrote:
I ran into this problem:
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address	= 0x10
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff82753417
stack pointer	        = 0x0:0xfffffe01438a8a80
frame pointer	        = 0x0:0xfffffe01438a8aa0
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 2382 (find)
rdi: fffffe015b22d700 rsi:            82000 rdx: fffffe01438a8ad8
rcx:                1  r8:              246  r9:            40000
rax:                0 rbx: fffffe015b22d700 rbp: fffffe01438a8aa0
r10:                1 r11:                0 r12:            80000
r13: fffffe01599802c0 r14: fffffe01438a8ad8 r15:            82000
trap number		= 12
panic: page fault
cpuid = 2
time = 1679981682
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01438a8840
vpanic() at vpanic+0x152/frame 0xfffffe01438a8890
panic() at panic+0x43/frame 0xfffffe01438a88f0
trap_fatal() at trap_fatal+0x409/frame 0xfffffe01438a8950
trap_pfault() at trap_pfault+0xab/frame 0xfffffe01438a89b0
calltrap() at calltrap+0x8/frame 0xfffffe01438a89b0
--- trap 0xc, rip = 0xffffffff82753417, rsp = 0xfffffe01438a8a80, rbp = 0xfffffe01438a8aa0 ---
unionfs_root() at unionfs_root+0x17/frame 0xfffffe01438a8aa0
vfs_lookup() at vfs_lookup+0x92a/frame 0xfffffe01438a8b40
namei() at namei+0x340/frame 0xfffffe01438a8bc0
kern_statat() at kern_statat+0x12f/frame 0xfffffe01438a8d00
sys_fstatat() at sys_fstatat+0x2f/frame 0xfffffe01438a8e00
amd64_syscall() at amd64_syscall+0x15a/frame 0xfffffe01438a8f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01438a8f30
--- syscall (552, FreeBSD ELF64, fstatat), rip = 0x1e9fa9c98, rbp = 0x1e9fa1989db0 ---
https://people.freebsd.org/~pho/stress/log/log0429.txt

PS
The BIOS and IPMI firmware was just updated on this test box, but the page fault seems legit to me?
Hmm. I'm probably missing something, but on initial investigation the panic doesn't seem to make sense.
It really looks as though unionfs_root() is seeing a partially constructed mount object: mp has the unionfs ops vector, but mnt_data is NULL (thus the page fault) and the stat object's fsid is 0.
This is consistent with the mount object state that would exist partway through unionfs_domount(), or if unionfs_domount() failed due to failure of unionfs_nodeget() or vfs_register_upper_from_vp(). There is another thread that appears to be partway through unionfs_domount(), and mp's busy count (mnt_lockref) of 2 is consistent with these 2 threads.
What doesn't make sense is how vfs_lookup() could observe a mount object in this state in the first place; vfs_domount_first() doesn't set coveredvp->v_mountedhere until after successful completion of VFS_MOUNT(), and it does so with coveredvp locked exclusive to avoid racing vfs_lookup().

Some things to note:
--Besides a couple of added comments, this is the same patch you tested successfully at the end of January.
--Since then, I have found a couple of bugs in the cleanup logic in unionfs_domount() and unionfs_nodeget(), which I'll post in a separate review after this one, but I don't think they would explain this behavior.
--There does appear to be a third thread calling dounmount() and blocked on an FFS lock, but it's unclear if that has any impact on the crash.

Mar 28 2023, 6:02 PM

jah added a comment to D39272: unionfs: remove LK_UPGRADE if falling back to the standard lock.

In D39272#894669, @pho wrote:

I ran into this problem:

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address	= 0x10
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff82753417
stack pointer	        = 0x0:0xfffffe01438a8a80
frame pointer	        = 0x0:0xfffffe01438a8aa0
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 2382 (find)
rdi: fffffe015b22d700 rsi:            82000 rdx: fffffe01438a8ad8
rcx:                1  r8:              246  r9:            40000
rax:                0 rbx: fffffe015b22d700 rbp: fffffe01438a8aa0
r10:                1 r11:                0 r12:            80000
r13: fffffe01599802c0 r14: fffffe01438a8ad8 r15:            82000
trap number		= 12
panic: page fault
cpuid = 2
time = 1679981682
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01438a8840
vpanic() at vpanic+0x152/frame 0xfffffe01438a8890
panic() at panic+0x43/frame 0xfffffe01438a88f0
trap_fatal() at trap_fatal+0x409/frame 0xfffffe01438a8950
trap_pfault() at trap_pfault+0xab/frame 0xfffffe01438a89b0
calltrap() at calltrap+0x8/frame 0xfffffe01438a89b0
--- trap 0xc, rip = 0xffffffff82753417, rsp = 0xfffffe01438a8a80, rbp = 0xfffffe01438a8aa0 ---
unionfs_root() at unionfs_root+0x17/frame 0xfffffe01438a8aa0
vfs_lookup() at vfs_lookup+0x92a/frame 0xfffffe01438a8b40
namei() at namei+0x340/frame 0xfffffe01438a8bc0
kern_statat() at kern_statat+0x12f/frame 0xfffffe01438a8d00
sys_fstatat() at sys_fstatat+0x2f/frame 0xfffffe01438a8e00
amd64_syscall() at amd64_syscall+0x15a/frame 0xfffffe01438a8f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01438a8f30
--- syscall (552, FreeBSD ELF64, fstatat), rip = 0x1e9fa9c98, rbp = 0x1e9fa1989db0 ---

https://people.freebsd.org/~pho/stress/log/log0429.txt

PS
The BIOS and IPMI firmware was just updated on this test box, but the page fault seems legit to me?

Mar 28 2023, 4:53 PM

Mar 26 2023

jah added a comment to D39272: unionfs: remove LK_UPGRADE if falling back to the standard lock.

In D39272#894131, @kib wrote:

For VOP_MKDIR() change. Please note that almost any VOP modifying metadata could drop the vnode lock. The list is like VOP_CREAT(), VOP_LINK(), VOP_REMOVE(), VOP_WHITEOUT(), VOP_RMDIR(), VOP_MAKEINODE(), VOP_RENAME().

Mar 26 2023, 4:36 PM

jah requested review of D39272: unionfs: remove LK_UPGRADE if falling back to the standard lock.

Mar 26 2023, 2:59 AM