Page MenuHomeFreeBSD

corefile_open_last: don't keep a locked vnode while locking other ones
ClosedPublic

Authored by avg on May 27 2020, 12:35 PM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 20 2023, 1:06 AM
Unknown Object (File)
Sep 8 2023, 12:46 AM
Unknown Object (File)
Sep 8 2023, 12:14 AM
Unknown Object (File)
Sep 8 2023, 12:13 AM
Unknown Object (File)
Sep 1 2023, 11:56 PM
Unknown Object (File)
Aug 22 2023, 6:25 PM
Unknown Object (File)
Jun 25 2023, 4:39 AM
Unknown Object (File)
Jun 25 2023, 4:01 AM
Subscribers

Details

Summary

Consider this scenario:

  • kern.corefile=/var/coredumps/%N.%U.%I.core
  • multiple processes with the same name crash at the same time

It's possible that one process selects existing file N as oldvp while it
keeps looking for an unused file number. Another process scans through
files and stumbles upon N. That process would be blocked on the vnode lock
while holding the directory vnode exclusively locked. The first process
would, thus, get blocked on the director's vnode lock.

More generally, holding a file's vnode lock (oldvp) while trying to lock its
directory (for the next lookup) is a violation of the vnode locking order.

I have observed this deadlock in the wild.

So, the change to keep oldvp "opened" but unlocked and to lock it again only
if it's to be returned as the result.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

avg requested review of this revision.May 27 2020, 12:35 PM

Just in case:

(kgdb) tid 101533
(kgdb) bt
#0  sched_switch (td=0xfffffe00b4f85500, flags=<optimized out>) at /usr/devel/git/motil/sys/kern/sched_ule.c:2147
#1  0xffffffff807e4de2 in mi_switch (flags=260) at /usr/devel/git/motil/sys/kern/kern_synch.c:542
#2  0xffffffff80831916 in sleepq_switch (wchan=0xfffff80085627bd8, pri=<optimized out>) at /usr/devel/git/motil/sys/kern/subr_sleepqueue.c:625
#3  0xffffffff807acb61 in sleeplk (lk=0xfffff80085627bd8, flags=532480, ilk=<optimized out>, wmesg=<optimized out>, pri=<optimized out>, timo=51, queue=0) at /usr/devel/git/motil/sys/kern/kern_lock.c:295
#4  0xffffffff807ab19f in lockmgr_xlock_hard (lk=0xfffff80085627bd8, flags=<unavailable>, ilk=0x0, file=0xffffffff80c40f02 "/usr/devel/git/motil/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c", line=1438, 
    lwa=0xfffff80085627bd8) at /usr/devel/git/motil/sys/kern/kern_lock.c:841
#5  0xffffffff808c3c64 in VOP_LOCK1 (vp=0xfffff80085627b70, flags=532480, file=0xffffffff80c40f02 "/usr/devel/git/motil/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c", line=1438) at ./vnode_if.h:879
#6  _vn_lock (vp=0xfffff80085627b70, flags=532480, file=0xffffffff80c40f02 "/usr/devel/git/motil/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c", line=1438) at /usr/devel/git/motil/sys/kern/vfs_vnops.c:1613
#7  0xffffffff8045ef27 in zfs_lookup_lock (dvp=0xfffff8000bc1c5b8, vp=0xfffff80085627b70, name=0xfffffe00b07cb3b0 "kcminit.1001.00.core", lkflags=532480)
    at /usr/devel/git/motil/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1438
#8  zfs_lookup (dvp=<optimized out>, nm=0xfffffe00b07cb3b0 "kcminit.1001.00.core", vpp=<optimized out>, cnp=0xfffffe00b07cb8a0, nameiop=1, cr=<optimized out>, td=<optimized out>, flags=0, cached=1)
    at /usr/devel/git/motil/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1612
#9  0xffffffff8045f60d in zfs_freebsd_lookup (ap=0xfffffe00b07cb4e0, cached=<error reading variable: Cannot access memory at address 0x1>) at /usr/devel/git/motil/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4922
#10 zfs_freebsd_cachedlookup (ap=0xfffffe00b07cb4e0) at /usr/devel/git/motil/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4930
#11 0xffffffff80898248 in VOP_CACHEDLOOKUP (dvp=0xfffff8000bc1c5b8, vpp=0xfffffe00b07cb870, cnp=0xfffffe00b07cb8a0) at ./vnode_if.h:80
#12 vfs_cache_lookup (ap=<optimized out>) at /usr/devel/git/motil/sys/kern/vfs_cache.c:2149
#13 0xffffffff808a22c1 in VOP_LOOKUP (dvp=0xfffff8000bc1c5b8, vpp=0xfffffe00b07cb870, cnp=0xfffffe00b07cb8a0) at ./vnode_if.h:54
#14 lookup (ndp=0xfffffe00b07cb810) at /usr/devel/git/motil/sys/kern/vfs_lookup.c:951
#15 0xffffffff808a17f3 in namei (ndp=0xfffffe00b07cb810) at /usr/devel/git/motil/sys/kern/vfs_lookup.c:512
#16 0xffffffff808c314b in vn_open_cred (ndp=0xfffffe00b07cb810, flagp=0xfffffe00b07cba4c, cmode=384, vn_open_flags=5, cred=0xfffff801edb7a000, fp=0x0) at /usr/devel/git/motil/sys/kern/vfs_vnops.c:226
#17 0xffffffff807dcec4 in corefile_open_last (td=<optimized out>, name=<optimized out>, indexpos=<optimized out>, indexlen=<optimized out>, ncores=<optimized out>, vpp=<optimized out>)
    at /usr/devel/git/motil/sys/kern/kern_sig.c:3434
#18 corefile_open (comm=0xfffff80013c098f0 "kcminit", uid=<optimized out>, pid=<optimized out>, td=0xfffffe00b4f85500, compress=0, signum=5, vpp=<optimized out>, namep=<optimized out>)
    at /usr/devel/git/motil/sys/kern/kern_sig.c:3585
#19 coredump (td=0xfffffe00b4f85500) at /usr/devel/git/motil/sys/kern/kern_sig.c:3669
#20 sigexit (td=0xfffffe00b4f85500, sig=6) at /usr/devel/git/motil/sys/kern/kern_sig.c:3211
#21 0xffffffff807ddc3c in postsig (sig=6) at /usr/devel/git/motil/sys/kern/kern_sig.c:3109
#22 0xffffffff8083969b in ast (framep=0xfffffe00b07cbc00) at /usr/devel/git/motil/sys/kern/subr_trap.c:336

nd = {ni_dirp = 0xfffff8000be51c00 "/var/coredumps/kcminit.1001.00.core", ni_segflg = UIO_SYSSPACE, ni_rightsneeded = {cr_rights = {144115188075855872, 288230376151711744}}, ni_startdir = 0x0, ni_rootdir = 0xfffff8000b76a1e8, 
  ni_topdir = 0x0, ni_dirfd = -100, ni_lcf = 0, ni_filecaps = {fc_rights = {cr_rights = {0, 0}}, fc_ioctls = 0x0, fc_nioctls = -1, fc_fcntls = 0}, ni_vp = 0xfffff80085627b70, ni_dvp = 0xfffff8000bc1c5b8, ni_resflags = 1, 
  ni_pathlen = 1, ni_next = 0xfffff8000b855c23 "", ni_loopcnt = 0, ni_cnd = {cn_nameiop = 1, cn_flags = 2129964, cn_thread = 0xfffffe00b4f85500, cn_cred = 0xfffff801edb7a000, cn_lkflags = 532480, 
    cn_pnbuf = 0xfffff8000b855c00 "/var/coredumps/kcminit.1001.00.core", cn_nameptr = 0xfffff8000b855c0f "kcminit.1001.00.core", cn_namelen = 20}, ni_cap_tracker = {tqh_first = 0x0, tqh_last = 0xfffffe00b07cb8e0}, 
  ni_beneath_latch = 0xfffffe00b07cb920}


(kgdb) fr 5
#5  0xffffffff808c3c64 in VOP_LOCK1 (vp=0xfffff80085627b70, flags=532480, file=0xffffffff80c40f02 "/usr/devel/git/motil/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c", line=1438) at ./vnode_if.h:879
(kgdb) p/x vp->v_lock.lk_lock 
$4 = 0xfffffe00b261b004

(kgdb) p $5.td_tid
$6 = 101707
(kgdb) tid 101707
(kgdb) bt
#0  sched_switch (td=0xfffffe00b261b000, flags=<optimized out>) at /usr/devel/git/motil/sys/kern/sched_ule.c:2147
#1  0xffffffff807e4de2 in mi_switch (flags=260) at /usr/devel/git/motil/sys/kern/kern_synch.c:542
#2  0xffffffff80831916 in sleepq_switch (wchan=0xfffff8000bc1c620, pri=<optimized out>) at /usr/devel/git/motil/sys/kern/subr_sleepqueue.c:625
#3  0xffffffff807acb61 in sleeplk (lk=0xfffff8000bc1c620, flags=2105344, ilk=<optimized out>, wmesg=<optimized out>, pri=<optimized out>, timo=51, queue=1) at /usr/devel/git/motil/sys/kern/kern_lock.c:295
#4  0xffffffff807aad3e in lockmgr_slock_hard (lk=0xfffff8000bc1c620, flags=2105344, ilk=<optimized out>, file=0xffffffff80c535c3 "/usr/devel/git/motil/sys/kern/vfs_subr.c", line=<optimized out>, lwa=<optimized out>)
    at /usr/devel/git/motil/sys/kern/kern_lock.c:649
#5  0xffffffff808c3c64 in VOP_LOCK1 (vp=0xfffff8000bc1c5b8, flags=2105344, file=0xffffffff80c535c3 "/usr/devel/git/motil/sys/kern/vfs_subr.c", line=6152) at ./vnode_if.h:879
#6  _vn_lock (vp=0xfffff8000bc1c5b8, flags=2105344, file=0xffffffff80c535c3 "/usr/devel/git/motil/sys/kern/vfs_subr.c", line=6152) at /usr/devel/git/motil/sys/kern/vfs_vnops.c:1613
#7  0xffffffff808b32a5 in vfs_cache_root (mp=0xfffffe00925b2b00, flags=2105344, vpp=0xfffffe00abccc558) at /usr/devel/git/motil/sys/kern/vfs_subr.c:6152
#8  0xffffffff808a24b3 in lookup (ndp=0xfffffe00abccc810) at /usr/devel/git/motil/sys/kern/vfs_lookup.c:1033
#9  0xffffffff808a17f3 in namei (ndp=0xfffffe00abccc810) at /usr/devel/git/motil/sys/kern/vfs_lookup.c:512
#10 0xffffffff808c314b in vn_open_cred (ndp=0xfffffe00abccc810, flagp=0xfffffe00abccca4c, cmode=384, vn_open_flags=5, cred=0xfffff801edb7a000, fp=0x0) at /usr/devel/git/motil/sys/kern/vfs_vnops.c:226
#11 0xffffffff807dcec4 in corefile_open_last (td=<optimized out>, name=<optimized out>, indexpos=<optimized out>, indexlen=<optimized out>, ncores=<optimized out>, vpp=<optimized out>)
    at /usr/devel/git/motil/sys/kern/kern_sig.c:3434
#12 corefile_open (comm=0xfffff8000be5d3d0 "kcminit", uid=<optimized out>, pid=<optimized out>, td=0xfffffe00b261b000, compress=-2057143440, signum=5, vpp=<optimized out>, namep=<optimized out>)
    at /usr/devel/git/motil/sys/kern/kern_sig.c:3585
#13 coredump (td=0xfffffe00b261b000) at /usr/devel/git/motil/sys/kern/kern_sig.c:3669
#14 sigexit (td=0xfffffe00b261b000, sig=6) at /usr/devel/git/motil/sys/kern/kern_sig.c:3211
#15 0xffffffff807ddc3c in postsig (sig=6) at /usr/devel/git/motil/sys/kern/kern_sig.c:3109
#16 0xffffffff8083969b in ast (framep=0xfffffe00abcccc00) at /usr/devel/git/motil/sys/kern/subr_trap.c:336

nd = {ni_dirp = 0xfffff80004fde800 "/var/coredumps/kcminit.1001.07.core", ni_segflg = UIO_SYSSPACE, ni_rightsneeded = {cr_rights = {144115188075855872, 288230376151711744}}, ni_startdir = 0x0, ni_rootdir = 0xfffff8000b76a1e8, 
  ni_topdir = 0x0, ni_dirfd = -100, ni_lcf = 0, ni_filecaps = {fc_rights = {cr_rights = {0, 0}}, fc_ioctls = 0x0, fc_nioctls = -1, fc_fcntls = 0}, ni_vp = 0xfffff8000b76b3d0, ni_dvp = 0xfffff80004b60988, ni_resflags = 1, 
  ni_pathlen = 22, ni_next = 0xfffff800765ba80e "/kcminit.1001.07.core", ni_loopcnt = 0, ni_cnd = {cn_nameiop = 1, cn_flags = 2113580, cn_thread = 0xfffffe00b261b000, cn_cred = 0xfffff801edb7a000, cn_lkflags = 2097152, 
    cn_pnbuf = 0xfffff800765ba800 "/var/coredumps/kcminit.1001.07.core", cn_nameptr = 0xfffff800765ba805 "coredumps/kcminit.1001.07.core", cn_namelen = 9}, ni_cap_tracker = {tqh_first = 0x0, tqh_last = 0xfffffe00abccc8e0}, 
  ni_beneath_latch = 0xfffffe00abccc920}
i = 7
oldvp = 0xfffff80085627b70

(kgdb) fr 5
#5  0xffffffff808c3c64 in VOP_LOCK1 (vp=0xfffff8000bc1c5b8, flags=2105344, file=0xffffffff80c535c3 "/usr/devel/git/motil/sys/kern/vfs_subr.c", line=6152) at ./vnode_if.h:879
(kgdb) p/x vp->v_lock.lk_lock               
$8 = 0xfffffe00b4f85502

(kgdb) p $9.td_tid
$10 = 101533

Then the result of the check is invalidated after the vnode is unlocked.
But this iteration looks dubious enough so that I do not think it matters.

Probably the only way to fix it is to lock the parent for whole duration of the loop and use VOP_LOOKUP() instead of namei(), which is too much trouble.

This revision is now accepted and ready to land.May 27 2020, 7:31 PM