fifo: null-check fifoinfo on close
AbandonedPublic
Actions

Authored by mjg on Sep 2 2019, 12:14 PM.

Details

Reviewers

Summary

I added toy early usecount in vget (patch at the bottom) and kept running into panics when running suj11.sh from stress2.

fifo_open populates fifoinfo if necessary and fifo_close frees it. ffs_vgets gets rid of vnodes from the hash and it starts by activating them. Since the vnode is active, vgonel calls the VOP_CLOSE routine. For fifo vnodes this means fifo_close is called without corresponding open first, leading to a NULL pointer deference. Check for the condition.

Arguably ufs should not be activating these vnodes but that's perhaps for later. I don't know this is easy to run into with early activation, I suspect it's about things like vtryrecycle which now can get a sneaked in usecount (previously it would be stopped by the vnode lock). I don't know if there are other problems with this yet.

Fatal trap 12: page fault while in kernel mode
cpuid = 30; apic id = 30
fault virtual address = 0x0
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80491db9
stack pointer = 0x28:0xfffffe01263f82e0
frame pointer = 0x28:0xfffffe01263f8310
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 1717 (creat)
trap number = 12
panic: page fault
cpuid = 30
time = 1567425074
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01263f7fa0
vpanic() at vpanic+0x19d/frame 0xfffffe01263f7ff0
panic() at panic+0x43/frame 0xfffffe01263f8050
trap_fatal() at trap_fatal+0x39c/frame 0xfffffe01263f80b0
trap_pfault() at trap_pfault+0x62/frame 0xfffffe01263f8100
trap() at trap+0x2b2/frame 0xfffffe01263f8210
calltrap() at calltrap+0x8/frame 0xfffffe01263f8210

trap 0xc, rip = 0xffffffff80491db9, rsp = 0xfffffe01263f82e0, rbp = 0xfffffe01263f8310 ---

fifo_close() at fifo_close+0x19/frame 0xfffffe01263f8310
VOP_CLOSE_APV() at VOP_CLOSE_APV+0x82/frame 0xfffffe01263f8330
vgonel() at vgonel+0xc5/frame 0xfffffe01263f83a0
vgone() at vgone+0x2f/frame 0xfffffe01263f83c0
ffs_vgetf() at ffs_vgetf+0x82/frame 0xfffffe01263f8440
ffs_valloc() at ffs_valloc+0x593/frame 0xfffffe01263f84e0
ufs_makeinode() at ufs_makeinode+0xb1/frame 0xfffffe01263f8670
ufs_create() at ufs_create+0x34/frame 0xfffffe01263f8690
VOP_CREATE_APV() at VOP_CREATE_APV+0x86/frame 0xfffffe01263f86c0
vn_open_cred() at vn_open_cred+0x2eb/frame 0xfffffe01263f8810
kern_openat() at kern_openat+0x1f3/frame 0xfffffe01263f8980
amd64_syscall() at amd64_syscall+0x2b9/frame 0xfffffe01263f8ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe01263f8ab0

syscall (499, FreeBSD ELF64, sys_openat), rip = 0x80081247a, rsp = 0x7fffffffdf18, rbp = 0x7fffffffdfc0 ---

KDB: enter: panic
[ thread pid 1717 tid 100943 ]
Stopped at kdb_enter+0x3b: movq $0,kdb_why

diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
index b18237c0b16c..a53029f9ab90 100644
--- a/sys/kern/vfs_subr.c
+++ b/sys/kern/vfs_subr.c
@@ -2717,7 +2717,7 @@ v_decr_devcount(struct vnode *vp)
 int
 vget(struct vnode *vp, int flags, struct thread *td)
 {
-       int error, oweinact;
+       int error;
 
        VNASSERT((flags & LK_TYPE_MASK) != 0, vp,
            ("vget: invalid lock operation"));
@@ -2732,42 +2732,24 @@ vget(struct vnode *vp, int flags, struct thread *td)
 
        CTR3(KTR_VFS, "%s: vp %p with flags %d", __func__, vp, flags);
 
+       if (!(flags & LK_INTERLOCK)) {
+               VI_LOCK(vp);
+               flags |= LK_INTERLOCK;
+       }
        if ((flags & LK_VNHELD) == 0)
-               _vhold(vp, (flags & LK_INTERLOCK) != 0);
+               vholdl(vp);
+       vp->v_iflag &= ~VI_OWEINACT;
+       refcount_acquire(&vp->v_usecount);
+       v_incr_devcount(vp);
 
        if ((error = vn_lock(vp, flags)) != 0) {
-               vdrop(vp);
+               vrele(vp);
                CTR2(KTR_VFS, "%s: impossible to lock vnode %p", __func__,
                    vp);
                return (error);
        }
        if (vp->v_iflag & VI_DOOMED && (flags & LK_RETRY) == 0)
                panic("vget: vn_lock failed to return ENOENT\n");
-       /*
-        * We don't guarantee that any particular close will
-        * trigger inactive processing so just make a best effort
-        * here at preventing a reference to a removed file.  If
-        * we don't succeed no harm is done.
-        *
-        * Upgrade our holdcnt to a usecount.
-        */
-       if (vp->v_type == VCHR ||
-           !refcount_acquire_if_not_zero(&vp->v_usecount)) {
-               VI_LOCK(vp);
-               if ((vp->v_iflag & VI_OWEINACT) == 0) {
-                       oweinact = 0;
-               } else {
-                       oweinact = 1;
-                       vp->v_iflag &= ~VI_OWEINACT;
-                       VNODE_REFCOUNT_FENCE_REL();
-               }
-               refcount_acquire(&vp->v_usecount);
-               v_incr_devcount(vp);
-               if (oweinact && VOP_ISLOCKED(vp) == LK_EXCLUSIVE &&
-                   (flags & LK_NOWAIT) == 0)
-                       vinactive(vp, td);
-               VI_UNLOCK(vp);
-       }
        return (0);
 }

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 26227

Event Timeline

mjg created this revision.Sep 2 2019, 12:14 PM

Herald added a subscriber: imp. · View Herald TranscriptSep 2 2019, 12:14 PM

Harbormaster completed remote builds in B26227: Diff 61548.Sep 2 2019, 12:15 PM

Yes, there are other problems. VOP_CLOSE() must be paired with VOP_OPEN(). This would break NFS badly, esp. NFS v4.

You are right, e.g. stat(2) avoided the issues by not dropping the lock between vget() in namei and vput().

For ufs and tmpfs close routine will be irrelevant. For zfs it looks like a complete no-op. I have trouble deciphering whether devfs_close wants to be called no matter what, it looks like it does not.

So we can probably just add a flag (VV_OPEN?) and call VOP_CLOSE based on that. Actually scratch that, there is weird discrepancy here. On one hand everyone can VOP_OPEN multiple times, yet there is apparently just one VOP_CLOSE call from vgone to take care of everything and at the same time just VOP_CLOSE for regular case should keep the vnode perfectly usable. Perhaps the close should be removed altogether from that codepath and all the handling moved to reclaim. the VOP_INACTIVE call shoud also be eliminated, just let the reclaim routine clean up however it sees fit.

this needs a rework where all the close/inactive processing from vgone is moved into reclaim

Revision Contents
Changeset List

Path

Size

sys/

fs/

fifofs/

fifo_vnops.c

7 lines

Diff 61548

View Options

sys/fs/fifofs/fifo_vnops.c

fifo: null-check fifoinfo on closeAbandonedPublicActions