vm: lockless fault handling for backing objects
AbandonedPublic
Actions

Authored by mjg on Aug 8 2022, 3:26 PM.

Details

Reviewers

kib
markj
alc
dougm

Summary

An alternative to D36038 -- experimental, works for me (tm).

I have to say I don't know the requirements to safely traverse the object chain list. However, vm_fault_object at some point can wunlock and wlock the object again, all while only having "paging in progress" on it. Assuming this is correct, the patch below should also be fine in that regard. That is the object used is still protected with PIP and after the page is busied and re-checked for identity + validity + the object not being dead, the guarantees should be the same as with the current code *after* the lock is dropped when FAULT_SOFT is returned. Finally, if the aforementioned wunlock/wlock is indeed fine, it should also be fine to just wlock as fallback.

I did not use vm_page_grab_unlocked(..., VM_ALLOC_NOCREAT) because it would mean the vm_page_all_valid check could only be performed after busying -- for pages which are not valid this only increases contention, all while it is avoidable. Additionally the sleeping mechanism is different so I would have to add VM_ALLOC_NOWAIT and handle it on my own anyway.

I also added vm_page_trybusy_unlocked for safe checks.

commit dc4bdbab0efd09ae608c2d54adaf48f49879a1b1
Author: Mateusz Guzik <mjg@FreeBSD.org>
Date:   Sun Aug 7 13:12:43 2022 +0000

    vm: unlocked lookup in fault handling against backing objects

commit bf1c404924185a097fdae38da6ada97faca2cb64
Author: Mateusz Guzik <mjg@FreeBSD.org>
Date:   Sun Aug 7 13:07:59 2022 +0000

    vm: include function name when checking vm_fault_object retval
    
    Reviewed by:
    Differential Revision:

commit 901fb82296bc540b956d4f99d72c65c3d1dfe7fb
Author: Mateusz Guzik <mjg@FreeBSD.org>
Date:   Sun Aug 7 13:05:47 2022 +0000

    vm: move up object lock asserts in fault functions
    
    No functional changes.
    
    Reviewed by:
    Differential Revision:

commit d65823d4d93b1c749dcd94d34763d94b73734e40
Author: Mateusz Guzik <mjg@FreeBSD.org>
Date:   Fri Aug 12 14:16:46 2022 +0000

    vm: add vm_page_trybusy_unlocked
    
    This allows consumers to find the page and safely busy without holding
    the object lock.
    
    Reviewed by:
    Differential Revision:

Test Plan

tested by pho

also built over 31k packages at -j 104.

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

mjg requested review of this revision.Aug 8 2022, 3:26 PM

mjg created this revision.

mjg added inline comments.Aug 8 2022, 3:45 PM

sys/vm/vm_fault.c
1494	Perhaps page identity check should be implemented as a dedicated routine so that it stops being opencoded whenever vm_page_lookup_unlocked is used. It could even assert busy.

This passes all my testing so far (tinderbox etc.), any comments?

sys/vm/vm_fault.c
1498	wonder if this perhaps should be in trybusy_unlocked?

kib added inline comments.Aug 16 2022, 10:16 PM

sys/vm/vm_fault.c
1498	So what if we missed OBJ_DEAD? Without your patch, lookup and the check are consistent because we own the object lock.

mjg added inline comments.Aug 17 2022, 11:39 AM

sys/vm/vm_fault.c

1498

This is already used by VOP_READ_PGCACHE handling for ufs. In vn_read_from_obj you can find:

obj = atomic_load_ptr(&vp->v_object);
if (obj == NULL)
        return (EJUSTRETURN);

/*
 * Depends on type stability of vm_objects.
 */ 
vm_object_pip_add(obj, 1);
if ((obj->flags & OBJ_DEAD) != 0) {
        /*
         * Note that object might be already reused from the
         * vnode, and the OBJ_DEAD flag cleared.  This is fine,
         * we recheck for DOOMED vnode state after all pages
         * are busied, and retract then.
         *
         * But we check for OBJ_DEAD to ensure that we do not
         * busy pages while vm_object_terminate_pages()
         * processes the queue.
         */
        error = EJUSTRETURN;
        goto out_pip;
}

My code happens to check the flag later, but it should be inconsequential. I can make this match the above by moving the check to vm_fault_next.

Also note locked fault handling unlocks the object before inserting the page into pmap, thus it only has the page xbusied + pip on the object, so it can already end up with a dead obj mid-fault. The patched code ends up in the same spot, except avoids the lock.

mjg edited the test plan for this revision. (Show Details)Aug 19 2022, 5:54 PM

kib added inline comments.Aug 19 2022, 10:00 PM

sys/vm/vm_fault.c
1498	The difference is that vn_read_from_obj() only operates on the OBJT_VNODE objects. We recheck for doomed vnode after the pages are busied, which prevents us from operating on an object/pages that changed their identity. For your change, you also need to handle anonymous objects, which might be collapsed on the shadow chain, see vm_object_collapse(). If the shadowing object is not locked, until the shadow' paging_in_progress is bumped, it is possible that the shadow is removed from the chain and even reused. Then even the OBJ_DEAD check is not enough, perhaps you need to ensure that the object is still participating in the same chain after pip is incremented (which prevents collapsing). May be there is already a reason why object cannot be collapsed in this case, but I am not sure.

mjg abandoned this revision.Aug 20 2022, 12:34 PM

mjg added inline comments.

sys/vm/vm_fault.c
1498	freeing is definitely not possible thanks to pip taken with the previous object still locked, but indeed the shadow processing can mess things up here. I'm going to stick to the rlocked route for now as it is sufficient to fix the problem i'm having.

kib added inline comments.Aug 20 2022, 5:32 PM

sys/vm/vm_fault.c
1498	Freeing is not possible because objects are type-stable. What is possible is identity change if there is not some handle on the object before pip is incremented. Owning either the shadow object lock, or shadow object pip reference (for non-first object) should provide the inter-locking for descend over the chain. Or you may re-check that the object is still on chain after pip increment, at the same position. I think this lock-less approach can work, assuming the precaution described above is either already true (and then a comment explaining it needs to be added), or is implemented. Then this lock-less approach is preferred over using the read lock, IMO.

mjg added inline comments.Aug 20 2022, 5:42 PM

sys/vm/vm_fault.c
1498	In this context it meant reaching uma_zfree call. There is an explicit wait for pip to drain. Right now I don't see a perf difference between rlocked and unlocked variant, mostly because of frequent sleeps on xbusy. Anyhow after looking more at the code I think there are important changes needed for sensible lockless operation (and by lockless i mean preferably avoiding pip as well). Basic idea would be to add sequence counters to all operations which can shuffle pages around, kill an object and so on. On top of that a guarantee that sbusying page + validating obj, pindex touple keeps the page stable. Then fault handling should be able to avoid ping-ponging on vm object in the common case. I'll look into it in foreseeable future.