Avoid a LOR between vnode locks and allproc_lock.
ClosedPublic
Actions

Authored by kib on Jun 1 2016, 6:42 PM.

Details

Reviewers

Commits

rS302089: Fix a LOR between vnode locks and allproc_lock.

Summary

There is an order between covered vnode lock and allproc_lock, which is established by calling mountcheckdirs() while owning the covered vnode lock. mountcheckdirs() iterates over the processes, protected by allproc_lock. I do not see how this can be changed.

On the other hand, various VM daemons also need to iterate over all processes, and they lock and unlock user maps. Since unlock of the user map may trigger processing of the deferred map entries, it causes vnode locking to occur. Or, when vmspace is freed, dropping references on the vnode-backed object also lock vnodes. We get reverted order comparing with the mount/unmount order.

For VM daemons, I do not see a need to own allproc_lock while we operate on vmspaces. If the process is held, it serves as the marker for allproc list iterator. This is the main change in the patch.

Also, several PHOLD() calls in the sys/vm were replaced by direct increment of p_lock, since I think that calling faultin() for OOM is wrong.

Test Plan

Tested by Peter, with the following debugging change:

diff --git a/sys/kern/subr_witness.c b/sys/kern/subr_witness.c
index a678635..5070e67 100644
--- a/sys/kern/subr_witness.c
+++ b/sys/kern/subr_witness.c
@@ -481,6 +481,9 @@ static const char w_stillcold[] = "Witness is still cold\n";


 static struct witness_order_list_entry order_lists[] = {
+       { "ufs", &lock_class_lockmgr },
+       { "allproc", &lock_class_sx },
+       { NULL, NULL },
        /*
         * sx locks
         */

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

kib updated this revision to Diff 17195.Jun 1 2016, 6:42 PM

kib retitled this revision from to Avoid a LOR between vnode locks and allproc_lock..

kib updated this object.

kib edited the test plan for this revision. (Show Details)

kib added a reviewer: jhb.

kib set the repository for this revision to rS FreeBSD src repository - subversion.

kib added subscribers: alc, pho.

Herald added a subscriber: imp. · View Herald TranscriptJun 1 2016, 6:42 PM

Update after r302063. This in fact removed one case.

Maybe have a _PHOLD_LITE() that just does the ++ but asserts that P_INMEM is set, or do you think there are cases where it won't be? I guess since other callers always check P_INMEM it's ok if it's not. It would be nice to keep the existing KASSERT against P_WEXIT in _PHOLD_LITE though.

The comment above PHOLD could use updating though (and in this case it could explain _PHOLD vs _PHOLD_LITE).

In D6679#144958, @jhb wrote:

Maybe have a _PHOLD_LITE() that just does the ++ but asserts that P_INMEM is set, or do you think there are cases where it won't be? I guess since other callers always check P_INMEM it's ok if it's not. It would be nice to keep the existing KASSERT against P_WEXIT in _PHOLD_LITE though.

The comment above PHOLD could use updating though (and in this case it could explain _PHOLD vs _PHOLD_LITE).

At least for vm_pageout.c uses, I do not see why P_INMEM must be already set, and esp. I do not see why in the OOM or page-out conditions we need to swap in stacks if they are not in memory. It only makes the memory deficit worse.

Add _PHOLD_LITE() and use it. Refresh the comment explaining PHOLD() and other macros.

Remove mis-merged chunk from vm_meter.c.

Thanks.

This revision is now accepted and ready to land.Jun 22 2016, 4:10 PM

Closed by commit rS302089: Fix a LOR between vnode locks and allproc_lock. (authored by kib). · Explain WhyJun 22 2016, 8:15 PM

This revision was automatically updated to reflect the committed changes.

As a follow on, I would suggest reconsidering whether we still need (or want) to do "trylock" on the vm map.

In D6679#145284, @alc wrote:

As a follow on, I would suggest reconsidering whether we still need (or want) to do "trylock" on the vm map.

Do you mean a follow-up as a work on the same piece of code, or is there more intricate connection between avoiding owning allproc_lock in the vmspace_acquire/free blocks and blocked vm_map_lock_read ? I suppose it is the former, and not the later ?

Indeed, we do drop the map lock when waiting for a page in vm_fault. But I think that there are several places where we sleep for a page under the map lock, e.g. calls to vm_fault_copy_entry() (and vm_object_split() from there). If we replace the try version of vm_map_lock with the blockable version, OOM and swap-out handlers become depending on the progress of the threads which are blocked waiting for a free page.

For me, it does not look desirable.