Fix synchronization between vm_pqbatch_process_page() and pagedaemon.
ClosedPublic
Actions

Authored by markj on Sep 4 2018, 8:04 PM.

Details

Reviewers

alc
jeff
kib
glebius

Commits

rS338536: Relax an assertion in vm_pqbatch_process_page().

Summary

The locking protocol for page queue operations says that m->queue may
only transition between PQ_NONE and a page queue index, and that the
queue lock for from-value of m->queue must be held when performing the
update.

The page daemon is allowed to break this rule in the PQ_INACTIVE scan,
as an optimization. For each page, it first physically removes the page
from the queue while the inactive queue lock is held. Then, it acquires
the page lock, and verifies that m->queue == PQ_INACTIVE and that none
of the queue state flags are set. (For example, if PGA_DEQUEUE is set,
the page is logically dequeued and may not be freed.) Immediately
before freeing the page, the page daemon sets m->queue = PQ_NONE.

Currently, when performing per-CPU per-pagequeue batch operations, the loop
looks like this:

foreach m in batch queue:
    if m->queue == queue:
        aflags = m->aflags;
	<process m based on aflags>

The problem is that the page daemon may update m->queue after the
initial check of m->queue. We thus need to be more careful about the
ordering of the accesses of m->queue and m->aflags.

In pratice, this manifests as an assertion failure: the check
pq == vm_page_pagequeue(m) at the beginning of
vm_pqbatch_process_page() is invalid because the page daemon is allowed
to update m->queue. I think it would be very difficult to observe any
effects of this race in a non-INVARIANTS kernel.

Test Plan

Peter reported the issue and has not been able to trigger any other
bugs so far with this patch applied.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

markj created this revision.Sep 4 2018, 8:04 PM

Harbormaster completed remote builds in B19391: Diff 47660.Sep 4 2018, 8:04 PM

markj edited the summary of this revision. (Show Details)Sep 4 2018, 8:10 PM

markj edited the test plan for this revision. (Show Details)

markj edited the summary of this revision. (Show Details)

markj added reviewers: alc, jeff, kib.Sep 4 2018, 8:12 PM

markj added a reviewer: glebius.Sep 5 2018, 2:14 AM

kib accepted this revision.Sep 6 2018, 1:51 PM

This revision is now accepted and ready to land.Sep 6 2018, 1:51 PM

alc added inline comments.Sep 6 2018, 5:38 PM

sys/vm/vm_pageout.c
1557 ↗	(On Diff #47660)	I'm a bit confused here. In the C11 model, and thus our derivative model, there must be a subsequent (atomic) store, and a synchronizes-with relationship is established between that store and an (atomic) load that precedes the acquire fence. Subsequent reads like the KASSERT read of aflags can still be performed before the fence, or more precisely the store on which the synchronizes-with relationship is established.

markj added inline comments.Sep 6 2018, 8:35 PM

sys/vm/vm_pageout.c
1557 ↗	(On Diff #47660)	Right, this doesn't quite make sense. I believe I was overthinking the race. I want to be certain that vm_pqbatch_process_page() isn't processing a page while the page daemon is simultaneously freeing it. vm_pqbatch_process_page() is a no-op if (m->aflags & PGA_QUEUE_STATE_MASK) == 0, and that condition must also be true if the page daemon is about to free m. The page daemon acquires the page lock before checking the state flags, which prevents them from being set after the checks and before the free. Suppose that a thread executing vm_pqbatch_process_page() observes (m->aflags & PGA_QUEUE_STATE_MASK) != 0. Then the page daemon cannot have already performed the queue state flag checks, and vm_pqbatch_process_page() only clears those state flags as its last step before returning. So I think the old code is already correct, and the only problem is that the pq == vm_page_pagequeue(m) assertion can be false. It thus should be sufficient to change it to: qflags = (atomic_read_8(&m->aflags) & PGA_QUEUE_STATE_MASK); KASSERT(pq == vm_page_pagequeue(m) \|\| qflags == 0, ...);