Page MenuHomeFreeBSD

Ensure that all queue state is cleared when vm_page_dequeue() returns.
ClosedPublic

Authored by markj on Jun 23 2018, 12:17 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Jan 12, 6:39 PM
Unknown Object (File)
Dec 9 2024, 5:59 PM
Unknown Object (File)
Nov 5 2024, 10:36 AM
Unknown Object (File)
Nov 5 2024, 2:05 AM
Unknown Object (File)
Nov 1 2024, 5:26 PM
Unknown Object (File)
Nov 1 2024, 5:26 PM
Unknown Object (File)
Nov 1 2024, 5:25 PM
Unknown Object (File)
Nov 1 2024, 5:25 PM
Subscribers

Details

Summary

Right now, vm_page_dequeue() may be called with or without the page lock
held. If the lock is not held, nothing prevents a different thread from
enqueuing the page before vm_page_dequeue() returns. In practice, this
does not happen because vm_page_dequeue() is called either with the lock
held or from the page allocation functions (so other threads cannot be
manipulating the page).

Suppose m is wired and on a page queue. Suppose the page daemon visits
m and schedules a deferred dequeue. This sets PGA_DEQUEUE and creates a
batch queue entry for the page. A bit later, the batch queue is
processed with only the page queue lock held. This processing will set
m->queue = PQ_NONE, issue a release fence, and clear PGA_DEQUEUE.
Suppose that a thread concurrently unwires the page. It will call
vm_page_dequeue(), which returns if it observes m->queue == PQ_NONE.
Thus, vm_page_dequeue() may return while PGA_DEQUEUE is still set, and
in particular, the thread processing the batch queue will clear all
queue state flags at some point after vm_page_dequeue() returns.

To fix this, make vm_page_dequeue() return only once the concurrent
dequeue has completed. The concurrent dequeue will happen in a critical
section, so we should end up looping for only a short period.

While here, do some related cleanup: inline vm_page_dequeue_locked()
into its only caller and delete a prototype for the unimplemented
vm_page_requeue_locked(). I added a volatile qualifier to "queue" in
r333703; instead, use atomic_load_*() only when specifically needed.

Test Plan

Peter observed the scenario described above in a stress test. The unwiring
thread dequeued the thread; in the subsequent vm_page_enqueue() call,
the assertion (m->aflags & PGA_QUEUE_STATE_MASK) == 0 failed.

An earlier version of this patch seems to have fixed the problem, but I will
ask for a retest if the approach taken here is agreed upon.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 18539
Build 18241: arc lint + arc unit

Event Timeline

markj added reviewers: alc, kib, jeff.
sys/vm/vm_page.c
3329

Is there a reason why the pq assignment isn't inside the body of the below if statement?

markj marked an inline comment as done.Aug 2 2018, 9:04 PM
markj added inline comments.
sys/vm/vm_page.c
3329

No, just an oversight.

Alan, did you have any further thoughts on this patch?

sys/vm/vm_page.c
3287

There is no need for a comma here.

3298

There are unnecessary ()'s around m->order == VM_NFREEORDER.

3312–3315

I would suggest adding a block comment here explicitly noting that we are busy waiting and that the thread blocking progress is guaranteed to be in a critical section.

Also, perform a cpu_spinwait() to make the busy waiting SMT/hyperthreading friendly.

Functionally, this change looks good to me. Do as you see fit with my earlier comments, and commit the change. l don't feel the need to review it again.

This revision is now accepted and ready to land.Aug 23 2018, 5:56 PM

BTW, my longer-term plan is to combine queue state into a single atomically updated word (today it's in the queue and aflags field). I think that will make reasoning about queue state easier and might make it possible to elide the page lock for most queue operations. I don't have a concrete proposal for this yet though.

This revision was automatically updated to reflect the committed changes.