vm_page_dequeue_deferred() schedules a deferred removal of the page from
its paging queue. This is mostly used by the page daemon when it
encounters a wired page: we lazily dequeue such pages, and there's no
need to complete the operation before the function returns, so
vm_page_dequeue_deferred() uses batching and thus avoids the page queue
lock most of the time.
vm_page_dequeue_deferred() may be called multiple times on a page before
it has been physically removed from the queue. Suppose a page m is
wired and on PQ_LAUNDRY. During a scan, the laundry thread will schedule
a dequeue by setting PGA_DEQUEUE and adding a batch entry for the page.
Suppose m is unwired and freed by a different thread before the batch
entry is processed. vm_page_free_prep() will schedule a second dequeue
operation. Suppose further that vm_page_dequeue_deferred() races with the
vm_page_dequeue_complete() call from the first operation. We can get
the following serialization:
Thread 1 (vm_page_dequeue_deferred()):
- Observes m->queue != PQ_NONE
Thread 2 (vm_page_dequeue_complete()):
- Sets m->queue = PQ_NONE
- Clears PGA_DEQUEUE from m->aflags
Thread 1:
- Observes (m->aflags & PGA_DEQUEUE) == 0
- Sets PGA_DEQUEUE in m->aflags and adds a batch entry
- Free the page to vm_phys
Now we have m->queue = PQ_NONE and PGA_DEQUEUE set. When the page is
subsequently allocated, vm_page_alloc() will call vm_page_dequeue() to
ensure that the page is completely removed, and that call will spin
forever waiting for PGA_DEQUEUE to be cleared.
The bug is that vm_page_dequeue_deferred() checks for m->queue ==
PQ_NONE and (m->aflags & PGA_DEQUEUE) == 0 in the wrong order. It
should just use vm_page_queue(), which performs the loads in the
correct order and has the appropriate fence. In particular, if
PGA_DEQUEUE is already set, vm_page_dequeue_deferred() can simply return
since a dequeue is already scheduled. If PGA_DEQUEUE is not set and
m->queue != PQ_NONE, then the page lock provides mutual exclusion with
any other thread that may schedule a queue operation on m.
While here, strengthen a related assertion in vm_page_requeue(): callers
must ensure that a deferred dequeue has not been scheduled on the page,
and all existing callers correctly ensure this.