A description of the change is available here:The change is somewhat large, sorry. I will try to explain the gist of it,
https://svnweb.freebsd.org/changeset/base/330296and the provide a list of things which have changed.
The change is somewhat large,aim is to reduce contention on page queue locks. sorry.Right now, I will try to listboth
the main changes here:page and page queue locks need to be held to enqueue, requeue or
dequeue a page. Of course, this is not very scalable, and it exacerbates
page lock contention because page locks are first in the lock order.
Consider that we hold the queue lock for the entirety of a `PQ_ACTIVE`
scan. If a thread attempts to enqueue a page there, it will block with
a page lock held until the scan is complete.
The approach here is to separate queue operations (enqueue, dequeue
and requeue) into two phases. The first phase requires only the page lock,
and schedules a deferred queue operation using per-CPU batch queues.
There is one batch queue per CPU per page queue. Operations are encoded
using atomic flags in the page. The second phase, implemented in
`vm_pqbatch_process()`, processes a batch queue with the page queue lock
held and carries out the requested queue operations. The second phase is
performed only when the batch is full, so operations on a given page
may be deferred indefinitely.
`vm_page_enqueue()` and `vm_page_requeue()` always perform deferred
operations. Higher-level APIs (e.g., `vm_page_deactivate()`) thus perform
deferred queue operations as well. `vm_page_dequeue()` guarantees that
the page is dequeued before the function returns, and
`vm_page_dequeue_deferred()` performs a deferred dequeue. `vm_page_dequeue()`
requires both the page and page queue locks unless a deferred dequeue was
already requested for the page, in which case only the queue lock is required.
The locking protocol for the `queue` field of `struct vm_page` is changed.
The field is only allowed to transition between `PQ_NONE` and a queue index,
i.e., it cannot transition directly between queue indices. To update the field, the
lock for the from-value must be held. For `PQ_NONE` this is the page lock,
otherwise it is the corresponding page queue lock. There is one place where
we safely violate this rule for an optimization: in the inactive queue scan,
right before freeing the page. There, we set the field to PQ_NONE directly
with the page lock held. At that point, it is known that the page is physically
removed from the queue and that no queue operations are scheduled, so
the queue lock is not needed in order to complete removal of the page.
Changes:
- vm_phys uses the `listq` field for freelists rather than
`plinks.q`. We now permit freed pages to reside on page
queues. Such pages must have `PGA_DEQUEUE` set,be schedule for a deferred dequeue.
ensuring that they will be ignored during scans.The page allocators complete the dequeue before returning
`vm_pageout_page_queued()` must be used to checkthe page.
for such pages.
- The page daemon scan loops are substantially different.
The idea now is to quickly collect a batch of pages with only the
page queue lock held, and then process that batch without
touching the page queue lock. This lets us get rid of mostsome
of the dancing that must occur to acquire the page and
object locks with the page queue lock held.
- When collecting a batch during the `PQ_INACTIVE` scan,
pages in the batch are dequeued, in the anticipation that
most of them will be freed. For `PQ_ACTIVE` and
`PQ_LAUNDRY` scans, we keep pages on the queue: during
a `PQ_ACTIVE` scan, we end up requeuing most pages, and
during a `PQ_LAUNDRY` scan, we keep pages queued until
laundering is done.
- Some new APIs are added.- The lock dancing in `vm_page_dequeue_lazy()`,object_terminate_pages()` is gone.
`vm_page_enqueue_lazy()`, and vm_page_requeue() performfree_prep()` schedules a deferred dequeue for the
deferred page, so the dequeue operations are already batched. The higher-level `vm_page_activate()`Similarly,
`vm_page_launder()` and `vm_page_deactivate()` use deferred queuenow that we use a UMA cache for `FREEPOOL_DEFAULT`
operations.pages, most calls to `vm_page_dequeufree()` does a synchronous dequeue. not acquire the free
I am not really happy with these names, so suggestions arequeue lock.
- The `PQ_ACTIVE` scan is implemented using the CLOCK
appreciatedalgorithm. This is to avoid requeue operations during the scan.
- We batch frees using a per-CPU batch queue as well. This- `vm_page_deactivate_noreuse()` uses a separate set of per-CPU
obviatesbatch queues to implement insertions near the batching in `vm_object_terminate_pages()`head of the queue.
I'm open to suggestions on other ways to implement this.