My aim is to reduce the scope of the vm_page lock array. It serves
multiple unrelated purposes, and the locks show up as a bottleneck
in some workloads. For example, vm_object_terminate_pages() bounces
between page locks at a high rate during builds. The locks also show up
with sendfile: in vm_page_grab_pages() when wiring resident pages, and
when freeing SF_NOCACHE pages (i.e., Netflix's workload).
The VM page lock protects the wire_count field and the object's
reference. The basic idea is to use per-page atomic reference counting
to remove the need for the page locks. This allows many code paths to
simply avoid the page lock and instead perform a single atomic operation
on a cache line within the vm_page. My approach imposes some overhead
on the page daemon, which now must acquire and drop a transient
reference to each page while scanning it, and handle some new race
conditions. However, I believe this is a reasonable tradeoff: in most
workloads, the page daemon is not the bottleneck, and profiling shows
that when it is, we would be best served by trying to reduce the
effect of cache misses incurred while traversing the queue itself.
Summary of changes:
The wire_count field is renamed to ref_count. wire_count is preserved
for now to avoid churn in the pmap code.
vm_page_wire() now can be used only for pages belonging to an object.
It asserts that the page is either busy, or the vm object lock is held.
vm_page_wire_mapped() is for use by pmap_extract_and_hold(). It may
fail if a thread is concurrently executing pmap_remove_all() or
pmap_remove_write().
vm_page_try_remove_all() and vm_page_try_remove_write() atomically check
for wirings of a page and block new wirings via vm_page_wire_mapped()
before calling pmap_remove_all() or pmap_remove_write(). They fail if a
wiring of the page exists.
vm_page_unwire() no longer returns a value and no longer accepts PQ_NONE
as the queue parameter. It frees the page if the wiring was the last
reference. vm_page_unwire_noq() returns true if it released the last
wiring (not necessarily the last reference) to the page.
sendfile_free_page() and vfs_vmio_unwire() are replaced with a generic
vm_page_release(), which releases the page to the page cache.
vm_page_try_to_free() is removed.
vm_page_remove() and vm_page_free_prep() no longer assert the page lock.
The object lock is required to remove a page from its object. We still
dequeue the page in vm_page_free_prep(), but we leverage the fact that
no other threads hold a reference in order to update queue state without
the page lock. vm_page_rename() still internally uses the page lock to
atomically move a page between objects. In other words, only the object
lock is required to transition m->object between an object and NULL, and
the page lock is required to transition m->object between two objects.
Page queue scans now require a transient reference to ensure that the
page daemon does not update the page's logical queue state (i.e.,
aflags) while a concurrent vm_page_free_prep() is dequeuing the page.
The transient reference is dropped once the object lock is acquired.