Page MenuHomeFreeBSD

Support atomic updates of the "aflags" and "queue" fields.

Authored by markj on Aug 14 2019, 12:03 AM.



These two fields encode the "queue state" of a page. The page lock is
required to set PGA_{DEQUEUE,REQUEUE,REQUEUE_HEAD}, and the page queue
lock corresponding to the "queue" field's value is required to clear
them. The page queue lock is also required when modifying the "queue"
field or setting and clearing PGA_ENQUEUED.

My goal is to support freeing pages without the page lock. However,
vm_page_free_prep() must set PGA_DEQUEUE in
vm_page_dequeue_deferred_free(). Since the page is being freed, the
scope of possible races is limited to the page daemon scanning the page.

Introduce a new function, vm_page_aflag_queue_cmpset(), which atomically
updates both fields containing queue state. It relies on "aflags" and
"queue" being in the same self-aligned 32-bit word. This primitive is
used by the page daemon in vm_page_swapqueue() and
vm_page_dequeue_deferred(). It allows the page daemon to avoid
modifying queue state after vm_page_dequeue_deferred_free() has already
set PGA_DEQUEUE. Once the page daemon has acquire the object lock, it
has a stable reference to the page and no longer needs to handle races, so the
only modifications to the page queue scans occur in places where the object
lock is not held. vm_pageout_reinsert_inactive_page() does not require
modification since it takes the page queue lock and is thus serialized with
the vm_page_dequeue() calls in the page allocator functions.

Test Plan

I set pageout_update_period=1, to force the page daemon to
frequently deactivate pages in PQ_ACTIVE, and used the TSC to
measure vm_page_swapqueue() relative to vm_page_deactivate().
I don't see a significant difference; both functions average out to
about 250 cycles on a Xeon E5-2630v3.

Diff Detail

rS FreeBSD src repository
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

markj created this revision.Aug 14 2019, 12:03 AM
markj edited the summary of this revision. (Show Details)Aug 14 2019, 12:08 AM
markj edited the test plan for this revision. (Show Details)
markj added reviewers: alc, dougm, jeff, kib.
markj added inline comments.Aug 14 2019, 12:12 AM
692 ↗(On Diff #60765)

I removed these assertions since they are effectively the same as the first CTASSERT above.

kib added inline comments.Aug 14 2019, 4:27 PM
3409 ↗(On Diff #60765)

I like this KPI.

674 ↗(On Diff #60765)

... atomically in same operation.

676 ↗(On Diff #60765)

Use _Static_assert() in new code.

679 ↗(On Diff #60765)

Instead of these two asserts, I would state that offsetof() of the field after the queue field is not larger than offsetof aflags + sizeof(uint32_t).

754 ↗(On Diff #60765)

I would pre-compute all shifts of the arguments outside the loop.

markj updated this revision to Diff 60812.Aug 15 2019, 2:23 AM
markj marked 3 inline comments as done.

Address feedback.

markj added inline comments.Aug 15 2019, 2:23 AM
676 ↗(On Diff #60765)

I converted the one above too.

679 ↗(On Diff #60765)

That doesn't provide the same guarantee, though. I want to assert that the shifts and mask below are correct.

kib accepted this revision.Aug 15 2019, 3:45 PM
kib added inline comments.
679 ↗(On Diff #60765)

This is not quite obvious and IMO asserts do not verify that simply because they do not use the _SHIFT symbols. I am not sure how to do this, in fact.

Might be, defining a union with one member uint32, and other a structure of bitfields for queue, oflags, and aflags would do it. Then you might remove the _SHIFT defines at all. But it might be not worth the efforts.

This revision is now accepted and ready to land.Aug 15 2019, 3:45 PM
markj added inline comments.Aug 15 2019, 5:19 PM
679 ↗(On Diff #60765)

It might be that we should dynamically compute the offsets like vm_page_clear_dirty_mask() does, since it should be possible to do this computation at compile time.

I like the idea of making a union. My plan is to take this patch further and entirely remove the need for the page lock for performing queue state updates. Instead, we would use atomic cmpset. To this end, I propose widening "aflags" and shrinking "flags": "aflags" has only one free bit left (and Jeff wants to use it for something), while most bits in "flags" are unused. Then I would write:

struct vm_page {
    union {
        uint32_t pqstate;
        struct pqstate {
            uint16_t aflags;
            uint8_t queue;
            uint8_t act_count;

and do as you suggest. However, I would like to do that in a separate diff, since this one is a minimal prerequisite for simplifying D20486.

kib added inline comments.Aug 15 2019, 7:37 PM
679 ↗(On Diff #60765)

Yes, this is what I though. Of course it requires some defines to keep aflags/queue usage in sources intact.

markj updated this revision to Diff 61333.Aug 27 2019, 4:03 AM

Rebase. Handle PGA_REQUEUE by simply calling vm_page_pqbatch_submit() to
create a batch queue entry. Remove code from vm_page_swapqueue() to handle
oldq == newq.

This revision now requires review to proceed.Aug 27 2019, 4:03 AM
kib accepted this revision.Aug 29 2019, 2:34 PM
This revision is now accepted and ready to land.Aug 29 2019, 2:34 PM