Page MenuHomeFreeBSD

Support atomic updates of the "aflags" and "queue" fields.
AcceptedPublic

Authored by markj on Wed, Aug 14, 12:03 AM.

Details

Reviewers
alc
dougm
jeff
kib
Summary

These two fields encode the "queue state" of a page. The page lock is
required to set PGA_{DEQUEUE,REQUEUE,REQUEUE_HEAD}, and the page queue
lock corresponding to the "queue" field's value is required to clear
them. The page queue lock is also required when modifying the "queue"
field or setting and clearing PGA_ENQUEUED.

My goal is to support freeing pages without the page lock. However,
vm_page_free_prep() must set PGA_DEQUEUE in
vm_page_dequeue_deferred_free(). Since the page is being freed, the
scope of possible races is limited to the page daemon scanning the page.

Introduce a new function, vm_page_aflag_queue_cmpset(), which atomically
updates both fields containing queue state. It relies on "aflags" and
"queue" being in the same self-aligned 32-bit word. This primitive is
used by the page daemon in vm_page_swapqueue() and
vm_page_dequeue_deferred(). It allows the page daemon to avoid
modifying queue state after vm_page_dequeue_deferred_free() has already
set PGA_DEQUEUE. Once the page daemon has acquire the object lock, it
has a stable reference to the page and no longer needs to handle races, so the
only modifications to the page queue scans occur in places where the object
lock is not held. vm_pageout_reinsert_inactive_page() does not require
modification since it takes the page queue lock and is thus serialized with
the vm_page_dequeue() calls in the page allocator functions.

Test Plan

I set pageout_update_period=1, to force the page daemon to
frequently deactivate pages in PQ_ACTIVE, and used the TSC to
measure vm_page_swapqueue() relative to vm_page_deactivate().
I don't see a significant difference; both functions average out to
about 250 cycles on a Xeon E5-2630v3.

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 25877
Build 24446: arc lint + arc unit

Event Timeline

markj created this revision.Wed, Aug 14, 12:03 AM
markj edited the summary of this revision. (Show Details)Wed, Aug 14, 12:08 AM
markj edited the test plan for this revision. (Show Details)
markj added reviewers: alc, dougm, jeff, kib.
markj added inline comments.Wed, Aug 14, 12:12 AM
sys/vm/vm_page.h
692

I removed these assertions since they are effectively the same as the first CTASSERT above.

kib added inline comments.Wed, Aug 14, 4:27 PM
sys/vm/vm_page.c
3409

I like this KPI.

sys/vm/vm_page.h
702

... atomically in same operation.

704

Use _Static_assert() in new code.

707

Instead of these two asserts, I would state that offsetof() of the field after the queue field is not larger than offsetof aflags + sizeof(uint32_t).

765

I would pre-compute all shifts of the arguments outside the loop.

markj marked 3 inline comments as done.Thu, Aug 15, 2:23 AM
markj updated this revision to Diff 60812.

Address feedback.

markj added inline comments.Thu, Aug 15, 2:23 AM
sys/vm/vm_page.h
704

I converted the one above too.

707

That doesn't provide the same guarantee, though. I want to assert that the shifts and mask below are correct.

kib accepted this revision.Thu, Aug 15, 3:45 PM
kib added inline comments.
sys/vm/vm_page.h
707

This is not quite obvious and IMO asserts do not verify that simply because they do not use the _SHIFT symbols. I am not sure how to do this, in fact.

Might be, defining a union with one member uint32, and other a structure of bitfields for queue, oflags, and aflags would do it. Then you might remove the _SHIFT defines at all. But it might be not worth the efforts.

This revision is now accepted and ready to land.Thu, Aug 15, 3:45 PM
markj added inline comments.Thu, Aug 15, 5:19 PM
sys/vm/vm_page.h
707

It might be that we should dynamically compute the offsets like vm_page_clear_dirty_mask() does, since it should be possible to do this computation at compile time.

I like the idea of making a union. My plan is to take this patch further and entirely remove the need for the page lock for performing queue state updates. Instead, we would use atomic cmpset. To this end, I propose widening "aflags" and shrinking "flags": "aflags" has only one free bit left (and Jeff wants to use it for something), while most bits in "flags" are unused. Then I would write:

struct vm_page {
    ...
    union {
        uint32_t pqstate;
        struct pqstate {
            uint16_t aflags;
            uint8_t queue;
            uint8_t act_count;
        };
    };
    ...
};

and do as you suggest. However, I would like to do that in a separate diff, since this one is a minimal prerequisite for simplifying D20486.

kib added inline comments.Thu, Aug 15, 7:37 PM
sys/vm/vm_page.h
707

Yes, this is what I though. Of course it requires some defines to keep aflags/queue usage in sources intact.