This KASSERT is overzealous because of the following race condition:
ClosedPublic
Actions

Authored by mlaier on Feb 10 2021, 1:02 AM.

Details

Reviewers

markj
bdrewery
rlibby

Commits

rG14b5a3c7d5c0: vm pqbatch: move unmanaged page assert under pagequeue lock

Summary

 1) A managed page which is currently in PQ_LAUNDRY is freed.
    vm_page_free_prep calls vm_page_dequeue_deferred()

    The page state is:
       PQ_LAUNDRY, PGA_DEQUEUE|PGA_ENQUEUED

 2) The laundry worker comes around and pick up the page and calls
    vm_pageout_defer(m, PQ_LAUNDRY, true) to check if page is still in the
    queue.  We do a vm_page_astate_load and get
       PQ_LAUNDRY, PGA_DEQUEUE|PGA_ENQUEUED
    as per above.

 3) The laundry worker is pre-empted and another thread allocates our page
    from the free pool.  For example vm_page_alloc_domain_after calls
    vm_page_dequeue() and sets VPO_UNMANAGED because we are allocating for
    an OBJT_UNMANAGED object.

    The page state is:
       PQ_NONE, 0 - VPO_UNMANAGED

 4) The laundry worker resumes, and processes vm_pageout_defer based on the
    stale astate which leads to a call to vm_page_pqbatch_submit, which will
    trip on the KASSERT.

Sponsored by:	Dell EMC Isilon

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

mlaier created this revision.Feb 10 2021, 1:02 AM

Herald added a subscriber: imp. · View Herald TranscriptFeb 10 2021, 1:02 AM

Harbormaster completed remote builds in B36844: Diff 83615.Feb 10 2021, 1:02 AM

mlaier requested review of this revision.Feb 10 2021, 1:02 AM

mlaier edited the summary of this revision. (Show Details)Feb 10 2021, 1:04 AM

mlaier added reviewers: markj, bdrewery, rlibby.

Forgot to add:

There is no negative fallout from this. When we eventually get into vm_pqbatch_process_page we reestablish the astate and atomically process any outstanding operations - if they still apply. In the case above, we would end in a no-op.

Thanks. This assertion is left over from when there was tighter synchronization. I guess there are not so many scenarios where we frequently allocate unmanaged pages from VM_FREEPOOL_DEFAULT, so this went unnoticed in stress testing.

This revision is now accepted and ready to land.Feb 10 2021, 1:36 AM

rlibby accepted this revision.Feb 10 2021, 1:59 AM

rlibby added inline comments.

sys/vm/vm_page.c
3547–3550	I think the assert would be valid here in `vm_pqbatch_process_page()`? Since here we know that we are (or were just) on a page queue, and we hold the pagequeue lock, and at least in `vm_page_alloc_domain_after()` we do the `vm_page_dequeue()` first and the assignment of `oflags` after that. (Incidentally I think this assert here is also a little messed up, although harmless. The first condition is always true because we just established above that `old.queue == queue`, and we asserted at the top that `queue < PQ_COUNT` and we could `_Static_assert(PQ_COUNT < PQ_NONE, ...)`.)

markj added inline comments.Feb 10 2021, 6:19 PM

sys/vm/vm_page.c
3547–3550	I agree with both of these statements. In other words, we should remove this assertion, because it's always true, and assert that the page is managed since at this point the page queue lock ensures that the page will not be allocated as an unmanaged page