Page MenuHomeFreeBSD

vm_pageout: Scan inactive dirty pages less aggressively
Needs ReviewPublic

Authored by markj on Mon, Jan 6, 7:16 PM.

Details

Reviewers
alc
kib
Summary

Consider a database workload where the bulk of RAM is used for a
fixed-size file-backed cache. Any leftover pages are used for
filesystem caching or anonymous memory. In particular, there is little
memory pressure and the inactive queue is scanned rarely.

Once in a while, the free page count dips a bit below the setpoint,
triggering an inactive queue scan. Since almost all of the memory there
is used by the database cache, the scan encounters only referenced
and/or dirty pages, moving them to the active and laundry queues. In
particular, it ends up completely depleting the inactive queue, even for
a small, non-urgent free page shortage.

This scan might process many gigabytes worth of pages in one go,
triggering VM object lock contention (on the DB cache file's VM object)
and consuming CPU, which can cause application latency spikes.

Observing this behaviour, my observation is that we should abort
scanning once we've encountered many dirty pages without meeting the
shortage. In general we've tried to make the page daemon control loops
avoid large bursts of work, and if a scan fails to turn up clean pages,
there's not much use in moving everything to laundry queue at once.

Modify the inactive scan to abort early if we encounter enough dirty
pages without meeting the shortage. If the shortage hasn't been met,
this will trigger shortfall laundering, wherein the laundry thread
will clean as many pages as needed to meet the instantaneous shortfall.
Laundered pages will be placed near the head of the inactive queue, so
will be immediately visible to the page daemon during its next scan of
the inactive queue.

Since this causes pages to move to the laundry queue more slowly, allow
clustering with inactive pages. I can't see much downside to this in
any case.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 61563
Build 58447: arc lint + arc unit

Event Timeline

markj requested review of this revision.Mon, Jan 6, 7:16 PM

This scan might process many gigabytes worth of pages in one go,
triggering VM object lock contention (on the DB cache file's VM object)
and consuming CPU, which can cause application latency spikes.

I meant to note that this is exacerbated by the page daemon being multithreaded on high core count systems - in this case we had 5 threads all processing the inactive queue over several seconds.

As a side note, I think the PPS calculation in vm_pageout_inactive_dispatch() also doesn't work well in this scenario: it counts the number of pages freed, not the number of pages scanned, so a queue full of dirty and/or referenced pages will result in a low PPS score, which makes it more likely that we'll dispatch multiple threads during a shortage.

Permit the inactive weight to have a value of 0, which effectively
restores the old behaviour.

Clamp the weights in the sysctl handler to make a multiplication overflow
less likely.

Set the inactive weight to 1 instead of 2. In my testing, we are still moving
pages to the laundry quite aggressively, see below, so we don't need the extra
multiplier.

Avoid incrementing oom_seq if there's no instantaneous shortage. Otherwise
it's possible to get spurious OOM kills after an acute page shortage: after the
shortage is resolved, the PID controller will still have positive output for a
period of time and thus will scan the queue. If the inactive queue is full of
dirty pages, the OOM controller will infer that the page daemon is failing to
make progress, but if the shortage has already been resolved, this is wrong.

This problem is not new but is easier to trigger now that we move pages to the
laundry less aggressively.