Page MenuHomeFreeBSD

Add support for multithreading the inactive queue pageout within a domain.
ClosedPublic

Authored by cem on Sep 12 2019, 11:22 PM.
Tags
None
Referenced Files
F103407909: D21629.id75466.diff
Sun, Nov 24, 3:14 PM
Unknown Object (File)
Wed, Oct 30, 3:30 PM
Unknown Object (File)
Tue, Oct 29, 6:29 AM
Unknown Object (File)
Tue, Oct 29, 6:29 AM
Unknown Object (File)
Tue, Oct 29, 6:29 AM
Unknown Object (File)
Tue, Oct 29, 6:29 AM
Unknown Object (File)
Tue, Oct 29, 6:29 AM
Unknown Object (File)
Tue, Oct 29, 3:43 AM

Details

Summary

In very high throughput workloads the inactive scan can become overwhelmed as you have many cores producing pages and a single core freeing. Since Mark's introduction of batched pagequeue operations we can now run multiple inactive threads working on independent batches.

To avoid confusing the pid and other control algorithms I do this in a mpi-like fan out and collect model that is driven from the primary page daemon. It decides whether the shortfall can be overcome with a single thread and if not dispatches multiple threads and waits for their results.

The heuristic is based on timing the pageout activity and averaging a pages-per-second variable which is exponentially decayed. This is visible in sysctl and may be interesting for other purposes.

I have verified that this does indeed double our paging throughput when used with two threads. With four we tend to run into other contention problems. For now I would like to commit this infrastructure with only a single thread enabled.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jeff added reviewers: alc, kib, markj, dougm.
sys/vm/vm_pageout.c
1675 ↗(On Diff #62014)

You had mentioned using the PID controller's integral term to determine whether the page daemon had fallen behind, in particular checking whether it had reached its bound. Does that heuristic not end up working well?

2177 ↗(On Diff #62014)

style(9) prefers "for (;;)". At least, we should just use the C99 bool literals instead.

2182 ↗(On Diff #62014)

Why not just use refcount_release()?

2328 ↗(On Diff #62014)

Maybe "dom%d helper"?

sys/vm/vm_pageout.c
1675 ↗(On Diff #62014)

This gives you a faster response. You don't have to wait for the error to accumulate. I think this is simpler.

2182 ↗(On Diff #62014)

If you have spurious wakeups you would drop the ref below zero. This makes sure only as many threads are running as requested. This could be done with a semaphore but our implementation is ugly.

sys/vm/vm_pageout.c
2182 ↗(On Diff #62014)

Just to reiterate a discussion on slack:

  • I don't think spurious wakeups are something we need to handle when the sleep is protected by a lock, as it is in this case. The sleepqueue code only wakes up threads sleeping on the wchan.
  • All accesses to vmd_inactive_starting are protected by vmd_inactive_starting, so we do not even need to use refcount_* to manipulate it.

We are running into pagedaemon bottlenecking on a high-CPU count system under load; most of the time seems to be spent in inactive scanning. Are you still working on this patch, Jeff?

cem added a reviewer: jeff.

I'll go ahead and take this one, until/unless I hear otherwise. I've rebased the patch onto recent git master, adapted to the blockcount(9) API. I left default PAGEOUT_THREADS 1, but for testing purposes have options PAGEOUT_THREADS=4 in my amd64/conf/GENERIC.

cem marked 4 inline comments as done.
  • Rebase on git master
  • Use blockcount(9) API rather than refcount for started/running thread counts
  • Minimal boot testing: unloaded virtual machine boots kernel ok. haven't tried under load yet.
cem edited subscribers, added: pho; removed: cem.Aug 5 2020, 11:44 PM

FWIW, since yesterday we've run this under some Isilon workloads and it does seem to (1) function correctly and (2) alleviate load on the pagedaemon thread(s) (which were previously a bottleneck on some workloads, running at 100% CPU constantly).

In D21629#576019, @cem wrote:

FWIW, since yesterday we've run this under some Isilon workloads and it does seem to (1) function correctly and (2) alleviate load on the pagedaemon thread(s) (which were previously a bottleneck on some workloads, running at 100% CPU constantly).

I have been testing this for ~9 hours. No problems seen.

Does anyone object to this going in, or would have some time to review this if we waited a few more days?

With PAGEOUT_THREADS 1 as the default, there is little functional change. There are a handful of additional uncontested atomics and vm_pageout_inactive_dispatch wraps vm_pageout_scan_inactive, but it's largely the same code.

It might be nice to make this a tuneable instead of an option; there's no real reason the number cannot be tuned once at boot time. It could even be runtime-adjustable, although that is much more complicated.

In D21629#577169, @cem wrote:

Does anyone object to this going in, or would have some time to review this if we waited a few more days?

With PAGEOUT_THREADS 1 as the default, there is little functional change. There are a handful of additional uncontested atomics and vm_pageout_inactive_dispatch wraps vm_pageout_scan_inactive, but it's largely the same code.

It might be nice to make this a tuneable instead of an option; there's no real reason the number cannot be tuned once at boot time. It could even be runtime-adjustable, although that is much more complicated.

I have no objection. I agree that it should be a tuneable; I thought I had suggested that at some point, apparently not.

sys/vm/vm_param.h
137 ↗(On Diff #75466)

I would add a comment indicating that it's really PAGEOUT_THREADS_PER_DOMAIN, or rename the constant accordingly. Or just make it a tunable and don't add a new kernel config option at all. Tunables can be set from the kernel config now, I believe.

cem marked an inline comment as done.

Drop PAGEOUT_THREADS conf option and substitute vm.pageout_threads_per_domain tunable.

$ ps auxwwwwHS | grep dom0
root  18   0.0  0.0      0   64  -  DL   13:33        0:00.09 [pagedaemon/dom0]
root  18   0.0  0.0      0   64  -  DL   13:33        0:00.00 [pagedaemon/dom0 hel]

$ sysctl vm.pageout_threads_per_domain
vm.pageout_threads_per_domain: 2
This revision was not accepted when it landed; it landed in state Needs Review.Aug 11 2020, 8:38 PM
This revision was automatically updated to reflect the committed changes.