PID Controlled page daemon
ClosedPublic
Actions

Authored by jeff on Feb 16 2018, 8:49 PM.

Details

Reviewers

alc
markj
kib
scottl
gallatin
imp

Commits

rS329882: Add a generic Proportional Integral Derivative (PID) controller algorithm and
rS329616: PID Controlled page daemon

Summary

This is a rewrite based on work done by Max Laier and myself at isilon which was itself based on another PID controller I used in the I/O stack. I'm not sure what to do about the copyright. There was no copyright on the original work because it was embedded in the vm_pageout.c file. The algorithm clearly has no protected IP as it has been in literature for generations and was originally inspired by steering ships.

This patch smooths out the page daemon work and avoids frequent sleep/wakeup cycles by effectively estimating demand and changes in demand to adjust the number of pages processed. There is an enormous amount of literature available on PID controllers. I have optimized this one for use in daemon regulation by preventing negative values, negative integral windup, and by using simple math in discrete time steps. The default gains were tuned based on a wide variety of page consumption rates. I might like a slightly faster ramp-up but that comes at the cost of stability.

I would appreciate feedback on the discussion in comments. Is this sufficient for someone else to apply this pid controller to another daemon? Many of our threshold based regulation systems (high/low water & wake point) produce really undesirable results. Sawtooth output, long delays when consumers run out of a resource and pause waiting on the daemon during periods of high demand. Excessive reclamation of resources during periods of low demand. etc. My goal with this pid controller would be to see it applied to eliminate these stalls and improve liveliness and regulation of the system at large. Some other candidates are laundry, buf daemon, bufspace daemon, vnlru, etc.

Isilon was able to use tighter targets with lower thresholds leaving fewer pages free under normal conditions. I have not made that change here but I think it's worthwhile. On my system the standard target is 1.5GB free.

On my parallel dd test I have seen improvements by as much as 50% owing to less frequent blocking.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

jeff created this revision.Feb 16 2018, 8:49 PM

jeff added reviewers: kib, scottl, gallatin.

avg added a subscriber: avg.Feb 16 2018, 8:53 PM

nwhitehorn added a subscriber: nwhitehorn.Feb 16 2018, 8:56 PM

nwhitehorn added inline comments.

vm/vm_pageout.c
1201 ↗	(On Diff #39398)	Can this be in arch-specific code (including cpufunc.h or something)? Would be nice not to leak a bunch of architecture #ifdef in here.

cem added a subscriber: cem.Feb 16 2018, 8:56 PM

jeff added inline comments.Feb 16 2018, 8:57 PM

vm/vm_pageout.c
1201 ↗	(On Diff #39398)	It should be indeed and is unrelated to the controller. I forgot to elide this diff.

gallatin added a reviewer: imp.Feb 16 2018, 8:57 PM

rpokala added a subscriber: rpokala.Feb 16 2018, 9:17 PM

rpokala added inline comments.

vm/vm_pageout.c
1963 ↗	(On Diff #39398)	Since the call to `pidctrl_run()` and the setting of `shortage` is identical in both sides, shouldn't it be hoisted out of the conditional?

jeff marked 2 inline comments as done.Feb 16 2018, 9:19 PM

jeff added inline comments.

vm/vm_pageout.c
1963 ↗	(On Diff #39398)	Yes you are right. In other versions of this patch the duplication was necessary but it is not any longer.

imp added inline comments.Feb 17 2018, 12:42 AM

kern/subr_pidctrl.c
103 ↗	(On Diff #39398)	I is integral, not interval. It's the sum of the error over time, possibly bounded to some max. P is the same as error here. That's not immediately clear from the code.
114 ↗	(On Diff #39398)	It's reported as a sysctl, so it's not what the sysctl says, but the sum of the current incremental outputs over the tick interval. Also, there's a computational flaw here. Not sure if it matters. But as we're summing the error, the I term will be over-represented over time if we're called too much. And the derivative term will initially under-represented. We'll still have numbers here, and this will still be a damped control loop, but I'm worried that since we're turning the handle multiple times per update interval, we'll be getting maybe too much output for good theoretical stability. Again, I'm not sure it matters, as these error terms aren't super big and trying one's best to get under a limit with increasing urgency as the shortage grows is the important bit here.

kbowling added a subscriber: kbowling.Feb 17 2018, 3:27 AM

jeff added inline comments.Feb 17 2018, 8:10 PM

kern/subr_pidctrl.c
103 ↗	(On Diff #39398)	Thanks that was just a typo. I put this comment here to make it clear that error = proportional. I will rewrite the line to make it more obvious. I would also appreciate it if you read the comment block in the header to see if it makes sense.
114 ↗	(On Diff #39398)	I've gone back and forth on this. It is intentional. The notion is that if you have done a pass of your daemon and you are still in a very low condition you want to call again and get a better rise time which is more important than stability in the short term. If we could run the controller at a much faster rate, say every 1ms, you probably wouldn't need this hack. But I think most would object to sampling that frequently and 100ms is a long time to delay and potentially block threads just to prevent overshooting your target. The integral is only modified by the new error amount so it should be reflected in the integral as if the whole error value came at the tick boundary. I realize this somewhat subverts the original intent of the algorithm. While it is a good fit for regulating work daemons it is not perfect. I will experiment with it once more and see if I can get satisfactory behavior in the page daemon without it. It would be a cleaner, simpler, algorithm that way.

jeff marked 2 inline comments as done.Feb 18 2018, 1:15 AM

jeff added inline comments.

kern/subr_pidctrl.c
114 ↗	(On Diff #39398)	Ok I tested this again. It really slows things down when you transition from no load to max load. The net result is that the controller becomes bi-modal. Below the saturation point for the daemon it acts as a normal pid controller and will cause the daemon to wakeup every 'interval' ticks and do a moderated amount of work. Where the saturation point is more or less the maximum throughput of the daemon. Above the saturation point the code that accommodates multiple calls per-tick kicks in and causes the output to rise aggressively up to the integral bound. This prevents stalls due to resource shortages. This requires the control loop to cooperate by calling the controller again if the resource is still too low, where too low is user defined. I think this is a good compromise.