Page MenuHomeFreeBSD

Propigate v_free_target updates to pid controllers in each domain
Needs ReviewPublic

Authored by thj on Jun 20 2022, 9:31 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Jan 5, 4:06 AM
Unknown Object (File)
Nov 21 2024, 6:44 AM
Unknown Object (File)
Oct 9 2024, 1:37 PM
Unknown Object (File)
Oct 7 2024, 3:42 AM
Unknown Object (File)
Oct 6 2024, 9:55 AM
Unknown Object (File)
Oct 3 2024, 9:32 AM
Unknown Object (File)
Oct 3 2024, 8:17 AM
Unknown Object (File)
Sep 27 2024, 9:43 PM
Subscribers

Details

Reviewers
markj
kib
jeff
khng
Group Reviewers
Klara
Summary

The vm.v_free_target sysctl represents the target number of free pages for the
vm system. It is calculated by adding together the setpoints for each memory
domain.

When written to it does not propagate the new targets back to the domain.
Modify the sysctl to be a proc and equally distribute the target across each of
the domains.

Without this fix we have a customer report of the sysctl not working, this
fixes their expected behaviour.

We are not sure if this is the correct approach, better might be to make this
sysctl read only and update the target when any pid controllers set point is
updated. This modification has the benefit of only exposing a single value for
an operator to fiddle with.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 46039
Build 42928: arc lint + arc unit

Event Timeline

thj requested review of this revision.Jun 20 2022, 9:31 AM

There are some other thresholds that are derived from the free target. zfs_arc_free_target, vmd_pageout_wakeup_thresh and vmd_background_launder_target, at least. It's not obvious to me that the system will behave sanely if those values are left unchanged. Though, today it's possible to modify the per-domain free targets, and the sysctl handler doesn't take any of this into account. So I don't object to this diff, but it would be good to know why your customer wants to set this sysctl in the first place.

sys/vm/vm_meter.c
125

Extra blank line here, and missing blank line after the variable declarations.

128

Why not just access vm_cnt.v_free_target directly?

130

This is assuming that all NUMA domains have the same number of pages, but that's not necessarily the case.

132

IMO it would be better to have a subroutine in subr_pidctrl.c instead of modifying fields directly like this.

There are some other thresholds that are derived from the free target. zfs_arc_free_target, vmd_pageout_wakeup_thresh and vmd_background_launder_target, at least. It's not obvious to me that the system will behave sanely if those values are left unchanged. Though, today it's possible to modify the per-domain free targets, and the sysctl handler doesn't take any of this into account.

Writing this I ended up with a big list of questions, but it seemed easier to start with a patch.

I asked @allanjude about zfs_arc_free_target and he thought that was okay to leave alone, operators modify that anyway. I must have missed the other too, I think those should be updated when we do this update. I think we are lacking the right mechanism to propigate updates to the system wide goal.

I am not sure that v_free_target should remain writable in this form. Right now we initialise the domain free target to 4*min+reserved, it feels to me that adjustments should be made in that manner rather than manually setting the number of pages.

I'd propose something like v_free_target factor, but this could also be calculated from an update to v_free_target. It isn't the best solution , it makes the interface to the vm system more complicated rather than less.

I think we should reduce the number of controls that feed into the free target, either by making v_free_target responsible for updating the domains and the consumers or by doing something else.
I think that one of either v_free_target or setpoint should be made read only, that would simplify this a bit.

So I don't object to this diff, but it would be good to know why your customer wants to set this sysctl in the first place.

The customer is hitting low memory when creating routes that led to the route creation to fail, this was triggered by doing large disk operations (the reproduction case was taking a checksum of a 4G disk image).
The core of their metric is a test is a memory pressure test that checks free pages > free target, if it isn't then they don't try to create routes.

They were modifying v_free_target to try and keep enough free memory around for the route creation to not fail, but that control not updating the domains setpoint meant that no more work was performed.

There are some other thresholds that are derived from the free target. zfs_arc_free_target, vmd_pageout_wakeup_thresh and vmd_background_launder_target, at least. It's not obvious to me that the system will behave sanely if those values are left unchanged. Though, today it's possible to modify the per-domain free targets, and the sysctl handler doesn't take any of this into account. So I don't object to this diff, but it would be good to know why your customer wants to set this sysctl in the first place.

In older versions of FreeBSD, before the introduction of the PID controller, this was the sysctl you tuned to adjust the target free memory value. Now it silently ignores any changes you make to it.
So my original proposal was just to restore what it used to do, by making it update the PID controller setpoints proportionally.

Tom's idea of a new sysctl that controls the 'factor' used to set the original PID setpoint at boot, would be workable as well. But I think both ways, Mark's point about values that are based off of v_free_target may also need to be adjusted.
Do we end up wanting like an eventhandler or something around these values being adjusted, so consumers can be notified when the setpoint, or v_free_target, or the new 'factor', or whatever are modified, and update themselves?

There are some other thresholds that are derived from the free target. zfs_arc_free_target, vmd_pageout_wakeup_thresh and vmd_background_launder_target, at least. It's not obvious to me that the system will behave sanely if those values are left unchanged. Though, today it's possible to modify the per-domain free targets, and the sysctl handler doesn't take any of this into account. So I don't object to this diff, but it would be good to know why your customer wants to set this sysctl in the first place.

In older versions of FreeBSD, before the introduction of the PID controller, this was the sysctl you tuned to adjust the target free memory value. Now it silently ignores any changes you make to it.
So my original proposal was just to restore what it used to do, by making it update the PID controller setpoints proportionally.

Tom's idea of a new sysctl that controls the 'factor' used to set the original PID setpoint at boot, would be workable as well. But I think both ways, Mark's point about values that are based off of v_free_target may also need to be adjusted.
Do we end up wanting like an eventhandler or something around these values being adjusted, so consumers can be notified when the setpoint, or v_free_target, or the new 'factor', or whatever are modified, and update themselves?

I think we would ideally not have anything trying to second-guess the VM system's memory reclamation policy. That is, rather than trying to propagate changes to VM thresholds to other subsystems, those subsystems should be querying the VM in some way. So absent some specific problem caused by the current lack of synchronization, I'd not add an eventhandler.

In D35531#805973, @thj wrote:

I asked @allanjude about zfs_arc_free_target and he thought that was okay to leave alone, operators modify that anyway. I must have missed the other too, I think those should be updated when we do this update. I think we are lacking the right mechanism to propigate updates to the system wide goal.

I am not sure that v_free_target should remain writable in this form. Right now we initialise the domain free target to 4*min+reserved, it feels to me that adjustments should be made in that manner rather than manually setting the number of pages.

I'd propose something like v_free_target factor, but this could also be calculated from an update to v_free_target. It isn't the best solution , it makes the interface to the vm system more complicated rather than less.

I think we should reduce the number of controls that feed into the free target, either by making v_free_target responsible for updating the domains and the consumers or by doing something else.
I think that one of either v_free_target or setpoint should be made read only, that would simplify this a bit.

So I don't object to this diff, but it would be good to know why your customer wants to set this sysctl in the first place.

The customer is hitting low memory when creating routes that led to the route creation to fail, this was triggered by doing large disk operations (the reproduction case was taking a checksum of a 4G disk image).
The core of their metric is a test is a memory pressure test that checks free pages > free target, if it isn't then they don't try to create routes.

BTW, this heuristic might not work properly on NUMA systems, with or without the patch. UMA currently implements a "strict first-touch" slab allocation policy, which means that it will indicate failure if the current NUMA domain doesn't have sufficient free pages, and that situation can arise even when the global number of free pages is greater than the global free target. I'm assuming here that route creation involves memory allocations from UMA.