Page MenuHomeFreeBSD

hwpstate: Add CPPC enable tunable
ClosedPublic

Authored by cy on Tue, Jan 20, 7:11 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 23, 5:18 AM
Unknown Object (File)
Fri, Jan 23, 2:48 AM
Unknown Object (File)
Thu, Jan 22, 5:33 PM
Unknown Object (File)
Wed, Jan 21, 6:01 PM
Unknown Object (File)
Wed, Jan 21, 4:52 PM
Unknown Object (File)
Wed, Jan 21, 1:02 PM
Unknown Object (File)
Wed, Jan 21, 1:02 PM
Unknown Object (File)
Wed, Jan 21, 12:17 PM
Subscribers

Details

Summary

The Framework 13 runs very hot the maximum frequency is possible. By
disabling CPPC (reverting to Cool`n'Quiet 2.0) we can use powerd to
limit the CPU frequency to 2200, thereby reducing the CPU temperature.

Some systems may run slower with CPPC enabled. See PR/292615 for that
issue.

PR: 292615

Test Plan

Running here since CPPC was committed.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

cy requested review of this revision.Tue, Jan 20, 7:11 PM
cy edited the test plan for this revision. (Show Details)

The question I have is, do we want CPPC to be enabled by default or disabled by default? The patch leaves it enabled by default.

I can confirm that turning this off also "fixes" my problem with a T14 4750 CPU thinkpad, which I describe in
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=292615

However, for my situation the original commit 3e6e4e4a0d42fa24f3b2a1c087e9ad25f9594081 is clearly broken and it needs to be fixed, otherwise we presumably slow down all AMD laptop users to a crawl until they learn about the tuneable.

Mine never slowed down. It simply ran hot at between 98C and 100C.

In D54803#1252662, @cy wrote:

Mine never slowed down. It simply ran hot at between 98C and 100C.

You used powerd to control frequency, I did not. That might be the difference.

In D54803#1252662, @cy wrote:

Mine never slowed down. It simply ran hot at between 98C and 100C.

You used powerd to control frequency, I did not. That might be the difference.

Before the patch and without powerd the machine ran even faster, at 3300 MHz, pegging the temperature up to 100C during buildworld. This is why I used powerd to artificially limit the clock on that machine.

I'm fine with the patch as it is. It is going to be useful in cases the hardware doesn't do self-tuning properly.

But that shouldn't be the end of the story.

At the moment, I'm not convinced at all that any of your use cases shows an actual problem with the new CPPC support code.

@cy: You say that, without powerd, the temperature is even higher. So CPPC seems to be actually having an effect, although at the default tuning, it's effectively very strange that your CPUs don't get cooler when the machine is idle (do you confirm?), which could be a hardware problem. Have you tried raising the epp sysctl knobs (IIRC, they are below something like dev.hwpstate_amdX for each CPU X) values above 50? What do the knobs debug.hwpstate_amdX report?

@cracauer: Default setting for EPP is 50. This was done to mimic what we already do in CPPC support for Intel processors. Can you try setting all dev.hwpstate_amdX.epp to 0 and see if you're recovering your initial behavior (and then possibly play with more values between 0 and 50)?

We need more investigation on your use cases before deciding on the default value. My general take is that we should enable the most modern mechanisms by default as much as possible. E.g., if it's really a hardware problem, we may want to have a list of quirks if that's feasible. It may be that we want to change the default EPP value, but then that would probably also mean changing it for Intel processors (at the very least, re-examining it; CPPC hardware mechanism probably differ between AMD and Intel). Also, your use cases involve loading cpufreq, and I don't find it unreasonable that people doing that are required to tune it in "uncommon" cases (a qualifier which we don't yet know whether it applies to your use cases).

This revision is now accepted and ready to land.Wed, Jan 21, 9:37 AM

Also, your use cases involve loading cpufreq, and I don't find it unreasonable that people doing that are required to tune it in "uncommon" cases (a qualifier which we don't yet know whether it applies to your use cases).

I don't do anything to load cpufreq manually, it comes up with the GENERIC kernel.

@khng mentions that there is if (resource_disabled(HWP_AMD_CLASSNAME, 0)) return; in hwpstate_identify() so maybe setting hint.hwpstate_amd.disabled="1" in hint file also works?

@cracauer: Default setting for EPP is 50. This was done to mimic what we already do in CPPC support for Intel processors. Can you try setting all dev.hwpstate_amdX.epp to 0 and see if you're recovering your initial behavior (and then possibly play with more values between 0 and 50)?

I tried 0, 10, 25, 37 and 50. The CPU speed stayed at the 11x slower speed in benchmarks.

I don't do anything to load cpufreq manually, it comes up with the GENERIC kernel.

Oh, you're right, I looked into my own config files by mistake.

@khng mentions that there is if (resource_disabled(HWP_AMD_CLASSNAME, 0)) return; in hwpstate_identify() so maybe setting hint.hwpstate_amd.disabled="1" in hint file also works?

That hint completely disables hwpstate_amd, which also contains the good old regular P-state driver. The new knob proposed in this revision just forces the use of that old code instead of the more modern CPPC support.

@cracauer: Default setting for EPP is 50. This was done to mimic what we already do in CPPC support for Intel processors. Can you try setting all dev.hwpstate_amdX.epp to 0 and see if you're recovering your initial behavior (and then possibly play with more values between 0 and 50)?

I tried 0, 10, 25, 37 and 50. The CPU speed stayed at the 11x slower speed in benchmarks.

Strange. Could you please give the output of debug.hwpstate_amdX for these varying values of EPP?

@cracauer: Default setting for EPP is 50. This was done to mimic what we already do in CPPC support for Intel processors. Can you try setting all dev.hwpstate_amdX.epp to 0 and see if you're recovering your initial behavior (and then possibly play with more values between 0 and 50)?

I tried 0, 10, 25, 37 and 50. The CPU speed stayed at the 11x slower speed in benchmarks.

Strange. Could you please give the output of debug.hwpstate_amdX for these varying values of EPP?

https://www.cons.org/tmp/epp.log

I'm fine with the patch as it is. It is going to be useful in cases the hardware doesn't do self-tuning properly.

But that shouldn't be the end of the story.

At the moment, I'm not convinced at all that any of your use cases shows an actual problem with the new CPPC support code.

@cy: You say that, without powerd, the temperature is even higher. So CPPC seems to be actually having an effect, although at the default tuning, it's effectively very strange that your CPUs don't get cooler when the machine is idle (do you confirm?), which could be a hardware problem. Have you tried raising the epp sysctl knobs (IIRC, they are below something like dev.hwpstate_amdX for each CPU X) values above 50? What do the knobs debug.hwpstate_amdX report?

Yes. Because I use powerd to limit the CPU frequency on this machine. If I don't limit the CPU frequency, i.e. let it go to 3300 MHz the machine will run hot at almost 100C. When I limit the frequency to 2200 MHz using powerd a buildworld will run between 45C and 55C. It's currently idling at 32.6C.

When CPPC is enabled the CPU will run as high as 4300 MHz or higher as the CPU supports frequency boost higher than baseline frequency. Again it will max out at 100C after which it will reduce its frequency to run at 98C. Certainly not healthy for the CPU. BTW, this is a Framework 13 laptop. I doubt the cooling was designed properly to support such a CPU.

My debug.hwpstat* output.

stinky# sysctl debug | grep hwpstate
debug.hwpstate_pstate_limit: 0
debug.hwpstate_verify: 0
debug.hwpstate_verbose: 0
stinky#

The CPU is: AMD Ryzen 7 7840U w/ Radeon 780M Graphics

@cracauer: Default setting for EPP is 50. This was done to mimic what we already do in CPPC support for Intel processors. Can you try setting all dev.hwpstate_amdX.epp to 0 and see if you're recovering your initial behavior (and then possibly play with more values between 0 and 50)?

We need more investigation on your use cases before deciding on the default value. My general take is that we should enable the most modern mechanisms by default as much as possible. E.g., if it's really a hardware problem, we may want to have a list of quirks if that's feasible. It may be that we want to change the default EPP value, but then that would probably also mean changing it for Intel processors (at the very least, re-examining it; CPPC hardware mechanism probably differ between AMD and Intel). Also, your use cases involve loading cpufreq, and I don't find it unreasonable that people doing that are required to tune it in "uncommon" cases (a qualifier which we don't yet know whether it applies to your use cases).

@khng mentions that there is if (resource_disabled(HWP_AMD_CLASSNAME, 0)) return; in hwpstate_identify() so maybe setting hint.hwpstate_amd.disabled="1" in hint file also works?

That hint completely disables hwpstate_amd, which also contains the good old regular P-state driver. The new knob proposed in this revision just forces the use of that old code instead of the more modern CPPC support.

Oh, that's right and a good point. I believe that RDTUN is the best way for here.

BTW, I suggest that we land this patch first, and continue the discussion at https://bugs.freebsd.org/292615

This revision was automatically updated to reflect the committed changes.

BTW, I suggest that we land this patch first, and continue the discussion at https://bugs.freebsd.org/292615

It may be that commenting on that bug rather than this (closed) revision is better in this particular case, although in general I'm not convinced. In any case, constant back-and-forth movements are impractical. Not that I have a solution to propose, but perhaps some people have ideas, and at least I'd like to raise the concern (where could we discuss development occurring in a given area in general?). Transferring most of the info here back into the bug.