Page MenuHomeFreeBSD

cpufreq(4): allow overriding P-state configuration
Needs ReviewPublic

Authored by jo_bruelltuete.com on May 19 2023, 12:18 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Apr 7, 12:28 PM
Unknown Object (File)
Feb 19 2024, 1:29 PM
Unknown Object (File)
Dec 30 2023, 12:52 AM
Unknown Object (File)
Dec 22 2023, 10:56 PM
Unknown Object (File)
Dec 13 2023, 2:47 AM
Unknown Object (File)
Dec 3 2023, 5:31 PM
Unknown Object (File)
Nov 8 2023, 10:50 AM
Unknown Object (File)
Oct 7 2023, 9:50 AM
Subscribers
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

The P-state configuration registers are writable, so we can tweak voltage and frequency.
Add a tunable that allows overriding the BIOS-configured P-states.

Test Plan

Works on my Ryzen 3700X.
Setting P2 to 1 GHz is noticably slower and uses a little bit less power under load. Likewise, setting all P-states to 3 GHz uses more power when busy.
Idle power (for the whole system) does not change in a measurable way (only when cpu is busy).
powerd was not running when testing.

$ kenv debug.hwpstate_override_pstatecfgmsr="0x8000000049120890 0x80000000459a0c84 0x80000000459A0828"
$ kldload /boot/modules/cpufreq.ko

$ sysctl dev.cpu.0
dev.cpu.0.freq_levels: 3600/3960 2200/1980 1000/1980

$ sysctl dev.cpu.0.freq=3600
$ sha512 -t
SHA512 time trial. Digesting 100000 10000-byte blocks ... done
Digest = 65898ddd069bab932fca2bf7923dd28d074789d3617904e68e0017a6b7dfa2c654e673c5eab502dd6930b6aef5fe36e126698e73822e0092c25c489f456079a2
Time = 2.062212 seconds
Speed = 462.452118 MiB/second

$ sysctl dev.cpu.0.freq=2200
$ sha512 -t
SHA512 time trial. Digesting 100000 10000-byte blocks ... done
Digest = 65898ddd069bab932fca2bf7923dd28d074789d3617904e68e0017a6b7dfa2c654e673c5eab502dd6930b6aef5fe36e126698e73822e0092c25c489f456079a2
Time = 3.378108 seconds
Speed = 282.310181 MiB/second

$ sysctl dev.cpu.0.freq=1000
$ sha512 -t
SHA512 time trial. Digesting 100000 10000-byte blocks ... done
Digest = 65898ddd069bab932fca2bf7923dd28d074789d3617904e68e0017a6b7dfa2c654e673c5eab502dd6930b6aef5fe36e126698e73822e0092c25c489f456079a2
Time = 7.435838 seconds
Speed = 128.253769 MiB/second

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

markj added inline comments.
sys/x86/cpufreq/hwpstate_amd.c
451

Presumably we want to be able to check all of the states before writing any MSRs?

Is there any additional validation that we should do? Are we assuming that this function is running on a specific CPU, e.g., the BSP?

thanks for looking at the patch

sys/x86/cpufreq/hwpstate_amd.c
451

Is there any additional validation that we should do? Are we assuming that this function is running on a specific CPU, e.g., the BSP?

Well spotted, this is actually missing a CPU_FOREACH.
Funny enough, it worked anyway... maybe just by accident. I'll update it.

Presumably we want to be able to check all of the states before writing any MSRs?

I was gonna treat this tunable as a footgun. If sysadmin puts bogus hex numbers in then maybe treat that as intentional.
But i'm not particularly attached to that notion. Can change it to parse-first-write-later.

jo_bruelltuete.com marked an inline comment as done.
sys/x86/cpufreq/hwpstate_amd.c
206–207

hm... more questions...

who owns the thread that calls hwpstate_goto_pstate()?
here, its cpu affinity is changed and when done not reverted back to what it was on entry.

The call chain is something like:
sysctl thingy -> cpufreq_curr_sysctl-> hwpstate_set -> hwpstate_goto_pstate

for example, powerd will try to set the frequency via sysctl().
Is the sysctl handler called from powerd's thread?

sys/x86/cpufreq/hwpstate_amd.c
206–207

This code will run in whichever thread invoked the sysctl. This could be powerd's thread, yes.

I think this function is missing a call to sched_unbind() after it's finished per-CPU operations. Currently it will return to usermode with the thread having been pinned to a specific CPU, which is almost certainly a bug.

451

this is actually missing a CPU_FOREACH.

CPU_FOREACH generally isn't going to be enough: to write an MSR for a particular CPU the thread must be executing on that CPU.

See also the x86_msr_op() function, which lets you read/write to MSRs on different CPUs using inter-processor interrupts. That's an alternative to binding the current thread to different CPUs in succession.

I was gonna treat this tunable as a footgun.

I'd suggest putting "debug" in the name somehow.

x86_msr_op looks much better and already does the right thing wrt unbind. thanks for the suggestion!

sys/x86/cpufreq/hwpstate_amd.c
451

I was gonna treat this tunable as a footgun.

I'd suggest putting "debug" in the name somehow.

The full tunable name is "debug.hwpstate_override_pstatecfgmsr"