The P-state configuration registers are writable, so we can tweak voltage and frequency.
Add a tunable that allows overriding the BIOS-configured P-states.
Details
- Reviewers
- None
Works on my Ryzen 3700X.
Setting P2 to 1 GHz is noticably slower and uses a little bit less power under load. Likewise, setting all P-states to 3 GHz uses more power when busy.
Idle power (for the whole system) does not change in a measurable way (only when cpu is busy).
powerd was not running when testing.
$ kenv debug.hwpstate_override_pstatecfgmsr="0x8000000049120890 0x80000000459a0c84 0x80000000459A0828" $ kldload /boot/modules/cpufreq.ko $ sysctl dev.cpu.0 dev.cpu.0.freq_levels: 3600/3960 2200/1980 1000/1980 $ sysctl dev.cpu.0.freq=3600 $ sha512 -t SHA512 time trial. Digesting 100000 10000-byte blocks ... done Digest = 65898ddd069bab932fca2bf7923dd28d074789d3617904e68e0017a6b7dfa2c654e673c5eab502dd6930b6aef5fe36e126698e73822e0092c25c489f456079a2 Time = 2.062212 seconds Speed = 462.452118 MiB/second $ sysctl dev.cpu.0.freq=2200 $ sha512 -t SHA512 time trial. Digesting 100000 10000-byte blocks ... done Digest = 65898ddd069bab932fca2bf7923dd28d074789d3617904e68e0017a6b7dfa2c654e673c5eab502dd6930b6aef5fe36e126698e73822e0092c25c489f456079a2 Time = 3.378108 seconds Speed = 282.310181 MiB/second $ sysctl dev.cpu.0.freq=1000 $ sha512 -t SHA512 time trial. Digesting 100000 10000-byte blocks ... done Digest = 65898ddd069bab932fca2bf7923dd28d074789d3617904e68e0017a6b7dfa2c654e673c5eab502dd6930b6aef5fe36e126698e73822e0092c25c489f456079a2 Time = 7.435838 seconds Speed = 128.253769 MiB/second
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
sys/x86/cpufreq/hwpstate_amd.c | ||
---|---|---|
451 | Presumably we want to be able to check all of the states before writing any MSRs? Is there any additional validation that we should do? Are we assuming that this function is running on a specific CPU, e.g., the BSP? |
thanks for looking at the patch
sys/x86/cpufreq/hwpstate_amd.c | ||
---|---|---|
451 |
Well spotted, this is actually missing a CPU_FOREACH.
I was gonna treat this tunable as a footgun. If sysadmin puts bogus hex numbers in then maybe treat that as intentional. |
sys/x86/cpufreq/hwpstate_amd.c | ||
---|---|---|
206–207 | hm... more questions... who owns the thread that calls hwpstate_goto_pstate()? The call chain is something like: for example, powerd will try to set the frequency via sysctl(). |
sys/x86/cpufreq/hwpstate_amd.c | ||
---|---|---|
206–207 | This code will run in whichever thread invoked the sysctl. This could be powerd's thread, yes. I think this function is missing a call to sched_unbind() after it's finished per-CPU operations. Currently it will return to usermode with the thread having been pinned to a specific CPU, which is almost certainly a bug. | |
451 |
CPU_FOREACH generally isn't going to be enough: to write an MSR for a particular CPU the thread must be executing on that CPU. See also the x86_msr_op() function, which lets you read/write to MSRs on different CPUs using inter-processor interrupts. That's an alternative to binding the current thread to different CPUs in succession.
I'd suggest putting "debug" in the name somehow. |
x86_msr_op looks much better and already does the right thing wrt unbind. thanks for the suggestion!
sys/x86/cpufreq/hwpstate_amd.c | ||
---|---|---|
451 |
The full tunable name is "debug.hwpstate_override_pstatecfgmsr" |