Page MenuHomeFreeBSD

Fix temperature reporting on AMD 2990WX
ClosedPublic

Authored by gallatin on Aug 23 2018, 12:06 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 17, 5:17 PM
Unknown Object (File)
Thu, Jan 16, 5:45 AM
Unknown Object (File)
Wed, Jan 15, 1:22 PM
Unknown Object (File)
Thu, Jan 9, 5:44 PM
Unknown Object (File)
Dec 4 2024, 5:59 AM
Unknown Object (File)
Dec 4 2024, 5:58 AM
Unknown Object (File)
Dec 4 2024, 5:58 AM
Unknown Object (File)
Dec 4 2024, 5:58 AM
Subscribers

Details

Summary

Update the AMD family 17h temperature reporting based on AMD Tech Doc 56255 OSRR, section 4.2.1:

  • For CPUS w/CUR_TEMP_RANGE_SEL set, scale the reported temp in the range -49..206, eg, subtract 49C

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Change looks functionally correct to me. Some style-y nitpicks below:

sys/dev/amdtemp/amdtemp.c
135–139 ↗(On Diff #47144)

I'd move these definitions to be up by AMDTEMP_17H_CUR_TMP.

FWIW, the PPR calls the register THM_TCON_CUR_TMP, the high 11 bits "CUR_TEMP," and bit 19 "CUR_TEMP_RANGE_SEL."

I'd suggest expanding the comment. The doc for the RANGE_SEL bit says "0=Report on 0C to 225C scale range. 1=Report on -49C to 206C scale range." I suspect the 225C is a typo for 255C.

The other bits in this register related to something called slewing and "Enable MCM for THM" (multi-chip module? Not sure what that does).

There are some nearby registers related to thermal temp limit and whether that has been tripped, but I don't think they really relate to this driver. It still might be interesting to display them (off-topic for this change).

Thank you so much for the pointer to the docs!!! It is much nicer to write from real docs, rather than crib from undocumented changes in another project.

Looking at the docs, it says that the range of CUR_TEMP is: 0..225C when CUR_TEMP_RANGE_SEL is not set, and -49C ... 206C when it is. So, based on my reading of the docs, our current scaling is wrong, since it does not scale from 0..225, but from 0..255. Eg, with our current scaling, max possible value, 0x7ff, would result in a temp of 255 not 225. So I've updated the scaling to reflect this. Interestingly, when bit 19 is set, the range is again 256 wide, so we can use the old scaling there.

gallatin edited the summary of this revision. (Show Details)

Thank you so much for the pointer to the docs!!! It is much nicer to write from real docs, rather than crib from undocumented changes in another project.

Agreed!

Looking at the docs, it says that the range of CUR_TEMP is: 0..225C when CUR_TEMP_RANGE_SEL is not set, and -49C ... 206C when it is.

Yeah, like I said earlier, I think that's a pretty clear typo for 255C. The temperature scale does not change between the modes and 206+49=255.

So, based on my reading of the docs, our current scaling is wrong, since it does not scale from 0..225, but from 0..255. Eg, with our current scaling, max possible value, 0x7ff, would result in a temp of 255 not 225. So I've updated the scaling to reflect this. Interestingly, when bit 19 is set, the range is again 256 wide, so we can use the old scaling there.

I don't believe this is correct.

cem requested changes to this revision.EditedAug 23 2018, 5:12 PM
In D16855#359281, @cem wrote:

Yeah, like I said earlier, I think that's a pretty clear typo for 255C. The temperature scale does not change between the modes and 206+49=255.

The right document for the temperature sensor is SB-TSI (40821), which you may note is dated 2010 and has not changed from earlier sensors (i.e., amdtemp_gettemp directly above uses the same math).

§ 2.6 SB-TSI Temperature and Threshold Encodings:

SB-TSI CPU temperature readings and limit registers encode the temperature in increments of 0.125 from 0 to 255.875.

Essentially the encoding is: one byte (8 bits) for the integer portion of the value, and the remaining 3 bits specify the fractional temperature in increments of 1/8 °C.

This revision now requires changes to proceed.Aug 23 2018, 5:12 PM
gallatin edited the summary of this revision. (Show Details)

Thanks for the clarification on the non-offset range!

Thanks! Mostly looks good to me. A few quibbles:

sys/dev/amdtemp/amdtemp.c
115–118 ↗(On Diff #47194)

Sentences should end in a period.

I don't love the explanation of the range select bit but it's fine.

608 ↗(On Diff #47194)

Why add excess parentheses?

610 ↗(On Diff #47194)

Does this calculation work without producing a warning (the negative value 490 is treated as an unsigned value)? I'd just use -= and positive 490.

gallatin marked an inline comment as done.
sys/dev/amdtemp/amdtemp.c
608 ↗(On Diff #47194)

typo, didn't notice in a plain text diff; removed

610 ↗(On Diff #47194)

It compiled just fine, without warnings.

I would have preferred -= with a positive 490 as well, but I was trying to be consistent with the sc->offset. Since we both prefer it the other way around, let's do it that way.

Perfect, looks great to me! Thanks for adding this.

This revision is now accepted and ready to land.Aug 26 2018, 1:30 AM
This revision was automatically updated to reflect the committed changes.