Page MenuHomeFreeBSD

amdtemp driver update
Needs ReviewPublic

Authored by rozhuk.im-gmail.com on Feb 22 2017, 11:59 PM.
Tags
Referenced Files
F103285573: D9759.diff
Sat, Nov 23, 2:11 AM
F103246285: D9759.id26074.diff
Fri, Nov 22, 2:31 PM
Unknown Object (File)
Thu, Nov 21, 12:30 PM
Unknown Object (File)
Thu, Nov 21, 9:06 AM
Unknown Object (File)
Thu, Nov 21, 2:26 AM
Unknown Object (File)
Wed, Nov 20, 11:08 AM
Unknown Object (File)
Wed, Nov 20, 1:44 AM
Unknown Object (File)
Tue, Nov 19, 11:09 PM

Details

Summary

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194792

Completely rewritten amdtemp driver.
There is no longer binding to the processor ID, all the new processors will be supported without the need to modify the code.

Maybe for older CPUs (FAMILY 0x0f) that it would not work, I have not had a chance to test.

sensor_offset - can be customized for each sensor separately.
In fact there is only one sensor to the processor.

Thermtrip Status - all registers are read-only.
Hardware Thermal Control (HTC) - everything except HtcTmpLmt available for recording.
Reported Temperature Control - everything except CurTmp available for recording.
SB-TSI - most writable.
Register values ​​can be found in the processor documentation.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

At first, Thanks for your great work __/|\__
Technically I'm not right person to talk deeply about this.
But for helping to reduce amount of works for next reviewer (and also for you ;D),
Some style(9) points:
use the form:

type
name()
{
}

instead of

type
name() {
}
/usr/src/sys/dev/amdtemp/amdtemp.c
38

Where is it used?
I think maybe, maybe you can use KASSERT?

allanjude added inline comments.
/usr/src/sys/dev/amdtemp/amdtemp.c
4

this change looks like it is inadvertent

This needs a lot of style fixes. Please read the style man page and check other drivers for examples.

sys/dev/amdtemp/amdtemp.c
38 ↗(On Diff #25675)

No can do.

99 ↗(On Diff #25675)

{ should be in the line above.

788 ↗(On Diff #25675)

I understand why people like the inverted comparison, but it's not the current FreeBSD style.

I will fix style a bi later.

/usr/src/sys/dev/amdtemp/amdtemp.c
4

My code is old. Then I start there was 2009-2011.

38

This is for disable sysctl static type checks.

sys/dev/amdtemp/amdtemp.c
38 ↗(On Diff #25675)

Without this build fail.
I dont know how to fix.
Have you any ideas?

sys/dev/amdtemp/amdtemp.c
38 ↗(On Diff #25675)

What's the build error?

sys/dev/amdtemp/amdtemp.c
38 ↗(On Diff #25675)

/usr/src/sys/modules/amdtemp/../../dev/amdtemp/amdtemp.c:1077:7: error: static_assert expression is not an integral constant expression

regs[i].flags, sc,
^~~~~~~~~~~~~~~~~~

/usr/src/sys/sys/sysctl.h:733:13: note: expanded from macro 'SYSCTL_ADD_PROC'

CTASSERT(((access) & CTLTYPE) != 0);                            \
~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~

/usr/src/sys/sys/systm.h:103:36: note: expanded from macro 'CTASSERT'
#define CTASSERT(x) _Static_assert(x, "compile-time assertion failed")

^

/usr/src/sys/modules/amdtemp/../../dev/amdtemp/amdtemp.c:1093:7: error: static_assert expression is not an integral constant expression

regs[i].flags, sc,
^~~~~~~~~~~~~~~~~~

/usr/src/sys/sys/sysctl.h:733:13: note: expanded from macro 'SYSCTL_ADD_PROC'

CTASSERT(((access) & CTLTYPE) != 0);                            \
~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~

/usr/src/sys/sys/systm.h:103:36: note: expanded from macro 'CTASSERT'
#define CTASSERT(x) _Static_assert(x, "compile-time assertion failed")

^

2 errors generated.

  • amdtemp.o ---
  • [amdtemp.o] Error code 1
sys/dev/amdtemp/amdtemp.c
38 ↗(On Diff #25675)

Change the code to pass non-variable access controls to sysctl_add_proc.
Defeating CTASSERT() is not an option.

rozhuk.im-gmail.com edited edge metadata.

Style update.
CTASSERT() removed.

How can we test this on all hardware versions? I don't have AMD hardware. You could ask on the mailing list.

sys/dev/amdtemp/amdtemp.c
61 ↗(On Diff #25963)

{ in line above.

98 ↗(On Diff #25963)

{ in line above

102 ↗(On Diff #25963)

{ in line above. Please fix all instances.

116 ↗(On Diff #25963)

No reason for this to be packed, right?
Use a CTASSERT(sizeof(...)) to make sure the size is correct.

700 ↗(On Diff #25963)

*Does

814 ↗(On Diff #25963)

Fits in line above.

1060 ↗(On Diff #25963)

inverted comparison

I have some AMD: E350, 5350, APU 6800, Phenom II X4 955 - test OK.

sys/dev/amdtemp/amdtemp.c
814 ↗(On Diff #25963)

?

I have some AMD: E350, 5350, APU 6800, Phenom II X4 955 - test OK.

Yes, but that's not enough. Please ask on the current mailing list if anyone is willing to help you with testing.

sys/dev/amdtemp/amdtemp.c
904 ↗(On Diff #25980)

Better way to write all these bit comparisons is: if (sc->flags & AMDTEMP_XXX) {...}.

No need for comparison with zero and removes the extra parentheses.
Please change all of them in the file.

814 ↗(On Diff #25963)

if (bid != 0x07 && bid !=0x09 ...

Use the whole line up to 78 columns.

rpaulo edited edge metadata.

We can commit this once we get more testing done.

sys/dev/amdtemp/amdtemp.c
830 ↗(On Diff #26015)

Too many parentheses.

This revision is now accepted and ready to land.Mar 6 2017, 4:35 AM

Will there be man page updates?

rozhuk.im-gmail.com edited edge metadata.

Since revision 310051 sysctl_add_oid have +1 param, add macro to handle it.

This revision now requires review to proceed.Mar 8 2017, 5:34 AM

On

CPU: AMD Ryzen 7 1700X Eight-Core Processor          (3393.69-MHz K8-class CPU)

I get the following results:

sysctl dev.amdtemp

dev.amdtemp.0.rtc.sensor_offset: 0
dev.amdtemp.0.rtc.PerStepTimeUp: 0
dev.amdtemp.0.rtc.PerStepTimeDn: 0
dev.amdtemp.0.rtc.TmpMaxDiffUp: 0
dev.amdtemp.0.rtc.TmpSlewDnEn: 0
dev.amdtemp.0.rtc.CurTmpTjSel: -49.0C
dev.amdtemp.0.rtc.CurTmp: 0.1C
dev.amdtemp.0.%parent: hostb10
dev.amdtemp.0.%pnpinfo:
dev.amdtemp.0.%location:
dev.amdtemp.0.%driver: amdtemp
dev.amdtemp.0.%desc: AMD CPU On-Die Thermal Sensors
dev.amdtemp.%parent:

According to the BIOS on this machine, the CPU idles around 56C, which probably includes the +20C that AMD adds to Tctl for the 1700X and 1800X.

There are a lot more bits set in amd_pminfo 0x6599 vs. 0x7d9 on my FX-8320E. Looks like we need some more documentation ...

Update: None of the data reported above varies with CPU load though I can hear the CPU fan speed ramp up and down.

sys/dev/amdtemp/amdtemp.c
701 ↗(On Diff #26074)

On my Ryzen 1700X this test fails because CPUID_TO_FAMILY(cpuid) returns zero, causing the driver to fail to attach. If I change this to:

if (CPUID_TO_FAMILY(cpu_id) < 0x0f)

then the driver attaches.

I m waiting for documentation from AMD to Ryzen CPU, looks like 17h have new interfaces.

amd_pminfo (part of CPUID, as I remember) has many flags, only few of them used for temperature report interfaces.
All work based on AMD documentation for BIOS and kernel developers, 09h-16h processors families.

Add support Bristol Ridge, possible Ryzen - need check
Remove TSI reading methods

add Ryzen specific SMU registers.

I have not reviewed the entire driver because I ran into enough issues I want addressed first to warrant a highlevel comment first.

There is no need to change 80+% of driver lines to make these enhancements. There are a few nice changes here it would be good to have, but they are combined with a bunch of gratuitous changes. In fact, the functional enhancements should be split up into a few patches anyway.

I think the idea of scribbling to random PCI dbsf numbers via amdtemp_pci_write / amdtemp_pci_read runs very contra the NewBus style of drivers. But I may be mistaken.

Finally, I am unsure on the value of basically all of the extra register decoding. What use is TTS / HTC?

sys/dev/amdtemp/amdtemp.c
171 ↗(On Diff #32630)

Please use C99 initializers here so it is clear what fields the magic numbers are associated with when the structure definition is far away.

179 ↗(On Diff #32630)

Not 2731?

239 ↗(On Diff #32630)

sysctl naming convention is lowercase

480 ↗(On Diff #32630)

This change is a style(9) regression from the previous code.

509 ↗(On Diff #32630)

This check seems to defeat the point of trying to read the value from PCI space, doesn't it?

526 ↗(On Diff #32630)

This change is a style regression from the existing code.

531 ↗(On Diff #32630)

This change is gratuitous?

584 ↗(On Diff #32630)

Can pci_io be set to any other value?

639 ↗(On Diff #32630)

This change seems gratuitous?

668 ↗(On Diff #32630)

What's this number?

767 ↗(On Diff #32630)

Gratuitous change?

772–776 ↗(On Diff #32630)

Gratuitous change?

783 ↗(On Diff #32630)

Gratuitous change?

815 ↗(On Diff #32630)

style(9) nit: No space between i and ++

816 ↗(On Diff #32630)

style(9) nit: if (sc->sysctl_cpu[i] != NULL) {

822–823 ↗(On Diff #32630)

Probably reverse the order of these two?

850 ↗(On Diff #32630)

Seems like gratuitous changes? And the extra space contradicts style(9).

rozhuk.im-gmail.com marked 4 inline comments as done.
rozhuk.im-gmail.com marked 4 inline comments as done.
rozhuk.im-gmail.com added inline comments.
sys/dev/amdtemp/amdtemp.c
179 ↗(On Diff #32630)

What is 2731?

526 ↗(On Diff #32630)

This is temporary change for ryzen debug.

sys/dev/amdtemp/amdtemp.c
179 ↗(On Diff #32630)

floor(0°C in deci-Kelvins). See r300421 and r285994.

526 ↗(On Diff #32630)

That doesn't make sense. It doesn't change functionality; it's purely a style change. How does a style change help debug?

More proper temp calculatuion based on RangeUnajusted, according to "Preliminary BIOS and Kernel Developer’s Guide (BKDG) for AMD Family 16h Models 00h-0Fh (Kabini) Processors".