Page MenuHomeFreeBSD

qoriq_therm: Fix sensor detection
AcceptedPublic

Authored by mindal_semihalf.com on Nov 30 2021, 12:18 PM.

Details

Reviewers
mmel
mw
wma
manu
Summary

Instead of assuming that we have six thermal sensors parse the thermal-zones node to obtain the actual number.
With this we can also assign a more descriptive name to each sensor.
Also use this opportunity to fix a memory leak in DT sensor calibration data parsing routine.

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

@mmel Do you have any objections to merge this patch?

This revision is now accepted and ready to land.Dec 6 2021, 8:35 AM

I'm sorry, but I don't agree with this.
The temperature sensor is a different entity from a temperature zone which we can not mix. A given SOC always has a fixed number of temperature sensors but the number of temperature zones is determined by the design of the board/laptop/equipment. In addition, one temperature zone can be controlled by multiple temperature sensors (where each can have a different weight) and can control multiple cooling devices. A sensor may or may not be a member of a thermal zone and yet its value may be useful to the user (and thus accessible, for example, by sysctl(8)).
I have an initial implementation (almost ready to commit) of a temperature sensor framework and a very early implementation of thermal zones - both necessary for temperature controlled cooling. The problem is that I don't have the free time to finish them right now... So take what you want, or give me time over Christmas, I'll try to finish it to a committable state.

My motivation for this patch is that on LS1028A there are only two accessible sensors.
From what I can see thermal-zones node is the only place where we can find the number of sensors supported.
Attempting to parse the "extra" sensors yields the following:

Sensor site2 timeouted
soctherm0: Sensor site2 timeouted
hw.temperature.ssoctherm0: Sensor site3 timeouted
ite2: -0.0C
soctherm0: Sensor site3 timeouted
hw.temperature.ssoctherm0: Sensor site4 timeouted
ite3: -0.0C
soctherm0: Sensor site4 timeouted
hw.temperature.ssoctherm0: Sensor site5 timeouted
ite4: -0.0C
soctherm0: Sensor site5 timeouted
hw.temperature.ssoctherm0: Sensor site6 timeouted
ite5: -0.0C
soctherm0: Sensor site6 timeouted
hw.temperature.site6: -0.0C

Even with your patch for handling thermal-zones imho we'll still need to read the amount of sensors on a given soc.

A given SOC always has a fixed number of temperature sensors but the number of temperature zones is determined by the design of the board/laptop/equipment.

I don't think that is the case, at least not on freescale SoCs. I did a simple grep serach and the only places where thermal-zones is mentioned are the SoC related .dtsi files.

Sorry, I didn't realize we were talking about a different SoC.
For me, the only relevant source on the number of sensors is TRM, not the DT files.
The current code is ready for variable number of sensors, just fill a new struct tsensor array according to TRM and use that instead of default_sensors. Unfortunately, due to a strange habit in QorIQ DT we have to use compatible string of the root node itself instead of the compatible string of the thermal controller :( . See comment on (original) line 311.
Where do you see the problem?

Again, a thermal zone is an area with the same thermal policy, nothing else. Each SoC should have one or more thermal zones for clock throttling of given part of SoC, each board with active cooling should have another (possibly shared with the above) for driving of cooling device. I plan to try push a similar change to the upstream for Honeycomb board. Having a fixed (and loud) fan is not a fun...
You may look to arm/am57xx-beagle-x15-common.dtsi for proper board level thermal zone definition...

Hmm, ok I see.
I agree that parsing thermal-zones here is somewhat hacky.
As you pointed out ideally we would have a separate driver for thermal zones.
Once your thermal-zones driver lands we could use it to expose all sensors and remove all sysctls from this one.
This will achieve two things:

  1. Only sensors described in thermal-zones are exposed, mitigating the timeout issue on LS1028A.
  2. The sysctls themselves can have a meaningful name. Right now we have something like:
hw.temperature.site0
hw.temperature.site1
(...)

IMHO this is not user-friendly approach.
Now if we use DTS labels instead on LS1028A we get:

hw.temperature.ddr-controller
hw.temperature.core-cluster

I see. But why you cannot use something like this (tested only on Honecomb) https://github.com/strejda/freebsd/commit/40eb737b2e9ca485daef8bf2effa96b053f847f9 ? Advantage of this code is that it can stay unchanged after thermal-zone driver was introduced..
I see. But why can't you use something like this (tested only on Honeycomb) https://github.com/strejda/freebsd/commit/40eb737b2e9ca485daef8bf2effa96b053f847f9 ? The advantage of this code is that it can remain unchanged after the thermal zone driver has been introduced.

I guess that your patches should work just fine, it's just that I don't really like the idea of extracting data from DT and hardcoding it in a driver.