Page MenuHomeFreeBSD

intel_thermal: Add Intel Processor Thermal driver
Needs ReviewPublic

Authored by guest-seuros on Jan 26 2026, 2:05 AM.
Referenced Files
Unknown Object (File)
Sun, May 17, 11:10 AM
Unknown Object (File)
Mon, May 11, 4:04 PM
Unknown Object (File)
Mon, May 11, 1:46 PM
Unknown Object (File)
Mon, May 11, 9:19 AM
Unknown Object (File)
Mon, May 11, 4:46 AM
Unknown Object (File)
Mon, May 11, 1:19 AM
Unknown Object (File)
Sun, May 10, 9:49 PM
Unknown Object (File)
Sun, May 10, 7:02 PM
Subscribers

Details

Summary

This device exposes RAPL power limits (PL1/PL2) and TDP information via sysctl. The power unit is read from the RAPL_UNIT MMIO register at attach time, with fallback to MSR_RAPL_POWER_UNIT (MSR 0x606) and then to the Skylake default of 1/8 W steps if both reads fail.

Sysctls provided:

dev.intel_thermal.0.pl1    - Long-term power limit (mW)
dev.intel_thermal.0.pl2    - Short-term power limit (mW)
dev.intel_thermal.0.tdp    - Thermal Design Power (mW)
dev.intel_thermal.0.locked - Firmware lock status

Note: Only the Skylake device (0x1903) has been tested. Later generations use the same BDF (B0D4F0) with different PCI device IDs and would need per-generation validation before being added.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 72398
Build 69281: arc lint + arc unit

Event Timeline

guest-seuros created this revision.

Wire into build; add PL2 enabled/lock bit; align sysctl names

Rename to intelthermal for consistency; add man page

guest-seuros added reviewers: ziaee, kib.
guest-seuros retitled this revision from intel/intel_proc_thermal: Add Intel Processor Thermal driver to intelthermal: Add Intel Processor Thermal driver.

@ziaee , shall we name this driver intelthermal or intel_thermal to mirror acpi_thermal < already in tree ?

PS: this diff need rebasing/fix once D54881 land.

@ziaee , shall we name this driver intelthermal or intel_thermal to mirror acpi_thermal < already in tree ?

PS: this diff need rebasing/fix once D54881 land.

Yes, I'm all in favor of it, and I checked with srcmgr@ that having underscores is OK.

Yes, I'm all in favor of it, and I checked with srcmgr@ that having underscores is OK.

Meaning intel_thermal(4), if that wasn't clear.

There's no issues with _ in driver names unless it gets too carried away: _i_n_t_e_l___t_h_e_r_m_a_l_ would be right out.

Though we generally try to have the driver module match the driver name, and here they don't match. So I'd suggest renaming the module to intel_thermal as well.

! In D54882#1291723, @guest-seuros wrote:
@ziaee , shall we name this driver intelthermal or intel_thermal to mirror acpi_thermal < already in tree ?

intel_thermal is much better for consistency with acpi_thermal. I was not intending to say that underscores are prohibited in the other review, I was intending to say that naming the driver exactly what a different named in linux is confusing, especially because the "_core" appendix is not mentioned in my glance through the datasheets or any other doc and also seems like a lot of underscores. Sorry for the confusion.

share/man/man4/intelthermal.4
1–5 ↗(On Diff #175349)

please consider using the preferred license template for all of these. in the last one, you changed just the one in the manual and the driver was committed with the other files still showing this style.

remove stale _intel_pmc_core from modules Makefile

this looks fine to me, anyone else?

This revision is now accepted and ready to land.Thu, Apr 30, 1:47 AM
share/man/man4/intel_thermal.4
11
13–25
29

better to use a name, then define the acronym.

47–57
share/man/man4/intel_thermal.4
11

I had Skylake in an earlier revision/commit, but since i have other architecture now, i removed it.

The message in the footer is just informative of what was implemented/tested.

Once this land, i open another diff.

olce requested changes to this revision.Fri, May 1, 3:26 PM

It seems that access to MMIO registers is wrong here, in a similar way as for the proposed Intel PMC driver (D54881). Offsets RAPL_PKG_POWER_* are applied to BAR(0), whereas they should be applied to MCHBAR in the host bridge's PCI config space. See also inline comments for some nits.

I have a bit alarming concerns about how this (and D54881) have been coded, on which information they are based exactly, how they have been tested and where all this is heading.

The supposed device ID according to the Intel Gen10 doc is 0x3Exx (should be at PCI 0:0:0), and actually refers to the host bridge, whereas here 0x1903 is used. I can see a mention of 0x1900 in some Gen6 doc. Concern here is, IIUC, you are basically claiming the host bridge device just for its "thermal" part. If no other driver claims it at the moment, that's fine (a very quick search did not return candidates), but further dev may need to refactor this into a kind of sub-driver. At the very least, this needs to be documented.

In the Intel documents I have read (which include "Intel 10th Gen Core Processor Datasheet Vol 2" that you're quoting explicitly), I see no mention at all of any B0D4 thermal device. And there's actually no use of this ID anywhere in this driver, it's just mentioned in the main source herald comment and the manual page. Where does this ID come from? What does it mean? If it has no bearing on what is done here, it should be removed, else its meaning/purpose should be made explicit. After some research, let me guess: I suspect this ID comes from Linux's processor_thermal_device_pci_legacy.c, where it appears in a comment, unexplained, and is not used anywhere in that driver either. And, thus, I suspect some the aspects of this driver have been inspired by some Linux code, which would not be wrong but should be acknowledged.

The offsets used look problematic. RAPL_PKG_POWER_INFO is 0x5994, which according to Intel's doc is RP_STATE_LIMITS, whose description is: "This register allows SW to limit the maximum base frequency for the Integrated GFX Engine (GT) allowed during run-time.", which just doesn't seem to be the information you are after. RAPL_PKG_POWER_INFO is 0x59A0, an offset that does not appear at all in the same Intel doc. No more luck in the datasheet for Intel Gen6 (Skylake) processors. Where do these offset come from? Typically, I see a different offset in Linux that should be used for RAPL_PKG_POWER_INFO.

I suspect you have been "vibe coding". A mention that it's the case and which IA was used would be appropriate. And let me add that, while getting some help from AI is OK, it does not exempt you (and reviewers) from doing the work of checking what the produced code actually does and if it matches existing docs and possibly other code, and documenting thoroughly your research and findings. The work you're not doing yourself is on the reviewers, which does not scale and, to be frank, can be annoying. How was this code tested? Given the problem reported at start, I don't see how you can get any values that make sense as power values.

In Linux, the RAPL driver apparently has 3 ways of accessing registers, among which there is access through the PCH (as you're trying to do here), non-architectural MSRs and TPMI, which I didn't know about but quickly explored (see, e.g. https://github.com/intel/tpmi_power_management). I'm wondering if creating a PCH driver, where each specific model must be added, is really worth it (but if you have changes for other models cooking). Personally, I'm interested in having power statistics for recent generation of Intel platforms (>= Gen 10, and in particular Core Ultra Series 1), and generally think we should target modern hardware as much as possible. Do you have any idea about what differences there are in terms of hardware support and "end-user" functionality between the 3 access methods I have just mentioned?

sys/dev/intel/intel_thermal.c
32

They are not relative to BAR 0 according to the doc. See general comment.

219

It's inconsistent to hardcode the shift here but define a constant for the divisor. Additionally, the hardware only provides a shift.

So, what's controlling should be the shift. There should be some POWER_UNIT_DEFAULT_SHIFT constant, and I'd just drop power_unit_div entirely (including the sysctl knob; the driver already does the unit conversions, so I don't see a point for that anyway).

This revision now requires changes to proceed.Fri, May 1, 3:26 PM
This comment was removed by guest-seuros.

Address review: document BAR0/MCHBAR relationship with measured register dumps; correct comment to note distinct physical windows

@olce Thank you for the review.

Your feedback is correct.

This was one of my first kernel drivers, written few months ago.
The wrong offsets were not caught because, on the machine I used at the time, reads from 0x5994 returned plausible-looking values while the correct datasheet register returned 42879 W. That discrepancy made the wrong offset appear correct... the vendor firmware had MCHBAR misconfigured in a way that made the "wrong" register return something sane.

While debugging a separate firmware issue on that machine, I ported the board to coreboot. After that, MCHBAR initialization changed, and 0x5994 clearly maps to RP_STATE_LIMITS, not package power information. I should have retested the driver after the firmware change. That was my mistake.

Regarding vibe coding: I don't use AI to generate code. A vibe-coded result would look like a table of every known Intel PCI ID, this diff has exactly one, because that is the only device I have tested.

The issue was a hardware quirk was used to be standard value, even when datasheet was saying otherwise.

I do use AI to help reformulate comments and documentation, or to understand something.


I have verified the corrected offsets on the board with the current revision:

intel_thermal0: PL1 15000 mW, PL2 18750 mW, TDP 15000 mW


This driver is not final, once we add other generations, the code will mutate to support different architecture difference. (The current code is provided as if future architecture don't exist yet)

guest-seuros retitled this revision from intelthermal: Add Intel Processor Thermal driver to intel_thermal: Add Intel Processor Thermal driver.Sun, May 3, 11:11 AM
guest-seuros edited the summary of this revision. (Show Details)

Thanks for the update.

At the moment, this one is on hold because Tom Jones has been independently working on a RAPL driver based on MSRs, which is now under review at D56790 and D56791. It looks like that MSR approach should work even on Skylake processors, in which case we would much prefer it to the driver here (simpler and much better documented way to obtain the same information).

Thanks for the update.

At the moment, this one is on hold because Tom Jones has been independently working on a RAPL driver based on MSRs, which is now under review at D56790 and D56791. It looks like that MSR approach should work even on Skylake processors, in which case we would much prefer it to the driver here (simpler and much better documented way to obtain the same information).

Ok, but surely there's a way to leverage both of them? Why does intel have two approaches (MMIO/BAR, MSRs) for this stuff? Which families implement which features and bus attachments?