Page MenuHomeFreeBSD

hmp(4): introduce Heterogeneous MultiProcessing support
Needs ReviewPublic

Authored by minsoochoo0122_proton.me on Apr 21 2026, 10:21 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jun 25, 4:09 PM
Unknown Object (File)
Thu, Jun 25, 3:06 PM
Unknown Object (File)
Thu, Jun 25, 1:29 AM
Unknown Object (File)
Wed, Jun 24, 8:07 AM
Unknown Object (File)
Mon, Jun 22, 9:47 PM
Unknown Object (File)
Sat, Jun 20, 11:09 AM
Unknown Object (File)
Mon, Jun 15, 4:24 AM
Unknown Object (File)
Fri, Jun 5, 8:58 PM

Details

Reviewers
imp
olce
adrian
Summary

Add initial support for HMP.

The hmp(4) framework sits between scheduler and providers. It aims to
forward information from provider to scheduler to enable hybrid
scheduling.

Support for controlling hardware from scheduler (e.g. Arm SCMI) will be
added later.

For now, disable this option by default and enable when it becomes
stable enough.

Sponsored by: FreeBSD Foundation

Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 73753
Build 70636: arc lint + arc unit

Event Timeline

Changes from D56547:

  • Moved from struct pcpu to DPCPU so we can avoid options HMP on pcpu.
  • Reworked scores system. Since scheduler only cares about perf and eff scores, we don't need other capabilities.
  • Capacities don't need atomic operation since it is written once on boot.
  • Added sysctls.

@koinec_yahoo.co.jp Could you please work on intelhfi based on this patch series?

Thank you very much for improving the HMP code and creating the manual.

Understood. I will proceed with supporting this version.

  • Remove dynamic and throttle flags as they are not needed anymore
  • Add hmp_lowest_capacity_cpu()
  • Remove dynamic and throttle flags as they are not needed anymore

Thank you very much for the improvements.
Is it possible to retain the flag field and the definition of the "Throttled" flag?

If the intention behind removing it is that it might not be immediately used in the ULE scheduler, I understand the reasoning. I am currently learning about the ULE scheduler while testing intelhfi. Since ULE focuses on distributing loads to the lowest-load CPUs while remaining cache-aware, I agree that the Throttled flag isn't strictly necessary for ULE itself.

However, I view HMP(4) as a forward-looking mechanism designed to share scores across heterogeneous multicore architectures, rather than something optimized solely for the ULE scheduler. Also, considering that the Intel Hardware Feedback Interface defines a "Throttled" state, I assume this flag was initially introduced with Arm architecture development in mind.
Therefore, notification of the "Throttled" state could become essential in the future for managing heterogeneous multicore environments. For this reason, I would like to request keeping the Throttled state configurable rather than removing it.

On the other hand, since "Capacity" represents a static capability, dynamic updates would be handled by a Provider that sets the "Score." Thus, I agree that the "Dynamic" flag can be safely removed.

Additionally, the Intel Hardware Feedback Interface defines a state for notifying power-efficient cores (such as LP-E cores) to aggregate all tasks. Specifically, when the HFI table's Efficiency score reaches its maximum value of 255, it requests aggregating all tasks to that specific core to maximize battery efficiency. This is conceptually the opposite of the "Throttled" state.
To support this behavior, would it be possible to add a "Consolidated" flag to the flag field?

  • Remove dynamic and throttle flags as they are not needed anymore

Thank you very much for the improvements.
Is it possible to retain the flag field and the definition of the "Throttled" flag?

If the intention behind removing it is that it might not be immediately used in the ULE scheduler, I understand the reasoning. I am currently learning about the ULE scheduler while testing intelhfi. Since ULE focuses on distributing loads to the lowest-load CPUs while remaining cache-aware, I agree that the Throttled flag isn't strictly necessary for ULE itself.

However, I view HMP(4) as a forward-looking mechanism designed to share scores across heterogeneous multicore architectures, rather than something optimized solely for the ULE scheduler. Also, considering that the Intel Hardware Feedback Interface defines a "Throttled" state, I assume this flag was initially introduced with Arm architecture development in mind.
Therefore, notification of the "Throttled" state could become essential in the future for managing heterogeneous multicore environments. For this reason, I would like to request keeping the Throttled state configurable rather than removing it.

Currently the sole goal for hmp(4) is integrating with the ULE scheduler (or any other future scheduler). When I added the throttled flag, I thought it would be useful for thread placement so the scheduler can avoid placing thread to throttled core. But this comes with two drawbacks:

  1. Assume we have two core arm64 board and buildworld (or do any heavy work) on it. Then both cores are likely to be flagged as throttled. The scheduler won't assign thread to ether core, but that's impossible. Otherwise, the scheduler will conclude that all cores are throttled and place the thread at a random core, but then we wasted O(n) time to make that conclusion. (O(n) for 2c board won't be a huge issue, but imagine this is happening to a 256c server)
  2. Throttled should be a part of thermal subsystem like cpufreq. For Arm specifically, I believe writing cpufreq_scmi driver would be better for handling throttled flag. You might ask what if we control the cores through hmp(4), but then we have two subsystems (cpufreq, hmp) for same work which introduces another layer of uncertainty.

Additionally, the Intel Hardware Feedback Interface defines a state for notifying power-efficient cores (such as LP-E cores) to aggregate all tasks. Specifically, when the HFI table's Efficiency score reaches its maximum value of 255, it requests aggregating all tasks to that specific core to maximize battery efficiency. This is conceptually the opposite of the "Throttled" state.
To support this behavior, would it be possible to add a "Consolidated" flag to the flag field?

hmp(4) subsystem works like a greatest common divisor. In other words, it only abstracts what seems to be common across many providers. But this seems to be intelhfi specific.

Moving a bulk of tasks at a time is generally avoided because thread replacement without reason is bad (e.g. cache miss, TLB miss, etc). And even without making a bulk replacement, if a core's efficiency score is high enough, the tasks should be balanced gradually over time according to the algorithm, so I don't see any reason why the aggregation needs to happen at the same time. But as you said, if it's good for battery efficiency, it's worth discussing with people interested in schedulers and laptop projects.

Thank you very much for the explanation. I understand it much better now.

I understand that "throttled" should indeed be handled by the thermal management mechanism and that it also affects performance.
Also, regarding "Consolidated" flag as you pointed out, I also recognize it as a feature specific to Intel HFI.

Based on your feedback, I have uploaded an intelhfi patch with the HMP flags removed. I would appreciate it if you could check it.

  • total_capacity should have type hmp_capacity_t
  • hmp_capacity_t is now uint32_t since total_capacity will overflow when there are 64 or more cores. hmp_score_t has also changed to uint32_t to reflect the change.
sys/sys/hmp.h
73

Is this type "hmp_capacity_t" ?

(This may not be the appropriate place to post this, but please forgive me as I don't know of any other suitable location.)

As part of testing intelhfi(4), I attempted to modify sched_ule(4) to utilize hmp(4), and I was able to achieve a performance improvement of up to +10-15% in some cases. I plan to post the diff later.
However, incorporating this modification requires changes to the hmp(4) side as well. Specifically, I want to modify the following:

  • Capacity: I want to change the meaning of the value. I want the slowest core's capacity value to be 1024, and the other cores' capacity value to be 1024 divided by their performance ratio to the slowest core. For example, a core twice as fast would have a score of 512.
  • Score: I want to set a range of 0-255. (I understand that the range is not clearly defined, even in the manual. I want to clarify this.) A higher value simply means faster, and unlike capacity value, this should be an absolute value, not a performance ratio.
  • I want the search for the highest/lowest capacity value for each core to be performed by sched_ule(4), and I want to avoid using Scheduler Helper Functions in hmp(4). (Sorry, but I want to remove these functions.) This is because the existing functions in sched_ule(4) already explore the CPU topology and search for load.
  • Due to the above change in the meaning of the capacity value, displaying percentages in hmp(4) becomes difficult. Therefore, I want to remove this function. Sorry, too.

I apologize for suggesting that this would involve removing part of the wonderful patch you have so kindly created. Would you be able to forgive me?

All of the above modifications are synchronized with sched_ule(4), so I have also created a patch for hmp(4) addressing the above points. Therefore, I would like to post it together with this update.

Finally, I am currently dealing with the above post and the advice from intelhfi(4) in response to it, so please understand that it will take me a few weeks to complete everything.

(This may not be the appropriate place to post this, but please forgive me as I don't know of any other suitable location.)

You are always welcome to share hmp-related stuff here.

As part of testing intelhfi(4), I attempted to modify sched_ule(4) to utilize hmp(4), and I was able to achieve a performance improvement of up to +10-15% in some cases. I plan to post the diff later.

Thank you very much! I'm currently in school so I can't spend too much time working on this..... But if you are willing to help, I appreciate a lot:) Please feel free to take some time for posting the diff if you need to do so.

However, incorporating this modification requires changes to the hmp(4) side as well. Specifically, I want to modify the following:

  • Capacity: I want to change the meaning of the value. I want the slowest core's capacity value to be 1024, and the other cores' capacity value to be 1024 divided by their performance ratio to the slowest core. For example, a core twice as fast would have a score of 512.

The concept of capacity is actually from hardware and ACPI people, not from me. At least on device tree, it accurately represents the maximum wordload a processor can do (work done per Hz * maximum frequency). However, if the diff tells that there is some compelling reasons to do so, I'm okay with it.

  • Score: I want to set a range of 0-255. (I understand that the range is not clearly defined, even in the manual. I want to clarify this.) A higher value simply means faster, and unlike capacity value, this should be an absolute value, not a performance ratio.

amdhfi uses u32 for perf/eff scores. I can check the rationale behind this in your ULE patch.

  • I want the search for the highest/lowest capacity value for each core to be performed by sched_ule(4), and I want to avoid using Scheduler Helper Functions in hmp(4). (Sorry, but I want to remove these functions.) This is because the existing functions in sched_ule(4) already explore the CPU topology and search for load.

This was something I was worrying about in the design phase. I think the diff will clarify it.

  • Due to the above change in the meaning of the capacity value, displaying percentages in hmp(4) becomes difficult. Therefore, I want to remove this function. Sorry, too.

If that's what we should do, I'm happy to do so.

I apologize for suggesting that this would involve removing part of the wonderful patch you have so kindly created. Would you be able to forgive me?

Of course! Thank you for dedicating your time on this:)

All of the above modifications are synchronized with sched_ule(4), so I have also created a patch for hmp(4) addressing the above points. Therefore, I would like to post it together with this update.

Please don't overwrite this revision, but instead create a new one with hmp modification and add that as its child revision.

Finally, I am currently dealing with the above post and the advice from intelhfi(4) in response to it, so please understand that it will take me a few weeks to complete everything.

No worries.