Page MenuHomeFreeBSD

sched_ule(4): Improve of CPU core selection logic using capacity and score in hmp(4) (support for heterogeneous multi-core architectures)
Needs ReviewPublic

Authored by koinec_yahoo.co.jp on Mon, Jun 29, 12:15 PM.

Details

Summary

I modify sched_ule(4) to select CPU cores based on per-core performance capacity - provided by the heterogeneous multi-core support module hmp(4) (developed under D56546 and D56547) - and the recommended usage score (determined by hardware or manual configuration).
I also address an issue where the system assumed that all CPU groups contained the same number of CPU cores.
The improvements are as follows:

  • cpu_search_highest(): Improve the random number distribution to stay within the CPU core capacity range.
  • cpu_search_highest(): Improve to match the number of CPU cores by a common multiple when comparing load per CPU core group.
  • cpu_search_highest(): Add processing that takes into account the CPU core capacity obtained from hmp(4).
  • cpu_search_lowest(): Modify to use the hmp(4) score for the dispersion range of the load score calculated using random numbers.
  • cpu_search_lowest(): Improve the system to use the capacity value to determine the penalty value for SMT groups.
  • cpu_search_lowest(): Improve the system to consider the hmp(4) score value when selecting a CPU core.
  • cpu_search_lowest(): Improve to match the number of CPU cores by a common multiple when comparing load per CPU core group.
  • cpu_search_lowest(): Improve how the number of tasks within a CPU group is communicated within the cpu_search_lowest() function.
  • cpu_search_lowest(): Improve the way the presence or absence of a preferred CPU is communicated within the cpu_search_lowest() function.
  • cpu_search_lowest(): Add processing that takes into account the CPU core capacity obtained from hmp(4).

This enables more efficient CPU core selection in heterogeneous multi-core environments.

Details regarding the specific modifications and the background behind them are provided in the attached report.

Please note that applying this patch requires the simultaneous application of hmp(4) (D56546, D56547, D56548) and an additional hmp(4) modification patch (D57939) designed to support this patch.
Furthermore, to make the feature operational, the following are required: for Intel hybrid architecture CPUs, intelhfi(4) (D44454, D44456, D44457, D44458, D44459) and an additional modification patch (D57940); for other architectures, the introduction of a driver for manually setting capacity/score (scheduled for future submission) and manual score adjustment.

Test Plan

Verification testing on two CPU configurations confirmed a performance improvement of approximately 12–27% for processes running with a number of simultaneous threads equal to or less than the number of physical P-cores.
I also confirmed that there is virtually no performance degradation for processes running with a higher number of simultaneous threads.

Please refer to the report mentioned above for details regarding the specific testing environment, conditions, measurements, and evaluation.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped