hmp(4): Interface changes for sched_ule(4) integration
Needs ReviewPublic
Actions

Authored by koinec_yahoo.co.jp on Mon, Jun 29, 11:51 AM.

Details

Reviewers

Group Reviewers

Summary

To enable sched_ule(4) to utilize hmp(4), the following interface changes will be made.
Specifically, the following adjustments will be implemented:

Currently, the capacity value is a simple performance metric ranging from 0 to 1024, normalized such that the fastest CPU core is assigned a value of 1024. However, for the integration with sched_ule(4), I intend to redefine this meaning: the lowest-performance CPU core will be set to 1024, and other CPU cores will be assigned values calculated by dividing 1024 by their performance ratio relative to the lowest-performance core. This approach reduces overhead. Accordingly, an accessor function will be added to retrieve the capacity value from sched_ule(4).
In sched_ule(4), the cpu_search_lowest() and cpu_search_highest() functions traverse the CPU topology to assign tasks or select source CPU cores for load balancing. Therefore, the scheduler helper functions within hmp(4) that search for the highest-performance CPU core - along with their associated variables - will be removed.
To ensure the score value is always utilized, a default provider for the score value will be implemented. Additionally, the score range will be defined as 0>–1024.

Please refer to the report attached to the sched_ule(4) proposal (D57941) for the rationale behind these changes.
Note that applying this patch requires the application of the hmp(4) patches (D56546, D56547, D56548).

The necessity of updating the hmp(4) manual (D56548) is currently under consideration.

Test Plan

I have verified performance on a local dual-CPU system with the following applied: the sched_ule(4) proposal (D57941), intelhfi(4) (D44454, D44456, D44457, D44458, D44459), and the proposal to enable intelhfi(4) support for the sched_ule(4) changes (D57939). The results show a performance improvement of approximately 12–27% for processes running with a thread count equal to or less than the number of physical P-cores.
We have also confirmed that there is virtually no performance degradation for processes with thread counts exceeding that limit.

Please refer to the report attached to the sched_ule(4) proposal (D57941) for details regarding the measurement environment, conditions, data, and evaluation.