Page MenuHomeFreeBSD

[RFC] add cpuset_getmyaffinity_count
AbandonedPublic

Authored by mjg on Feb 28 2023, 10:02 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Apr 19, 12:58 AM
Unknown Object (File)
Thu, Apr 18, 10:10 PM
Unknown Object (File)
Apr 12 2024, 9:18 AM
Unknown Object (File)
Feb 23 2024, 1:43 PM
Unknown Object (File)
Feb 23 2024, 1:01 PM
Unknown Object (File)
Jan 1 2024, 5:06 AM
Unknown Object (File)
Oct 7 2023, 2:14 AM
Unknown Object (File)
Sep 19 2023, 6:18 AM
Subscribers

Details

Reviewers
markj
Summary

The name is pretty bad and was just slapped together for illustrative purposes.

The problem: it became an idiom to query the affinity, compute cpu count and blindly spawn a thread pool of that size, regardless if it is going to have any work. This includes programs like cmake, go, the linker from llvm and others.

Secondary minor problem: there is no *handy* way to get the count.

Consider a case of package building on a multi-socket box, like flix1 in zoo. With 2 sockets * 26 cores * 2 threads that's 104 threads in total. It is beneficial to exploit memory affinity by preparing all build jails in respective domains and making sure package builds spawned in them don't leave it. This means giving them a cpuset of 52 threads. As compilation for most packages keeps changing in its ability to use multiple cpus over the course of the build it makes sense to spawn one build for each cpu thread with -j 10 or so. This gives the scheduler freedom to put threads where it sees fit within the socket, all while making sure there is no downtime thanks to enough work being there.

However, as numerous programs including the linker query the cpuset to find the thread count to spawn, the entire thing suffers greatly. There is an argument to the linker to make it only spawn certain amount, but it does not cover other programs.

Here is a list of binaries which spawned 52 or more threads during complete package build:

7zz R asm b b-boot basisu blender build-script-build cargo cgo cmake compile debugfs doxygen dubhash dumpe2fs e2fsck execgen file-png gdk-pixbuf-pixdata gegl gh gimp-2.10 git go go_bootstrap gofmt goimports goyacc inkscape jar jarsigner java javac javadoc javah javap jdeps keytool ld.lld ldc2 link lld mdbook-linkcheck mlir-linalg-ods-gen mlir-linalg-ods-yam mono-sgen node octave-cli-7.3.0 perl povray37 prereqs protoc-gen-gogoroac python python3.9 rawtherapee-cli resize2fs rmic rospo rsvg-convert ruby30 synfig tar2sqfs tune2fs wire zig zig2

Most of them probably don't even have a specific switch.

Thus cpuset_getmyaffinity_count which gets the count, while the result can be overwritten by an env variable. Consider this an -j equivalent for threading.

Test Plan

the change is not even compile tested at the moment

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

mjg requested review of this revision.Feb 28 2023, 10:02 PM
mjg edited the test plan for this revision. (Show Details)

So you plan to convert all of those programs to use this new interface? Why can't you set the cpuset for each builder jail instead?

it makes sense to spawn one build for each cpu thread with -j 10 or so

This requires a lot of RAM per core.

So you plan to convert all of those programs to use this new interface? Why can't you set the cpuset for each builder jail instead?

I did set a cpuset for each builder, I explained why this is not sufficient vs the threading situation. Something akin to make -j switch is needed.

I do plan to patch most commonly seen perpetrators, which owuld in particular include lld, cmake, go and msgmerge (that's from gettext).

it makes sense to spawn one build for each cpu thread with -j 10 or so

This requires a lot of RAM per core.

not if they only spawn one compiler or less, which they often do. and even then, i got ram.

In D38829#883964, @mjg wrote:

So you plan to convert all of those programs to use this new interface? Why can't you set the cpuset for each builder jail instead?

I did set a cpuset for each builder, I explained why this is not sufficient vs the threading situation. Something akin to make -j switch is needed.

You can use a cpuset that's smaller than the entire domain's cpuset.

I do plan to patch most commonly seen perpetrators, which owuld in particular include lld, cmake, go and msgmerge (that's from gettext).

At least lld, cmake and go can already be handled with flags and environment variables. I don't really understand why you'd want to patch all the projects and freebsd, and wait until all those patches are released, to gain a marginal improvement.

it makes sense to spawn one build for each cpu thread with -j 10 or so

This requires a lot of RAM per core.

not if they only spawn one compiler or less, which they often do. and even then, i got ram.