Page MenuHomeFreeBSD

devel/llvm19: prune build on 32-bit archs
ClosedPublic

Authored by brooks on Aug 6 2024, 9:51 PM.
Tags
None
Referenced Files
F104356687: D46239.diff
Fri, Dec 6, 4:46 PM
Unknown Object (File)
Tue, Nov 26, 12:21 AM
Unknown Object (File)
Sun, Nov 24, 8:50 PM
Unknown Object (File)
Sat, Nov 23, 10:27 PM
Unknown Object (File)
Fri, Nov 22, 8:10 PM
Unknown Object (File)
Fri, Nov 22, 4:09 AM
Unknown Object (File)
Fri, Nov 22, 4:08 AM
Unknown Object (File)
Fri, Nov 22, 4:08 AM
Subscribers

Details

Summary

Default to BE_NATIVE (no cross build support) on 32-bit plaforms.
BE_AMDGPU and BE_WASM might be useful, but cross building for other
targets seems generally not worth supporting out of the box.

Completely disable MLIR and POLLY on 32-bit. Just building MLIR fails
routinly on armv7 and there aren't a lot of direct users (it's used by
FLANG, but FLANG is 64-bit only). Polly is pretty niche and add ands
quite a bit of build time.

Diff Detail

Repository
R11 FreeBSD ports repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

brooks requested review of this revision.Aug 6 2024, 9:51 PM
brooks created this revision.

FYI: I'm working on getting updated to a context with llvm19 19.1.0.r2 based on the new debug information size tradeoffs, leading me to avoid a default of -g for my optimized but with debug-information build activities. llvm19's package in my historical aarch64-as-aarch64 -g usage context ended up with a 27.9 GiByte flat size and took a little under 2 hr for pre-package build stages and slightly over 4 hr 40 min to package on that Windows DevKit 2023, so slightly over 6 hr 40min total.

As it turned out that I could not use Win11Pro Hyper-V to form my intended test contexts for armv7 builds on the Windows Dev Kit 2023, I'm using a RPi4B instead. config.txt can use total_mem=2048 to limit the amount of RAM used. Initial experiments are based on that and: I'm using 3584 MiBytes of swap space and there are only 4 cores. I'm using USE_TMPFS=no , ALLOW_MAKE_JOBS= , no imposed MAKE_JOBS_NUMBER_LIMIT or the like. I'm using -gline-tables-only just to see if that fits. devel/llvm19 having no other builders active in parallel with its own build. The aarch64 environment (like on builders) allows larger armv7 processes than my native armv7 context does. I'm using the updated Makefile (without any changes by me).

Hmm. I just realized one notable oddity with this experiment that I'd forgotten to put back to normal is use of --threads=1 for ld.lld for 32-bit or less pointer contexts (via checking void*):

diff --git a/contrib/llvm-project/lld/ELF/Driver.cpp b/contrib/llvm-project/lld/ELF/Driver.cpp
index 8b2c32b15348..299daf7dd6fa 100644
--- a/contrib/llvm-project/lld/ELF/Driver.cpp
+++ b/contrib/llvm-project/lld/ELF/Driver.cpp
@@ -1587,6 +1587,9 @@ static void readConfigs(opt::InputArgList &args) {
             arg->getValue() + "'");
     parallel::strategy = hardware_concurrency(threads);
     config->thinLTOJobs = v;
+  } else if (sizeof(void*) <= 4) {
+    log("set maximum concurrency to 1, specify --threads= to change");
+    parallel::strategy = hardware_concurrency(1);
   } else if (parallel::strategy.compute_thread_count() > 16) {
     log("set maximum concurrency to 16, specify --threads= to change");
     parallel::strategy = hardware_concurrency(16);

I'm interested in comparing/contrasting both ways, so I'll let this run.

Testing on the 2 GiByte Cortex-A7 OrangePi+ 2ed takes a lot longer than an RPi4B would (which in turn takes more time than the Windows Dev Kit 2023 would have). So my preliminary experiments are RPi4B focused to guide later testing.

I'm using an oddly patched up top that monitors various "Max(imum)Obs(erved)NAME" figures and reports them.

As a cross check on the intent: The build log with no manual OPTION settings shows the defaults that resulted are:

---Begin OPTIONS List---
===> The following configuration options are available for llvm19-19.1.0.r2:
     BE_AMDGPU=off: AMD GPU backend (required by mesa)
     BE_WASM=off: WebAssembly backend (required by firefox via wasi)
     CLANG=on: Build clang
     DOCS=on: Build and/or install documentation
     EXTRAS=on: Extra clang tools
     LIT=on: Install lit and FileCheck test tools
     LLD=on: Install lld, the LLVM linker
     LLDB=on: Install lldb, the LLVM debugger
     PYCLANG=on: Install python bindings to libclang
     STATIC_LIBS=on: Install static libraries (does not effect sanitizers)
====> Options available for the single BACKENDS: you have to select exactly one of them
     BE_FREEBSD=off: Backends for FreeBSD architectures
     BE_NATIVE=on: Backend(s) for this architecture (ARM)
     BE_STANDARD=off: All non-experimental backends
===> Use 'make config' to modify these settings
---End OPTIONS List---

So BE_AMDGPU and BE_WASM are *not* being built by default for armv7. This matches my reading of the Makefile content as well. But I do not find the summary clear as to the intent for these two.

I got the the 2 GiByte RAM Cortex-A7 OrangePi+ 2ed restored to a status where only devel/llvm19 needs to build. So likely days instead of most or all of a week, presuming it can complete.

Swap space happens to be: 3685 MiBytes

It built on the RPi4B just fine:

[00:01:31] [02] [00:00:00] Building   devel/llvm19@default | llvm19-19.1.0.r2
[12:01:54] [02] [12:00:23] Finished   devel/llvm19@default | llvm19-19.1.0.r2: Success ending TMPFS: 0.00 GiB

(The "ending TMPFS: ?.?? GiB" reporting is something that I add to my local poudriere-devel.)

The Relevant "MaxObsNAME" figures are:

Mem: . . . , 1380Mi MaxObsActive, 681092Ki MaxObsWired, 1846Mi MaxObs(Act+Wir+Lndry)
Swap: 3584Mi Total, . . . , 106368Ki MaxObsUsed, 1507Mi MaxObs(Act+Lndry+SwapUsed), 1938Mi MaxObs(A+Wir+L+SU), 1958Mi (A+W+L+SU+InAct)

Also:

# dmesg -a | grep " memory "
real memory  = 2066735104 (1970 MB)
avail memory = 1990144000 (1897 MB)

So:

ORIGINAL_AVAIL_RAM+SwapSpace   was somewhat over:  1897 MiByte + 3584 MiByte == 5481 MiByte
ORIGINAL_AVAIL_RAM+MaxSwapUsed was slightly under: 1897 MiByte +  104 MiByte == 2001 MiByte
So:                spare space was slightly over:  5481 MiByte - 2001 MiByte == 3480 MiByte
And: 5481 MiByte / 1958Mi MaxObs[A+W+L+SU+InAct] is somewhat over 2.7, which is a fair sized margin for RAM+SWAP.

But that does not report on process size/fragmentation limitations. However, my use of -gline-tables-only that is not stripped means that more process (and system) space would be available in a build that avoids having such information, as is normal.

Reminder: THIS WAS BASED ON HAVING FORCED --threads=1 FOR ld.lld FOR THE 32-BIT (4 BYTE) POINTER SIZE CONTEXT (armv7), however. That avoids having up to more like 4*4 threads active if 4 ld.lld instances ever had overlapping time frames, for example.

The OrangePi+ 2ed test run will likely take 3 to 4 times as long as the RPi4B test run, if it completes nicely. So likely still more than a day before I'll have those results.

For reference (just examples):

graphics/mesa-libs  requires a devel/llvm* to build
graphics/mesa-dri   requires a devel/llvm* to run (via the library dependency)

x11/xorg                requires a devel/llvm* to run (via graphics/mesa-dri)
x11-servers/xorg-server requires a devel/llvm* to run (via graphics/mesa-dri)
x11/xorg-minimal        requires a devel/llvm* to run (via x11-servers/xorg-server)

www/firefox         requires a devel/llvm* to build (via devel/wasi-libcxx)
www/firefox-esr     requires a devel/llvm* to build (via devel/wasi-libcxx)
www/librewolf       requires a devel/llvm* to build (via devel/wasi-libcxx)
www/tor-browser     requires a devel/llvm* to build (via devel/wasi-libcxx)
www/waterfox        requires a devel/llvm* to build (via devel/wasi-libcxx)
www/thunderbird     requires a devel/llvm* to build (via devel/wasi-libcxx)

So any devel/llvm* with the updated Makefile content relative
to 32-bit platforms would not have https://pkg.freebsd.org/
distributing such for the 32-bit platforms once that
devel/llvm* was the default one for the build servers. Personal
builds for the likes of armv7 would be required instead.

I could imagine some feedback once that happens and things
become unavailable. In particular, the xorg items seem likely
to be involved in a lot of contexts.

Having folks aware ahead of time might be of some help, if such
can be arranged.

May be I should enable, say, BE_AMDGPU and BE_WASM and
see how the builds go. (It would still be based on --thread=1 for
ld.lld being forced at this point.) At least that would supply some
RAM+SWAP usage examples for reference in making judgments.

This seems reasonable.

devel/llvm19/Makefile
96

It should always be possible to build for all FreeBSD architectures.

With BE_AMDGPU and BE_WASM enabled I already see that it has gone through a stage (AMDGPU building) that nearly used up the swap space I've set up:

avail memory = 1990144000 (1897 MB)
ORIGINAL_AVAIL_RAM+SwapSpace   was somewhat over:  1897 MiByte + 3584 MiByte == 5481 MiByte

Mem: . . . , 1479Mi MaxObsActive, 441188Ki MaxObsWired, 1891Mi MaxObs(Act+Wir+Lndry)
Swap: 3584Mi Total, . . . , 3209Mi MaxObsUsed, 4597Mi MaxObs(Act+Lndry+SwapUsed), 5025Mi MaxObs(A+Wir+L+SU), 5062Mi (A+W+L+SU+InAct)

5481 MiByte / 5062Mi MaxObs[A+W+L+SU+InAct] is slightly over 1.08, not enough margin to depend on for RAM+SWAP of around 1897 MiByte + 3584 MiByte, especially if future growth in memory requirements is considered.

The log file so far does not show WASM build activity. So avoiding BE_AMDGPU as a default looks like a good idea relative to build-ability relative to memory space requirements (RAM+SWAP). Also, having more builders active in parallel would clearly contribute to such issues.

Note: This is with the automatic --threads=1 use for ld.lld . The "MaxObs" short term load average was shown as 5.06 as of when I looked at this. I've no clue if that was during the AMDGPU build activity.

Depending on the criteria for devel/llvm19@lite (llvm19-lite), BE_AMDGPU might not be a good fit there, given AMDGPU's build's much more extensive memory space usage.

I'll stop it and disable BE_AMDGPU and try again in order to sooner see what BE_WASM leads to.

devel/llvm19/Makefile
96

Setting the default to BE_NATIVE does not prevent instead overriding that to instead build BE_FREEBSD. The prior:

OPTIONS_SINGLE=         BACKENDS
OPTIONS_SINGLE_BACKENDS=BE_FREEBSD BE_NATIVE BE_STANDARD

allows picking one of the 3 alternatives as the default --similarly for overriding the default. BE_FREEBSD is a strict superset of BE_NATIVE. BE_STANDARD is a strict superset of BE_FREEBSD as well. Thus, at most one of the 3 needs to be picked, never 2 at a time. (BE_AMDGPU makes it be in both BE_NATIVE and BE_FREEBSD. BE_WASM makes it be in both BE_NATIVE and BE_FREEBSD. BE_AMDGPU and BE_WASM are always in BE_STANDARD.)

I've not tested memory space use and time requirements for BE_FREEBSD without BE_AMDGPU involved in a very long time (if ever). I do not know if it is practical on armv7 generally or not. (I've added notes to this D46239 recently about BE_AMDGPU memory space use for 4 cores with ALLOW_MAKE_JOBS. One has to plan ahead for having sufficient swap space to make RAM+SWAP sufficient.)

There does not seem to be a resource-usage-based reason to avoid BE_WASM as a default:

Unlike BE_AMDGPU (so: AMDGPU) build activity, the BE_WASM (so: WebAssembly) build activity does not seem to have contributed much to the RAM+SWAP use compared to when both were disabled. Unless there is another stage of activity that shows up later that does something significant related to WebAssembly, there does not seem to be a resource-usage-based reason to avoid BE_WASM as a default. (Much of the small difference may actually be from my looking at the log while the build is still going --not from the build itself.)

The build is still going and I'll check on if things still look that way after the build has finished.

With BE_WASM (so: WebAssembly) built just fine, taking 12:09:45 instead of the 12:01:54 on the RPi4B that was used without both BE_AMDGPU and BE_WASM. Slightly less RAM+SWAP usage showing (so: within the range of variation).

[00:01:21] [02] [00:00:00] Building   devel/llvm19@default | llvm19-19.1.0.r2
[12:09:45] [02] [12:08:24] Finished   devel/llvm19@default | llvm19-19.1.0.r2: Success ending TMPFS: 0.00 GiB

Mem: . . . , 1395Mi MaxObsActive, 679696Ki MaxObsWired, 1841Mi MaxObs(Act+Wir+Lndry)
Swap: 3584Mi Total, . . . , 85080Ki MaxObsUsed, 1494Mi MaxObs(Act+Lndry+SwapUsed), 1918Mi MaxObs(A+Wir+L+SU), 1924Mi (A+W+L+SU+InAct)

Next I'll see what using BE_FREEBSD instead of BE_NATIVE leads to.

My general position is that 32-bit systems are not reasonable places for software development so there's little value in cross compilation support and we might as well save the cycles in the default case. The question with BE_AMDGPU and BE_WASM is: are they dependencies for things that make sense to run on a given 32-bit platform?

With BE_FREEBSD and BE_WASM but not BE_AMDCPU built just fine but took somewhat longer on the RPi4B (no surprise): 13:11:04 vs. NE_NATIVE with BE_WASM's 12:09:45. Its AVAIL_RAM+SWAP use [3070 MiBytes (A+W+L+SU+InAct)] was between the BE_NATIVE+BE_WASM without BE_AMDGPU [1958 MiBytes] and with BE_AMDGPU [5062 Mi Byte+, stopped early due to size vs. configuration]:

[00:01:31] [02] [00:00:00] Building   devel/llvm19@default | llvm19-19.1.0.r2
[13:11:04] [02] [13:09:33] Finished   devel/llvm19@default | llvm19-19.1.0.r2: Success ending TMPFS: 0.00 GiB

Mem: . . . , 1464Mi MaxObsActive, 677852Ki MaxObsWired, 1890Mi MaxObs(Act+Wir+Lndry)
Swap: 3584Mi Total, . . . , 1203Mi MaxObsUsed, 2643Mi MaxObs(Act+Lndry+SwapUsed), 3065Mi MaxObs(A+Wir+L+SU), 3070Mi (A+W+L+SU+InAct)

It indicates needing over 1 GiByte of swap space for the 4-core ALLOW_MAKE_JOBS based build (with only the one builder active) that was for having room for -gline-tables-only information [indicating some room for growth when not having such information].

Again, the tests have --threads=1 forced for ld.lld for 32-bit platforms, here armv7.

So, overall: BE_AMDGPU is the big RAM+SWAP usage option. BE_FREEBSD is next but has noticeably smaller such usage. BE_WASM makes little difference. [I'm not going to test BE_STANDARD.] It might well be that BE_FREEBSD+BE_AMDGPU+BE_WASM would be similar to BE_NATIVE+BE_AMDGPU+BE_WASM for RAM+SWAP needs (but not for time taken).

The native armv7, 2 GiByte, Cortex-A7, OrangePi+ 2ed test with BE_NATIVE but without BE_AMDGPU and without BE_WASM build of devel/llvm19 finally finished successfully, over 46 hrs after it started, so somewhat under 4 times as long as on the RPi4B. The OPi+2ed has somewhat more "avail memory" than resulted from the RPi4B total_mem assignment that I'd used.

# dmesg -a | grep " memory "
real memory  = 2113130496 (2015 MB)
avail memory = 2053459968 (1958 MB)

[00:05:23] [02] [00:00:00] Building   devel/llvm19@default | llvm19-19.1.0.r2
[1D:22:46:22] [02] [1D:22:40:59] Finished   devel/llvm19@default | llvm19-19.1.0.r2: Success ending TMPFS: 0.00 GiB

Mem: . . . , 1484Mi MaxObsActive, 264892Ki MaxObsWired, 1773Mi MaxObs(Act+Wir+Lndry)
Swap: 3685Mi Total, . . . , 11284Ki MaxObsUsed, 1560Mi MaxObs(Act+Lndry+SwapUsed), 1778Mi MaxObs(A+Wir+L+SU), 1924Mi (A+W+L+SU+InAct)

It indicates using somewhat under just 12 MiByte of swap space for the 4-core ALLOW_MAKE_JOBS based build (with only the one builder active) that was for having room for -gline-tables-only information [indicating some room for growth when not having such information].

Again: the tests have --threads=1 forced for ld.lld for 32-bit platforms, here armv7. USE_TMPFS=no too.

So the proposed defaults fit this 2 GiByte RAM context, although some SWAP space needs to be part of the configuration for reliability/growth. (I do not plan on testing a 1 GiByte RPi*, neither native armv7 nor aarch64-as-armv7.)

If it turns out that default threading in ld.lld leads to OOM or process allocation failure problems, then there is a viable, simple to implement alternative that leads to --thread=1 implicitly for 32-bit ld.lld that get the kind of results my testing shows.

I'd be happy to add a --threads=1 patch since that seems to be useful. I'm not sure if we should do it for all 32-bit platforms or make it an option that defaults to enabled on 32-bit platforms.

I'd be happy to add a --threads=1 patch since that seems to be useful. I'm not sure if we should do it for all 32-bit platforms or make it an option that defaults to enabled on 32-bit platforms.

I'm fine with alternatives to how I enabled testing with --threads=1 based ld.lld . I just did enough to simply allow the tests. An option with varying defaults based on the platform being targeted seems reasonable to me.

This revision was not accepted when it landed; it landed in state Needs Review.Aug 27 2024, 7:23 PM
This revision was automatically updated to reflect the committed changes.