Page MenuHomeFreeBSD

Use LTO/symbol versioning to optimize (de-clutter) boot bits and pieces
Needs ReviewPublic

Authored by sobomax on Apr 19 2024, 7:05 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jan 2, 4:32 AM
Unknown Object (File)
Nov 29 2024, 4:25 PM
Unknown Object (File)
Nov 23 2024, 9:45 AM
Unknown Object (File)
Nov 20 2024, 10:12 AM
Unknown Object (File)
Nov 17 2024, 3:03 PM
Unknown Object (File)
Nov 13 2024, 5:39 AM
Unknown Object (File)
Nov 10 2024, 10:49 PM
Unknown Object (File)
Nov 10 2024, 9:42 PM

Details

Reviewers
imp
sjg
tsoome
manu
Group Reviewers
bhyve
Summary

This is a bit of experiment that I wanted to carry out for a long time and now finally got a chance to do. It's basically about enabling LTO on all major bits in stand and see if it helps to bring the bloat down. And it turned out surprisingly easy to pull off and very fruitful. Below is the summary table of the results.

ScreenShot1274.png (522×753 px, 65 KB)

The change consists of the following:

  1. Enabling LTO on all bits and pieces (i.e. build intermediary static libs as IR code).
  1. Using symbol versioning tables to demote all but one symbol to locals.
  1. Fixing (more like a workarounding) a fallout of symbol being present multiple times in ZFS code. Has a lot to do with the hackish way ZFS is composed by including ".c" file into another ".c".
  1. Fixing unrelated bug in efi/boot1/Makefile that would pull file already provide by the libsa that is being linked in.
  1. Fix to pre-existing issue which is that EFI code needs to use -fshort-wchar (by the EFI book), and has the appropriate flag set. At the same time it links to the libsa, which has default wchar size of 32 bits on aarch64 and amd64. That makes lld unhappy. Such issue was easily missed before since such information won't survive the lowering to native code. I don't think anything else uses wchars in the code, so I opted to just enable it for everything.

As it can be seen, the gain is quite substantial esp for the EFI parts. Not so much because of LTO but simply because there seems to be quite a lot of garbage being linked in right now. The only significant loss is 40 bytes off boot2, but we are still within a safe margin. If you are look at the table holistically, the new size "hierarchy" makes much more sense, since EFI component is not more than 100K in size, so there is no real reason for the EFI versions of the loader being 2x in size.

Test Plan
  1. Some cursory tests were performed to confirm that userboot.so is functional. It is found to be capable to load kernel from the UFS partition.
  2. EFI loader has been populated and loaded. Confirmed to be functional and being able to do basic functionality enumerating disks, loading and booting kernels and modules from USF and ZFS volumes.

More tests are needed for the MBR/bios boot pieces.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

sobomax edited the summary of this revision. (Show Details)

impressive gains if true... However, the 'no-whole-archive' stuff makes me ask the question "doesn't that eliminate all the commands that we build on linker sets?"

Also wonder how making things weak actually does anything useful... We don't include these in multiple places, and the efi ones certainly aren't included twice...

I'll have to look at this in more detail, but making sure all the commands and other things we build on linker sets are still around would be good to test in the mean time.

stand/efi/libefi/devpath.c
44 ↗(On Diff #137404)

Why? These should only be in one file.

In D44872#1023204, @imp wrote:

impressive gains if true... However, the 'no-whole-archive' stuff makes me ask the question "doesn't that eliminate all the commands that we build on linker sets?"

Also wonder how making things weak actually does anything useful... We don't include these in multiple places, and the efi ones certainly aren't included twice...

I'll have to look at this in more detail, but making sure all the commands and other things we build on linker sets are still around would be good to test in the mean time.

Thanks @imp for the review. I will look into those linker sets, those indeed might potentially be optimized out. However, I still don't see how that would make it up for the 2x code side increase. Also if you look at the table before and after more holistically, the new version is making more sense kind-off. There is no reason for the "loader_lua.efi" to be 2x size of "loader[_lua]", i.e. 450k more. Based on the gptboot vs gptboot.efi, the "EFI" component is at tops 100kb.

  • devpath.c needs not to be pulled in explicitly any longer.
stand/efi/libefi/devpath.c
44 ↗(On Diff #137404)

Why? These should only be in one file.

It's this:

ys -Ddouble=jagged-little-pill -Dfloat=floaty-mcfloatface -ffunction-sections -fdata-sections -DLOADER_GELI_SUPPORT -I/usr/home/sobomax/projects/freebsd13/stand/libsa/geli -DLOADER_DISK_SUPPORT -ffreestanding -mno-mmx -mno-sse -mno-avx -mno-avx2 -msoft-float -fPIC -mno-red-zone -mno-relax -I. -Iinclude -DEFI_BOOT1 -DEFI_ZFS_BOOT -I/usr/home/sobomax/projects/freebsd13/stand/efi/include -I/usr/home/sobomax/projects/freebsd13/stand/efi/include/amd64 -I/usr/home/sobomax/projects/freebsd13/sys/contrib/dev/acpica/include -DEFI_UFS_BOOT -I/usr/home/sobomax/projects/freebsd13/stand/common -fPIC -pipe -std=gnu99 -Wno-format-zero-length -Wsystem-headers -Werror -Wno-pointer-sign -Wno-empty-body -Wno-string-plus-int -Wno-unused-const-variable -Wno-error=unused-but-set-variable -Wno-tautological-compare -Wno-unused-value -Wno-parentheses-equality -Wno-unused-function -Wno-enum-conversion -Wno-unused-local-typedef -Wno-address-of-packed-member -Wno-switch -Wno-switch-enum -Wno-knr-promoted-parameter -Wno-parentheses -Oz -Qunused-arguments  -nostdlib -Wl,-T/usr/home/sobomax/projects/freebsd13/stand/efi/loader/arch/amd64/ldscript.amd64,-Bsymbolic,-znotext -pie -Wl,--version-script=/usr/home/sobomax/projects/freebsd13/stand/efi/boot1/../loader/Symbol.map -Wl,-znocombreloc   -o boot1.sym boot1.o proto.o self_reloc.o start.o ufs_module.o devpath.o zfs_module.o  -Wl,--whole-archive /usr/obj/usr/home/sobomax/projects/freebsd13/amd64.amd64/stand/efi/libefi/libefi.a /usr/obj/usr/home/sobomax/projects/freebsd13/amd64.amd64/stand/libsa/libsa.a -Wl,--no-whole-archive
ld: error: duplicate symbol: efi_lookup_image_devpath
>>> defined in devpath.o
>>> defined in /usr/obj/usr/home/sobomax/projects/freebsd13/amd64.amd64/stand/efi/libefi/libefi.a(devpath.o)

ld: error: duplicate symbol: efi_lookup_devpath
>>> defined in devpath.o
>>> defined in /usr/obj/usr/home/sobomax/projects/freebsd13/amd64.amd64/stand/efi/libefi/libefi.a(devpath.o)

ld: error: duplicate symbol: efi_close_devpath
>>> defined in devpath.o
>>> defined in /usr/obj/usr/home/sobomax/projects/freebsd13/amd64.amd64/stand/efi/libefi/libefi.a(devpath.o)

ld: error: duplicate symbol: efi_devpath_name
>>> defined in devpath.o
>>> defined in /usr/obj/usr/home/sobomax/projects/freebsd13/amd64.amd64/stand/efi/libefi/libefi.a(devpath.o)

ld: error: duplicate symbol: efi_free_devpath_name
>>> defined in devpath.o
>>> defined in /usr/obj/usr/home/sobomax/projects/freebsd13/amd64.amd64/stand/efi/libefi/libefi.a(devpath.o)

ld: error: duplicate symbol: efi_name_to_devpath
>>> defined in devpath.o
>>> defined in /usr/obj/usr/home/sobomax/projects/freebsd13/amd64.amd64/stand/efi/libefi/libefi.a(devpath.o)

ld: error: duplicate symbol: efi_name_to_devpath16
>>> defined in devpath.o
>>> defined in /usr/obj/usr/home/sobomax/projects/freebsd13/amd64.amd64/stand/efi/libefi/libefi.a(devpath.o)

ld: error: duplicate symbol: efi_devpath_free
>>> defined in devpath.o
>>> defined in /usr/obj/usr/home/sobomax/projects/freebsd13/amd64.amd64/stand/efi/libefi/libefi.a(devpath.o)

ld: error: duplicate symbol: efi_devpath_last_node
>>> defined in devpath.o
>>> defined in /usr/obj/usr/home/sobomax/projects/freebsd13/amd64.amd64/stand/efi/libefi/libefi.a(devpath.o)

ld: error: duplicate symbol: efi_devpath_trim
>>> defined in devpath.o
>>> defined in /usr/obj/usr/home/sobomax/projects/freebsd13/amd64.amd64/stand/efi/libefi/libefi.a(devpath.o)

This is an aftermath of the following:

commit 4cf36aa1017f9e1ca75544f6e5393a80dd9c0bea
Author: Warner Losh <imp@FreeBSD.org>
Date:   Mon May 6 18:38:46 2019 +0000

    Reach over and pull in devpath.c from libefi

    This allows us to remove three nearly identical functions because the
    differences don't matter, and the size difference is trivial.

@imp I think at one point you would be pulling just the devpath.c from the libsa, but then you gave up and started to pull the whole circus. But it continued to pull that one still. Fixed in the new rev.

sobomax retitled this revision from WiP: Use LTO/symbol versioning to optimize (de-clutter) boot bits and pieces to Use LTO/symbol versioning to optimize (de-clutter) boot bits and pieces.
sobomax edited the summary of this revision. (Show Details)

@imp I think at one point you would be pulling just the devpath.c from the libsa, but then you gave up and started to pull the whole circus. But it continued to pull that one still. Fixed in the new rev.

That may well be. IT's a good catch if so.

This is very interesting work... Most of my head-scratching, though, comes from things that don't make snese. And if we're doing both the devpath and libsa, that's an 'answer' to why: that's clearly bogus and we shouldn't do it.

I'll try to grab this version and see if I can desconstruct what I still see as several different sub-fixes / themes in this so I can understand what's causing them and how we can avoid that in the future...

stand/efi/Makefile.inc
8

A big part of this is to try to support UTF-16 that UEFI uses... I think we only do that in a few places, so we should understand why we do that (likely a misunderstaning on my part from years ago). I see it on 3 strings only in efi_main and boot1.c.

stand/efi/boot1/Makefile
32

This is completely independent of all this.
I'll commit it separately to start to winnow down the issues.

61

This should also be removed.

boot1.efi isn't worth optimizing, especially if we have to add all the attribute((weak))
For me, it only goes from 157184 down to 132608 (with the changes I described here). I forgot to check without the adjustments with these changes unvarnished... It's still 15% savings...
And for boot1.efi, we don't want the full libsa... there's several subparts to libsa that kinda live together in this library: low level, libc stuff, filesystem support. We don't want all of that in boot1.efi, so adding whole-archive is counter productive and unnecessary. We don't use the filesystem support at all in boot1.efi.

stand/efi/boot1/Makefile
91

I dropped both of these whole-archives... and I was able to delete all the attribute __weak stuff that you added for zfs which conflicted.

stand/libsa/zfs/zfsimpl.c
163

All the attribute((weak))s from this file

sys/cddl/boot/zfs/blkptr.c
46

and this one too.

@imp thank for looking into this! And cherry-picking stuff to reduce this are totally cool with me. :) I forgot to mention that the "whole-archive" was an empirical workaround for some weird linking "symbol not found issues" with LTO enabled that I could not overcome or explain otherwise. I am not 100% sure if this some deficiency of LTO in lld's version that we have in 13.1 that I was experimenting with and if it fixed in -current. I was planning to setup full fledged -current build environment to see if it's really needed, but it got boggled down by some 3-weeks travelling spree going at the moment (still ongoing).

The "whole-archive" adverse effects are offset by the symbol versioning, so anything not referenced is simply dropped by the linker, which I actually verified. I should be back to my office in about 2 weeks and plan to picking this up where we left.