This is a bit of experiment that I wanted to carry out for a long time and now finally got a chance to do. It's basically about enabling LTO on all major bits in stand and see if it helps to bring the bloat down. And it turned out surprisingly easy to pull off and very fruitful. Below is the summary table of the results.
{F81675464, size=full}
The change consists of four major parts:
1. Enabling LTO on all bits and pieces.
2. Using symbol versioning tables to demote all but one symbol to locals.
3. Fixing (more like a workarounding) a fallout of symbol being present multiple times in ZFS and EFI code. Has a lot to do with the hackish way ZFS and EFI are composed by including ".c" file into another ".c".
There was also a smallish fix to pre-existing issue which is that EFI code needs to use -fshort-wchar (by the EFI book), and has the appropriate flag set. At the same time it links to the libsa, which has default wchar size of 32 bits on aarch64 and amd64. That makes lld unhappy. Such issue was easily missed before since such information won't survive the lowering to native code. I don't think anything else uses wchars in the code, so I opted to just enable it for everything.
The patch is against releng/13.2 (this is what I had on my machine), mainly for discussion at this point. If the input is positive, I'll bring it up to the latest master.
As it can be seen, the gain is quite substantial esp for the EFI parts. Not so much because of LTO but simply because there seems to be quite a lot of garbage being linked in right now. The only loss is 40 bytes off boot2, but we are still within a safe margin.