On powerpc64, powerpc64le and riscv64 some software wrongly assumes that
it runs on powerpc or riscv (32-bit).
Details
- Reviewers
- imp - alfredo 
- Group Reviewers
- PowerPC 
- Commits
- rG83bf6ab56829: uname: switch machine to HW_MACHINE_ARCH
make universe
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
- Lint Not Applicable 
- Unit
- Tests Not Applicable 
Event Timeline
If you're going by the output of 'uname -a' or 'uname -m', on FreeBSD you can see the MACHINE_ARCH with uname -p instead.
I'm sure this would break things...
What software? We've had this "issue" for ~2 decades since powerpc grew 64 bit support...
Why now?
One would be Perl's uname(): https://github.com/Perl/perl5/blob/blead/ext/POSIX/POSIX.xs#L3349. This causes e.g. OpenSSL's build system to be explicitly told what architecture to build (see security/openssl port).
Another would be Firefox's generation of default UA: https://searchfox.org/mozilla-central/source/netwerk/protocol/http/nsHttpHandler.cpp#931
I'm pretty sure there are others, it's just that powerpc and riscv are not that widely used. It also benefits armv*, since with this change either armv6 or armv7 should be printed, instead of plain arm
This doesn't change anything for amd64 and i386 and very little for aarch64 (aarch64 will be printed instead of arm64), so I'm not sure what breakage you think about
Please advise on how else solve this issue. Another examples of it happening are hacks in https://cgit.freebsd.org/ports/tree/www/py-adblock/Makefile#n121 or https://cgit.freebsd.org/ports/tree/devel/py-orjson/Makefile#n75
MACHINE_ARCH does this already. Why not use that? We use it quite extensively in the base system to catch issues like this, or all the mips64 stuff or whatever. It's always defined for the host system in make.
OK, so just to be sure, you're asking to fix multiple applications that currently use the standard way of checking architecture to use FreeBSD's non-standard way instead of correcting FreeBSD to use the standard?
First off, there's no standard here. That's the problem. POSIX defines this interface, but is vague with 'hardware type' as the only description. BSD traditionally has gone with a kernel type interpretation and had a separate '-p' to uname (which wasn't in struct utsname) to get the specific ABI that is running. Linux has adopted that, but most other Unixes have not. To be clear: I'm asking you to retain the BSD traditional behavior in the absence of more complete data on its impact. People have made similar sorts of changes in the past w/o knowing the full implications only to put the project in a difficult to back out from situation.
If you mean 'change to the linux way of reporting this' then please say so and that's a different discussion. That's what this change does: it departs from BSD's and FreeBSD's historic behavior and moves to Linux's behavior. I worry about all the places that assume historic this change may break, but short of at least an exp run, we won't know those. The best way to decide which way to move forward is with more data, and we're currently missing half of the data. However, we'd need it done on something like aarch64/arm64 where there's a difference already and much better ports support than powerpc.
Given its long history, perhaps this should be discussed in arch@. That way people will know why, the tradeoffs can be made and if there's some large current user that relies on the current behavior, then we'd discover it. And people wouldn't be surprised when it happens. And you could explain up front that uname -p and uname -m are not changing. I know 100% there are lots of scripts that depend on those not changing, and stating so up front will keep people from raising a red-herring.
Also, there's at least one bug in this implementation that I've noted which should be fixed if you want to go this way.
| lib/libc/gen/__xuname.c | ||
|---|---|---|
| 130 | At the very least, this also needs to change to UNAME_p since that's what's being returned here. | |
I started a similar discussion some time ago on IRC and the only conclusion I got was there's no standard. I looked at https://en.wikipedia.org/wiki/Uname to try to find some pattern, but found none as well and I agreed to keep as is to avoid breaking things. However I understand that keeping as is today adds an unnecessary (and avoidable) extra burden to the already small number of ports maintainers that do an excellent job keeping powerpc* ports in shape. Since no regular package delivery for powerpc* existed and we are about to have them (with the new hardware that was acquired), it's the the perfect moment to discuss it.
I'm in favor of a change, but I'm not sure how it should be. FreeBSD's natural choice for 64-bit is "powerpc64 and powerpc64le", however GNU coreutils use "ppc64 and ppc64le" so I believe the majority of the build tools and scripts are expecting this instead and the changed wouldn't be as effective as we wish, unless they are already expecting the full "powerpc64*" string.
I agree we need to involve more brains. No matter the decision it must be set in stone :)
IMO we can use powerpc64 and powerpc64le. What I'm trying to change is not to use powerpc if the system is actually on powerpc64, powerpc64le or even powerpcspe. This already causes some issues and I'm 100% sure there are still some issues not known to me.
Is there any way we can push this issue further? Or is the comment about UNAME_p the only thing that blocks it from being committed?
Ping.
Another example:
https://github.com/numpy/numpy/blob/main/numpy/distutils/ccompiler_opt.py#L959
Since python on all powerpc* just says "powerpc", numpy is downgraded to undefined hell, which probably implicates worse performance.
edited for clarity and to remove some undeserved snark
Yes. Not doing that is a deal killer. The patch hasn't been updated to fix that, so I've not commented further. Please make that small change and add a comment.
I'd prefer if there was an exp run possible for aarch64, but that part of the project hasn't yet caught up with aarch64 being tier 1, and I've not had time to get my aarch64 box updated enough to do a package run, I'll not block it... But I suspect this will break things in arm land.
I think the additional examples have exactly the same analysis as before: The programs are still making decisions based on a non-standardized item in ways that are highly linux specific... and since our encodings differ from Linux for arm and aarch64, I think it opens up a can of worms for the not-powerpc side of the which is a much larger user base. So if it does, we may need revert this and the powerpc world will need to upstream patches to cope like the arm and aarch64 worlds have done.
It's possible that even powerpc* will ultimately regress in overall, until some ports fixed. For example, I haven't yet tried forcing numpy to build as ppc64 and ppc64le instead of undefined, it's possible that it won't build in such configurations. It may also be the case with other ports, but this can all be fixed.
And well, I'm not asking for it to be merged to stable, and users of current should expect breakages when they use current anyway.
I will update the patch.
Does this change the status of the man page content for uname? :
The options are as follows:
-a      Behave as though the options -m, -n, -r, -s, and -v were
        specified.[Editied to correct inaccurate recall of old details.]
FYI: Various Linux based OS's do not have a uniform definition for uname -p on the same hardware (and addressing width OS when that can vary on the hardware) but seem to only standardize uname -m . So I do not expect that this really standardizes on what Linux does from that point of view: FreeBSD has a specific definition of uname -p output for the type of context.
Old mid 2017 details, extracted from an old list exchange:
I looked at:
A) gnu's coreutils and its uname.c
B) Ubunutu's coreutils and its uname.c [an (A) variant but not via a .patch file]
C) Gentoo's coreutils and its uname.c [a patch applied to (A)]
All 3 have different handling of -p and -i in uname (even for the same kernel) via having different source code that does different things to generate the output.
While all 3 do the same thing for -m it appears that various Linux distributions tend to tailor what -p and -i do, making the output vary.
It does not look like depending on uname -p or uname -i output across different Linux
distributions would a good idea (not very portable).
I'll note that gentoo had the following example:
# uname -p ARMv7 Processor rev 5 (v7l)
First, uname -m and uname -p aren't changing.
Second, I already made the point that there's no standard names... It was one of my big objections.
Third, for powerpc, this brings things in closer alignment with what Linux does, but for 32-bit arm, it's a crap shoot as you note there and may cause regressions. For x86 and aarch64 it should change nothing, though there we're also different than Linux too. riscv I don't know, but it's kinda super new.
uname(1) is unaffected. This only affects uname(3).
Okay. So:
/usr/local/share/poudriere/jail.sh:                 -e 's#:\(setenv.*\),UNAME_m=.*,UNAME_p=[^:,]*\([,:]\)#:\1\2#' \
/usr/local/share/poudriere/common.sh:           login_env="${login_env},UNAME_m=${arch%.*},UNAME_p=${arch#*.}"
/usr/local/share/poudriere/common.sh:   export "UNAME_m=${arch%.*}"
/usr/local/share/poudriere/common.sh:   unset UNAME_mvs. [from UNAME(3)]
UNAME_m  If the environment variable UNAME_m is set, it will override the
         machine member.Looks like the poudriere UNAME_m use in common.sh is in export_cross_env(), unset_cross_env(), and is associated with the need_cross_build() use in update_version_env() via the login_env line.
The presumed tie to uname(3) in uname(1) is now less obvious, given the new -m vs. uname(3) result differences. The reference below could be confusing.
ENVIRONMENT
     An environment variable composed of the string UNAME_ followed by any
     flag to the uname utility (except for -a) will allow the corresponding
     data to be set to the contents of the environment variable.  See uname(3)
     for more information.Note: When the relevant values for uname -m and uname(3) are distinct, it seems impossible to set UNAME_m to a single value and keep both uname -m and uname(3) each under their individual new distinctions. One either breaks uname -m or breaks __xuname.c's new meaning. poudriere's scripts use uname -m .
Well, Poudriere should use uname -p. Using -p doesn't make sense. On powerpc*, it prints just powerpc on all 4 platforms and is totally misleading on 3 of them. On armv{6,7}, it's imprecise (do we deal with arm as in the removed ARMv5, armeb, armv6 or maybe armv7?), although not as incorrect as on powerpc*. On riscv64, it makes other software think it's actually riscv32, which FreeBSD doesn't even support. I have posted previously links here to software and ugly workarounds that I needed to use.
EDIT: I meant "Using -m doesn't make sense." in my 2nd sentence.
poudriere uses both uname -m and uname -p :
# grep -r "uname.*-[mp]" /usr/local/share/poudriere/ | more
/usr/local/share/poudriere/common.sh:   _arch="$(uname -m).$(uname -p)"
/usr/local/share/poudriere/common.sh:   # uname -m (advised by dillon)
/usr/local/share/poudriere/common.sh:   if { sysctl -n kern.supported_archs 2>/dev/null || uname -m; } | \(I'll not list the uses of _arch . The other use is as a fail over when FreeBSD variants do not have kern.supported_archs .)
In other words, poudriere forms both parts of the likes of arm64.aarch64 and powerpc.powerpc64 . Is FreeBSD overall set up to allow use of just the 2nd part of the pair? Is poudriere ? Are all ports built other ways?
The UNAME_m use for cross builds might feed back into this use of the -m and -p pair. But I've not checked. You might want to get advice specific to poudriere. But the use for cross builds will override what you have done in uname(3).
Poudriere is doing exactly the right thing. It uses them for different things. uname -m is for kernel APIs, and uname -p is for userland ABIs. We use both in our build process in a nuanced way, and it works.
Given how much this broke, I'm going to go out on a limb and say you'll never be able to change this. You must patch all the software to conform with FreeBSD's quirky way of splitting these. While uname(1) didn't change, a lot of things use uname (3) rather than sysctl for this information, and you'll be playing whack-a-mole fixing the things that are already working. So I'm back to dead set against this because it causes too much POLA.
What about isolate this to powerpc* changing the MACHINE value in 'powerpc/include/param.h' to be [powerpc|powerpcspe|powerpc64|powerpc64le], according to the target arch?
Given what I see in https://en.wikipedia.org/wiki/Uname, I'd consider having MACHINE/MACHINE_ARCH strings being our target arch, like "powerpc64le/powerpc64le". I can be wrong but I don't see any advantage in having "MACHINE=powerpc" if it isn't the powerpc32 bit platform.
Examples of Coreutils-based OS are using "$TARGET_ARCH/Unknown", or "$TARGET_ARCH/$TARGET_ARCH":
- armv7l/Unknown
- x86_64/Unknown
- sparc64/sparc64
- ppc64/ppc64
- i686/i686
MacOS uses "$TARGET_ARCH/$TARGET" instead (so it would be something like "powerpc64le/powerpc"):
- x86_64/i386
- arm64/arm
- i386/i386
I think MacOS way would make more sense if it was in the inverse order. But to not reinvent the wheel again I'd vote to use the GNU/Coreutils way using the same value for both, like we already use today but using $TARGET_ARCH instead of $TARGET. (so powerpc64le/powerpc64le). These are Tier-2 archs and we never had regular ports package, the servers are on the way.. best time to break things if we have to.
Edit: Just to be more clear, I'm not proposing change the values of TARGET and TARGET_ARCH variables, they will remain the same. I'm proposing change the string that uname(1) and uname(3) reports for MACHINE and -m to be something like powerpc64le instead of just powerpc that makes build scripts from packages misdetect system as being a 32 bit powerpc.
You are wrong. Very wrong. Wikipedia is a bad source.
MACHINE *MUST*BE* powerpc. It is deeply entwined into our build system that MACHINE is identical to the location where the kernel sources are kept. The advantage is that it breaks the build completely if it is anything else.
MACHINE_ARCH can be whatever. It alone defines the ABI that's used. Changing these definitions is a non-starter. It cannot be done.
Examples of Coreutils-based OS are using "$TARGET_ARCH/Unknown", or "$TARGET_ARCH/$TARGET_ARCH":
- armv7l/Unknown
- x86_64/Unknown
- sparc64/sparc64nn
- ppc64/ppc64
- i686/i686
Coreutiles isn't FreeBSD. This would violate pola and break too many things.
MacOS uses "$TARGET_ARCH/$TARGET" instead (so it would be something like "powerpc64le/powerpc"):
- x86_64/i386
- arm64/arm
- i386/i386
Again, MacOS made different choices.
I think MacOS way would make more sense if it was in the inverse order. But to not reinvent the wheel again I'd vote to use the GNU/Coreutils way using the same value for both, like we already use today but using $TARGET_ARCH instead of $TARGET. (so powerpc64le/powerpc64le). These are Tier-2 archs and we never had regular ports package, the servers are on the way.. best time to break things if we have to.
Edit: Just to be more clear, I'm not proposing change the values of TARGET and TARGET_ARCH variables, they will remain the same. I'm proposing change the string that uname(1) and uname(3) reports for MACHINE and -m to be something like powerpc64le instead of just powerpc that makes build scripts from packages misdetect system as being a 32 bit powerpc.
TARGET == MACHINE 
TARGET_ARCH == MACHINE_ARCH
That's not possible to change without huge effort. It is utterly fundamental in how FreeBSD builds. What you are asking for is simply not possible. The connection between these two are too deeply engrained to change at this time.
MACHINE is always the kernel architecture, where the sources are stored.
MACHINE_ARCH is always the binaries that are produced.
Anything else will be man years of effort to fix in base FreeBSD.
I'm sorry I'm so earnest, but what you are asking for is too fundamental a change....
It's not a case of value judgment. The page is collection of uname outputs on different platforms, and that's what build scripts contained on packages rely on, historically. The scripts are not aware of FreeBSD's powerpc64* way.
MACHINE *MUST*BE* powerpc. It is deeply entwined into our build system that MACHINE is identical to the location where the kernel sources are kept. The advantage is that it breaks the build completely if it is anything else.
MACHINE_ARCH can be whatever. It alone defines the ABI that's used. Changing these definitions is a non-starter. It cannot be done.
Right. arm64 and amd64 doesn't have this problem because FreeBSD choice for 64 bit was split into another MACHINE string making "uname -m" return the 64 bit name, while powerpc64 remained powerpc for both 32 and 64 bits.
That's not possible to change without huge effort. It is utterly fundamental in how FreeBSD builds. What you are asking for is simply not possible. The connection between these two are too deeply engrained to change at this time.
MACHINE is always the kernel architecture, where the sources are stored.
MACHINE_ARCH is always the binaries that are produced.Anything else will be man years of effort to fix in base FreeBSD.
I'm sorry I'm so earnest, but what you are asking for is too fundamental a change....
powerpc64* is Tier2 and I don't see it like violating POLA gratuitously, it's for the sake of increase compatibility and reduce ports maintenance burden. On powerpc context man years of effort are already being expent fixing packages that are detecting the incorrect architecture because they aren't aware of the FreeBSD way for uname on powerpc64*.
Would a "hack" activated by poudriere/ports to change the behavior of uname(1) and uname(3) to report HW_MACHINE_ARCH instead of HW_MACHINE, be acceptable?
powerpc64* is Tier2 and I don't see it like violating POLA gratuitously, it's for the sake of increase compatibility and reduce ports maintenance burden. On powerpc context man years of effort are already being expent fixing packages that are detecting the incorrect architecture because they aren't aware of the FreeBSD way for uname on powerpc64*.
It breaks native builds of thef base tree. And that's hard to fix without super-gross hacks. I can't of a good way that doesn't introduce a lot of ifdef powerpc hacks to the build. If it were only powerpc vs powerpc64, then we could maybe do it with a symlink and fixing places in the tree that known that MACHINE==powerpc means something (though they should use MACHINE_CPUARCH). But it's not just powerpc64 vs powerpc, there's also powerpc64el,. So I'm skeptical here.
Would a "hack" activated by poudriere/ports to change the behavior of uname(1) and uname(3) to report HW_MACHINE_ARCH instead of HW_MACHINE, be acceptable?
I doubt it would work. The hack is easy to write: just set UNAME_m=$(uname -p) when building packages (but not for buildworld, etc). However, even this wouldn't do what you want... It would break at least kernel module building would be broken since MACHINE wouldn't be powerpc in this case. It may also break some utilities that grovel in the kernel as well (though they may not since /usr/include/machine would be built correctly... though anything that needs to know where kernel sources are might not work).
Replying to my own question, there's a hack implemented already. Ports scripts can set the UNAME_m environment variable to override the syscall hw.machine:
root@:~ # uname -m powerpc root@:~ # export UNAME_m=powerpc64 root@:~ # uname -m powerpc64
Mid air collision! I didn't see your response  about UNAME_M before I figured it out, sorry
I think this is sufficient to make most packages build correctly. The drawback is that users building packages directly from sources will have to apply the hack manually of needed.
FYI:
# sysctl hw.machine hw.machine: arm64 # env UNAME_m=$(uname -p) uname -m aarch64 # env UNAME_m=$(uname -p) sysctl hw.machine hw.machine: arm64
(I do not have powerpc* access any more. That is the only reason why I used arm64.aarch64 above.)
Yea, you'll likely find that some ports need it and some don't. You may need a way to manage that. Ideally they all get fixed. We went through that with mips and it wasn't terrible. It's likely to be a lot messier than you are representing when you look at all the details. mips sure was, and there were no easy solutions there either.
Also, we may be able to place this into the build environment of individual ports or port families that are known to be troublesome. This would at least document the ones that need help, though maybe at the cost of extra work per-port that you're trying to avoid (but maybe less extra work than fixing disparate build systems).
Putting it in at the /usr/ports/Mk level may be possible, but also a fruitful area of experimentation. There may be issues with acceptance here though depending on the hacks.
Having Poudriere have a regular expression list would allow you to set it for all, but not set it for the */*-kmod ports too.
Perhaps a USING+=linux-uname or something might make things more systematic and easier to do.
This divergence is quite old. We've been forever fixing problems with amd64 vs x86_64 in builds as well, and that dates back 15 years...
. . .
This divergence is quite old. We've been forever fixing problems with amd64 vs x86_64 in builds as well, and that dates back 15 years...
For ports of things that upstream basically limits to linux targeting, there will likely be examples something like:
powerpc64le vs. ppc64le
powerpc64 vs. ppc64
powerpc vs. ppc
for uname -m result testing issues. UNAME_m=$(uname -p) use of itself would not directly address such distinctions. I've no clue how common this sort of lack of powerpc* name handling might be over the upstream ports.
I think naming like powerpc64le is ok, ports often check also for that. The problem is misleading naming "powerpc" for 64-bits, especially LE.
I'll try to come up with a patch specifically for ports that will work around this issue there.