Page MenuHomeFreeBSD

python.mk: don't write bytecode whilst building under PEP-517
AbandonedPublic

Authored by vishwin on Feb 7 2023, 10:58 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 3, 6:53 AM
Unknown Object (File)
Fri, Dec 27, 7:42 PM
Unknown Object (File)
Dec 3 2024, 9:07 PM
Unknown Object (File)
Dec 3 2024, 6:49 PM
Unknown Object (File)
Nov 24 2024, 6:38 AM
Unknown Object (File)
Nov 21 2024, 1:33 PM
Unknown Object (File)
Nov 18 2024, 12:13 PM
Unknown Object (File)
Nov 18 2024, 6:23 AM

Details

Reviewers
tcberner
fluffy
arrowd
antoine
Group Reviewers
Python
portmgr
Summary

Python writes bytecode on import by default, which is achieveable when the executing user matches the file/directory owner of the modules being imported. This is problematic for package build purposes as the filesystem is polluted. Observed especially when packages are built as root, but not otherwise.

This passes PYTHONDONTWRITEBYTECODE to ${MAKE_ENV} to disable the default behaviour when building under PEP-517. In the future, additional consideration is warranted for disabling this behaviour globally as far as packaging is concerned.

Reported by: sunpoet, pi

Test Plan

Build any port that uses PEP-517.

Diff Detail

Repository
rP FreeBSD ports repository
Lint
No Lint Coverage
Unit
No Test Coverage
Build Status
Buildable 49617
Build 46507: arc lint + arc unit

Event Timeline

I have no idea what actually writes bytecode, or when it gets written, but it feels like this should be put in a place where it is picked up by all ports that will use some python bits, like directly in bsd.port.mk so all ports get it.

There will still be violations at runtime, this doesn't fix anything.

root@131amd64-default:~ # find /usr/local/lib/python3.9/site-packages/flit_core | grep pycache
root@131amd64-default:~ # python3.9
Python 3.9.16 (main, Dec 19 2022, 17:55:24)
[Clang 13.0.0 (git@github.com:llvm/llvm-project.git llvmorg-13.0.0-0-gd7b669b3a on freebsd13
Type "help", "copyright", "credits" or "license" for more information.

import flit_core

root@131amd64-default:~ # find /usr/local/lib/python3.9/site-packages/flit_core | grep pycache
/usr/local/lib/python3.9/site-packages/flit_core/__pycache__
/usr/local/lib/python3.9/site-packages/flit_core/__pycache__/__init__.cpython-39.pyc

antoine requested changes to this revision.EditedFeb 8 2023, 8:41 AM

Unless you have something working that compiles bytecode on pep-517 package installation and nukes stale bytecode on pep-517 package de-installation, the workaround committed by sunpoet in de6965254c3a007efcf697c3d455b54d2aeb2383 should be restored.

This revision now requires changes to proceed.Feb 8 2023, 8:41 AM

From what I understand, you removed sunpoet's fix because of some hypothetical future problem that may never happen.

We already have a very easy way to fix the problem you mentioned, if it ever happens in a patch level python upgrade that some opcode is removed and that breaks runtime, we simply bump PORTREVISION for all packages that install .py modules.

(If the problem only happens during minor python upgrades, well, we don't need to do anything, packages will be rebuilt and reinstalled because of the lang/pythonxy dependency change.)

In D38429#874508, @mat wrote:

From what I understand, you removed sunpoet's fix because of some hypothetical future problem that may never happen.

This problem exists right now, because bytecode is not to be packaged, period. We (and many other operating system-level packagers) have been doing it wrong for years, and only as a crutch due to lack of a trigger mechanism that we now have. D34739 uses trigger support; having bytecode packaged hampers development of that, because proper operation cannot be verified.

In D38429#874500, @mat wrote:

I have no idea what actually writes bytecode, or when it gets written, but it feels like this should be put in a place where it is picked up by all ports that will use some python bits, like directly in bsd.port.mk so all ports get it.

That is part of the "additional consideration", where to put PYTHONDONTWRITEBYTECODE.

There will still be violations at runtime, this doesn't fix anything.

root@131amd64-default:~ # find /usr/local/lib/python3.9/site-packages/flit_core | grep pycache
root@131amd64-default:~ # python3.9
Python 3.9.16 (main, Dec 19 2022, 17:55:24)
[Clang 13.0.0 (git@github.com:llvm/llvm-project.git llvmorg-13.0.0-0-gd7b669b3a on freebsd13
Type "help", "copyright", "credits" or "license" for more information.

import flit_core

root@131amd64-default:~ # find /usr/local/lib/python3.9/site-packages/flit_core | grep pycache
/usr/local/lib/python3.9/site-packages/flit_core/__pycache__
/usr/local/lib/python3.9/site-packages/flit_core/__pycache__/__init__.cpython-39.pyc

Because you are running as root, which almost always owns the installed files.

Started an upstream discussion https://github.com/python/cpython/issues/101702 since I'm not sure of any good way to pass the variable globally in the environment for every stage (important for the interactive stage)

Because you are running as root, which almost always owns the installed files.

So what, we have to forbid users running python bits to do anything as root?

In D38429#875110, @mat wrote:

So what, we have to forbid users running python bits to do anything as root?

More that the default behaviour of writing bytecode in the same directories as the imported modules really only took into account when not ran as root.

In D38429#874508, @mat wrote:

From what I understand, you removed sunpoet's fix because of some hypothetical future problem that may never happen.

This problem exists right now, because bytecode is not to be packaged, period. We (and many other operating system-level packagers) have been doing it wrong for years, and only as a crutch due to lack of a trigger mechanism that we now have. D34739 uses trigger support; having bytecode packaged hampers development of that, because proper operation cannot be verified.

Well, the branch you work on should be self-contained. You cannot commit the first part (as in removal of packaging of bytecode) without also committing something that will generate the bytecode, or prevent any bytecode from being generated.

sunpoet was right in restoring the current way we do things. The fact that it is wrong WRT how python wants people to do things is irrelevant, it is what works for us right now. Your code review should include the removal of the bits that generate the bytecode, and you cannot leave the ports tree in the state it currently is.

How do the package systems in other operating systems deal with Python bytecode?

How do the package systems in other operating systems deal with Python bytecode?

Debian installs bytecode at the moment. I don't know if it's a good idea to remove it. Users would not have that advantage and perhaps the performance of some programs would possibly degrade.

Most others package bytecode, but again as a crutch due to lack of a trigger function that we have. Most others also don't have the strict/static plist that we do; pacman for instance just takes whatever in their equivalent ${STAGEDIR} and calls it a day. Therein lies our problem: there is no reliable way to reliably track bytecode for the purposes of a static plist with the latest Python packaging tooling, not even with wildcards. (We had a fighting chance with the old distutils method, but that's another story.) It is compounded by no stability guarantees within the bytecodes themselves, further cementing that these are little more than instruction caches tailored to each individual installation.

Don't get me wrong, I'm not against having bytecode around for CPython execution, because performance benefits do exist amongst other things. But they can't needlessly pollute the filesystem in an inconsistent and non-repeatable manner.

Don't get me wrong, I'm not against having bytecode around for CPython execution, because performance benefits do exist amongst other things. But they can't needlessly pollute the filesystem in an inconsistent and non-repeatable manner.

What about the way freebsd packages have historically installed python bytecode was inconsistent and/or non-repeatable? I apologize if I missed the reference somewhere in the discussion.

To add another point of view, Fedora has a take on this as well:

https://fedoraproject.org/wiki/Changes/Python_Optional_Bytecode_Cache

Therein is a proposal to ship optional bytecode as subpackages (even separate packages each for non-optimized bytecode and level 1 and level 2 optimized bytecode). FreeBSD tends not to do that kind of thing, relying rather on port options (cf., DOCS). Linux has more of a tendency to create separate packages to allow users to not install 'optional' bits (e.g., -devel for .a and include files and section 3 man pages). For FreeBSD, a "global" PYBYTECODE option similar to DOCS feels reasonable.

They (Fedora) don't seem to discuss anything (and maybe are not overly concerned) about strict adherence to mimic what some native python package tooling solutions do (gen byte code at install time). They even discuss shipping .pyc only instead of .py as one possibility (but the drawbacks discussed seem to indicate that is not likely except for special case packages like "data-only" or machine-generated .py files).

Or just do what FreeBSD python ports have historically done - ship with bytecode in the plist. I am not familiar with reports of inconsistencies due to that behavior, although I understand how it could technically be possible. That possibility should occur in predictable situations I think (e.g., in some cases of base python updates), and bumping PORTREVISION as necessary should be able to cover that.

[...] there is no reliable way to reliably track bytecode for the purposes of a static plist with the latest Python packaging tooling, not even with wildcards.

At the moment, pip records .pyc files in the RECORD file (even if they would be optional), so why not replace py-installer with py-pip? I use it as PEP517_INSTALL_CMD in devel/py-virtualenv, and it works well.

pip is unacceptably heavy dependency-wise and cannot be bootstrapped (in that circular dependencies will result) how installer, build et al can. It is also not intended for "system" use. installer currently does not track bytecode in RECORD or anywhere else.

Well, I think it's easy to record pyc files generated by py-installer with a little modification to python.mk...