Page MenuHomeFreeBSD

lang/python: add bytecode trigger
Needs ReviewPublic

Authored by tcberner on Apr 1 2022, 8:22 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Mar 18, 8:11 PM
Unknown Object (File)
Mon, Mar 18, 2:42 PM
Unknown Object (File)
Wed, Mar 13, 2:34 AM
Unknown Object (File)
Sun, Mar 10, 7:14 AM
Unknown Object (File)
Wed, Mar 6, 6:07 AM
Unknown Object (File)
Feb 25 2024, 4:16 PM
Unknown Object (File)
Feb 22 2024, 10:43 AM
Unknown Object (File)
Feb 20 2024, 8:22 AM

Details

Summary

Prototype to automatically recreate the __pycache__ when site-packages is touched.

There's some hard-coded assumptions...

Seems to be able to cleanup py-sip4 fine now

[00:02:29] [01] [00:00:05] Finished devel/py-sip4@py38 | py38-sip4-4.19.25: Success

Install:

[140amd64-main-job-01] Installing py38-sip4-4.19.25...
>=> Recreating python bytecode ...
   >=> wrote bytecode for /usr/local/lib/python3.8/site-packages/PyQt5
>=> Recreating top-level python bytecode ...
   >=> wrote bytecode for /usr/local/lib/python3.8/site-packages
>=> Cleaning stale python bytecode ...

Deinstall:

[140amd64-main-job-01] [1/1] Deleting files for py38-sip4-4.19.25: .......... done
>=> Recreating python bytecode ...
>=> Recreating top-level python bytecode ...
>=> Cleaning stale python bytecode ...
  >=> removed stale bytecode sipdistutils.cpython-38.pyc
  >=> removed stale bytecode sipconfig.cpython-38.pyc
  >=> removed stale bytecode sipconfig.cpython-38.opt-1.pyc
  >=> removed stale bytecode sipdistutils.cpython-38.opt-1.pyc
  >=> removed empty cache directory /usr/local/lib/python3.8/site-packages/__pycache__

Diff Detail

Repository
rP FreeBSD ports repository
Lint
No Lint Coverage
Unit
No Test Coverage
Build Status
Buildable 50451
Build 47342: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

For those who've lived under a rock, I'm not under mentorship.

This error is transient, but it's probably something simple and stupid given the nil indexing. The error doesn't really happen in a not-package building environment, for instance.

But a full revert breaks things back to the filesystem violations when building as root.

For those who've lived under a rock, I'm not under mentorship.

Sorry if I've missed your release, I just glanced at this part of the c17ddfb commit message which reads:

Co-authored by: tcberner
Approved by: tcberner (mentor)
Differential Revision: https://reviews.freebsd.org/D34739

and I have no reason to question the (mentor) tag there. My apologies, but no reason to get into personal offenses. Tell me down if I make the same mistake twice.

This error is transient, but it's probably something simple and stupid given the nil indexing. The error doesn't really happen in a not-package building environment, for instance.

The error is not transient, but hit py-breathe every single time I have built it in poudriere, which is *the* prime platform to support these days.

But a full revert breaks things back to the filesystem violations when building as root.

Yeah, I appreciate the effort, but I appreciate neither the long silence (a mail or comment here "whoops, I am on it" or "I will get to it the day after tomorrow" in 9 days is not asking too much) nor a commit to high-profile ports without -exp run.

now that we apparently have to appease non-fatal errors not part of the build, as if package builders are sentient beings

yeet bytecode from USE_PYTHON=distutils because to hell with them

this is the part that makes an exp-run actually worthwhile. all independent compileall invocations that write to ${PYTHONPREFIX_SITELIBDIR} are on watch.

now that we apparently have to appease non-fatal errors not part of the build, as if package builders are sentient beings

It was non-fatal, but the rest of the lua trigger didn't execute which caused violations at runtime to be back

now that we apparently have to appease non-fatal errors not part of the build, as if package builders are sentient beings

No need to get touchy and distract from the fact that the commit was introducing new bugs, and that there had not been acknowledgment of the bug in 9 days.
Only now you're back to the drawing board.

mandree requested changes to this revision.Feb 28 2023, 7:36 PM

So, the trigger is redundantly duplicated across all Python versions. This is bad style and error-prone.

Please refactor this to a common script (either we keep it in Tools/ or else in a separate port), and just run it from the trigger (which has the port-specific %%PYTHON_*%% variables, with the necessary arguments.
That also makes it amenable to separate testing and maintenance independent of the Python package.

This revision now requires changes to proceed.Feb 28 2023, 7:36 PM
This revision now requires review to proceed.Feb 28 2023, 7:37 PM

So, the trigger is redundantly duplicated across all Python versions. This is bad style and error-prone.

Please refactor this to a common script (either we keep it in Tools/ or else in a separate port), and just run it from the trigger (which has the port-specific %%PYTHON_*%% variables, with the necessary arguments.
That also makes it amenable to separate testing and maintenance independent of the Python package.

No.

The initial initial pass had this as a separate port that the Python ports themselves depended on. It was discarded for a couple reasons. Bytecode is a CPython implementation detail and thus only applicable for it; no other Python implementation employs bytecode. Also, every CPython version differs somewhat, and even for compileall, 3.9 gained a flag that allows for one invocation instead of three, which also shortens the trigger. Thus there is no technical justification to separating this from the CPython ports themselves, however redundant things may appear.

It is not in our best interest to have even a relatively minor implementation detail discourage someone else from exploring porting other Python implementations here. For instance, PyPy has been a wishlist item for years.

mandree requested changes to this revision.Feb 28 2023, 7:58 PM

So, the trigger is redundantly duplicated across all Python versions. This is bad style and error-prone.

Please refactor this to a common script (either we keep it in Tools/ or else in a separate port), and just run it from the trigger (which has the port-specific %%PYTHON_*%% variables, with the necessary arguments.
That also makes it amenable to separate testing and maintenance independent of the Python package.

No.

The initial initial pass had this as a separate port that the Python ports themselves depended on. It was discarded for a couple reasons. Bytecode is a CPython implementation detail and thus only applicable for it; no other Python implementation employs bytecode. Also, every CPython version differs somewhat, and even for compileall, 3.9 gained a flag that allows for one invocation instead of three, which also shortens the trigger. Thus there is no technical justification to separating this from the CPython ports themselves, however redundant things may appear.

It is not in our best interest to have even a relatively minor implementation detail discourage someone else from exploring porting other Python implementations here. For instance, PyPy has been a wishlist item for years.

You are distracting again, and your arguments are not relevant. Functions can have arguments, so you can cater for the few differences. A Python port that isn't CPython need not call the refactored function at all.

This revision now requires changes to proceed.Feb 28 2023, 7:58 PM

You are distracting again, and your arguments are not relevant. Functions can have arguments, so you can cater for the few differences. A Python port that isn't CPython need not call the refactored function at all.

Then I challenge you with your take on bytecode management outside packaging, employing Tools/ and whatnot like you mentioned, compliant with the living PEP-376 and PEP-627 standards and consistent in behaviour to pure Python packaging tooling like pip standalone.

This revision now requires review to proceed.Feb 28 2023, 8:20 PM
mandree requested changes to this revision.Feb 28 2023, 9:04 PM
mandree added a reviewer: mandree.

@vishwin remove me again from reviewers and that will have grave consequences.

mandree requested changes to this revision.Feb 28 2023, 9:05 PM

we cannot have duplicated code all around. This is unmaintainable, error-prone, and Charlie has not given any non-refuted reason that would justify code duplication.

This revision now requires changes to proceed.Feb 28 2023, 9:05 PM

Also, there is practically no error checking and handling. This seems inappropriate for code that is applied to high-profile ports with lots of users.

lang/python311/files/python3.11.ucl.in
46 ↗(On Diff #118052)

this shows an utter misunderstanding of context. The trigger does not get to tell pkg how it has to behave, but the trigger lua script needs to live with whatever pkg does. Always, not just intermittently. This comment is out of line.

@vishwin remove me again from reviewers and that will have grave consequences.

Please, no need to be distract with emotional things. Let's focus on the technical discussions and find the solutions. Most of the technical points from everyone are worth to be taken into consideration.

This is a non-trivial change and we're doing things other package managers didn't do (doesn't mean it is bad, instead this may let others to learn from) so having wilder testing and feedback and doing adjustment in different stages (pre and post merging) is common and not a bad thing.

antoine requested changes to this revision.Mar 9 2023, 7:57 AM

These 2 changes (no-compile / bytecode trigger) must be seperate commits and seperate exp-runs.

tcberner reclaimed this revision.
tcberner added a reviewer: vishwin.
This revision now requires changes to proceed.Mar 13 2023, 6:09 PM

Revert the no-compile change (for a separate review).

  • Fix print statements again (no arrows requested)
  • Reinstate python-pycache port containing the trigger

@mandree, @vishwin could you please check the rework.

  • I merged it back into a single port, that installs flavored triggers (see change to Mk/Uses/trigger.mk)
  • The trigger tries to be a bit more restrictive in recreating / cleaning up only bytecode files for paths reported by pkg (via arg).
Mk/Uses/trigger.mk
14

@bapt opinions?

lang/python-pycache/files/python-update-cache.ucl.in
46

^oh, I noticed that this is not adapted to only build in directory anymore.

  • Trigger: ignore pkg's arg and walk site-lib-dir on our own
lang/python-pycache/files/python-update-cache.ucl.in
15

^ this is no longer needed, if pkg.arg is not used

130

129-130: no longer needed if pkg.arg is not used

lang/python-pycache/Makefile
11

you have to add "USE_PYTHON=flavors allflavors" to have a working ports tree

There are still lua errors, for instance

===>   py39-sphinx-5.3.0,1 depends on package: py39-sphinxcontrib-htmlhelp>=2.0.0 - not found
===>   Installing existing package /packages/All/py39-sphinxcontrib-htmlhelp-2.0.0.pkg
[131amd64-default-foo-job-18] Installing py39-sphinxcontrib-htmlhelp-2.0.0...
[131amd64-default-foo-job-18] Extracting py39-sphinxcontrib-htmlhelp-2.0.0: .......... done
Creating Python bytecode files...
Cleaning Python bytecode files...
pkg-static: Failed to execute lua trigger: attempt to index a nil value
pkg-static: lua script failed

To be honnest I am not convinced we should pursue in that direction, only debian seems to be doing something like that and they use a python script to compile and cleanup called by a trigger. All other OSes I have checked are not at all doing anything in that direction.

I am personnally not seeing what problem does this approarch solves vs the complexity it adds, at least this needs to be clarified.

In my opinion, not knowing much about the python ecosystem, it seems better to implement something in the port framework to ensure we have all the cache generated at packaging time, rather that generated via a hook.

In D34739#895706, @bapt wrote:

To be honnest I am not convinced we should pursue in that direction, only debian seems to be doing something like that and they use a python script to compile and cleanup called by a trigger. All other OSes I have checked are not at all doing anything in that direction.

I am personnally not seeing what problem does this approarch solves vs the complexity it adds, at least this needs to be clarified.

In my opinion, not knowing much about the python ecosystem, it seems better to implement something in the port framework to ensure we have all the cache generated at packaging time, rather that generated via a hook.

Ok, so your proposal is to re-introduce https://cgit.freebsd.org/ports/commit/?id=de6965254c3a007efcf697c3d455b54d2aeb2383

In D34739#895706, @bapt wrote:

To be honnest I am not convinced we should pursue in that direction, only debian seems to be doing something like that and they use a python script to compile and cleanup called by a trigger. All other OSes I have checked are not at all doing anything in that direction.

Because nearly nobody else has a trigger functionality like we do.

I am personnally not seeing what problem does this approarch solves vs the complexity it adds, at least this needs to be clarified.

CPython bytecode is not deterministic at all, even with every attempt to mitigate such. The whole point of bytecode is to serve as instruction caching, but there is absolutely no guarantee that any pre-packaged bytecode gets used as intended (just run compileall on stuff that comes with bytecode and see how much gets rebuilt). Furthermore, bytecode file sizes in Python 3.12 have only gotten bigger.

Python's own packaging ecosystem does not even include bytecode in binary distributions; they are always built on the target system and only if the user allows it (which is the default).

See https://wiki.freebsd.org/Python/CompiledPackages for technical details, it's messy no matter the approach.

In D34739#895706, @bapt wrote:

To be honnest I am not convinced we should pursue in that direction, only debian seems to be doing something like that and they use a python script to compile and cleanup called by a trigger. All other OSes I have checked are not at all doing anything in that direction.

Because nearly nobody else has a trigger functionality like we do.

Untrue. Fedora and Debian, and by that many derived distros, have that.

I am personnally not seeing what problem does this approarch solves vs the complexity it adds, at least this needs to be clarified.

CPython bytecode is not deterministic at all, even with every attempt to mitigate such. The whole point of bytecode is to serve as instruction caching, but there is absolutely no guarantee that any pre-packaged bytecode gets used as intended (just run compileall on stuff that comes with bytecode and see how much gets rebuilt). Furthermore, bytecode file sizes in Python 3.12 have only gotten bigger.

Python's own packaging ecosystem does not even include bytecode in binary distributions; they are always built on the target system and only if the user allows it (which is the default).

The thing is:

  1. the current python.mk ports stuff does not package __pycache__ properly for all Python versions, but should
  2. you hack files after pkg knows them, either you get checksum mismatches, or pkg which $PREFIX/some/file does not know about it. Both are inacceptable.
  3. poudriere, if it needs to RUN a requisite Python package that creates __pycache__ entries newly, will make the *USER* look bad although it's not that *USER* package that caused extra __pycache__ dirs and files to be created, but the lack of them being installed in the first place.

See https://wiki.freebsd.org/Python/CompiledPackages for technical details, it's messy no matter the approach.

Part of which may be a lack of a way to tell pkg results of post-install scripts.

As a more general comment, I removed some immature code. I am not too troubled about *how* we solve it, but we need robust code that properly traps errors, properly reports them, and is fully tested, and DOES NOT GET COMMITTED BEFORE it's properly tested and approved by all. Python isn't your average leaf port sandbox in the playground, but high-profile stuff. And we surely should not comment anything in the next few days (hours) before 2023Q2 has branched. If something is mature and does not break semantics of other ports, we can still MFH to quarterly later, or just really clean up for 2023Q3 before July.

  1. the current python.mk ports stuff does not package __pycache__ properly for all Python versions, but should
  2. you hack files after pkg knows them, either you get checksum mismatches, or pkg which $PREFIX/some/file does not know about it. Both are inacceptable.

When bytecode does not pass checks, they are recompiled on import. If a user with write privileges to locations of imported modules performs execution (usually root), those results overwrite any existing bytecode, causing checksum mismatches anyway.

  1. poudriere, if it needs to RUN a requisite Python package that creates __pycache__ entries newly, will make the *USER* look bad although it's not that *USER* package that caused extra __pycache__ dirs and files to be created, but the lack of them being installed in the first place.

D39306 eliminates that. Otherwise, only happens when building as root (which has not been poudriere's default setting for years) or another user with write privileges to the locations of imported modules.

I also wonder why we don't install a 3.7+ compatible Python script and call that from the lua trigger. It would be far more useful (as in count of available developers) to write this in Python and not Lua, and we'd get proper debugging and logging libs for free. Lua is a bit minimalistic for the rather complex task at hand.

In D34739#895706, @bapt wrote:

To be honnest I am not convinced we should pursue in that direction, only debian seems to be doing something like that and they use a python script to compile and cleanup called by a trigger. All other OSes I have checked are not at all doing anything in that direction.

I am personnally not seeing what problem does this approarch solves vs the complexity it adds, at least this needs to be clarified.

In my opinion, not knowing much about the python ecosystem, it seems better to implement something in the port framework to ensure we have all the cache generated at packaging time, rather that generated via a hook.

Ok, so your proposal is to re-introduce https://cgit.freebsd.org/ports/commit/?id=de6965254c3a007efcf697c3d455b54d2aeb2383

I quick glance it look simpler, I may be wrong.

In D34739#895706, @bapt wrote:

To be honnest I am not convinced we should pursue in that direction, only debian seems to be doing something like that and they use a python script to compile and cleanup called by a trigger. All other OSes I have checked are not at all doing anything in that direction.

Because nearly nobody else has a trigger functionality like we do.

That is wrong, I implemented triggers, because we were the last to now have such functionnality.. and I looked how it was implemented elsewhere.

I am personnally not seeing what problem does this approarch solves vs the complexity it adds, at least this needs to be clarified.

CPython bytecode is not deterministic at all, even with every attempt to mitigate such. The whole point of bytecode is to serve as instruction caching, but there is absolutely no guarantee that any pre-packaged bytecode gets used as intended (just run compileall on stuff that comes with bytecode and see how much gets rebuilt). Furthermore, bytecode file sizes in Python 3.12 have only gotten bigger.poudriere

Python's own packaging ecosystem does not even include bytecode in binary distributions; they are always built on the target system and only if the user allows it (which is the default).

See https://wiki.freebsd.org/Python/CompiledPackages for technical details, it's messy no matter the approach.

As far as I know from the time when I participated to the events about reproducible build the issues has been fixed, at least instead of claiming it is not deterministic, can you state what makes it non deterministic?

In D34739#895772, @bapt wrote:

As far as I know from the time when I participated to the events about reproducible build the issues has been fixed, at least instead of claiming it is not deterministic, can you state what makes it non deterministic?

It's still not, never has and probably never will. While PEP-552 adds an alternative to mtime comparisons (a computed hash), the use of a magic number that can vary between environments (in addition to interpreter versions) already makes it non-deterministic. Further, PEP-552 acknowledges other facets of non-determinism, particularly inconsistent marshalling of set objects (such an implementation is necessary by default for security reasons).

Increased file sizes in the upcoming CPython 3.12 is also concerning.

In D34739#895772, @bapt wrote:

As far as I know from the time when I participated to the events about reproducible build the issues has been fixed, at least instead of claiming it is not deterministic, can you state what makes it non deterministic?

It's still not, never has and probably never will. While PEP-552 adds an alternative to mtime comparisons (a computed hash), the use of a magic number that can vary between environments (in addition to interpreter versions) already makes it non-deterministic. Further, PEP-552 acknowledges other facets of non-determinism, particularly inconsistent marshalling of set objects (such an implementation is necessary by default for security reasons).

Increased file sizes in the upcoming CPython 3.12 is also concerning.

Given you have been proven wrong several times in the past, I need to ask these:

  1. Please show figures
  2. Please show how these contribute to a relevant increase of .txz and/or .tbz2 archives, and how these are out of balance.

Using testport with the following EXTRA_PATCHES_OFF (since compileall runs unconditionally during stage):

--- Makefile.pre.in.orig        2023-02-07 12:05:45 UTC
+++ Makefile.pre.in
@@ -1601,33 +1601,6 @@ libinstall:      build_all $(srcdir)/Modules/xxmodule.c
                $(INSTALL_DATA) $(srcdir)/Modules/xxmodule.c \
                        $(DESTDIR)$(LIBDEST)/distutils/tests ; \
        fi
-       -PYTHONPATH=$(DESTDIR)$(LIBDEST)  $(RUNSHARED) \
-               $(PYTHON_FOR_BUILD) -Wi $(DESTDIR)$(LIBDEST)/compileall.py \
-               -j0 -d $(LIBDEST) -f \
-               -x 'bad_coding|badsyntax|site-packages|lib2to3/tests/data' \
-               $(DESTDIR)$(LIBDEST)
-       -PYTHONPATH=$(DESTDIR)$(LIBDEST) $(RUNSHARED) \
-               $(PYTHON_FOR_BUILD) -Wi -O $(DESTDIR)$(LIBDEST)/compileall.py \
-               -j0 -d $(LIBDEST) -f \
-               -x 'bad_coding|badsyntax|site-packages|lib2to3/tests/data' \
-               $(DESTDIR)$(LIBDEST)
-       -PYTHONPATH=$(DESTDIR)$(LIBDEST) $(RUNSHARED) \
-               $(PYTHON_FOR_BUILD) -Wi -OO $(DESTDIR)$(LIBDEST)/compileall.py \
-               -j0 -d $(LIBDEST) -f \
-               -x 'bad_coding|badsyntax|site-packages|lib2to3/tests/data' \
-               $(DESTDIR)$(LIBDEST)
-       -PYTHONPATH=$(DESTDIR)$(LIBDEST) $(RUNSHARED) \
-               $(PYTHON_FOR_BUILD) -Wi $(DESTDIR)$(LIBDEST)/compileall.py \
-               -j0 -d $(LIBDEST)/site-packages -f \
-               -x badsyntax $(DESTDIR)$(LIBDEST)/site-packages
-       -PYTHONPATH=$(DESTDIR)$(LIBDEST) $(RUNSHARED) \
-               $(PYTHON_FOR_BUILD) -Wi -O $(DESTDIR)$(LIBDEST)/compileall.py \
-               -j0 -d $(LIBDEST)/site-packages -f \
-               -x badsyntax $(DESTDIR)$(LIBDEST)/site-packages
-       -PYTHONPATH=$(DESTDIR)$(LIBDEST) $(RUNSHARED) \
-               $(PYTHON_FOR_BUILD) -Wi -OO $(DESTDIR)$(LIBDEST)/compileall.py \
-               -j0 -d $(LIBDEST)/site-packages -f \
-               -x badsyntax $(DESTDIR)$(LIBDEST)/site-packages
        -PYTHONPATH=$(DESTDIR)$(LIBDEST) $(RUNSHARED) \
                $(PYTHON_FOR_BUILD) -m lib2to3.pgen2.driver $(DESTDIR)$(LIBDEST)/lib2to3/Grammar.txt
        -PYTHONPATH=$(DESTDIR)$(LIBDEST) $(RUNSHARED) \
portsize with pycsize without pyc
python3.7111 MiB flat, 18 MiB tzst49 MiB flat, 11 MiB tzst
python3.8116 MiB flat, 19 MiB tzst52 MiB flat, 11 MiB tzst
python3.9119 MiB flat, 19 MiB tzst51 MiB flat, 11 MiB tzst
python3.10121 MiB flat, 19 MiB tzst52 MiB flat, 12 MiB tzst
python3.11194 MiB flat, 24 MiB tzst58 MiB flat, 12 MiB tzst

Python 3.12 has not been ported yet. I apparently misread what I originally found about bytecode sizes there that I can no longer find, but Python 3.11's ballooned bytecode sizes are a known issue upstream; the mentioned "low-hanging fruit" should not reduce Python 3.12's bytecode size much I don't expect.

Existing upstream issue, still open, about (lack of) reproducibility. It's so bad that CPython's own tests aren't even deterministic.

In D34739#895772, @bapt wrote:

As far as I know from the time when I participated to the events about reproducible build the issues has been fixed, at least instead of claiming it is not deterministic, can you state what makes it non deterministic?

It's still not, never has and probably never will. While PEP-552 adds an alternative to mtime comparisons (a computed hash), the use of a magic number that can vary between environments (in addition to interpreter versions) already makes it non-deterministic. Further, PEP-552 acknowledges other facets of non-determinism, particularly inconsistent marshalling of set objects (such an implementation is necessary by default for security reasons).

Increased file sizes in the upcoming CPython 3.12 is also concerning.

the mtime as non determinism isnot a valid reason for a trigger as the mtime is set in stone, as for other parts marshalling, I will need to read PEP-552 :(

In D34739#895772, @bapt wrote:

As far as I know from the time when I participated to the events about reproducible build the issues has been fixed, at least instead of claiming it is not deterministic, can you state what makes it non deterministic?

It's still not, never has and probably never will. While PEP-552 adds an alternative to mtime comparisons (a computed hash), the use of a magic number that can vary between environments (in addition to interpreter versions) already makes it non-deterministic. Further, PEP-552 acknowledges other facets of non-determinism, particularly inconsistent marshalling of set objects (such an implementation is necessary by default for security reasons).

Increased file sizes in the upcoming CPython 3.12 is also concerning.

Given you have been proven wrong several times in the past, I need to ask these:

  1. Please show figures
  2. Please show how these contribute to a relevant increase of .txz and/or .tbz2 archives, and how these are out of balance.

Using testport with the following EXTRA_PATCHES_OFF (since compileall runs unconditionally during stage):

--- Makefile.pre.in.orig        2023-02-07 12:05:45 UTC
+++ Makefile.pre.in
@@ -1601,33 +1601,6 @@ libinstall:      build_all $(srcdir)/Modules/xxmodule.c
                $(INSTALL_DATA) $(srcdir)/Modules/xxmodule.c \
                        $(DESTDIR)$(LIBDEST)/distutils/tests ; \
        fi
-       -PYTHONPATH=$(DESTDIR)$(LIBDEST)  $(RUNSHARED) \
-               $(PYTHON_FOR_BUILD) -Wi $(DESTDIR)$(LIBDEST)/compileall.py \
-               -j0 -d $(LIBDEST) -f \
-               -x 'bad_coding|badsyntax|site-packages|lib2to3/tests/data' \
-               $(DESTDIR)$(LIBDEST)
-       -PYTHONPATH=$(DESTDIR)$(LIBDEST) $(RUNSHARED) \
-               $(PYTHON_FOR_BUILD) -Wi -O $(DESTDIR)$(LIBDEST)/compileall.py \
-               -j0 -d $(LIBDEST) -f \
-               -x 'bad_coding|badsyntax|site-packages|lib2to3/tests/data' \
-               $(DESTDIR)$(LIBDEST)
-       -PYTHONPATH=$(DESTDIR)$(LIBDEST) $(RUNSHARED) \
-               $(PYTHON_FOR_BUILD) -Wi -OO $(DESTDIR)$(LIBDEST)/compileall.py \
-               -j0 -d $(LIBDEST) -f \
-               -x 'bad_coding|badsyntax|site-packages|lib2to3/tests/data' \
-               $(DESTDIR)$(LIBDEST)
-       -PYTHONPATH=$(DESTDIR)$(LIBDEST) $(RUNSHARED) \
-               $(PYTHON_FOR_BUILD) -Wi $(DESTDIR)$(LIBDEST)/compileall.py \
-               -j0 -d $(LIBDEST)/site-packages -f \
-               -x badsyntax $(DESTDIR)$(LIBDEST)/site-packages
-       -PYTHONPATH=$(DESTDIR)$(LIBDEST) $(RUNSHARED) \
-               $(PYTHON_FOR_BUILD) -Wi -O $(DESTDIR)$(LIBDEST)/compileall.py \
-               -j0 -d $(LIBDEST)/site-packages -f \
-               -x badsyntax $(DESTDIR)$(LIBDEST)/site-packages
-       -PYTHONPATH=$(DESTDIR)$(LIBDEST) $(RUNSHARED) \
-               $(PYTHON_FOR_BUILD) -Wi -OO $(DESTDIR)$(LIBDEST)/compileall.py \
-               -j0 -d $(LIBDEST)/site-packages -f \
-               -x badsyntax $(DESTDIR)$(LIBDEST)/site-packages
        -PYTHONPATH=$(DESTDIR)$(LIBDEST) $(RUNSHARED) \
                $(PYTHON_FOR_BUILD) -m lib2to3.pgen2.driver $(DESTDIR)$(LIBDEST)/lib2to3/Grammar.txt
        -PYTHONPATH=$(DESTDIR)$(LIBDEST) $(RUNSHARED) \
portsize with pycsize without pyc
python3.7111 MiB flat, 18 MiB tzst49 MiB flat, 11 MiB tzst
python3.8116 MiB flat, 19 MiB tzst52 MiB flat, 11 MiB tzst
python3.9119 MiB flat, 19 MiB tzst51 MiB flat, 11 MiB tzst
python3.10121 MiB flat, 19 MiB tzst52 MiB flat, 12 MiB tzst
python3.11194 MiB flat, 24 MiB tzst58 MiB flat, 12 MiB tzst

Python 3.12 has not been ported yet. I apparently misread what I originally found about bytecode sizes there that I can no longer find, but Python 3.11's ballooned bytecode sizes are a known issue upstream; the mentioned "low-hanging fruit" should not reduce Python 3.12's bytecode size much I don't expect.

Existing upstream issue, still open, about (lack of) reproducibility. It's so bad that CPython's own tests aren't even deterministic.

Anyway if we want to generate those pyc files in target via a trigger, have you checked what debian is doing? imho their approach look sane (at least at quick read) and if we are not planning on doing the same kind of thing, then when pushing our own trigger we should state explicitly why we took a different path.

(The have 2 python scripts which are called by their trigger), it means our trigger will be stripped down to only calling those scripts. (except for the "cleanup" section, which we don't have here).

In D34739#896901, @bapt wrote:

Anyway if we want to generate those pyc files in target via a trigger, have you checked what debian is doing? imho their approach look sane (at least at quick read) and if we are not planning on doing the same kind of thing, then when pushing our own trigger we should state explicitly why we took a different path.

(The have 2 python scripts which are called by their trigger), it means our trigger will be stripped down to only calling those scripts. (except for the "cleanup" section, which we don't have here).

Debian's implementation is not suitable for us. Their whole Python build and packaging process is wrapped and customised to the n-th degree, which suits them, but not us. The only bit I can see us taking some inspiration from is their bare-bones alternate command sequence for bytecode removal, which runs before the removal transaction and queries pkgdb. Overall, their implementation is more Keywords/ than trigger.