diff --git a/AUTHORS b/AUTHORS index 5eff238ae413..f805a204ecb7 100644 --- a/AUTHORS +++ b/AUTHORS @@ -1,58 +1,58 @@ Authors of XZ Utils =================== XZ Utils is developed and maintained by Lasse Collin . Major parts of liblzma are based on code written by Igor Pavlov, specifically the LZMA SDK . Without this code, XZ Utils wouldn't exist. The SHA-256 implementation in liblzma is based on code written by Wei Dai in Crypto++ Library . A few scripts have been adapted from GNU gzip. The original versions were written by Jean-loup Gailly, Charles Levert, and Paul Eggert. Andrew Dudman helped adapting the scripts and their man pages for XZ Utils. The initial version of the threaded .xz decompressor was written by Sebastian Andrzej Siewior. The initial version of the .lz (lzip) decoder was written by Michał Górny. Architecture-specific CRC optimizations were contributed by - Ilya Kurdyukov, Hans Jansen, and Chenxi Mao. + Ilya Kurdyukov, Chenxi Mao, and Xi Ruoyao. Other authors: - Jonathan Nieder - Joachim Henke Special author: Jia Tan was a co-maintainer in 2022-2024. He and the team behind him inserted a backdoor (CVE-2024-3094) into XZ Utils 5.6.0 and 5.6.1 releases. He suddenly disappeared when this was discovered. Many people have contributed improvements or reported bugs. Most of these people are mentioned in the file THANKS. The translations of the command line tools and man pages have been contributed by many people via the Translation Project: - https://translationproject.org/domain/xz.html - https://translationproject.org/domain/xz-man.html The authors of the translated man pages are in the header comments of the man page files. In the source package, the authors of the translations are in po/*.po and po4a/*.po files. Third-party code whose authors aren't listed here: - GNU getopt_long() in the 'lib' directory is included for platforms that don't have a usable getopt_long(). - The build system files from GNU Autoconf, GNU Automake, GNU Libtool, GNU Gettext, Autoconf Archive, and related files. diff --git a/COPYING b/COPYING index aed21531497c..ef3371389d7d 100644 --- a/COPYING +++ b/COPYING @@ -1,83 +1,70 @@ XZ Utils Licensing ================== Different licenses apply to different files in this package. Here is a summary of which licenses apply to which parts of this package: - liblzma is under the BSD Zero Clause License (0BSD). - The command line tools xz, xzdec, lzmadec, and lzmainfo are under 0BSD except that, on systems that don't have a usable getopt_long, GNU getopt_long is compiled and linked in from the 'lib' directory. The getopt_long code is under GNU LGPLv2.1+. - The scripts to grep, diff, and view compressed files have been adapted from GNU gzip. These scripts (xzgrep, xzdiff, xzless, and xzmore) are under GNU GPLv2+. The man pages of the scripts are under 0BSD; they aren't based on the man pages of GNU gzip. - Most of the XZ Utils specific documentation that is in plain text files (like README, INSTALL, PACKAGERS, NEWS, and ChangeLog) are under 0BSD unless stated otherwise in the file itself. The files xz-file-format.txt and lzma-file-format.xt are in the public domain but may be distributed under the terms of 0BSD too. - Translated messages and man pages are under 0BSD except that some old translations are in the public domain. - Test files and test code in the 'tests' directory, and debugging utilities in the 'debug' directory are under the BSD Zero Clause License (0BSD). - The GNU Autotools based build system contains files that are under GNU GPLv2+, GNU GPLv3+, and a few permissive licenses. These files don't affect the licensing of the binaries being built. - The 'extra' directory contains files that are under various free software licenses. These aren't built or installed as part of XZ Utils. + The following command may be helpful in finding per-file license + information. It works on xz.git and on a clean file tree extracted + from a release tarball. + + sh build-aux/license-check.sh -v + For the files under the BSD Zero Clause License (0BSD), if a copyright notice is needed, the following is sufficient: Copyright (C) The XZ Utils authors and contributors If you copy significant amounts of 0BSD-licensed code from XZ Utils into your project, acknowledging this somewhere in your software is polite (especially if it is proprietary, non-free software), but it is not legally required by the license terms. Here is an example of a good notice to put into "about box" or into documentation: This software includes code from XZ Utils . The following license texts are included in the following files: - COPYING.0BSD: BSD Zero Clause License - COPYING.LGPLv2.1: GNU Lesser General Public License version 2.1 - COPYING.GPLv2: GNU General Public License version 2 - COPYING.GPLv3: GNU General Public License version 3 - A note about old XZ Utils releases: - - XZ Utils releases 5.4.6 and older and 5.5.1alpha have a - significant amount of code put into the public domain and - that obviously remains so. The switch from public domain to - 0BSD for newer releases was made in Febrary 2024 because - public domain has (real or perceived) legal ambiguities in - some jurisdictions. - - There is very little *practical* difference between public - domain and 0BSD. The main difference likely is that one - shouldn't claim that 0BSD-licensed code is in the public - domain; 0BSD-licensed code is copyrighted but available under - an extremely permissive license. Neither 0BSD nor public domain - require retaining or reproducing author, copyright holder, or - license notices when distributing the software. (Compare to, - for example, BSD 2-Clause "Simplified" License which does have - such requirements.) - If you have questions, don't hesitate to ask for more information. The contact information is in the README file. diff --git a/ChangeLog b/ChangeLog index 2d36d7bb1043..577dce5e12a2 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,13908 +1,17389 @@ -commit 9331ce4009ddc839f5191d234cc41b2d4797376d +commit a522a226545730551f7e7c2685fab27cf567746c Author: Lasse Collin -Date: 2024-10-01 12:21:22 +0300 +Date: 2025-04-03 14:34:43 +0300 - Bump version and soname for 5.6.3 + Bump version and soname for 5.8.1 src/liblzma/Makefile.am | 2 +- src/liblzma/api/lzma/version.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) -commit f52857ffde768058db0e0e13f68a2660ca9f1330 +commit 1c462c2ad86ff85766928638431029cd0b0dc995 Author: Lasse Collin -Date: 2024-10-01 12:17:39 +0300 +Date: 2025-04-03 14:34:43 +0300 - Add NEWS for 5.6.3 + Add NEWS for 5.8.1 - NEWS | 125 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - 1 file changed, 125 insertions(+) + NEWS | 30 ++++++++++++++++++++++++++++++ + 1 file changed, 30 insertions(+) -commit b8f52990b5d47a50902bf33cd2305ce985457bac +commit 513cabcf7f5ce1c3ed0619e791393fc53d1dbbd0 Author: Lasse Collin -Date: 2024-10-01 12:10:23 +0300 +Date: 2025-04-03 14:34:43 +0300 - Update THANKS + Tests: Call lzma_code() in smaller chunks in fuzz_common.h - (cherry picked from commit 1ebbe915d4e0d877154261b5f8103719a6722975) + This makes it easy to crash fuzz_decode_stream_mt when tested + against the code from 5.8.0. + + Obviously this might make it harder to reach some other code path now. + The previous code has been in use since 2018 when fuzzing was added + in 106d1a663d4b ("Tests: Add a fuzz test program and a config file + for OSS-Fuzz."). - THANKS | 2 ++ - 1 file changed, 2 insertions(+) + tests/ossfuzz/fuzz_common.h | 31 ++++++++++++++++++++++++------- + 1 file changed, 24 insertions(+), 7 deletions(-) -commit 51f6f455873911894f155e6997bc23a9be8f42ba +commit 48440e24a25911ae59e8518b67a1e0f6f1c293bf Author: Lasse Collin -Date: 2024-10-01 12:10:23 +0300 +Date: 2025-04-03 14:34:43 +0300 - Tests/Windows: Add the application manifest to the test programs - - This ensures that the test programs get executed the same way as - the binaries that are installed. + Tests: Add a fuzzing target for the multithreaded .xz decoder - (cherry picked from commit 74702ee00ecfd080d8ab11118cd25dbe6c437ec0) + It doesn't seem possible to trigger the CVE-2025-31115 bug with this + fuzzing target at the moment. It's because the code in fuzz_common.h + passes the whole input buffer to lzma_code() at once. - CMakeLists.txt | 14 ++++++++++---- - tests/Makefile.am | 10 ++++++++++ - tests/tests.cmake | 33 ++++++++++++++++++++++++++++++++- - tests/tests_w32res.rc | 18 ++++++++++++++++++ - 4 files changed, 70 insertions(+), 5 deletions(-) + tests/ossfuzz/fuzz_decode_stream_mt.c | 47 +++++++++++++++++++++++++++++++++++ + 1 file changed, 47 insertions(+) -commit bf518b9ba446327a062ddfe67e7e0a5baed2394f +commit 0c80045ab82c406858d9d5bcea9f48ebc3d0a81d Author: Lasse Collin -Date: 2024-10-01 12:10:23 +0300 +Date: 2025-04-03 14:34:42 +0300 - Windows: Embed an application manifest in the EXE files - - IMPORTANT: This includes a security fix to command line tool - argument handling. - - Some toolchains embed an application manifest by default to declare - UAC-compliance. Some also declare compatibility with Vista/8/8.1/10/11 - to let the app access features newer than those of Vista. - - We want all the above but also two more things: - - - Declare that the app is long path aware to support paths longer - than 259 characters (this may also require a registry change). - - - Force the code page to UTF-8. This allows the command line tools - to access files whose names contain characters that don't exist - in the current legacy code page (except unpaired surrogates). - The UTF-8 code page also fixes security issues in command line - argument handling which can be exploited with malicious filenames. - See the new file w32_application.manifest.comments.txt. + liblzma: mt dec: Fix lack of parallelization in single-shot decoding - Thanks to Orange Tsai and splitline from DEVCORE Research Team - for discovering this issue. - - Thanks to Vijay Sarvepalli for reporting the issue to me. + Single-shot decoding means calling lzma_code() by giving it the whole + input at once and enough output buffer space to store the uncompressed + data, and combining this with LZMA_FINISH and no timeout + (lzma_mt.timeout = 0). This way the file is decoded with a single + lzma_code() call if possible. - Thanks to Kelvin Lee for testing with MSVC and helping with - the required build system fixes. + The bug prevented the decoder from starting more than one worker thread + in single-shot mode. The issue was noticed when reviewing the code; + there are no bug reports. Thus maybe few have tried this mode. - (cherry picked from commit 46ee0061629fb075d61d83839e14dd193337af59) + Fixes: 64b6d496dc81 ("liblzma: Threaded decoder: Always wait for output if LZMA_FINISH is used.") - CMakeLists.txt | 18 +++ - src/Makefile.am | 4 +- - src/common/common_w32res.rc | 5 + - src/common/w32_application.manifest | 28 ++++ - src/common/w32_application.manifest.comments.txt | 178 +++++++++++++++++++++++ - 5 files changed, 232 insertions(+), 1 deletion(-) + src/liblzma/common/stream_decoder_mt.c | 11 +++++++++-- + 1 file changed, 9 insertions(+), 2 deletions(-) -commit 5718ce932e6ad4262d5fffc9e2a7a838f963d7e5 +commit 8188048854e8d11071b8a50d093c74f4c030acc9 Author: Lasse Collin -Date: 2024-09-29 14:46:52 +0300 +Date: 2025-04-03 14:34:42 +0300 - Windows: Set DLL name accurately in StringFileInfo on Cygwin and MSYS2 + liblzma: mt dec: Don't modify thr->in_size in the worker thread - Now the information in the "Details" tab in the file properties - dialog matches the naming convention of Cygwin and MSYS2. This - is only a cosmetic change. + Don't set thr->in_size = 0 when returning the thread to the stack of + available threads. Not only is it useless, but the main thread may + read the value in SEQ_BLOCK_THR_RUN. With valid inputs, it made + no difference if the main thread saw the original value or 0. With + invalid inputs (when worker thread stops early), thr->in_size was + no longer modified after the previous commit with the security fix + ("Don't free the input buffer too early"). - (cherry picked from commit dad153091552b52a41b95ec4981c6951f1cae487) + So while the bug appears harmless now, it's important to fix it because + the variable was being modified without proper locking. It's trivial + to fix because there is no need to change the value. Only main thread + needs to set the value in (in SEQ_BLOCK_THR_INIT) when starting a new + Block before the worker thread is activated. + + Fixes: 4cce3e27f529 ("liblzma: Add threaded .xz decompressor.") + Reviewed-by: Sebastian Andrzej Siewior + Thanks-to: Sam James - src/liblzma/liblzma_w32res.rc | 10 +++++++++- - 1 file changed, 9 insertions(+), 1 deletion(-) + src/liblzma/common/stream_decoder_mt.c | 6 ++++-- + 1 file changed, 4 insertions(+), 2 deletions(-) -commit e77c0ca61d12ebac433b7661840cb18d7031700a +commit d5a2ffe41bb77b918a8c96084885d4dbe4bf6480 Author: Lasse Collin -Date: 2024-09-25 15:47:55 +0300 +Date: 2025-04-03 14:34:42 +0300 - common_w32res.rc: White space edits + liblzma: mt dec: Don't free the input buffer too early (CVE-2025-31115) - LANGUAGE and VS_VERSION_INFO begin new statements so put an empty line - between them. + The input buffer must be valid as long as the main thread is writing + to the worker-specific input buffer. Fix it by making the worker + thread not free the buffer on errors and not return the worker thread to + the pool. The input buffer will be freed when threads_end() is called. - (cherry picked from commit 8940ecb96fe9f0f2a9cfb8b66fe9ed31ffbea904) - - src/common/common_w32res.rc | 15 ++++++++------- - 1 file changed, 8 insertions(+), 7 deletions(-) - -commit e0ba0f26d9f3f53cedc92fb13303924c39d00392 -Author: Lasse Collin -Date: 2024-09-28 20:09:50 +0300 - - CMake: Add the resource files to the Cygwin and MSYS2 builds + With invalid input, the bug could at least result in a crash. The + effects include heap use after free and writing to an address based + on the null pointer plus an offset. - Autotools-based build has always done this so this is for consistency. + The bug has been there since the first committed version of the threaded + decoder and thus affects versions from 5.3.3alpha to 5.8.0. - However, the CMake build won't create the DEF file when building - for Cygwin or MSYS2 because in that context it should be useless. - (If Cygwin or MSYS2 is used to host building of normal Windows - binaries then the DEF file is still created.) + As the commit message in 4cce3e27f529 says, I had made significant + changes on top of Sebastian's patch. This bug was indeed introduced + by my changes; it wasn't in Sebastian's version. + + Thanks to Harri K. Koskinen for discovering and reporting this issue. - (cherry picked from commit c3b9dad07d3fd9319f88386b7095019bcea45ce1) + Fixes: 4cce3e27f529 ("liblzma: Add threaded .xz decompressor.") + Reported-by: Harri K. Koskinen + Reviewed-by: Sebastian Andrzej Siewior + Thanks-to: Sam James - CMakeLists.txt | 16 ++++++++++------ - 1 file changed, 10 insertions(+), 6 deletions(-) + src/liblzma/common/stream_decoder_mt.c | 31 ++++++++++++++++++++++--------- + 1 file changed, 22 insertions(+), 9 deletions(-) -commit 69637d0c323c0d7d9619cff637c7ce97dabc4f02 +commit c0c835964dfaeb2513a3c0bdb642105152fe9f34 Author: Lasse Collin -Date: 2024-09-28 15:19:14 +0300 +Date: 2025-04-03 14:34:42 +0300 - CMake: Fix Windows resource file dependencies + liblzma: mt dec: Simplify by removing the THR_STOP state - If common_w32res.rc is modified, the resource files need to be rebuilt. - In contrast, the liblzma*.map files truly are link dependencies. + The main thread can directly set THR_IDLE in threads_stop() which is + called when errors are detected. threads_stop() won't return the stopped + threads to the pool or free the memory pointed by thr->in anymore, but + it doesn't matter because the existing workers won't be reused after + an error. The resources will be cleaned up when threads_end() is + called (reinitializing the decoder always calls threads_end()). - (cherry picked from commit da4f275bd1c18b897e5c2dd0043546de3accce0a) + Reviewed-by: Sebastian Andrzej Siewior + Thanks-to: Sam James - CMakeLists.txt | 17 +++++++++-------- - 1 file changed, 9 insertions(+), 8 deletions(-) + src/liblzma/common/stream_decoder_mt.c | 75 +++++++++++++--------------------- + 1 file changed, 29 insertions(+), 46 deletions(-) -commit af8533459c60d7bc5b55f2f516251af4572169e4 +commit 831b55b971cf579ee16a854f177c36b20d3c6999 Author: Lasse Collin -Date: 2024-09-29 01:20:03 +0300 +Date: 2025-04-03 14:34:42 +0300 - CMake: Checking for CYGWIN covers MSYS2 too - - On MSYS2, both CYGWIN and MSYS are set. + liblzma: mt dec: Fix a comment - (cherry picked from commit 1c673c0aac7f7dee8dda2c1140351c8417a71e47) + Reviewed-by: Sebastian Andrzej Siewior + Thanks-to: Sam James - CMakeLists.txt | 2 +- + src/liblzma/common/stream_decoder_mt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit eca08e4c204db404911e513f95110dcb0fb919bd +commit b9d168eee4fb6393b4fe207c0aeb5faee316ca1a Author: Lasse Collin -Date: 2024-09-28 09:37:30 +0300 +Date: 2025-04-03 14:34:30 +0300 - Translations: Add the SPDX license identifier to pt_BR.po - - (cherry picked from commit 6aaa0173b839e28429d43a8b62d257ad2f3b4521) + liblzma: Add assertions to lzma_bufcpy() - po/pt_BR.po | 2 ++ - 1 file changed, 2 insertions(+) + src/liblzma/common/common.c | 6 ++++++ + 1 file changed, 6 insertions(+) -commit 85801c96c32456300177fbbad1506b07f5dd0a47 +commit c8e0a4897b4d0f906966f5d4d4f662221d64f3ae Author: Lasse Collin -Date: 2024-09-25 16:41:37 +0300 +Date: 2025-04-02 16:40:22 +0300 - Windows/CMake: Use the correct resource file for lzmadec.exe - - CMakeLists.txt was using xzdec_w32res.rc for both xzdec and lzmadec. - - Fixes: 998d0b29536094a89cf385a3b894e157db1ccefe - (cherry picked from commit dc7b9f24b737e4e55bcbbdde6754883f991c2cfb) + DOS: Update Makefile to fix the build - CMakeLists.txt | 2 +- - 1 file changed, 1 insertion(+), 1 deletion(-) + dos/Makefile | 2 ++ + 1 file changed, 2 insertions(+) -commit a341d19c835a8c10fcf561b00b548c53af43381e +commit 307c02ed698a69763ef1c9c0df4ff24727442118 Author: Lasse Collin -Date: 2024-09-25 21:29:59 +0300 +Date: 2025-03-29 12:41:32 +0200 - Translations: Update the Brazilian Portuguese translation + sysdefs.h: Avoid even with C11 compilers + + Oracle Developer Studio 12.6 on Solaris 10 claims C11 support in + __STDC_VERSION__ and supports _Alignas. However, is missing. + We only need alignas, so define it to _Alignas with C11/C17 compilers. + If something included later, it shouldn't cause problems. - (cherry picked from commit b834ae5f80911a3819d6cdb484f61b257174c544) + Thanks to Ihsan Dogan for reporting the issue and testing the fix. + + Fixes: c0e7eaae8d6eef1e313c9d0da20ccf126ec61f38 - po/pt_BR.po | 144 ++++++++++++++++++++++-------------------------------------- - 1 file changed, 53 insertions(+), 91 deletions(-) + src/common/sysdefs.h | 4 +++- + 1 file changed, 3 insertions(+), 1 deletion(-) -commit e69c0b9b2e00ade984393ef9cabac57342072328 +commit 7ce38b318339d6c01378a77585e08169ca3a604e Author: Lasse Collin -Date: 2024-09-17 01:21:15 +0300 +Date: 2025-03-29 12:32:05 +0200 Update THANKS - - (cherry picked from commit eceb023d4c129fd63ee881a2d8696eaf52ad1532) THANKS | 1 + 1 file changed, 1 insertion(+) -commit aef9a25b3200457c16846b046222fb2c7967afe0 -Author: Tobias Stoeckmann -Date: 2024-09-16 23:19:46 +0200 +commit 688e51bde4c987589717b2be1a1fde9576c604fc +Author: Lasse Collin +Date: 2025-03-29 12:21:51 +0200 - lzmainfo: Avoid integer overflow - - The MB output can overflow with huge numbers. Most likely these are - invalid .lzma files anyway, but let's avoid garbage output. - - lzmadec was adapted from LZMA Utils. The original code with this bug - was written in 2005, over 19 years ago. - - Co-authored-by: Lasse Collin - Closes: https://github.com/tukaani-project/xz/pull/144 - (cherry picked from commit 76cfd0a9bb33ae8e534b1f73f6359dc825589f2f) + Translations: Update the Croatian translation - src/lzmainfo/lzmainfo.c | 5 ++--- - 1 file changed, 2 insertions(+), 3 deletions(-) + po/hr.po | 14 +++++++------- + 1 file changed, 7 insertions(+), 7 deletions(-) -commit 40a7f163f56aca6b3c8b83e9382f5e5cb4f8e93b -Author: Tobias Stoeckmann -Date: 2024-09-16 22:04:40 +0200 +commit 173fb5c68b08a8c1369550267be258132b7760c6 +Author: Lasse Collin +Date: 2025-03-25 18:23:57 +0200 - xzdec: Remove unused short option -M - - "xzdec -M123" exited with exit status 1 without printing - any messages. The "M:" entry should have been removed when - the memory usage limiter support was removed from xzdec. - - Fixes: 792331bdee706aa852a78b171040ebf814c6f3ae - Closes: https://github.com/tukaani-project/xz/pull/143 - [ Lasse: Commit message edits ] - - (cherry picked from commit 78355aebb7fb654302e5e33692ba109909dacaff) + doc/SHA256SUMS: Add 5.8.0 - src/xzdec/xzdec.c | 2 +- - 1 file changed, 1 insertion(+), 1 deletion(-) + doc/SHA256SUMS | 6 ++++++ + 1 file changed, 6 insertions(+) -commit c98714a57058ac381365c2ff1e1d1cd63a5742c4 +commit db9258e828bc2cd96e3954f1ddcc9d3530589025 Author: Lasse Collin -Date: 2024-09-10 13:54:47 +0300 +Date: 2025-03-25 15:18:32 +0200 - Update THANKS + Bump version and soname for 5.8.0 - (cherry picked from commit e5758db7bd75587a2499e0771907521a4aa86908) + Also remove the LZMA_UNSTABLE macro. - THANKS | 1 + - 1 file changed, 1 insertion(+) + src/liblzma/Makefile.am | 2 +- + src/liblzma/api/lzma/bcj.h | 2 -- + src/liblzma/api/lzma/version.h | 6 +++--- + src/liblzma/common/common.h | 2 -- + src/liblzma/liblzma_generic.map | 2 +- + src/liblzma/liblzma_linux.map | 2 +- + 6 files changed, 6 insertions(+), 10 deletions(-) -commit 4ed449517817b3659b35d19f39703e3c460f46c2 -Author: Firas Khalil Khana -Date: 2024-09-10 12:30:32 +0300 +commit bfb752a38f89ed03fc93d54f11c09f43fda64bc2 +Author: Lasse Collin +Date: 2025-03-25 15:18:32 +0200 - Build: Fix a typo in autogen.sh - - Fixes: e9be74f5b129fe8a5388d588e68b1b7f5168a310 - Closes: https://github.com/tukaani-project/xz/pull/141 - (cherry picked from commit 80ffa38f56657257ed4d90d76f6bd2f2bcb8163c) + Add NEWS for 5.8.0 - autogen.sh | 2 +- - 1 file changed, 1 insertion(+), 1 deletion(-) + NEWS | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + 1 file changed, 62 insertions(+) -commit 3b83577a1547e72cb78a905ad3d308a799ded485 +commit 6ccbb904da851eb0c174c8dbd43e84da31739720 Author: Lasse Collin -Date: 2024-09-02 20:08:40 +0300 +Date: 2025-03-25 15:18:31 +0200 - Translations: Update Chinese (simplified) translation - - Differences to the zh_CN.po file from the Translation Project: - - - Two uses of \v were fixed. + Translations: Run "make -C po update-po" - - Missing "OPTS" translation in --riscv[=OPTS] was copied from - previous lines. + POT-Creation-Date is set to match the timestamp in 5.7.2beta which + in the Translation Project is known as 5.8.0-pre1. The strings + haven't changed since 5.7.1alpha but a few comments have. + + This is a very noisy commit, but this helps keeping the PO files + similar between the Git repository and stable release tarballs. + + po/ca.po | 964 ++++++++++++++++++++++++++++++++++++++++++++--------------- + po/cs.po | 935 ++++++++++++++++++++++++++++++++++++++++++---------------- + po/da.po | 663 ++++++++++++++++++++++++++++++----------- + po/de.po | 7 +- + po/eo.po | 966 +++++++++++++++++++++++++++++++++++++++++++++--------------- + po/es.po | 7 +- + po/fi.po | 2 +- + po/fr.po | 916 +++++++++++++++++++++++++++++++++++++++++--------------- + po/hu.po | 966 +++++++++++++++++++++++++++++++++++++++++++++--------------- + po/ka.po | 7 +- + po/ko.po | 7 +- + po/nl.po | 7 +- + po/pl.po | 7 +- + po/pt_BR.po | 962 ++++++++++++++++++++++++++++++++++++++++++++--------------- + po/sr.po | 2 +- + po/sv.po | 7 +- + po/tr.po | 7 +- + po/uk.po | 7 +- + po/vi.po | 948 +++++++++++++++++++++++++++++++++++++++++++--------------- + po/zh_CN.po | 940 ++++++++++++++++++++++++++++++++++++++++++++-------------- + po/zh_TW.po | 2 +- + 21 files changed, 6209 insertions(+), 2120 deletions(-) + +commit 891a5f057a6bb2dd2e3ce5e3bdd7a1f1ee03b800 +Author: Lasse Collin +Date: 2025-03-25 15:18:31 +0200 + + Translations: Run po4a/update-po - - "make update-po" was run to remove line numbers from comments. + Also remove the trivial obsolete messages like man page dates. - (cherry picked from commit 68c54e45d042add64a4cb44bfc87ca74d29b87e2) + This is a noisy commit, but this helps keeping the PO files similar + between the Git repository and stable release tarballs. - po/zh_CN.po | 102 ++++++++++++++++++++++++------------------------------------ - 1 file changed, 40 insertions(+), 62 deletions(-) + po4a/fr.po | 82 +++++++++++++++++++++++++++++++++++++------------------ + po4a/pt_BR.po | 88 +++++++++++++++++++++++++++++++++++++++++------------------ + po4a/sr.po | 79 ++++++++++++++++++++++++++++++++++------------------- + 3 files changed, 167 insertions(+), 82 deletions(-) -commit 06f4c7edda0387eb6a2d6303804b59dcf4d3db1f +commit 4f52e7387012cb3510b01c937dd9b3a0c6a3ac6c Author: Lasse Collin -Date: 2024-09-02 19:40:50 +0300 +Date: 2025-03-25 15:18:31 +0200 - Translations: Update the Catalan translation - - Differences to the ca.po file from the Translation Project: - - - An overlong line translating --filters-help was wrapped. - - - "make update-po" was used to remove line numbers from the comments - to match the changes in fccebe2b4fd513488fc920e4dac32562ed3c7637 - and 093490b58271e9424ce38a7b1b38bcf61b9c86c6. xz.pot in the TP - is older than these commits. + Translations: Partially fix overtranslation in Serbian man pages - (cherry picked from commit 2230692aa1bcebb586100183831e3daf1714d60a) + Names of environment variables and some other strings must be present + in the original form. The translator couldn't be reached so I'm + changing some of the strings myself. In the "Robot mode" section, + occurrences in the middle of sentences weren't changed to reduce + the chance of grammar breakage, but I kept the translated strings in + parenthesis in the headings. It's not ideal, but now people shouldn't + need to look at the English man page to find the English strings. - po/ca.po | 171 ++++++++++++++++++++++++++------------------------------------- - 1 file changed, 69 insertions(+), 102 deletions(-) + po4a/sr.po | 66 ++++++++++++++++++++++++++++++++++++++++++-------------------- + 1 file changed, 45 insertions(+), 21 deletions(-) -commit 406cb5b669e47c0e45c98f1afb7be998084a93d0 +commit ff5d944749b99eb5ab35e2ebaf01d05a59e7169b Author: Lasse Collin -Date: 2024-08-22 11:01:07 +0300 +Date: 2025-03-25 15:18:31 +0200 - Update THANKS - - (cherry picked from commit 5e375987509fab484b7bef0b90be92f241c58c91) + liblzma: Count the extra bytes in LZMA/LZMA2 decoder memory usage - THANKS | 1 + - 1 file changed, 1 insertion(+) + src/liblzma/lz/lz_decoder.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) -commit 3a4a05d75eb41ddc41899324df0511670ceaaf1e -Author: Yifeng Li -Date: 2024-08-22 02:18:49 +0000 +commit 943b012d09f717f7b44284c4e4976ea41264c731 +Author: Lasse Collin +Date: 2025-03-25 15:18:31 +0200 - liblzma: Fix x86-64 movzw compatibility in range_decoder.h + liblzma: Use SSE2 intrinsics instead of memcpy() in dict_repeat() - Support for instruction "movzw" without suffix in "GNU as" was - added in commit [1] and stabilized in binutils 2.27, released - in August 2016. Earlier systems don't accept this instruction - without a suffix, making range_decoder.h's inline assembly - unable to build on old systems such as Ubuntu 16.04, creating - error messages like: + SSE2 is supported on every x86-64 processor. The SSE2 code is used on + 32-bit x86 if compiler options permit unconditional use of SSE2. - lzma_decoder.c: Assembler messages: - lzma_decoder.c:371: Error: no such instruction: `movzw 2(%r11),%esi' - lzma_decoder.c:373: Error: no such instruction: `movzw 4(%r11),%edi' - lzma_decoder.c:388: Error: no such instruction: `movzw 6(%r11),%edx' - lzma_decoder.c:398: Error: no such instruction: `movzw (%r11,%r14,4),%esi' + dict_repeat() copies short random-sized unaligned buffers. At least + on glibc, FreeBSD, and Windows (MSYS2, UCRT, MSVCRT), memcpy() is + clearly faster than byte-by-byte copying in this use case. Compared + to the memcpy() version, the new SSE2 version reduces decompression + time by 0-5 % depending on the machine and libc. It should never be + slower than the memcpy() version. - Change "movzw" to "movzwl" for compatibility. + However, on musl 1.2.5 on x86-64, the memcpy() version is the slowest. + Compared to the memcpy() version: - [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=c07315e0c610e0e3317b4c02266f81793df253d2 + - The byte-by-version takes 6-7 % less time to decompress. + - The SSE2 version takes 16-18 % less time to decompress. - Suggested-by: Lasse Collin - Tested-by: Yifeng Li - Signed-off-by: Yifeng Li - Fixes: 3182a330c1512cc1f5c87b5c5a272578e60a5158 - Fixes: https://github.com/tukaani-project/xz/issues/121 - Closes: https://github.com/tukaani-project/xz/pull/136 - (cherry picked from commit 6cd7c8607843c337edfe2c472aa316602a393754) + The numbers are from decompressing a Linux kernel source tarball in + single-threaded mode on older AMD and Intel systems. The tarball + compresses well, and thus dict_repeat() performance matters more + than with some other files. - src/liblzma/rangecoder/range_decoder.h | 24 ++++++++++++------------ - 1 file changed, 12 insertions(+), 12 deletions(-) + src/liblzma/lz/lz_decoder.c | 14 ++++++-- + src/liblzma/lz/lz_decoder.h | 87 ++++++++++++++++++++++++++++++++++++++++----- + 2 files changed, 90 insertions(+), 11 deletions(-) -commit 4669f06d1a8d31de4b8b5861b5e8afd82cacd721 +commit bc14e4c94e788d42eeab984298391fc0ca46f969 Author: Lasse Collin -Date: 2024-07-19 20:02:43 +0300 +Date: 2025-03-25 15:18:31 +0200 - Build: Comment that elf_aux_info(3) will be available on OpenBSD >= 7.6 + liblzma: Add "restrict" to a few functions in lz_decoder.h + + This doesn't make any difference in practice because compilers can + already see that writing through the dict->buf pointer cannot modify + the contents of *dict itself: The LZMA decoder makes a local copy of + the lzma_dict structure, and even if it didn't, the pointer to + lzma_dict in the LZMA decoder is already "restrict". - (cherry picked from commit bf901dee5d4c46609645e50311c0cb2dfdcf9738) + It's nice to add "restrict" anyway. uint8_t is typically unsigned char + which can alias anything. Without the above conditions or "restrict", + compilers could need to assume that writing through dict->buf might + modify *dict. This would matter in dict_repeat() because the loops + refer to dict->buf and dict->pos instead of making local copies of + those members for the duration of the loops. If compilers had to + assume that writing through dict->buf can affect *dict, then compilers + would need to emit code that reloads dict->buf and dict->pos after + every write through dict->buf. - CMakeLists.txt | 2 +- - configure.ac | 17 +++++++++++------ - 2 files changed, 12 insertions(+), 7 deletions(-) + src/liblzma/lz/lz_decoder.h | 7 ++++--- + 1 file changed, 4 insertions(+), 3 deletions(-) -commit 9edddda5636d7b3504a033c31e8ea763e293fd35 +commit e82ee090c567e560f51a056775a17f534d159d65 Author: Lasse Collin -Date: 2024-07-13 22:10:37 +0300 +Date: 2025-03-25 15:18:30 +0200 - liblzma: Tweak a comment + liblzma: Define LZ_DICT_INIT_POS for initial dictionary position - (cherry picked from commit 7c292dd0bf23cefcdf4b1509f3666322e08a7ede) + It's more readable. - src/liblzma/simple/arm64.c | 4 ++-- - 1 file changed, 2 insertions(+), 2 deletions(-) + src/liblzma/lz/lz_decoder.c | 4 ++-- + src/liblzma/lz/lz_decoder.h | 9 ++++++--- + 2 files changed, 8 insertions(+), 5 deletions(-) -commit 1a93ab55d1563f5eb9b2c1b8240384046fe4bb97 +commit 8e7cd0091e5239334437decbe1989662d45a2f47 Author: Lasse Collin -Date: 2024-07-11 22:17:56 +0300 +Date: 2025-03-25 15:18:30 +0200 - CMake: Bump maximum policy version to 3.30 + Windows: Update README-Windows.txt about UCRT - CMakeLists.txt | 2 +- - 1 file changed, 1 insertion(+), 1 deletion(-) + windows/README-Windows.txt | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) -commit cfe4465742ad2963fb0d9795e258615d7c1cf32d +commit 2c24292d341e505e5579fccac3bce5bc71d839ef Author: Lasse Collin -Date: 2024-07-09 14:27:51 +0300 +Date: 2025-03-25 15:18:15 +0200 Update THANKS - - (cherry picked from commit 028185dd4889e3d6235ff13560160ebca6985021) THANKS | 1 + 1 file changed, 1 insertion(+) -commit 0f47db18d04434203b350bde4909a5e468f197cc +commit 48053c90898fa191a216aefca01626520a7413f4 Author: Lasse Collin -Date: 2024-07-06 14:04:48 +0300 +Date: 2025-03-17 15:33:25 +0200 - xz: Remove the TODO comment about --recursive - - It won't be implemented. find + xargs is more flexible, for example, - it allows compressing small files in parallel. An example for that - has been included in the xz man page since 2010. - - (cherry picked from commit baecfa142644eb5f5c6dd6f8e2f531c362fa3747) + Translations: Update the Italian translation - src/xz/args.c | 1 - - 1 file changed, 1 deletion(-) + po/it.po | 32 ++++++++++++++++---------------- + 1 file changed, 16 insertions(+), 16 deletions(-) -commit 07f52c3528e43c4a925a3fc59a933c89f5604d92 +commit 8d6f06a65f50358fad13567f5dd8af41ef1d2b58 Author: Lasse Collin -Date: 2024-07-03 20:45:48 +0300 +Date: 2025-03-17 15:28:56 +0200 - CMake: Link xz against Threads::Threads if using pthreads + Translations: Update the Portuguese translation - The liblzma target was recently changed to link against Threads::Threads - with the PRIVATE keyword. I had forgotten that xz itself depends on - pthreads too due to pthread_sigmask(). Thus, the build broke when - building shared liblzma and pthread_sigmask() wasn't in libc. - - Thanks to Peter Seiderer for the bug report. - - Fixes: ac05f1b0d7cda1e7ae79775a8dfecc54601d7f1c - Fixes: https://github.com/tukaani-project/xz/issues/129#issuecomment-2204522994 - (cherry picked from commit b3e53122f42796aaebd767bab920cf7bedf69966) + The language tag in the Translation Project is pt, not pt_PT, + thus I changed the "Language:" line to pt. - CMakeLists.txt | 13 +++++++++++++ - 1 file changed, 13 insertions(+) + po/pt.po | 1045 +++++++++++++++++++++++++++++++------------------------------- + 1 file changed, 526 insertions(+), 519 deletions(-) -commit eccb4d258b01651d06a2a31b8b68be9b04b7998c +commit c3439b039f46fe547ad603e16dc3bd63c1ca9b0c Author: Lasse Collin -Date: 2024-07-02 22:49:33 +0300 +Date: 2025-03-14 13:02:21 +0200 - Update THANKS - - (cherry picked from commit 5742ec1fc7f2cf1c82cfe3477bb90594a4658374) + Translations: Update the Italian translation - THANKS | 1 + - 1 file changed, 1 insertion(+) + po/it.po | 1020 +++++++++++++++++++++++++++++++------------------------------- + 1 file changed, 516 insertions(+), 504 deletions(-) -commit c9bd00327f064778babb014302718a18d65cf7d3 -Author: Sam James -Date: 2024-06-28 14:18:35 +0300 +commit 79b4ab8d79528dd633a84df2d29e63f5d13ccbdf +Author: Lasse Collin +Date: 2025-03-12 20:48:39 +0200 - CI: Speed up Valgrind job by using --trace-children-skip-by-arg=... - - This addresses the issue I mentioned in - 6c095a98fbec70b790253a663173ecdb669108c4 and speeds up the Valgrind - job a bit, because non-xz tools aren't run unnecessarily with - Valgrind by the script tests. + Translations: Update the Italian man page translations - (cherry picked from commit 7e99856f66c07852c4e0de7aa01951e9147d86b0) + Only trivial additions but this keeps the file in sync with the TP. - .github/workflows/ci.yml | 2 +- - 1 file changed, 1 insertion(+), 1 deletion(-) + po4a/it.po | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) -commit 495de6ec9d7834c4ef4d5286844ef7b784eb951b +commit 515b6fc8557825e1335012b3b1c8cf71e2c38775 Author: Lasse Collin -Date: 2024-06-25 16:00:22 +0300 +Date: 2025-03-12 19:38:54 +0200 - Build: Prepend, not append, PTHREAD_CFLAGS to LIBS - - It shouldn't make any difference because LIBS should be empty - at that point in configure. But prepending is the correct way - because in general the libraries being added might require other - libraries that come later on the command line. - - (cherry picked from commit 2402e8a1ae92676fa0d4cb1b761d7f62f005c098) + Translations: Update the Italian man page translations - configure.ac | 2 +- - 1 file changed, 1 insertion(+), 1 deletion(-) + po4a/it.po | 129 ++++++++++++++++++++++++++++++++++++------------------------- + 1 file changed, 77 insertions(+), 52 deletions(-) -commit 55bf3f49a812e20a21e42323e39526bb31d9341a +commit 333b7c0b776295f0941269b4e6cdb1a0ba5f6218 Author: Lasse Collin -Date: 2024-06-25 14:24:29 +0300 +Date: 2025-03-10 21:00:31 +0200 - Build: Use AC_LINK_IFELSE to handle implicit function declarations - - It's more robust in case the compiler allows pre-C99 implicit function - declarations. If an x86 intrinsic is missing and gets treated as - implicit function, the linking step will very probably fail. This - isn't the only way to workaround implicit function declarations but - it might be the simplest and cleanest. - - The problem hasn't been observed in the wild. - - There are a couple more AC_COMPILE_IFELSE uses in configure.ac. - Of these, Landlock check calls prctl() and in theory could have - the same problem. In practice it doesn't as the check program - looks for several other things too. However, it was changed to - AC_LINK_IFELSE still to look more correct. - - Similarly, m4/tuklib_cpucores.m4 and m4/tuklib_physmem.m4 were - updated although they haven't given any trouble either. They - have worked all these years because those check programs rely - on specific headers and types: if headers or types are missing, - compilation will fail. Using the linker makes these checks more - similar to the ones in cmake/tuklib_*.cmake which always link. - - (cherry picked from commit 7bb46f2b7b3989c1b589a247a251470f65e91cda) + Translations: Update the Korean man page translations - configure.ac | 8 ++++++-- - m4/tuklib_cpucores.m4 | 8 ++++---- - m4/tuklib_physmem.m4 | 17 +++++++++++------ - 3 files changed, 21 insertions(+), 12 deletions(-) + po4a/ko.po | 139 +++++++++++++++++++++++++++++++++++-------------------------- + 1 file changed, 80 insertions(+), 59 deletions(-) -commit b45270d88f0de1b2e8bf510f0e370a5db4067e1f +commit ae52ebd27dc0be5e1ba62fb0c45255d8563fcd88 Author: Lasse Collin -Date: 2024-06-24 23:35:59 +0300 +Date: 2025-03-10 20:56:57 +0200 - Build: Use AC_LINK_IFELSE instead of -Werror - - AC_COMPILE_IFELSE needed -Werror because Clang <= 14 would merely - warn about the unsupported attribute and implicit function declaration. - Changing to AC_LINK_IFELSE handles the implicit declaration because - the symbol __crc32d is unlikely to exist in libc. + Translations: Update the German man page translations + + po4a/de.po | 102 ++++++++++++++++++++++++++++++++++++++----------------------- + 1 file changed, 63 insertions(+), 39 deletions(-) + +commit 1028e52c93d2292b44ff7bae8e721025d2f2c94d +Author: Lasse Collin +Date: 2025-03-10 13:13:30 +0200 + + CMake: Fix tuklib_use_system_extensions - Note that the other part of the check is that #include - must work. If the header is missing, most compilers give an error - and the linking step won't be attempted. + Revert back to a macro so that list(APPEND CMAKE_REQUIRED_DEFINITIONS) + will affect the calling scope. I had forgotten that while CMake + functions inherit the variables from the parent scope, the changes + to them are local unless using set(... PARENT_SCOPE). - Avoiding -Werror makes the check more robust in case CFLAGS contains - warning flags that break -Werror anyway (but this isn't the only check - in configure.ac that has this problem). Using AC_LINK_IFELSE also makes - the check more similar to how it is done in CMakeLists.txt. + This also means that the commit message in 5bb77d0920dc is wrong. The + commit itself is still fine, making it clearer that -DHAVE_SYS_PARAM_H + is only needed for specific check_c_source_compiles() calls. - (cherry picked from commit 35eb57355ad1c415a838d26192d5af84abb7cf39) + Fixes: c1ea7bd0b60eed6ebcdf9a713ca69034f6f07179 - configure.ac | 12 +----------- - 1 file changed, 1 insertion(+), 11 deletions(-) + cmake/tuklib_common.cmake | 7 +++++-- + 1 file changed, 5 insertions(+), 2 deletions(-) -commit 2c3e4cbbdcefe214ef3033a725049034b73e9756 +commit 80e48836024ec2d7cbd557575be6da3d1f055cba Author: Lasse Collin -Date: 2024-06-24 23:34:34 +0300 +Date: 2025-03-10 11:38:55 +0200 - Build: Sync the compile check changes from CMakeLists.txt + INSTALL: Document -bmaxdata on AIX - It's nice to keep these in sync. The use of main() will later allow - AC_LINK_IFELSE usage too which may avoid the more fragile -Werror. + This is based on a pull request and AIX docs. I haven't tested the + instructions myself. - (cherry picked from commit 5a728813c378cc3c4c9c95793762452418d08f1b) + Closes: https://github.com/tukaani-project/xz/pull/137 - configure.ac | 15 ++++++++------- - 1 file changed, 8 insertions(+), 7 deletions(-) + INSTALL | 5 +++++ + 1 file changed, 5 insertions(+) -commit 809e69f1f574dad3c9b00d4f01b9ef1a492319f3 +commit ab319186b6d0454285ff4941a777ac95e580f60f Author: Lasse Collin -Date: 2024-06-25 16:11:13 +0300 +Date: 2025-03-10 11:37:19 +0200 - CMake: Use configure_file() to copy a file - - I had missed this simpler method before. It does create a dependency - so that if .in.h changes the copying is done again. + Update THANKS + + THANKS | 1 + + 1 file changed, 1 insertion(+) + +commit 4434671a04436038f88ab0feaa251cc8d7abb683 +Author: Collin Funk +Date: 2025-03-09 19:14:31 -0700 + + tuklib_physmem: Silence -Wsign-conversion on AIX - (cherry picked from commit de215a0517645d16343f3a5336d3df884a4f665f) + Closes: https://github.com/tukaani-project/xz/pull/168 - CMakeLists.txt | 17 +++++++---------- - 1 file changed, 7 insertions(+), 10 deletions(-) + src/common/tuklib_physmem.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) -commit 52a8c87f37f4bd133f670722d2d4b73a74e352bc +commit 18bcaa4fafc935d89ffde94301fa6427907306bf Author: Lasse Collin -Date: 2024-06-25 15:51:48 +0300 +Date: 2025-03-09 22:10:38 +0200 - CMake: Always add pthread flags into CMAKE_REQUIRED_LIBRARIES - - It was weird to add CMAKE_THREAD_LIBS_INIT in CMAKE_REQUIRED_LIBRARIES - only if CLOCK_MONOTONIC is available. Alternative would be to remove - the thread libs from CMAKE_REQUIRED_LIBRARIES after the check for - pthread_condattr_setclock() but keeping the libs should be fine too. - Then it's ready in case more pthread functions were wanted some day. - - (cherry picked from commit e620f35097c0ad20cd76d8258750aa706758ced9) + Translations: Update the Romanian man page translations - CMakeLists.txt | 6 ++++-- - 1 file changed, 4 insertions(+), 2 deletions(-) + po4a/ro.po | 110 ++++++++++++++++++++++++++++++++++++------------------------- + 1 file changed, 66 insertions(+), 44 deletions(-) -commit 1591747bf692d10c3b2fd92c9dc8ba931626fd84 +commit 1e17b7f42fe2f9df279f44ad7043d3753cd00363 Author: Lasse Collin -Date: 2024-06-24 22:41:10 +0300 +Date: 2025-03-09 21:28:15 +0200 - CMake: Fix three checks if building with -flto - - In CMake, check_c_source_compiles() always links too. With - link-time optimization, unused functions may get omitted if - main() doesn't depend on them. Consider the following which - tries to check if somefunction() is available when - has been included: - - #include - int foo(void) { return somefunction(); } - int main(void) { return 0; } - - LTO may omit foo() completely because the program as a whole doesn't - need it and then the program will link even if the symbol somefunction - isn't available in libc or other library being linked in, and then - the test may pass when it shouldn't. - - What happens if doesn't declare somefunction()? - Shouldn't the test fail in the compilation phase already? It should - but many compilers don't follow the C99 and later standards that - prohibit implicit function declarations. Instead such compilers - assume that somefunction() exists, compilation succeeds (with a - warning), and then linker with LTO omits the call to somefunction(). - - Change the tests so that they are part of main(). If compiler accepts - implicitly declared functions, LTO cannot omit them because it has to - assume that they might have side effects and thus linking will fail. - On the other hand, if the functions/intrinsics being used are supported, - they might get optimized away but in that case it's fine because they - really are supported. - - It is fine to use __attribute__((target(...))) for main(). At least - it works with GCC 4.9 to 14.1 on x86-64. - - Reported-by: Sam James - (cherry picked from commit 114cba69dbb96003e676c8c87a2e9943b12d065f) + Translations: Update the Croatian translation - CMakeLists.txt | 19 ++++++++----------- - 1 file changed, 8 insertions(+), 11 deletions(-) + po/hr.po | 19 +++++++++++-------- + 1 file changed, 11 insertions(+), 8 deletions(-) -commit cc386f4ff4b87ff895fbc30fd3b13ee6e6152ace +commit ff85e6130d5940896915cdbb99aa9ece9d41240b Author: Lasse Collin -Date: 2024-06-24 21:06:18 +0300 +Date: 2025-03-09 21:23:34 +0200 - CMake: Improve the comment about LIBS - - (cherry picked from commit d3f20382fc1bd865eb70a65455d5022ed05caac8) + Translations: Update the Romanian translation - CMakeLists.txt | 6 ++++++ - 1 file changed, 6 insertions(+) + po/ro.po | 24 +++++++++++++----------- + 1 file changed, 13 insertions(+), 11 deletions(-) -commit 65aaa0f87048f78a3f69c4ec0ad03723a2354fa7 +commit a5bfb33f30f77e656723d365db8b06e089d3de61 Author: Lasse Collin -Date: 2024-06-24 17:39:54 +0300 +Date: 2025-03-09 21:11:34 +0200 - CI: Workaround buggy config.guess on Ubuntu 22.04LTS and 24.04LTS - - Check for the wrong triplet from config.guess and override it with - the --build option on the configure command line. Then i386 assembly - autodetection will work. - - These Ubuntu versions (and as of writing, also Debian unstable) - ship config.guess version 2022-01-09 which contains a bug that - was fixed in version 2022-05-08. It results in a wrong configure - triplet when using CC="gcc -m32" to build i386 binaries. - - Upstream fix: - https://git.savannah.gnu.org/cgit/config.git/commit/?id=f56a7140386d08a531bcfd444d632b28c61a6329 - - More information: - https://mail.gnu.org/archive/html/config-patches/2022-05/msg00003.html - - (cherry picked from commit 1bf83cded2955282fe1a868f08c83d4e5d6dca4a) + Translations: Update the Ukrainian man page translations - build-aux/ci_build.bash | 9 +++++++++ - 1 file changed, 9 insertions(+) + po4a/uk.po | 107 ++++++++++++++++++++++++++++++++++++------------------------- + 1 file changed, 64 insertions(+), 43 deletions(-) -commit 810f1a8aee9edb3bff430559f4b832cd0ec50797 +commit 5bb77d0920dcf949d8eb04eb19204b7b199e42df Author: Lasse Collin -Date: 2024-06-24 15:24:52 +0300 +Date: 2025-03-09 14:43:07 +0200 - CI: Use CC="gcc -m32" to get i386 compiler on x86-64 - - The old method put it in CFLAGS which is a wrong place because - config.guess doesn't read CFLAGS. + CMake: Use cmake_push_check_state in tuklib_cpucores and tuklib_physmem - (cherry picked from commit dbcdabf68fee9ed694b68c3a82e6adbeff20b679) + Now the changes to CMAKE_REQUIRED_DEFINITIONS are temporary and don't + leak to the calling code. - .github/workflows/ci.yml | 4 ++-- - 1 file changed, 2 insertions(+), 2 deletions(-) + cmake/tuklib_cpucores.cmake | 3 +++ + cmake/tuklib_physmem.cmake | 4 +++- + 2 files changed, 6 insertions(+), 1 deletion(-) -commit dde14ded9a3240fd524d9bc01c9ceeb4d7909e95 +commit c1ea7bd0b60eed6ebcdf9a713ca69034f6f07179 Author: Lasse Collin -Date: 2024-06-24 14:54:17 +0300 +Date: 2025-03-09 14:06:35 +0200 - CI: Let CMake use the CC environment variable + CMake: Revise tuklib_use_system_extensions - CC from environment is used to initialize CMAKE_C_COMPILER so - setting CMAKE_C_COMPILER explicitly isn't needed. + Define NetBSD and Darwin/macOS feature test macros. Autoconf defines + these too (and a few others). - The syntax in ci_build.bash was broken in case one wished to put - spaces in CC. + Define the macros on Windows except with MSVC. The _GNU_SOURCE macro + makes a difference with mingw-w64. - (cherry picked from commit 0c1e6d900bac127464fb30a854776e1810ab5f16) + Use a function instead of a macro. Don't take the TARGET_OR_ALL argument + because there's always global effect because the global variable + CMAKE_REQUIRED_DEFINITIONS is modified. - build-aux/ci_build.bash | 4 ---- - 1 file changed, 4 deletions(-) + CMakeLists.txt | 2 +- + cmake/tuklib_common.cmake | 27 +++++++++++++++------------ + 2 files changed, 16 insertions(+), 13 deletions(-) -commit 85a55e1120bebac2f3cd9af8965f4a6335eeeb9b +commit 4243c45a48ef8c103d77b75d9f93d48adcb631db Author: Lasse Collin -Date: 2024-06-20 18:12:21 +0300 +Date: 2025-03-08 14:54:29 +0200 - CMake: Keep existing options in LIBS when adding -lrt - - This makes no difference yet because -lrt is currently the only option - that might be added to LIBS. + doc/SHA256SUMS: Add 5.7.2beta + + doc/SHA256SUMS | 3 +++ + 1 file changed, 3 insertions(+) + +commit cc7f2fc1cf9f3c63cbce90ee92bfbb004f98140b +Author: Lasse Collin +Date: 2025-03-08 14:29:57 +0200 + + Bump version and soname for 5.7.2beta + + src/liblzma/Makefile.am | 2 +- + src/liblzma/api/lzma/version.h | 4 ++-- + src/liblzma/liblzma_generic.map | 2 +- + src/liblzma/liblzma_linux.map | 2 +- + 4 files changed, 5 insertions(+), 5 deletions(-) + +commit 62e44b36167de27541776dcf677ed04077c9fd19 +Author: Lasse Collin +Date: 2025-03-08 14:24:38 +0200 + + Add NEWS for 5.7.2beta + + NEWS | 35 +++++++++++++++++++++++++++++++++++ + 1 file changed, 35 insertions(+) + +commit 70f1f203789433b5d7b8b22e1655abc465d659f7 +Author: Lasse Collin +Date: 2025-03-08 14:23:00 +0200 + + COPYING: Remove the note about old releases + + COPYING | 19 ------------------- + 1 file changed, 19 deletions(-) + +commit db9827dc38ff79de747a6fc7a99619e961dbc5e6 +Author: Lasse Collin +Date: 2025-03-08 14:22:28 +0200 + + xz: Update the man page about the environment variables again + + src/xz/xz.1 | 22 +++++++++++----------- + 1 file changed, 11 insertions(+), 11 deletions(-) + +commit 99c584891bd1d946561cebded2226df9b83f1efb +Author: Lasse Collin +Date: 2025-03-06 19:26:09 +0200 + + liblzma: Edit spelling in a comment - (cherry picked from commit 75ce4797d49621710e6da95d8cb91541028c6d68) + It was found with codespell. - CMakeLists.txt | 2 +- + src/liblzma/api/lzma/container.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit e24a762f1be6bf379df73b7fe0a115ccae139a35 +commit 7a234c8c05a8f64efde013cd6a6d31a90b7d0d28 Author: Lasse Collin -Date: 2024-06-15 18:07:04 +0300 +Date: 2025-03-06 19:14:23 +0200 - CMake: Fix indentation + xz: Update the man page about the environment variables + + src/xz/xz.1 | 26 ++++++++++++++++++++++++-- + 1 file changed, 24 insertions(+), 2 deletions(-) + +commit 808f05af3ef40730d40b3798666757bd866484f1 +Author: Lasse Collin +Date: 2025-03-06 17:37:39 +0200 + + Docs: Add a few TRANSLATORS comments to man pages - (cherry picked from commit c715dec8e800b65145918cfb0ee9bbc90faa8aad) + All translators know that --command-line-options must not be translated. + With some other strings it's not obvious when the untranslated string + must be preserved. These comments hopefully help. - CMakeLists.txt | 2 +- - 1 file changed, 1 insertion(+), 1 deletion(-) + src/scripts/xzmore.1 | 2 ++ + src/xz/xz.1 | 22 ++++++++++++++++++++++ + 2 files changed, 24 insertions(+) -commit 99555b721b55263a6892b1093f2806f09a92e1fb +commit 051de255f00dda331e2a6fa189a6e7fe56a7c69b Author: Lasse Collin -Date: 2024-06-15 23:34:29 +0300 +Date: 2025-03-06 16:34:32 +0200 - CMake: Link Threads::Threads as PRIVATE to liblzma + Scripts: Mark the LZMA Utils script aliases as deprecated - This way pthread options aren't passed to the linker when linking - against shared liblzma but they are still passed when linking against - static liblzma. (Also, one never needs the include path of the - threading library to use liblzma since liblzma's API headers - don't #include . But tends to be in the - default include path so here this change makes no difference.) + The deprecated aliases are lzcmp, lzdiff, lzless, lzmore, + lzgrep, lzegrep, and lzfgrep. The commands that start with + the xz prefix have identical behavior, for example, both + lzgrep and xzgrep handle all supported file formats. - One cannot mix target_link_libraries() calls that use the scope - (PRIVATE, PUBLIC, or INTERFACE) keyword and calls that don't use it. - The calls without the keyword are like PUBLIC except perhaps when - they aren't, or something like that... It seems best to always - specify a scope keyword as the meanings of those three keywords - at least are clear. + This doesn't affect lzma, unlzma, lzcat, lzmadec, or lzmainfo. + The last release of LZMA Utils was made in 2008, but the lzma + compatibility alias for the gzip-like tool is still in common use. + Deprecating it would cause unnecessary breakage. + + src/scripts/xzdiff.1 | 5 ++++- + src/scripts/xzgrep.1 | 6 +++++- + src/scripts/xzless.1 | 4 +++- + src/scripts/xzmore.1 | 4 +++- + 4 files changed, 15 insertions(+), 4 deletions(-) + +commit 4941ea454c02cf15a64d6434a0778fc2a81282fc +Author: Lasse Collin +Date: 2025-03-02 21:13:04 +0200 + + Translations: Add Serbian man page translations + + po4a/po4a.conf | 2 +- + po4a/sr.po | 3892 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + 2 files changed, 3893 insertions(+), 1 deletion(-) + +commit d142d96f24daa451edaabfca8594e202932b3c0b +Author: Lasse Collin +Date: 2025-03-02 20:42:14 +0200 + + Translations: Update Georgian translation + + po/ka.po | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +commit 9b7e45d841195c8fd8d286e26f810df28c53dd16 +Author: Lasse Collin +Date: 2025-02-28 21:07:21 +0200 + + Update THANKS + + THANKS | 1 + + 1 file changed, 1 insertion(+) + +commit 9351592710e0df3238b09d39c545a643c50ac88f +Author: Lasse Collin +Date: 2025-02-22 16:04:58 +0200 + + Update THANKS + + THANKS | 1 + + 1 file changed, 1 insertion(+) + +commit 9023be7831faca2f28def55e16c39e3a42e1e262 +Author: Lasse Collin +Date: 2025-02-19 16:33:52 +0200 + + Translations: Update the Croatian translation + + po/hr.po | 6 +++--- + 1 file changed, 3 insertions(+), 3 deletions(-) + +commit 2eaf242c56e8c65db83d48b018fa44aeafeb33a5 +Author: Lasse Collin +Date: 2025-02-17 21:46:15 +0200 + + Build: Fix out-of-tree builds when using the replacement getopt_long + + Nowaways $(top_builddir)/lib/getopt.h depends on headers in + $(top_srcdir)/lib, so both have to be in the include path. + CMake-based build already did this. + + Fixes: 7e884c00d0093c38339f17fb1d280eec493f42ca + + src/lzmainfo/Makefile.am | 6 ++++-- + src/xz/Makefile.am | 6 ++++-- + src/xzdec/Makefile.am | 6 ++++-- + 3 files changed, 12 insertions(+), 6 deletions(-) + +commit 41322b2c60cd2c67a1053cb40d27e573420185b7 +Author: Lasse Collin +Date: 2025-02-17 18:25:52 +0200 + + m4/getopt.m4: Remove an outdated comment + + m4/getopt.m4 | 3 --- + 1 file changed, 3 deletions(-) + +commit 03c23a4952bce1b50a1d213ca2d1c15acd76a489 +Author: Lasse Collin +Date: 2025-02-17 18:11:58 +0200 + + Build: Allow forcing the use of the replacement getopt_long + + Now one can pass gl_replace_getopt=yes to configure to force the use + of GNU getopt_long from the lib directory. This only checks that the + value of gl_replace_getopt is non-empty, so one cannot force the + replacement to be disabled. + + Closes: https://github.com/tukaani-project/xz/pull/166 + + m4/getopt.m4 | 5 +++-- + 1 file changed, 3 insertions(+), 2 deletions(-) + +commit c23b837d15960ecc0d537f0260f389904e1e7f02 +Author: Lasse Collin +Date: 2025-02-17 18:11:42 +0200 + + Update THANKS + + THANKS | 1 + + 1 file changed, 1 insertion(+) + +commit 2672a38f1159babf9ba3cca429f644bb823a8bdd +Author: Lasse Collin +Date: 2025-02-12 19:23:31 +0200 + + Update THANKS + + THANKS | 2 ++ + 1 file changed, 2 insertions(+) + +commit 4fdcbfaf3f222299747c6a815762a74eeb1b0b23 +Author: Lasse Collin +Date: 2025-02-11 12:13:41 +0200 + + Update THANKS + + THANKS | 3 +++ + 1 file changed, 3 insertions(+) + +commit 0d553568f1af9a35779ecac41392a6c871786930 +Author: Lasse Collin +Date: 2025-02-08 11:39:08 +0200 + + Translations: Update the Polish translation + + po/pl.po | 802 ++++++++++++++++++++++++++++++++++++--------------------------- + 1 file changed, 464 insertions(+), 338 deletions(-) + +commit 9f165076aebb3b5115d2b6520529db8fa11a6bdd +Author: Lasse Collin +Date: 2025-02-07 19:12:03 +0200 + + Docs: Update TODO a little + + TODO | 22 ++++------------------ + 1 file changed, 4 insertions(+), 18 deletions(-) + +commit f5aa292c534f87b9dd588e667d1c65ed31e5f289 +Author: Lasse Collin +Date: 2025-02-07 18:50:56 +0200 + + Add researcher credits of CVE-2022-1271 and CVE-2024-47611 to THANKS + + These are specific phrases that were included in the advisories and + NEWS. It's nice to have them in THANKS as well. + + THANKS | 4 ++++ + 1 file changed, 4 insertions(+) + +commit 7cf463b5add70e3fb48a10de3965c8beb6c01ad9 +Author: Lasse Collin +Date: 2025-02-07 18:43:00 +0200 + + Update THANKS + + THANKS | 5 +++++ + 1 file changed, 5 insertions(+) + +commit 6b7fe7e27b77038592e2c2e31df955059dda7d1d +Author: Lasse Collin +Date: 2025-02-04 14:12:46 +0200 + + Docs: Update the "Translations" section in README + + Make it clearer that translations cannot be accepted if they don't + come via the Translation Project. + + Column headings have been handled automatically for years and now --help + is autowrapped too, so the related instructions can be removed. + + README | 107 ++++++++++++++++++++++++----------------------------------------- + 1 file changed, 39 insertions(+), 68 deletions(-) + +commit 2c7aee94936babf84b61b55420e503a0b2629ec1 +Author: Lasse Collin +Date: 2025-02-04 13:23:53 +0200 + + debug/translations.bash: Revise a little + + Make it work for out-of-tree builds without requiring one to specify + the location of the xz executable. + + Add xz --filters-help. + + Make the output shorter by reducing the number of xz -lvv test files. + + Show the value of LANGUAGE environment variable. + + Show the xz.git version using git describe --abbrev=8 instead of =4. + + debug/translation.bash | 24 +++++++++++------------- + 1 file changed, 11 insertions(+), 13 deletions(-) + +commit c6b15e7045209002bbbf4979c48072af01c20d8d +Author: Lasse Collin +Date: 2025-02-04 13:20:52 +0200 + + Build: Use "git describe --abbrev=8" in snapshot tarball names + + 8 is more likely to be reproducible than the old 4 without being + excessively long for a small repository like this. + + Makefile.am | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit 0ce97987c5b27cfb6f98984e5fd7477880e0cf33 +Author: Lasse Collin +Date: 2025-02-04 19:37:17 +0200 + + Update THANKS + + THANKS | 2 ++ + 1 file changed, 2 insertions(+) + +commit 353c33355cb12e5016d49052fd1e90d15568aa37 +Author: Lasse Collin +Date: 2025-02-03 16:29:31 +0200 + + Translations: Update the Serbian translation + + po/sr.po | 805 ++++++++++++++++++++++++++++++++++++--------------------------- + 1 file changed, 458 insertions(+), 347 deletions(-) + +commit 887dc281885052bced32b3aa309506ea58a2e78e +Author: Lasse Collin +Date: 2025-02-03 16:15:38 +0200 + + Translations: Update Chinese (traditional) translation + + Since there are no spaces between words, the unsophisticated automatic + word wrapping code needs some help. Compared to the version in the + Translation Project, I added a few \t characters which the word + wrapping code interprets as zero width spaces (hopefully they are + placed correctly). These edits can be seen with this command: + + grep -v ^# po/zh_TW.po | grep --color -F '\t' + + po/zh_TW.po | 843 +++++++++++++++++++++++++++++++++--------------------------- + 1 file changed, 471 insertions(+), 372 deletions(-) + +commit 0f1454cf5f460a4095f47f8f73f5a290e9777d7f +Author: Lasse Collin +Date: 2025-02-03 16:12:44 +0200 + + Update THANKS + + THANKS | 2 ++ + 1 file changed, 2 insertions(+) + +commit 23ea031820086d302a213be005a091df763b8a7b +Author: Lasse Collin +Date: 2025-02-02 14:15:07 +0200 + + Build: Update posix-shell.m4 from Gnulib + + Tabs have been converted to spaces and a "serial" number has been + added. The previous version was from 2008/2009. There are no functional + changes since then but now it's clearer that the copy in XZ Utils + isn't outdated. + + The new file was picked from the Gnulib commit + 81a4c1e3b7692e95c0806d948cbab9148ad85ef2. A later commit adds + a warranty disclaimer to the license, which obviously is fine, + but I didn't find a SPDX license identifier for the new license, + so for simplicity I used the earlier commit. + + m4/posix-shell.m4 | 31 ++++++++++++++++--------------- + 1 file changed, 16 insertions(+), 15 deletions(-) + +commit 84c33c0384aa4604ff7956f2fae6f83ea60ba96b +Author: Lasse Collin +Date: 2025-02-02 12:51:03 +0200 + + Build: Check for -fsanitize= also in $CC + + People may put -fsanitize in CC instead of CFLAGS so check both. + Landlock sandbox isn't compatible with sanitizers so it's nice + to catch the incompatible options at configure time. + + Don't attempt to do the same in CMakeLists.txt; the check for + CMAKE_C_FLAGS / CFLAGS shall be enough there. The extra flags from + the CC environment variable go into the undocumented internal variable + CMAKE_C_COMPILER_ARG1 (all flags from CC go into that same variable). + Peeking the internal variable merely for improved diagnostics isn't + worth it. + + Fixes: 88588b1246d8c26ffbc138b3e5c413c5f14c3179 + + configure.ac | 5 +++-- + 1 file changed, 3 insertions(+), 2 deletions(-) + +commit a7304ea4a7daede9789a8fe422b714e372737120 +Author: Lasse Collin +Date: 2023-09-26 19:11:20 +0300 + + Build: Remove the FIXME about -Werror checks + + configure.ac | 7 ------- + 1 file changed, 7 deletions(-) + +commit 1780bba74075da5e7764615bd323e95e19057dee +Author: Lasse Collin +Date: 2023-09-26 19:10:51 +0300 + + Build: If using a GCC compatible compiler, ensure that -Werror works + + The check can be skipped by passing SKIP_WERROR_CHECK=yes to configure. + It won't be documented anywhere else than in the error message. + + Ways to test: + + ./configure CC=gcc CFLAGS=-Wunused-macros + ./configure CC=clang CFLAGS=-Weverything + ./configure CC=clang CFLAGS=-Weverything SKIP_WERROR_CHECK=yes + + configure.ac | 26 ++++++++++++++++++++++++++ + 1 file changed, 26 insertions(+) + +commit 3aca2daefbdedd7cc0fb75ddde6b714273b1cc1d +Author: Lasse Collin +Date: 2025-02-02 14:30:15 +0200 + + Update THANKS + + THANKS | 4 ++++ + 1 file changed, 4 insertions(+) + +commit 186ff78ab40ceb07cde139506cab42a927ca99d2 +Author: Lasse Collin +Date: 2025-02-01 12:49:09 +0200 + + Translations: Update Romanian translation + + po/ro.po | 12 ++++++------ + 1 file changed, 6 insertions(+), 6 deletions(-) + +commit 40a8ce3e10747ca5233610cc2cb704fc303c48e4 +Author: Lasse Collin +Date: 2025-01-30 18:16:43 +0200 + + Translations: Update Korean man page translations + + po4a/ko.po | 146 ++++++++++++++++++++++++------------------------------------- + 1 file changed, 56 insertions(+), 90 deletions(-) + +commit 1787f9bd18ea8798d64b636cdefe6d0fda9b8f72 +Author: Lasse Collin +Date: 2025-01-30 18:15:52 +0200 + + Translations: Add Italian man page translations + + po4a/it.po | 3876 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + po4a/po4a.conf | 2 +- + 2 files changed, 3877 insertions(+), 1 deletion(-) + +commit 9b9182e561787a811fc0178489589f28c3e0174c +Author: Lasse Collin +Date: 2025-01-29 22:18:29 +0200 + + Translations: Update the Finnish translation + + po/fi.po | 13 +++++++------ + 1 file changed, 7 insertions(+), 6 deletions(-) + +commit 7d73ff7a9d8eab6270f0b1ff7d10c0aa6f5ba53f +Author: Lasse Collin +Date: 2025-01-29 20:50:03 +0200 + + lzmainfo: Use tuklib_mbstr_wrap for --help text + + Some languages have so long strings that they need to be wrapped. + + CMakeLists.txt | 4 ++++ + src/lzmainfo/Makefile.am | 2 ++ + src/lzmainfo/lzmainfo.c | 36 ++++++++++++++++++++++++++---------- + 3 files changed, 32 insertions(+), 10 deletions(-) + +commit c56eb4707627d700695813fccdddd1483eac4f21 +Author: Lasse Collin +Date: 2025-01-29 20:00:06 +0200 + + Translations: Update the Croatian translation + + po/hr.po | 926 ++++++++++++++++++++++++++++++++++++--------------------------- + 1 file changed, 529 insertions(+), 397 deletions(-) + +commit 69f4aec0a2442ab81f9ab66e5871a6546aefb0fc +Author: Lasse Collin +Date: 2025-01-29 19:56:01 +0200 + + Translations: Update the Finnish translation + + po/fi.po | 911 +++++++++++++++++++++++++++++++++------------------------------ + 1 file changed, 483 insertions(+), 428 deletions(-) + +commit d49dde33cf5f488bb38b1f57e172c4e3343fb383 +Author: Lasse Collin +Date: 2025-01-29 19:55:27 +0200 + + Translations: Update the German man page translations + + po4a/de.po | 147 +++++++++++++++++++++++-------------------------------------- + 1 file changed, 55 insertions(+), 92 deletions(-) + +commit 23b99fc4a1f35bec5d63ffd02b14cacbdce9fe3c +Author: Lasse Collin +Date: 2025-01-29 19:55:17 +0200 + + Translations: Update the German translation + + po/de.po | 825 +++++++++++++++++++++++++++++++++++---------------------------- + 1 file changed, 460 insertions(+), 365 deletions(-) + +commit 7edab2bde0606b42229d9c04fe664069e38de3fb +Author: Lasse Collin +Date: 2025-01-29 19:55:05 +0200 + + Translations: Update the Turkish translation + + po/tr.po | 892 +++++++++++++++++++++++++++++++++++---------------------------- + 1 file changed, 490 insertions(+), 402 deletions(-) + +commit fac4d0fa5277d7a1f621707621ee9516f0bdbac5 +Author: Lasse Collin +Date: 2025-01-29 19:54:36 +0200 + + Translations: Add the Dutch translation + + po/LINGUAS | 1 + + po/nl.po | 1268 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + 2 files changed, 1269 insertions(+) + +commit abe5092f24b55dde9f7f78fac1bf810bce173273 +Author: Lasse Collin +Date: 2025-01-29 19:53:50 +0200 + + Translations: Update the Georgian translation + + po/ka.po | 153 +++++++++++++++++++++++++++++++++++++++++++++++---------------- + 1 file changed, 115 insertions(+), 38 deletions(-) + +commit b97b23c78d8100eec363c3e999c511560366d347 +Author: Lasse Collin +Date: 2025-01-29 19:53:21 +0200 + + Translations: Update the Spanish translation + + po/es.po | 824 ++++++++++++++++++++++++++++++++++----------------------------- + 1 file changed, 450 insertions(+), 374 deletions(-) + +commit c68318cb49e0562bd22e88724ce85e76c6789a3a +Author: Lasse Collin +Date: 2025-01-29 19:53:06 +0200 + + Translations: Update the Korean translation + + po/ko.po | 785 +++++++++++++++++++++++++++++++++++++-------------------------- + 1 file changed, 460 insertions(+), 325 deletions(-) + +commit 153ee17f635962a474499f786ea1de1e1a2bb276 +Author: Lasse Collin +Date: 2025-01-29 19:52:42 +0200 + + Translations: Update the Romanian man page translations + + po4a/ro.po | 141 +++++++++++++++++++++++-------------------------------------- + 1 file changed, 54 insertions(+), 87 deletions(-) + +commit 6ed308197e1f9d6c7a5cfe5aae301e75544017c4 +Author: Lasse Collin +Date: 2025-01-29 19:51:59 +0200 + + Translations: Update the Romanian translation + + po/ro.po | 818 +++++++++++++++++++++++++++++++++++---------------------------- + 1 file changed, 461 insertions(+), 357 deletions(-) + +commit 06028803e19219f642aa9abddd3525c43594ec6c +Author: Lasse Collin +Date: 2025-01-29 19:50:50 +0200 + + Translations: Update the Ukrainian man page translations + + po4a/uk.po | 142 +++++++++++++++++++++++-------------------------------------- + 1 file changed, 54 insertions(+), 88 deletions(-) + +commit 8cbaf896a65a53c1d1e7e2ffc80d6ea216b1e8df +Author: Lasse Collin +Date: 2025-01-29 19:50:26 +0200 + + Translations: Update the Ukrainian translation + + po/uk.po | 813 ++++++++++++++++++++++++++++++++++++--------------------------- + 1 file changed, 460 insertions(+), 353 deletions(-) + +commit 81c352907b8048b97d9868947026701a49f377ef +Author: Lasse Collin +Date: 2025-01-29 19:48:43 +0200 + + Translations: Update the Swedish translation + + po/sv.po | 847 ++++++++++++++++++++++++++++++++++----------------------------- + 1 file changed, 462 insertions(+), 385 deletions(-) + +commit 999ce263718a52ba74245c3e2a416ab11494d1b1 +Author: Lasse Collin +Date: 2025-01-28 16:33:32 +0200 + + tuklib_physmem: Clean up disabled code + + src/common/tuklib_physmem.c | 9 +-------- + 1 file changed, 1 insertion(+), 8 deletions(-) + +commit 4d7e7c9d94f7a5ad4931a5bbd6ed9d00173fa1ab +Author: Lasse Collin +Date: 2025-01-28 16:28:18 +0200 + + Windows: Avoid an error message on broken pipe + + Also make xz not process more input files after a broken pipe has + been detected. This matches the behavior on POSIX. If all files + are being written to standard output, trying with the next file is + pointless when it's known that standard output won't accept more data. + + xzdec already stopped after the first error. It does so with all + errors, so it differs from xz: + + $ xz -dc not_found_1 not_found_2 + xz: not_found_1: No such file or directory + xz: not_found_2: No such file or directory + + $ xzdec not_found_1 not_found_2 + xzdec: not_found_1: No such file or directory + + Reported-by: Vincent Torri + + src/xz/file_io.c | 13 +++++++++++++ + src/xzdec/xzdec.c | 11 ++++++++++- + 2 files changed, 23 insertions(+), 1 deletion(-) + +commit 95b638480aa8203e547c709c651f421c22db1718 +Author: Lasse Collin +Date: 2025-01-23 19:59:17 +0200 + + doc/SHA256SUMS: Add 5.6.4 and 5.7.1alpha + + doc/SHA256SUMS | 9 +++++++++ + 1 file changed, 9 insertions(+) + +commit cdae0df31e4c2dfb1e885941cd1998e5a2b6e39d +Author: Lasse Collin +Date: 2025-01-23 11:50:42 +0200 + + Bump version and soname for 5.7.1alpha + + src/liblzma/Makefile.am | 2 +- + src/liblzma/api/lzma/version.h | 2 +- + src/liblzma/liblzma_generic.map | 2 +- + src/liblzma/liblzma_linux.map | 2 +- + 4 files changed, 4 insertions(+), 4 deletions(-) + +commit 4d2af2c43bae25ef4ef9cd88304471d4859aa322 +Author: Lasse Collin +Date: 2025-01-23 11:48:43 +0200 + + Translations: Run po4a/update-po + + po4a/de.po | 64 +++++++++++++++++++++++++++++++++++++++++++++++++---------- + po4a/fr.po | 57 +++++++++++++++++++++++++++++++++++++++++++++++----- + po4a/ko.po | 64 +++++++++++++++++++++++++++++++++++++++++++++++++---------- + po4a/pt_BR.po | 57 +++++++++++++++++++++++++++++++++++++++++++++++----- + po4a/ro.po | 64 +++++++++++++++++++++++++++++++++++++++++++++++++---------- + po4a/uk.po | 64 +++++++++++++++++++++++++++++++++++++++++++++++++---------- + 6 files changed, 320 insertions(+), 50 deletions(-) + +commit ff0b825505e60e21b32e33c42f551c8f34ba393f +Author: Lasse Collin +Date: 2025-01-23 11:40:46 +0200 + + Add NEWS for 5.7.1alpha + + NEWS | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + 1 file changed, 107 insertions(+) + +commit f6cd3e3bfc8d1f5a76dd55170968bf4582b95baf +Author: Lasse Collin +Date: 2025-01-23 11:40:46 +0200 + + Add NEWS for 5.6.4 + + NEWS | 45 +++++++++++++++++++++++++++++++++++++++++++++ + 1 file changed, 45 insertions(+) + +commit b3af3297e4d6cf0eafb48155aa97bb06c82a9228 +Author: Lasse Collin +Date: 2025-01-23 11:40:46 +0200 + + NEWS: The security fix in 5.6.3 is known as CVE-2024-47611 + + NEWS | 4 +++- + 1 file changed, 3 insertions(+), 1 deletion(-) + +commit a04b9dd0c7c74fabd8c393d2dc68a221276d6e29 +Author: Lasse Collin +Date: 2025-01-22 16:55:09 +0200 + + windows/build.bash: Fix error message + + Fixes: 1ee716f74085223c8fbcae1d5a384e6bf53c0f6a + + windows/build.bash | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit 4eae859ae8ad7072eaa74aeaee79a2c3c12c55cb +Author: Lasse Collin +Date: 2025-01-22 15:03:55 +0200 + + Windows: Disable MinGW-w64's stdio functions in size-optimized builds + + This only affects builds with UCRT. With legacy MSVCRT, the replacement + functions are always enabled. + + Omitting the MinGW-w64 replacements saves over 20 KiB per executable. + The downside is that --enable-small or XZ_SMALL=ON disables thousand + separator support in xz messages. If someone is OK with the slower + speed of slightly smaller builds, lack of thousand separators won't + matter. + + Don't override __USE_MINGW_ANSI_STDIO if it is already defined (via + CPPFLAGS or such method). + + src/common/sysdefs.h | 30 +++++++++++++++++++++--------- + src/xz/util.c | 6 +++++- + 2 files changed, 26 insertions(+), 10 deletions(-) + +commit a831bc185bdd44c06847eae8df2d35cc281f65da +Author: Lasse Collin +Date: 2025-01-20 16:44:27 +0200 + + liblzma: Add raw ARM64, RISC-V, and x86 BCJ filter APIs + + Put them behind the LZMA_UNSTABLE macro for now. + + These low-level special APIs might become useful in erofs-utils. + + src/liblzma/api/lzma/bcj.h | 99 +++++++++++++++++++++++++++++++++++++++++ + src/liblzma/common/common.h | 2 + + src/liblzma/liblzma_generic.map | 10 +++++ + src/liblzma/liblzma_linux.map | 10 +++++ + src/liblzma/simple/arm64.c | 18 ++++++++ + src/liblzma/simple/riscv.c | 18 ++++++++ + src/liblzma/simple/x86.c | 24 ++++++++++ + 7 files changed, 181 insertions(+) + +commit 6f5cdd4534faf7db4b6c123651d6a606bc59b98c +Author: Lasse Collin +Date: 2025-01-20 16:31:49 +0200 + + xz: Unify a few strings with liblzma + + Avoid having both "%s: foo" and "foo" as translatable strings + so that translators don't need to handle it twice. + + src/xz/options.c | 11 ++++++----- + src/xz/util.c | 4 ++-- + 2 files changed, 8 insertions(+), 7 deletions(-) + +commit 713fdaa8b06a83f18b06811aba7b9bd7b7cbf1cb +Author: Lasse Collin +Date: 2025-01-20 16:31:49 +0200 + + xz: Translate error messages from lzma_str_to_filters() + + liblzma doesn't use gettext but the messages are included in xz.pot, + so xz can translate the messages. + + src/xz/coder.c | 9 +++------ + 1 file changed, 3 insertions(+), 6 deletions(-) + +commit f2e2b267cab8d7aa0b0a58c325546ee5070c0028 +Author: Lasse Collin +Date: 2025-01-20 16:31:49 +0200 + + liblzma: Mark string conversion messages as translatable + + po/POTFILES.in | 1 + + src/liblzma/common/string_conversion.c | 96 ++++++++++++++++++++-------------- + 2 files changed, 59 insertions(+), 38 deletions(-) + +commit f49d7413d9a0d480ded6d448c1ef7475ae6cd1c9 +Author: Lasse Collin +Date: 2025-01-20 16:31:35 +0200 + + liblzma: Tweak a few error messages in lzma_str_to_filters() + + src/liblzma/common/string_conversion.c | 9 +++++---- + 1 file changed, 5 insertions(+), 4 deletions(-) + +commit da359c360e986b21cd8d7b888c6a80f56b9d49c7 +Author: Lasse Collin +Date: 2025-01-19 20:11:54 +0200 + + Update THANKS + + THANKS | 1 + + 1 file changed, 1 insertion(+) + +commit f032373561cefaf07f92ffe3fbc471ec6770456e +Author: Lasse Collin +Date: 2025-01-19 19:40:32 +0200 + + Update THANKS + + THANKS | 1 + + 1 file changed, 1 insertion(+) + +commit 51f038f8cbd5d8a95954c05bfcbbc32f2a313615 +Author: Lasse Collin +Date: 2025-01-13 08:44:58 +0200 + + liblzma: memcmplen.h: Use 8-byte method on 64-bit unaligned archs + + Previously it was enabled only on x86-64 and ARM64 when also support + for unaligned access was detected or manually enabled at built time. + + In the default build configuration, the 8-byte method is now enabled + also on 64-bit RISC-V and 64-bit PowerPC (both endiannesses). It was + reported that on big endian POWER9, encoding time may reduce 12-13 %. + + This change only affects builds with GCC and Clang because the code + uses __builtin_ctzll or __builtin_clzll. + + Thanks to Marcus Comstedt for testing on POWER9. + + src/liblzma/common/memcmplen.h | 3 +-- + 1 file changed, 1 insertion(+), 2 deletions(-) + +commit 96336b0110d47756a9fd2a103fbf0a99e905fbed +Author: Lasse Collin +Date: 2025-01-12 13:06:17 +0200 + + Update THANKS + + THANKS | 1 + + 1 file changed, 1 insertion(+) + +commit 150356207c8d6a3e0af465b676430d19d62f884c +Author: Lasse Collin +Date: 2025-01-12 12:59:20 +0200 + + liblzma: Fix the encoder breakage on big endian ARM64 + + When the 8-byte method was enabled for ARM64, a check for endianness + wasn't added. This broke the LZMA/LZMA2 encoder. Test suite caught it. + + Fixes: cd64dd70d5665b6048829c45772d08606f44672e + Co-authored-by: Marcus Comstedt + + src/liblzma/common/memcmplen.h | 9 +++++++-- + 1 file changed, 7 insertions(+), 2 deletions(-) + +commit b01b0958025a2da284b53a583f313f8140636cb5 +Author: Lasse Collin +Date: 2025-01-12 11:04:27 +0200 + + Windows: Update manifest comments about long UTF-8 filenames + + src/common/w32_application.manifest.comments.txt | 23 +++++++++++++++-------- + 1 file changed, 15 insertions(+), 8 deletions(-) + +commit 0dfc67d37ebb038be8a9b17b536d1b561d52e81a +Author: Lasse Collin +Date: 2025-01-12 10:47:58 +0200 + + Windows: Update build.bash and its README-Windows.txt to UCRT + + While MSVCRT builds are possible, UCRT works better with UTF-8. + A 32-bit build is included still but hopefully it's not actually + needed anymore. + + windows/README-Windows.txt | 17 ++++++++--------- + windows/build.bash | 20 ++++++++++++++------ + 2 files changed, 22 insertions(+), 15 deletions(-) + +commit 7b3eb2db6c4ba24b5eb438e58ab1ca57e14e59c2 +Author: Lasse Collin +Date: 2025-01-10 13:11:40 +0200 + + Translations: Update Serbian translation + + I rewrapped a few overlong lines. Those edits aren't in the + Translation Project. Automatic wrapping in the master branch + means that these strings need to be updated soon anyway. + + po/sr.po | 346 ++++++++++++++++++++++----------------------------------------- + 1 file changed, 121 insertions(+), 225 deletions(-) + +commit 950da11ce09c90412dcbca29689575037640667a +Author: Lasse Collin +Date: 2025-01-08 19:26:29 +0200 + + Build: Use --sort=name in TAR_OPTIONS + + Use also LC_COLLATE=C to make the sorting locale-independent. + Sorting makes the file order reproducible. + + Makefile.am | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +commit 75d91d6b39ea3e2fae8f027dcec01be2dca9594d +Author: Lasse Collin +Date: 2025-01-08 19:08:08 +0200 + + xz: Workaround broken O_SEARCH in musl + + Testing with musl 1.2.5 and Linux 6.12, O_SEARCH doesn't result + in a file descriptor that works with fsync() although it should work. + See the added comment. + + The same issue affected gzip --synchronous: + + https://bugs.gnu.org/75405 + + Thanks to Paul Eggert. + + src/xz/file_io.c | 11 +++++++++++ + 1 file changed, 11 insertions(+) + +commit ea92eae122a3ccefa61087f84fd99b417fc9ee3c +Author: Lasse Collin +Date: 2025-01-07 21:34:33 +0200 + + Revert "xz: O_SEARCH cannot be used for fsync()" + + This reverts commit 4014e2479c7b0273f15bd0c9c017c5fe859b0d8f. + + POSIX-conforming O_SEARCH should allow fsync(). + + src/xz/file_io.c | 21 +++++++++++---------- + 1 file changed, 11 insertions(+), 10 deletions(-) + +commit 4014e2479c7b0273f15bd0c9c017c5fe859b0d8f +Author: Lasse Collin +Date: 2025-01-05 21:43:11 +0200 + + xz: O_SEARCH cannot be used for fsync() + + Opening a directory with O_SEARCH results in a file descriptor that can + be used with functions like openat(). Such a file descriptor cannot be + used with fsync(). Use O_RDONLY instead. + + In musl, O_SEARCH becomes Linux-specific O_PATH. A file descriptor + from O_PATH doesn't allow fsync(). + + Seems that it's not possible to fsync() a directory that has write + and search permissions but not read permission. + + Fixes: 2a9e91d796d091740489d951fa7780525e4275f1 + + src/xz/file_io.c | 21 ++++++++++----------- + 1 file changed, 10 insertions(+), 11 deletions(-) + +commit ad2b57cb477b753293c25a01fc24c7f84ee523c2 +Author: Lasse Collin +Date: 2025-01-05 20:48:28 +0200 + + CI: Make ctest show errors from failed tests + + build-aux/ci_build.bash | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit c405264c031aceaf68dfd1546d6337afcebd48e5 +Author: Lasse Collin +Date: 2025-01-05 20:14:49 +0200 + + tuklib_mbstr_nonprint: Preserve the value of errno + + A typical use case is like this: + + printf("%s: %s\n", tuklib_mask_nonprint(filename), strerror(errno)); + + tuklib_mask_nonprint() may call mbrtowc() and malloc() which may modify + errno. If errno isn't preserved, the error message might be wrong if + a compiler decides to call tuklib_mask_nonprint() before strerror(). + + Fixes: 40e573305535960574404d2eae848b248c95ea7e + + src/common/tuklib_mbstr_nonprint.c | 17 ++++++++++++++--- + src/common/tuklib_mbstr_nonprint.h | 4 +++- + 2 files changed, 17 insertions(+), 4 deletions(-) + +commit 2a9e91d796d091740489d951fa7780525e4275f1 +Author: Lasse Collin +Date: 2025-01-05 20:14:49 +0200 + + xz: Use fsync() before deleting the input file, and add --no-sync + + xz's default behavior is to delete the input file after successful + compression or decompression (unless writing to standard output). + If the system crashes soon after the deletion, it is possible that + the newly written file has not yet hit the disk while the previous + delete operation might have. In that case neither the original file + nor the written file is available. + + Call fsync() on the file. On POSIX systems, sync also the directory + where the file was created. + + Add a new option --no-sync which disables fsync() usage. It can avoid + a (possibly significant) performance penalty when processing many + small files. It's fine to use --no-sync when one knows that the files + are easy to recreate or restore after a system crash. + + Using fsync() after every flush initiated by --flush-timeout was + considered. It wasn't implemented at least for now. + + - --flush-timeout is typically used when writing to stdout. If stdout + is a file, xz cannot (portably) sync the directory of the file. + One would need to create the output file first, sync the directory, + and then run xz with fsync() enabled. + + - If xz --flush-timeout output goes to a file, it's possible to use + a separate script to sync the file, for example, once per minute + while telling xz to flush more frequently. + + - Not supporting syncing with --flush-timeout was simpler. + + Portability notes: + + - On systems that lack O_SEARCH (like Linux), "xz dir/file" will now + fail if "dir" cannot be opened for reading. If "dir" still has + write and search permissions (like d-wx------ in "ls -l"), + previously xz would have been able to compress "dir/file" still. + Now it only works if using --no-sync (or --keep or --stdout). + + - and dirname() should be available on all POSIX systems, + and aren't needed on non-POSIX systems. + + - fsync() is available on all POSIX systems. The directory syncing + could be changed to fdatasync() although at least on ext4 it + doesn't seem to make a performance difference in xz's usage. + fdatasync() would need a build system check to support (old) + special cases, for example, MINIX 3.3.0 doesn't have fdatasync() + and Solaris 10 needs -lrt. + + - On native Windows, _commit() is used to replace fsync(). Directory + syncing isn't done and shouldn't be needed. (In Cygwin, fsync() on + directories is a no-op.) + + - DJGPP has fsync() for files. ;-) + + Using fsync() was considered somewhere around 2009 and again in 2016 but + those times the idea was rejected. For comparison, GNU gzip 1.7 (2016) + added the option --synchronous which enables fsync(). + + Co-authored-by: Sebastian Andrzej Siewior + Fixes: https://bugs.debian.org/814089 + Link: https://www.mail-archive.com/xz-devel@tukaani.org/msg00282.html + Closes: https://github.com/tukaani-project/xz/pull/151 + + src/xz/args.c | 14 ++++++ + src/xz/args.h | 2 +- + src/xz/file_io.c | 129 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- + src/xz/file_io.h | 6 +++ + src/xz/message.c | 3 ++ + src/xz/sandbox.c | 5 ++- + src/xz/xz.1 | 24 ++++++++++- + 7 files changed, 177 insertions(+), 6 deletions(-) + +commit 2e28c7145747b3287283f13c9d2becd73a7c4a1f +Author: Lasse Collin +Date: 2024-12-27 09:15:50 +0200 + + xz: Use "goto" for error handling in io_open_dest_real() + + src/xz/file_io.c | 20 +++++++++----------- + 1 file changed, 9 insertions(+), 11 deletions(-) + +commit 75107217670a97b7b772833669d88c3c2f188e37 +Author: Lasse Collin +Date: 2025-01-05 12:10:05 +0200 + + liblzma: Always validate the first digit of a preset string + + lzma_str_to_filters() may call parse_lzma12_preset() in two ways. The + call from str_to_filters() detects the string type from the first + character(s) and as a side-effect it validates the first digit of + the preset string. So this change makes no difference there. + + However, the call from parse_options() doesn't pre-validate the string. + parse_lzma12_preset() will return an invalid value which is passed to + lzma_lzma_preset() which safely rejects it. The bug still affects the + the error message: + + $ xz --filters=lzma2:preset=X + xz: Error in --filters=FILTERS option: + xz: lzma2:preset=X + xz: ^ + xz: Unsupported preset + + After the fix: + + $ xz --filters=lzma2:preset=X + xz: Error in --filters=FILTERS option: + xz: lzma2:preset=X + xz: ^ + xz: Unsupported preset + + The ^ now correctly points to the X and not past it because the X itself + is the problematic character. + + Fixes: cedeeca2ea6ada5b0411b2ae10d7a859e837f203 + + src/liblzma/common/string_conversion.c | 4 ++++ + 1 file changed, 4 insertions(+) + +commit 52ff32433734d03befd85a5bf00fba77d6501455 +Author: Lasse Collin +Date: 2025-01-05 11:40:34 +0200 + + xz: Fix getopt_long argument type in --filters* + + Forgetting the argument (or not using = to separate the option from + the argument) resulted in lzma_str_to_filters() being called with NULL + as input string argument. The function handles it fine but xz passes + the NULL to printf() too: + + $ xz --filters + xz: Error in --filters=FILTERS option: + xz: (null) + xz: ^ + xz: Unexpected NULL pointer argument(s) to lzma_str_to_filters() + + Now it's correct: + + $ xz --filters + xz: option '--filters' requires an argument + + The --filters-help option doesn't take any arguments. + + Fixes: 9ded880a0221f4d1256845fc4ab957ffd377c760 + Fixes: d6af7f347077b22403133239592e478931307759 + Fixes: a165d7df1964121eb9df715e6f836a31c865beef + + src/xz/args.c | 22 +++++++++++----------- + 1 file changed, 11 insertions(+), 11 deletions(-) + +commit 2655c81b5e92278b0fd51f6537c1116f8349b02a +Author: Lasse Collin +Date: 2025-01-04 20:04:56 +0200 + + xzdec: Don't leave Landlock file descriptor open for no reason + + This fix is similar to 48ff3f06521ca326996ab9a04d1b342098960427. + + Fixes: d74fb5f060b76db709b50f5fd37490394e52f975 + + src/xzdec/xzdec.c | 2 ++ + 1 file changed, 2 insertions(+) + +commit 35df4c2bc0500e60ba9d0d163d37a6d110d6841e +Author: Lasse Collin +Date: 2025-01-04 20:02:18 +0200 + + xz: Make --single-stream imply --keep + + Suggested by xx on #tukaani on 2024-04-12. + + src/xz/args.c | 3 +++ + src/xz/xz.1 | 9 ++++++++- + 2 files changed, 11 insertions(+), 1 deletion(-) + +commit 6f412814a8019700248229ce972530159a0d9872 +Author: Lasse Collin +Date: 2025-01-04 19:57:07 +0200 + + Update AUTHORS + + The contributions have been rewritten. + + AUTHORS | 2 +- + src/liblzma/check/crc32_arm64.h | 1 - + src/liblzma/check/crc32_fast.c | 1 - + src/liblzma/check/crc_common.h | 1 - + 4 files changed, 1 insertion(+), 4 deletions(-) + +commit 5651d153031a7ee2581cdba9bff658031826cb50 +Author: Lasse Collin +Date: 2025-01-04 15:02:16 +0200 + + xz: Avoid printf formats like %2$s + + It's a POSIX feature that isn't in standard C. It's not available on + Windows. Even MinGW-w64 with __USE_MINGW_ANSI_STDIO doesn't support + it even though it supports POSIX %'d for thousand separators. + + Gettext's provides overrides for printf and other functions + which do support the %2$s formats. Translations use them. But xz should + work on Windows without too. + + Fixes: 3e9177fd206d20d6d8acc7d203c25a9ae0549229 + + src/xz/message.c | 51 ++++++++++++++++++++++++++++++++------------------- + 1 file changed, 32 insertions(+), 19 deletions(-) + +commit 63b246c90e7677c617faab1d3f6fc5c643b5e7cf +Author: Lasse Collin +Date: 2025-01-04 14:41:37 +0200 + + tuklib_mbstr_wrap: Add printf format attribute + + It's supported by GCC 3.x already. + + src/common/tuklib_common.h | 7 +++++++ + src/common/tuklib_mbstr_wrap.h | 1 + + 2 files changed, 8 insertions(+) + +commit a7313c01d9b8db71ffb61dc1dd7c4ea928824b4b +Author: Lasse Collin +Date: 2025-01-04 13:44:12 +0200 + + xz: Translate a Windows-specific string + + Originally I thought that native Windows builds wouldn't be translated + but nowadays at least MSYS2 ships such binaries. + + src/xz/file_io.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit 00eb6073c088be9e7516dfc00a13ef520827b57c +Author: Lasse Collin +Date: 2025-01-02 15:32:10 +0200 + + xz: Use my_landlock.h + + A slightly silly thing is that xz may now query the ABI version up to + three times. We could call my_landlock_ruleset_attr_forbid_all() only + once and cache the result but it didn't seem worth doing. + + CMakeLists.txt | 1 + + src/xz/sandbox.c | 72 ++++++++++---------------------------------------------- + 2 files changed, 13 insertions(+), 60 deletions(-) + +commit 0fc5a625d7cc4ad51fde9367de088b9ad3bd40f6 +Author: Lasse Collin +Date: 2025-01-02 15:32:10 +0200 + + xzdec: Use my_landlock.h + + CMakeLists.txt | 1 + + src/xzdec/xzdec.c | 34 ++++++---------------------------- + 2 files changed, 7 insertions(+), 28 deletions(-) + +commit 38cb8ec9fd70d25fca6b473de44cf61586238552 +Author: Lasse Collin +Date: 2025-01-02 15:32:10 +0200 + + Add my_landlock.h with helper functions to use Linux Landlock + + This supports up to Landlock ABI version 6. The current code in + xz and xzdec only support up to ABI version 4. + + src/Makefile.am | 1 + + src/common/my_landlock.h | 141 +++++++++++++++++++++++++++++++++++++++++++++++ + 2 files changed, 142 insertions(+) + +commit 672da29bb3a209a727ae46c0df948d7eea69f2e2 +Author: Lasse Collin +Date: 2025-01-01 18:46:50 +0200 + + liblzma: Silence warnings from "clang -Wimplicit-fallthrough" + + src/liblzma/lzma/lzma_decoder.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit 1a8a1ad9a1e3179ce267baa551fb17b30624b4dd +Author: Lasse Collin +Date: 2025-01-01 15:34:51 +0200 + + Build: Use -Wimplicit-fallthrough=5 when supported + + Now that we have the FALLTHROUGH macro, use the strictest mode with + GCC so that comment-based fallthrough markings are no longer accepted. + + In GCC, -Wextra includes -Wimplicit-fallthrough=3 and + -Wimplicit-fallthrough is the same as -Wimplicit-fallthrough=3. + Thus, the strict mode requires specifying -Wimplicit-fallthrough=5. + + Clang has -Wimplicit-fallthrough which is *not* enabled by -Wextra. + Clang doesn't have a variant that takes an argument. Thus we need + to check for -Wimplicit-fallthrough. Do it before checking for + -Wimplicit-fallthrough=5 so that the latter overrides the former + when using GCC. + + CMakeLists.txt | 2 ++ + configure.ac | 2 ++ + 2 files changed, 4 insertions(+) + +commit 94adc996e45cc5cad9352cc3271d3a1a2f5c4c22 +Author: Lasse Collin +Date: 2025-01-01 15:30:50 +0200 + + Replace "Fall through" comments with FALLTHROUGH + + src/liblzma/common/alone_decoder.c | 3 +-- + src/liblzma/common/auto_decoder.c | 5 ++--- + src/liblzma/common/block_decoder.c | 6 ++---- + src/liblzma/common/block_encoder.c | 6 ++---- + src/liblzma/common/common.c | 2 +- + src/liblzma/common/file_info.c | 22 +++++++++------------- + src/liblzma/common/index_decoder.c | 9 +++------ + src/liblzma/common/index_encoder.c | 6 ++---- + src/liblzma/common/index_hash.c | 7 +++---- + src/liblzma/common/lzip_decoder.c | 14 +++++--------- + src/liblzma/common/stream_decoder.c | 16 ++++++---------- + src/liblzma/common/stream_decoder_mt.c | 25 +++++++++---------------- + src/liblzma/common/stream_encoder_mt.c | 10 ++++------ + src/liblzma/lzma/lzma2_encoder.c | 9 +++------ + src/xz/args.c | 2 +- + src/xz/list.c | 3 +-- + 16 files changed, 54 insertions(+), 91 deletions(-) + +commit f31c3a6647b5a5d056324a9c83e6b2c940ebec22 +Author: Lasse Collin +Date: 2025-01-01 15:08:51 +0200 + + sysdefs.h: Add FALLTHROUGH macro + + src/common/sysdefs.h | 9 +++++++++ + 1 file changed, 9 insertions(+) + +commit e34dbd6a0ae7a560a5508d51fc0bd142c5a320dc +Author: Lasse Collin +Date: 2025-01-01 15:06:15 +0200 + + xzdec: Fix language in a comment + + src/xzdec/xzdec.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit 16821252c504071f5c2012e415e59cbf5fb79820 +Author: Lasse Collin +Date: 2025-01-02 13:35:48 +0200 + + Windows: Make NLS require UCRT and gettext-runtime >= 0.23.1 + + Also remove the recently-added workaround from tuklib_gettext.h. + Requiring a new enough gettext-runtime is cleaner. I guess it's + mostly MSYS2 where xz is built with translation support, so once + MSYS2 has Gettext >= 0.23.1, this requirement shouldn't be a problem + in practice. + + CMakeLists.txt | 29 ++++++++++++++++++++++++++ + configure.ac | 29 ++++++++++++++++++++++++++ + src/common/tuklib_gettext.h | 51 --------------------------------------------- + 3 files changed, 58 insertions(+), 51 deletions(-) + +commit aa1807ed942579f700a08ab091b796cf04e31aec +Author: Lasse Collin +Date: 2025-01-02 11:52:17 +0200 + + windows/build-with-cmake.bat: Fix ENABLE_NLS to XZ_NLS + + Fixes: 29f77c7b707f2458fb047e77497354b195e05b14 + + windows/build-with-cmake.bat | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit ea21c76aa2406ba06ac154fe57741734c04f260f +Author: Lasse Collin +Date: 2024-12-30 11:21:57 +0200 + + Build: Use git log --pretty=medium when creating ChangeLog + + It's the default in git-log. Specifying it explicitly is good in case + a user has set format.pretty to a different value. + + Makefile.am | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +commit 08050c0788ce5bac0ffd572e9784a2749c4a13df +Author: Lasse Collin +Date: 2024-12-30 10:51:33 +0200 + + Windows: Update MinGW-w64 + CMake instructions to recommend UCRT + + windows/INSTALL-MinGW-w64_with_CMake.txt | 38 +++++++++++++++++++------------- + 1 file changed, 23 insertions(+), 15 deletions(-) + +commit 653732bd6f06d8f465bf353bf6e1c16f1405b906 +Author: Lasse Collin +Date: 2024-12-30 10:51:26 +0200 + + xz man page: Describe the source file deletion in -z and -d options + + The DESCRIPTION section always explained it, and the OPTIONS section + only described the differences to the default behavior. However, new + users in a hurry may skip reading DESCRIPTION. The default behavior + is a bit dangerous, thus it's good to repeat in --compress and + --decompress docs that source file is removed after successful operation. + + Fixes: https://github.com/tukaani-project/xz/issues/150 + + src/xz/xz.1 | 17 ++++++++++++++++- + 1 file changed, 16 insertions(+), 1 deletion(-) + +commit bb79f79b278fd4fb06a0bcd5ab3445c468f9baaf +Author: Lasse Collin +Date: 2024-12-27 21:52:28 +0200 + + Build: Set libtool -version-info so that it matches with CMake + + In the past, they haven't been in sync in development versions + although they (of course) have been in stable releases. + + src/liblzma/Makefile.am | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit cf54f70e14c218faf5019ffa2fa769ed73772ee8 +Author: Lasse Collin +Date: 2024-12-28 18:28:56 +0200 + + CMake/macOS: Use GNU Libtool compatible shared library versioning + + Because this increases the Mach-O compatibility_version, this commit + shouldn't cause any ABI compatibility trouble for existing CMake users + on macOS. This is assuming that they won't later downgrade to an older + liblzma version that was built with CMake before this commit. + + Meson allows customising the Mach-O versioning too. So the three + build systems can be configured to be compatible. + + CMakeLists.txt | 51 ++++++++++++++++++++++++++++++++++++++++++++++++--- + 1 file changed, 48 insertions(+), 3 deletions(-) + +commit 94e17916689d38bc09bf35e602ed6f6276034b59 +Author: Lasse Collin +Date: 2024-12-28 14:49:45 +0200 + + CMake: Edit a comment + + CMakeLists.txt | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit 6b50590725aeae8a2aed06faa3238cb9f8771c1b +Author: Lasse Collin +Date: 2024-12-28 20:39:49 +0200 + + version.sh: Omit an unwanted dot from development versions + + It printed 5.7.0.alpha instead of 5.7.0alpha. + + Fixes: e7a42cda7c827e016619e8cab15e2faf5d4181ae + + build-aux/version.sh | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit f7a248f56e94310a080051c4a709c08514fa48b1 +Author: Lasse Collin +Date: 2024-12-27 16:25:07 +0200 + + CMake: Remove a duplicate word from a comment + + CMakeLists.txt | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +commit 8b7c55d148f4a9b3702207164e862437ddffad33 +Author: Lasse Collin +Date: 2024-12-27 16:23:12 +0200 + + INSTALL: Document CMAKE_DLL_NAME_WITH_SOVERSION + + INSTALL | 19 +++++++++++++++++++ + 1 file changed, 19 insertions(+) + +commit 260d5d36203955a7148ae1ab05d0931c942028d5 +Author: Lasse Collin +Date: 2024-12-26 21:27:18 +0200 + + xz: Fix comments + + src/xz/file_io.c | 4 ++-- + src/xz/file_io.h | 4 ++-- + 2 files changed, 4 insertions(+), 4 deletions(-) + +commit bf6da9a573a780cd1a7fb1728ef55d09e58dad11 +Author: Dexter Castor Döpping +Date: 2024-12-22 13:44:03 +0100 + + CMake: Disable unity builds project-wide + + liblzma and xz can't be compiled as a unity/jumbo build because of + redeclarations and type name reuse. The CMake documentation recommends + setting UNITY_BUILD to false in this case. + + This is especially important if we're compiled as a subproject and the + consumer wants to use CMAKE_UNITY_BUILD=ON for the rest of their code + base. + + Closes: https://github.com/tukaani-project/xz/pull/158 + + CMakeLists.txt | 6 ++++++ + 1 file changed, 6 insertions(+) + +commit f8c328eed1bf0a0168132025a52116b7735f894c +Author: Lasse Collin +Date: 2024-12-20 08:51:18 +0200 + + Windows: Workaround a UTF-8 issue in Gettext's libintl_setlocale() + + See the comment. In this package, locale is set at program startup and + not changed later, so the point (2) in the comment isn't a problem. + + Fixes: 46ee0061629fb075d61d83839e14dd193337af59 + + src/common/tuklib_gettext.h | 51 +++++++++++++++++++++++++++++++++++++++++++++ + 1 file changed, 51 insertions(+) + +commit 03533906093529701ba91081907d8977991997de +Author: Lasse Collin +Date: 2024-12-20 06:50:36 +0200 + + Revert "Windows: Use UTF-8 locale when active code page is UTF-8" + + This reverts commit 0d0b574cc45045d6150d397776340c068df59e2a. + + src/common/tuklib_gettext.h | 32 ++------------------------------ + 1 file changed, 2 insertions(+), 30 deletions(-) + +commit 4b319e05afef4eab2fbafb6223f25d128ec99fce +Author: Lasse Collin +Date: 2024-12-19 18:31:09 +0200 + + xzdec: Use setlocale() instead of tuklib_gettext_setlocale() + + xzdec isn't translated and doesn't need libintl on Windows even + when NLS is enabled, thus libintl_setlocale() cannot interfere + with the locale settings. Thus, standard setlocale() works perfectly. + + In the commit 78868b6e, the explanation in the commit message is wrong. + + Fixes: 78868b6ed63fa4c89f73e3dfed27abfb8b0d46db + + src/xzdec/xzdec.c | 9 +++------ + 1 file changed, 3 insertions(+), 6 deletions(-) + +commit 34b80e282ea76ec793eaedaef58a36c3913dec78 +Author: Lasse Collin +Date: 2024-12-19 19:36:15 +0200 + + Windows: Revert the setlocale(LC_ALL, ".UTF8") documentation + + Only leave the FindFileFirstA() notes from 20dfca81, reverting + the incorrect setlocale() notes. On Windows, Gettext's + overrides setlocale() with libintl_setlocale() wrapper. I hadn't + noticed this, and thus my conclusions were wrong. + + Fixes: 20dfca8171dad4c64785ac61d5b68972c444877b + + src/common/w32_application.manifest.comments.txt | 21 +-------------------- + 1 file changed, 1 insertion(+), 20 deletions(-) + +commit 5794cda064ce980450eaa5a4e2c71bd317168ce4 +Author: Lasse Collin +Date: 2024-12-18 17:49:05 +0200 + + tuklib_mbstr_wrap: Silence a warning from Clang + + Fixes: ca529c3f41a4a19a59e2e252e6dd9255f130c634 + + src/common/tuklib_mbstr_wrap.c | 9 +++++++++ + 1 file changed, 9 insertions(+) + +commit 16c9796ef970ae349c54fef9a346e394d7cc4c75 +Author: Lasse Collin +Date: 2024-12-18 14:00:09 +0200 + + Update THANKS + + THANKS | 2 ++ + 1 file changed, 2 insertions(+) + +commit 3b5c8a1fcab385eed9cc95684223fddd7cf5a053 +Author: Lasse Collin +Date: 2024-12-18 14:00:09 +0200 + + Update TODO + + Fixes: 5f6dddc6c911df02ba660564e78e6de80947c947 + + TODO | 3 --- + 1 file changed, 3 deletions(-) + +commit 22a35e64ce3d331b668f15f858a7bb3da3acc78e +Author: Lasse Collin +Date: 2024-12-18 14:00:09 +0200 + + lzmainfo: Use tuklib_mbstr_nonprint + + CMakeLists.txt | 3 +++ + src/lzmainfo/Makefile.am | 1 + + src/lzmainfo/lzmainfo.c | 16 ++++++++++------ + 3 files changed, 14 insertions(+), 6 deletions(-) + +commit 03111595ee713e0f94fb4f4a19a15594d5149347 +Author: Lasse Collin +Date: 2024-12-18 14:00:09 +0200 + + xzdec: Use tuklib_mbstr_nonprint + + CMakeLists.txt | 3 +++ + src/xzdec/Makefile.am | 2 ++ + src/xzdec/xzdec.c | 15 +++++++++++---- + 3 files changed, 16 insertions(+), 4 deletions(-) + +commit d22f96921fd2f94d842f3cc2e5f729cb3cca5122 +Author: Lasse Collin +Date: 2024-12-18 14:00:09 +0200 + + xz: Use tuklib_mbstr_nonprint + + Call tuklib_mask_nonprint() on filenames and also on a few other + strings from the command line too. + + The filename printed by "xz --robot --list" (in list.c) is also masked. + It's good to get rid of tabs and newlines which would desync the output + but masking other chars wouldn't be strictly necessary. It might matter + with sensible filenames if LC_CTYPE is "C" (when iswprint() might reject + non-ASCII chars) and a script wants to read a filename from xz's output. + Hopefully it's an unusual enough corner case to not be a real problem. + + CMakeLists.txt | 2 ++ + src/xz/Makefile.am | 1 + + src/xz/coder.c | 19 ++++++++----- + src/xz/file_io.c | 81 ++++++++++++++++++++++++++++++++++-------------------- + src/xz/list.c | 32 +++++++++++++-------- + src/xz/main.c | 10 +++++-- + src/xz/message.c | 8 ++++-- + src/xz/options.c | 10 ++++--- + src/xz/private.h | 1 + + src/xz/suffix.c | 12 ++++---- + 10 files changed, 113 insertions(+), 63 deletions(-) + +commit 40e573305535960574404d2eae848b248c95ea7e +Author: Lasse Collin +Date: 2024-12-18 14:00:09 +0200 + + Add tuklib_mbstr_nonprint to mask non-printable characters + + Malicious filenames or other untrusted strings may affect the state of + the terminal when such strings are printed as part of (error) messages. + Add functions that mask such characters. + + It's not enough to handle only single-byte control characters. + In multibyte locales, some control characters are multibyte too, for + example, terminals interpret C1 control characters (U+0080 to U+009F) + that are two bytes as UTF-8. + + Instead of checking for control characters with iswcntrl(), this + uses iswprint() to detect printable characters. This is much stricter. + On Windows it's actually too strict as it rejects some characters that + definitely are printable. + + Gnulib's quotearg would do a lot more but I hope this simpler method + is good enough here. + + Thanks to Ryan Colyer for the discussion about the problems of + the earlier single-byte-only method. + + Thanks to Christian Weisgerber for reporting a bug in an earlier + version of this code. + + Thanks to Jeroen Roovers for a typo fix. + + Closes: https://github.com/tukaani-project/xz/pull/118 + + src/Makefile.am | 2 + + src/common/tuklib_mbstr_nonprint.c | 151 +++++++++++++++++++++++++++++++++++++ + src/common/tuklib_mbstr_nonprint.h | 69 +++++++++++++++++ + 3 files changed, 222 insertions(+) + +commit 36190c8c4bb13d1eab84a30f3650a5ec5ff0e402 +Author: Lasse Collin +Date: 2024-12-18 11:33:09 +0200 + + Translations: Add preliminary Georgian translation + + Most of the auto-wrapped strings are translated already. A few + strings have changed since this was created though. This file + isn't in the Translation Project *yet* because these strings + are still very new. + + Closes: https://github.com/tukaani-project/xz/pull/145 + + po/LINGUAS | 1 + + po/ka.po | 1186 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + 2 files changed, 1187 insertions(+) + +commit 4a0c4f92b820b84ace625a95305a9d56cb662f4e +Author: Lasse Collin +Date: 2024-10-30 20:50:20 +0200 + + xz: Make one string simpler for translators + + Leading spaces in the string can get miscounted by translators. + + src/xz/list.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +commit 3fcf547e926f6c0414b23459f7b43164f7e8c378 +Author: Lasse Collin +Date: 2024-12-17 10:26:10 +0200 + + lzmainfo: Sync the translatable strings with xz + + src/lzmainfo/lzmainfo.c | 20 ++++++++++++-------- + 1 file changed, 12 insertions(+), 8 deletions(-) + +commit 3e9177fd206d20d6d8acc7d203c25a9ae0549229 +Author: Lasse Collin +Date: 2024-12-17 10:26:10 +0200 + + xz: Use automatic word wrapping for help texts + + --long-help is now one line longer because --lzma1 is now on its + own line. + + CMakeLists.txt | 2 + + src/xz/Makefile.am | 3 +- + src/xz/message.c | 482 ++++++++++++++++++++++++++++++++++------------------- + 3 files changed, 313 insertions(+), 174 deletions(-) + +commit a0eecc9eb23ac583ccf442de3f5c106d4b09482d +Author: Lasse Collin +Date: 2024-12-16 18:46:45 +0200 + + po/Makevars: Add --keyword=W_:... to XGETTEXT_OPTIONS + + The text was copied from tuklib_gettext.h. + + Also rearrange the --keyword options to be last on the line. + + po/Makevars | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit ca529c3f41a4a19a59e2e252e6dd9255f130c634 +Author: Lasse Collin +Date: 2024-12-16 18:43:52 +0200 + + Add tuklib_mbstr_wrap for automatic word wrapping + + Automatic word wrapping makes translators' work easier and reduces + errors like misaligned columns or overlong lines. Right-to-left + languages and languages that don't use spaces between words will + still need extra effort. (xz hasn't been translated to any RTL + language so far.) + + cmake/tuklib_mbstr.cmake | 4 + + m4/tuklib_mbstr.m4 | 2 +- + src/Makefile.am | 2 + + src/common/tuklib_gettext.h | 11 ++ + src/common/tuklib_mbstr_wrap.c | 285 +++++++++++++++++++++++++++++++++++++++++ + src/common/tuklib_mbstr_wrap.h | 203 +++++++++++++++++++++++++++++ + 6 files changed, 506 insertions(+), 1 deletion(-) + +commit 314b83cebad0244a0015a8abc6d8d086b581c215 +Author: Lasse Collin +Date: 2024-12-17 17:57:18 +0200 + + Build: Sort filenames to ASCII order in Makefile.am + + src/Makefile.am | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit df399c52554dfdf60259ca2cce97adbcfff39dc0 +Author: Lasse Collin +Date: 2024-10-21 18:51:24 +0300 + + tuklib_mbstr_width: Add tuklib_mbstr_width_mem() + + It's a new function split from tuklib_mbstr_width(). + It's useful with partial strings that aren't terminated with \0. + + src/common/tuklib_mbstr.h | 17 +++++++++++++++++ + src/common/tuklib_mbstr_width.c | 8 ++++++++ + 2 files changed, 25 insertions(+) + +commit 51081efae4c52c226e96da95313916eba99f885f +Author: Lasse Collin +Date: 2024-12-16 20:08:27 +0200 + + tuklib_mbstr_width: Update a comment about shift states + + src/common/tuklib_mbstr_width.c | 11 ++++++++--- + 1 file changed, 8 insertions(+), 3 deletions(-) + +commit 7ff1b0ac53866877bdfd79acf5fee0269058c58b +Author: Lasse Collin +Date: 2024-10-21 18:47:56 +0300 + + tuklib_mbstr_width: Don't mention shift states in the API docs + + It is assumed that this code won't be used with charsets that use + locking shift states. + + src/common/tuklib_mbstr.h | 8 ++------ + 1 file changed, 2 insertions(+), 6 deletions(-) + +commit 3c16105936320e4095dbe84fa9a33a4a6d46a597 +Author: Lasse Collin +Date: 2024-10-21 18:41:41 +0300 + + tuklib_mbstr_width: Use stricter return value checking + + This should make no difference in practice (at least if mbrtowc() + isn't broken). + + src/common/tuklib_mbstr_width.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit b797c44c42ea54fe1c52722a2fca0c9618575598 +Author: Lasse Collin +Date: 2024-12-16 20:06:07 +0200 + + tuklib_mbstr_width: Change the behavior when wcwidth() is not available + + If wcwidth() isn't available (Windows), previously it was assumed + that one byte == one column in the terminal. Now it is assumed that + one multibyte character == one column. This works better with UTF-8. + Languages that only use single-width characters without any combining + characters should work correctly with this. + + In xz, none of po/*.po contain combining characters and only ko.po, + zh_CN.po, and zh_TW.po contain fullwidth characters. Thus, "only" + those three translations in xz are broken on Windows with the + UTF-8 code page. Broken means that column headings in xz -lvv and + (only in the master branch) strings in --long-help are misaligned, + so it's not a huge problem. I don't know if those three languages + displayed perfectly before the UTF-8 change because I hadn't tested + translations with native Windows builds before. + + Fixes: 46ee0061629fb075d61d83839e14dd193337af59 + + src/common/tuklib_mbstr_width.c | 13 +++++++++++-- + 1 file changed, 11 insertions(+), 2 deletions(-) + +commit 78868b6ed63fa4c89f73e3dfed27abfb8b0d46db +Author: Lasse Collin +Date: 2024-12-18 14:23:13 +0200 + + xzdec: Use setlocale() via tuklib_gettext_setlocale() + + xzdec isn't translated and didn't have locale-specific behavior + in the past. On Windows with UTF-8 in the application manifest, + setting the locale makes a difference though: + + - Without any setlocale() call, non-ASCII filenames don't display + properly in Command Prompt unless one first uses "chcp 65001" + to set the console code page to UTF-8. + + - setlocale(LC_ALL, "") is enough to make non-ASCII filenames + print correctly in Command Prompt without using "chcp 65001", + assuming that the non-UTF-8 code page (like 850) supports + those non-ASCII characters. + + - setlocale(LC_ALL, ".UTF8") is even better because then mbrtowc() and + such functions use an UTF-8 locale instead of a legacy code page. + The tuklib_gettext_setlocale() macro takes care of this (without + enabling any translations). + + Fixes: 46ee0061629fb075d61d83839e14dd193337af59 + + src/xzdec/xzdec.c | 12 ++++++++++++ + 1 file changed, 12 insertions(+) + +commit 0d0b574cc45045d6150d397776340c068df59e2a +Author: Lasse Collin +Date: 2024-12-17 14:59:37 +0200 + + Windows: Use UTF-8 locale when active code page is UTF-8 + + XZ Utils 5.6.3 set the active code page to UTF-8 to fix CVE-2024-47611. + This wasn't paired with UCRT-specific setlocale(LC_ALL, ".UTF8"), thus + non-ASCII characters from translations became mojibake. + + Fixes: 46ee0061629fb075d61d83839e14dd193337af59 + + src/common/tuklib_gettext.h | 32 ++++++++++++++++++++++++++++++-- + 1 file changed, 30 insertions(+), 2 deletions(-) + +commit 20dfca8171dad4c64785ac61d5b68972c444877b +Author: Lasse Collin +Date: 2024-12-17 15:01:29 +0200 + + Windows: Document the need for setlocale(LC_ALL, ".UTF8") + + Also warn about unpaired surrogates and (somewhat UTF-8-specific) + MAX_PATH issue in FindFirstFileA(). + + Fixes: 46ee0061629fb075d61d83839e14dd193337af59 + + src/common/w32_application.manifest.comments.txt | 28 +++++++++++++++++++++++- + 1 file changed, 27 insertions(+), 1 deletion(-) + +commit 4e936f234056e5831013ed922145b666b04bb1e3 +Author: Lasse Collin +Date: 2024-12-18 14:12:22 +0200 + + xzdec: Call tuklib_progname_init() early enough + + If the early pledge() call on OpenBSD fails, it calls my_errorf() + which requires the "progname" variable. + + Fixes: d74fb5f060b76db709b50f5fd37490394e52f975 + + src/xzdec/xzdec.c | 6 +++--- + 1 file changed, 3 insertions(+), 3 deletions(-) + +commit 61feaf681bd793dc5c919732b44bca7dcf2ed1b8 +Author: Lasse Collin +Date: 2024-12-15 19:08:32 +0200 + + CMake: Bump maximum policy version to 3.31 + + With CMake 3.31, there were a few warnings from + CMP0177 "install() DESTINATION paths are normalized". + These occurred because the install(FILES) command in + my_install_man_lang() is called with a DESTINATION path + that contains two consecutive slashes, for example, + "share/man//man1". Such a path is for the English man pages. + With translated man pages, the language code goes between + the slashes. The warning was probably triggered because the + extra slash gets removed by the normalization. + + CMakeLists.txt | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit b0bb84dd7bbdcc85243386a0051c7b2cb5fc6a18 +Author: Lasse Collin +Date: 2024-12-15 18:35:27 +0200 + + Update THANKS + + THANKS | 1 + + 1 file changed, 1 insertion(+) + +commit bee0c044d30a6ad3b3d94901c27e7519f6f46e27 +Author: Dexter Castor Döpping +Date: 2024-12-08 18:24:29 +0100 + + liblzma: Fix incorrect macro name in a comment + + Fixes: 33b8a24b6646a9dbfd8358405aec466b13078559 + Closes: https://github.com/tukaani-project/xz/pull/155 + + src/liblzma/api/lzma/lzma12.h | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit 2cfa1ad0a9eb62b1847cf13f9aee290158978a3a +Author: Lasse Collin +Date: 2024-12-17 10:36:43 +0200 + + license-check.sh: Add an exception for doc/SHA256SUMS + + Fixes: 36b531022f24a2ab57a2dfb9e5052f1c176e9d9a + + build-aux/license-check.sh | 1 + + 1 file changed, 1 insertion(+) + +commit 36b531022f24a2ab57a2dfb9e5052f1c176e9d9a +Author: Lasse Collin +Date: 2024-12-01 21:38:17 +0200 + + doc/SHA256SUMS: Add the list of SHA-256 hashes of release files + + The release files are signed but verifying the signatures cannot + catch certain types of attacks: + + 1. A malicious maintainer could make more than one variant of + a package. One could be for general distribution. Another + with malicious content could be targeted to specific users, + for example, distributing the malicious version on a mirror + controlled by the attacker. + + 2. If the signing key of an honest maintainer was compromised + without being detected, a similar situation as described + above could occur. + + SHA256SUMS could be put on the project website but having it in + the Git repository makes it obvious that old lines aren't modified + when the file is updated. + + Hashes of uncompressed files are included too. This way tarballs + can be recompressed and the hashes can still be verified. + + .gitattributes | 1 + + doc/SHA256SUMS | 218 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + 2 files changed, 219 insertions(+) + +commit fe9e66993fdbcc2981c7361b9b034a451eb0fc42 +Author: Lasse Collin +Date: 2024-11-30 12:05:59 +0200 + + Docs: Remove .github/SECURITY.md + + One of the reasons to have this file in the xz repository was to + show vulnerability reporting info in the Security section on GitHub. + On 2024-11-25, I added SECURITY.md to the tukaani-project organization + on GitHub: + + https://github.com/tukaani-project/.github/blob/main/SECURITY.md + + GitHub shows that file in all projects in the organization unless + overridden by a project-specific SECURITY.md. Thus, removing + the file from the xz repo makes GitHub show the organization-wide + text instead. + + Maintaining a single copy for the whole GitHub organization makes + things simpler. It's also nicer to have fewer GitHub-specific files + in the xz repo. Information how to report bugs (including security + issues) is available in README and on the home page too. + + The OpenSSF Scorecard tool didn't find .github/SECURITY.md from the + xz repository. There was a suggestion to move the file to the top-level + directory where Scorecard should find it. However, Scorecard does find + the organization-wide SECURITY.md. Thus, the file isn't needed in the + xz repository to score points in the Scorecard game: + + https://scorecard.dev/viewer/?uri=github.com/tukaani-project/xz + + Closes: https://github.com/tukaani-project/xz/issues/148 + Closes: https://github.com/tukaani-project/xz/pull/149 + + .github/SECURITY.md | 14 -------------- + 1 file changed, 14 deletions(-) + +commit b36177273602ebc83e9cc58517f63a7b6af33f70 +Author: Lasse Collin +Date: 2024-11-30 10:27:14 +0200 + + Translations: Update the Chinese (traditional) translation + + po/zh_TW.po | 201 +++++++++++++++++++++++++----------------------------------- + 1 file changed, 84 insertions(+), 117 deletions(-) + +commit c15115f7ede492f20c91b08ba485f9426f60233f +Author: Lasse Collin +Date: 2024-10-30 19:54:34 +0200 + + liblzma: Optimize the loop conditions in BCJ filters + + Compilers cannot optimize the addition "i + 4" away since theoretically + it could overflow. + + src/liblzma/simple/arm.c | 4 +++- + src/liblzma/simple/arm64.c | 4 +++- + src/liblzma/simple/armthumb.c | 7 ++++++- + src/liblzma/simple/ia64.c | 4 +++- + src/liblzma/simple/powerpc.c | 4 +++- + src/liblzma/simple/sparc.c | 5 +++-- + 6 files changed, 21 insertions(+), 7 deletions(-) + +commit 9f69e71e78621fd056f5eaaad7cdcd9279310fb5 +Author: Lasse Collin +Date: 2024-11-25 16:26:54 +0200 + + Update THANKS + + THANKS | 1 + + 1 file changed, 1 insertion(+) + +commit 48ff3f06521ca326996ab9a04d1b342098960427 +Author: Mark Wielaard +Date: 2024-11-25 12:28:44 +0200 + + xz: Landlock: Fix a file descriptor leak + + src/xz/sandbox.c | 1 + + 1 file changed, 1 insertion(+) + +commit dbca3d078ec581600600abebbb18769d3d713914 +Author: Sam James +Date: 2024-10-02 03:04:03 +0100 + + CI: update FreeBSD, NetBSD, OpenBSD, Solaris actions + + Checked the changes and they're all innocuous. This should hopefully + fix the "externally managed" pip error in these jobs that started + recently. + + .github/workflows/freebsd.yml | 2 +- + .github/workflows/netbsd.yml | 2 +- + .github/workflows/openbsd.yml | 2 +- + .github/workflows/solaris.yml | 2 +- + 4 files changed, 4 insertions(+), 4 deletions(-) + +commit a94b85bea3f04d8c1f4e2e6f648a9a15bc6ce58f +Author: Lasse Collin +Date: 2024-10-01 12:17:39 +0300 + + Add NEWS for 5.6.3 + + NEWS | 125 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + 1 file changed, 125 insertions(+) + +commit be4bf94446b6286a5dffdde85fc1d21448f4edff +Author: Lasse Collin +Date: 2024-10-01 14:49:41 +0300 + + cmake/tuklib_large_file_support.cmake: Add a missing include + + v5.2 didn't build with CMake. Other branches had + include(CMakePushCheckState) in top-level CMakeLists.txt + which made the build work. + + Fixes: 597f49b61475438a43a417236989b2acc968a686 + + cmake/tuklib_large_file_support.cmake | 1 + + 1 file changed, 1 insertion(+) + +commit 1ebbe915d4e0d877154261b5f8103719a6722975 +Author: Lasse Collin +Date: 2024-10-01 12:10:23 +0300 + + Update THANKS + + THANKS | 2 ++ + 1 file changed, 2 insertions(+) + +commit 74702ee00ecfd080d8ab11118cd25dbe6c437ec0 +Author: Lasse Collin +Date: 2024-10-01 12:10:23 +0300 + + Tests/Windows: Add the application manifest to the test programs + + This ensures that the test programs get executed the same way as + the binaries that are installed. + + CMakeLists.txt | 14 ++++++++++---- + tests/Makefile.am | 10 ++++++++++ + tests/tests.cmake | 33 ++++++++++++++++++++++++++++++++- + tests/tests_w32res.rc | 18 ++++++++++++++++++ + 4 files changed, 70 insertions(+), 5 deletions(-) + +commit 7ddf2273e0e4654582ee65db19d44431bfdb5791 +Author: Lasse Collin +Date: 2024-10-01 12:10:23 +0300 + + license-check.sh: Add an exception for w32_application.manifest + + The file gets embedded as is into executables, thus it cannot + hold a license identifier. + + build-aux/license-check.sh | 1 + + 1 file changed, 1 insertion(+) + +commit 46ee0061629fb075d61d83839e14dd193337af59 +Author: Lasse Collin +Date: 2024-10-01 12:10:23 +0300 + + Windows: Embed an application manifest in the EXE files + + IMPORTANT: This includes a security fix to command line tool + argument handling. + + Some toolchains embed an application manifest by default to declare + UAC-compliance. Some also declare compatibility with Vista/8/8.1/10/11 + to let the app access features newer than those of Vista. + + We want all the above but also two more things: + + - Declare that the app is long path aware to support paths longer + than 259 characters (this may also require a registry change). + + - Force the code page to UTF-8. This allows the command line tools + to access files whose names contain characters that don't exist + in the current legacy code page (except unpaired surrogates). + The UTF-8 code page also fixes security issues in command line + argument handling which can be exploited with malicious filenames. + See the new file w32_application.manifest.comments.txt. + + Thanks to Orange Tsai and splitline from DEVCORE Research Team + for discovering this issue. + + Thanks to Vijay Sarvepalli for reporting the issue to me. + + Thanks to Kelvin Lee for testing with MSVC and helping with + the required build system fixes. + + CMakeLists.txt | 18 +++ + src/Makefile.am | 4 +- + src/common/common_w32res.rc | 5 + + src/common/w32_application.manifest | 28 ++++ + src/common/w32_application.manifest.comments.txt | 178 +++++++++++++++++++++++ + 5 files changed, 232 insertions(+), 1 deletion(-) + +commit dad153091552b52a41b95ec4981c6951f1cae487 +Author: Lasse Collin +Date: 2024-09-29 14:46:52 +0300 + + Windows: Set DLL name accurately in StringFileInfo on Cygwin and MSYS2 + + Now the information in the "Details" tab in the file properties + dialog matches the naming convention of Cygwin and MSYS2. This + is only a cosmetic change. + + src/liblzma/liblzma_w32res.rc | 10 +++++++++- + 1 file changed, 9 insertions(+), 1 deletion(-) + +commit 8940ecb96fe9f0f2a9cfb8b66fe9ed31ffbea904 +Author: Lasse Collin +Date: 2024-09-25 15:47:55 +0300 + + common_w32res.rc: White space edits + + LANGUAGE and VS_VERSION_INFO begin new statements so put an empty line + between them. + + src/common/common_w32res.rc | 15 ++++++++------- + 1 file changed, 8 insertions(+), 7 deletions(-) + +commit c3b9dad07d3fd9319f88386b7095019bcea45ce1 +Author: Lasse Collin +Date: 2024-09-28 20:09:50 +0300 + + CMake: Add the resource files to the Cygwin and MSYS2 builds + + Autotools-based build has always done this so this is for consistency. + + However, the CMake build won't create the DEF file when building + for Cygwin or MSYS2 because in that context it should be useless. + (If Cygwin or MSYS2 is used to host building of normal Windows + binaries then the DEF file is still created.) + + CMakeLists.txt | 16 ++++++++++------ + 1 file changed, 10 insertions(+), 6 deletions(-) + +commit da4f275bd1c18b897e5c2dd0043546de3accce0a +Author: Lasse Collin +Date: 2024-09-28 15:19:14 +0300 + + CMake: Fix Windows resource file dependencies + + If common_w32res.rc is modified, the resource files need to be rebuilt. + In contrast, the liblzma*.map files truly are link dependencies. + + CMakeLists.txt | 17 +++++++++-------- + 1 file changed, 9 insertions(+), 8 deletions(-) + +commit 1c673c0aac7f7dee8dda2c1140351c8417a71e47 +Author: Lasse Collin +Date: 2024-09-29 01:20:03 +0300 + + CMake: Checking for CYGWIN covers MSYS2 too + + On MSYS2, both CYGWIN and MSYS are set. + + CMakeLists.txt | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit 6aaa0173b839e28429d43a8b62d257ad2f3b4521 +Author: Lasse Collin +Date: 2024-09-28 09:37:30 +0300 + + Translations: Add the SPDX license identifier to pt_BR.po + + po/pt_BR.po | 2 ++ + 1 file changed, 2 insertions(+) + +commit dc7b9f24b737e4e55bcbbdde6754883f991c2cfb +Author: Lasse Collin +Date: 2024-09-25 16:41:37 +0300 + + Windows/CMake: Use the correct resource file for lzmadec.exe + + CMakeLists.txt was using xzdec_w32res.rc for both xzdec and lzmadec. + + Fixes: 998d0b29536094a89cf385a3b894e157db1ccefe + + CMakeLists.txt | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit b834ae5f80911a3819d6cdb484f61b257174c544 +Author: Lasse Collin +Date: 2024-09-25 21:29:59 +0300 + + Translations: Update the Brazilian Portuguese translation + + po/pt_BR.po | 144 ++++++++++++++++++++++-------------------------------------- + 1 file changed, 53 insertions(+), 91 deletions(-) + +commit eceb023d4c129fd63ee881a2d8696eaf52ad1532 +Author: Lasse Collin +Date: 2024-09-17 01:21:15 +0300 + + Update THANKS + + THANKS | 1 + + 1 file changed, 1 insertion(+) + +commit 76cfd0a9bb33ae8e534b1f73f6359dc825589f2f +Author: Tobias Stoeckmann +Date: 2024-09-16 23:19:46 +0200 + + lzmainfo: Avoid integer overflow + + The MB output can overflow with huge numbers. Most likely these are + invalid .lzma files anyway, but let's avoid garbage output. + + lzmadec was adapted from LZMA Utils. The original code with this bug + was written in 2005, over 19 years ago. + + Co-authored-by: Lasse Collin + Closes: https://github.com/tukaani-project/xz/pull/144 + + src/lzmainfo/lzmainfo.c | 5 ++--- + 1 file changed, 2 insertions(+), 3 deletions(-) + +commit 78355aebb7fb654302e5e33692ba109909dacaff +Author: Tobias Stoeckmann +Date: 2024-09-16 22:04:40 +0200 + + xzdec: Remove unused short option -M + + "xzdec -M123" exited with exit status 1 without printing + any messages. The "M:" entry should have been removed when + the memory usage limiter support was removed from xzdec. + + Fixes: 792331bdee706aa852a78b171040ebf814c6f3ae + Closes: https://github.com/tukaani-project/xz/pull/143 + [ Lasse: Commit message edits ] + + src/xzdec/xzdec.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit e5758db7bd75587a2499e0771907521a4aa86908 +Author: Lasse Collin +Date: 2024-09-10 13:54:47 +0300 + + Update THANKS + + THANKS | 1 + + 1 file changed, 1 insertion(+) + +commit 80ffa38f56657257ed4d90d76f6bd2f2bcb8163c +Author: Firas Khalil Khana +Date: 2024-09-10 12:30:32 +0300 + + Build: Fix a typo in autogen.sh + + Fixes: e9be74f5b129fe8a5388d588e68b1b7f5168a310 + Closes: https://github.com/tukaani-project/xz/pull/141 + + autogen.sh | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit 68c54e45d042add64a4cb44bfc87ca74d29b87e2 +Author: Lasse Collin +Date: 2024-09-02 20:08:40 +0300 + + Translations: Update Chinese (simplified) translation + + Differences to the zh_CN.po file from the Translation Project: + + - Two uses of \v were fixed. + + - Missing "OPTS" translation in --riscv[=OPTS] was copied from + previous lines. + + - "make update-po" was run to remove line numbers from comments. + + po/zh_CN.po | 102 ++++++++++++++++++++++++------------------------------------ + 1 file changed, 40 insertions(+), 62 deletions(-) + +commit 2230692aa1bcebb586100183831e3daf1714d60a +Author: Lasse Collin +Date: 2024-09-02 19:40:50 +0300 + + Translations: Update the Catalan translation + + Differences to the ca.po file from the Translation Project: + + - An overlong line translating --filters-help was wrapped. + + - "make update-po" was used to remove line numbers from the comments + to match the changes in fccebe2b4fd513488fc920e4dac32562ed3c7637 + and 093490b58271e9424ce38a7b1b38bcf61b9c86c6. xz.pot in the TP + is older than these commits. + + po/ca.po | 171 ++++++++++++++++++++++++++------------------------------------- + 1 file changed, 69 insertions(+), 102 deletions(-) + +commit 3e7723ce26f74c71919984a6180504b4548cbb7e +Author: Lasse Collin +Date: 2024-08-22 14:06:16 +0300 + + Update THANKS + + THANKS | 1 + + 1 file changed, 1 insertion(+) + +commit d3e0e679b2b8b428598bb8ba56a17715190814db +Author: Lasse Collin +Date: 2024-08-22 14:06:16 +0300 + + CMake: Don't install lzmadec.1 symlinks if XZ_TOOL_LZMADEC=OFF + + Thanks-to: 榆柳松 (ZhengSen Wang) + Fixes: fb50c6ba1d4c9405e5b12b5988b01a3002638c5d + Closes: https://github.com/tukaani-project/xz/pull/134 + + CMakeLists.txt | 12 ++++++++++-- + 1 file changed, 10 insertions(+), 2 deletions(-) + +commit acdf21033abe347d9a279e9fe757f90ed16c1dbb +Author: Lasse Collin +Date: 2024-08-22 14:06:16 +0300 + + CMake: Fix the build when XZ_TOOL_LZMADEC=OFF + + Co-developed-by: 榆柳松 (ZhengSen Wang) + Fixes: fb50c6ba1d4c9405e5b12b5988b01a3002638c5d + Fixes: https://github.com/tukaani-project/xz/pull/134 + + CMakeLists.txt | 6 ++++-- + 1 file changed, 4 insertions(+), 2 deletions(-) + +commit 5e375987509fab484b7bef0b90be92f241c58c91 +Author: Lasse Collin +Date: 2024-08-22 11:01:07 +0300 + + Update THANKS + + THANKS | 1 + + 1 file changed, 1 insertion(+) + +commit 6cd7c8607843c337edfe2c472aa316602a393754 +Author: Yifeng Li +Date: 2024-08-22 02:18:49 +0000 + + liblzma: Fix x86-64 movzw compatibility in range_decoder.h + + Support for instruction "movzw" without suffix in "GNU as" was + added in commit [1] and stabilized in binutils 2.27, released + in August 2016. Earlier systems don't accept this instruction + without a suffix, making range_decoder.h's inline assembly + unable to build on old systems such as Ubuntu 16.04, creating + error messages like: + + lzma_decoder.c: Assembler messages: + lzma_decoder.c:371: Error: no such instruction: `movzw 2(%r11),%esi' + lzma_decoder.c:373: Error: no such instruction: `movzw 4(%r11),%edi' + lzma_decoder.c:388: Error: no such instruction: `movzw 6(%r11),%edx' + lzma_decoder.c:398: Error: no such instruction: `movzw (%r11,%r14,4),%esi' + + Change "movzw" to "movzwl" for compatibility. + + [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=c07315e0c610e0e3317b4c02266f81793df253d2 + + Suggested-by: Lasse Collin + Tested-by: Yifeng Li + Signed-off-by: Yifeng Li + Fixes: 3182a330c1512cc1f5c87b5c5a272578e60a5158 + Fixes: https://github.com/tukaani-project/xz/issues/121 + Closes: https://github.com/tukaani-project/xz/pull/136 + + src/liblzma/rangecoder/range_decoder.h | 24 ++++++++++++------------ + 1 file changed, 12 insertions(+), 12 deletions(-) + +commit bf901dee5d4c46609645e50311c0cb2dfdcf9738 +Author: Lasse Collin +Date: 2024-07-19 20:02:43 +0300 + + Build: Comment that elf_aux_info(3) will be available on OpenBSD >= 7.6 + + CMakeLists.txt | 2 +- + configure.ac | 17 +++++++++++------ + 2 files changed, 12 insertions(+), 7 deletions(-) + +commit f7103c2c2a8fa51d1f308ba7387beeff20a0d4dd +Author: Lasse Collin +Date: 2024-07-19 19:42:26 +0300 + + Revert "liblzma: Add ARM64 CRC32 instruction support detection on OpenBSD" + + This reverts commit dc03f6290f5b9bd3d50c7e12e58dee870889d599. + + OpenBSD 7.6 will support elf_aux_info(3), and the detection code used + on FreeBSD will work on OpenBSD 7.6 too. Keep things simpler and drop + the OpenBSD-specific sysctl() method. + + Thanks to Christian Weisgerber. + + CMakeLists.txt | 6 ------ + configure.ac | 9 --------- + src/liblzma/check/crc32_arm64.h | 15 --------------- + src/liblzma/check/crc_common.h | 1 - + 4 files changed, 31 deletions(-) + +commit 7c292dd0bf23cefcdf4b1509f3666322e08a7ede +Author: Lasse Collin +Date: 2024-07-13 22:10:37 +0300 + + liblzma: Tweak a comment + + src/liblzma/simple/arm64.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +commit 6408edac5529d6ec0abf52794074f229c8362303 +Author: Lasse Collin +Date: 2024-07-11 22:17:56 +0300 + + CMake: Bump maximum policy version to 3.30 + + CMakeLists.txt | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit 9231c39ffb518196d6664a86e5325e744621a21b +Author: Lasse Collin +Date: 2024-07-06 15:13:19 +0300 + + CMake: Require CMake 3.20 or later + + This allows a few cleanups. + + CMakeLists.txt | 78 ++++++++++++++++++++-------------------------------------- + 1 file changed, 27 insertions(+), 51 deletions(-) + +commit 028185dd4889e3d6235ff13560160ebca6985021 +Author: Lasse Collin +Date: 2024-07-09 14:27:51 +0300 + + Update THANKS + + THANKS | 1 + + 1 file changed, 1 insertion(+) + +commit baecfa142644eb5f5c6dd6f8e2f531c362fa3747 +Author: Lasse Collin +Date: 2024-07-06 14:04:48 +0300 + + xz: Remove the TODO comment about --recursive + + It won't be implemented. find + xargs is more flexible, for example, + it allows compressing small files in parallel. An example for that + has been included in the xz man page since 2010. + + src/xz/args.c | 1 - + 1 file changed, 1 deletion(-) + +commit f691d58fae82bd815c5f86ffad10fe9b6b59dad8 +Author: Lasse Collin +Date: 2024-07-06 14:04:16 +0300 + + Document --disable-loongarch-crc32 in INSTALL + + INSTALL | 8 ++++++++ + 1 file changed, 8 insertions(+) + +commit b3e53122f42796aaebd767bab920cf7bedf69966 +Author: Lasse Collin +Date: 2024-07-03 20:45:48 +0300 + + CMake: Link xz against Threads::Threads if using pthreads + + The liblzma target was recently changed to link against Threads::Threads + with the PRIVATE keyword. I had forgotten that xz itself depends on + pthreads too due to pthread_sigmask(). Thus, the build broke when + building shared liblzma and pthread_sigmask() wasn't in libc. + + Thanks to Peter Seiderer for the bug report. + + Fixes: ac05f1b0d7cda1e7ae79775a8dfecc54601d7f1c + Fixes: https://github.com/tukaani-project/xz/issues/129#issuecomment-2204522994 + + CMakeLists.txt | 13 +++++++++++++ + 1 file changed, 13 insertions(+) + +commit 5742ec1fc7f2cf1c82cfe3477bb90594a4658374 +Author: Lasse Collin +Date: 2024-07-02 22:49:33 +0300 + + Update THANKS + + THANKS | 1 + + 1 file changed, 1 insertion(+) + +commit 2d13d10357ecad243d7e4ff1de0e6b437c38a47a +Author: Lasse Collin +Date: 2024-07-02 20:23:35 +0300 + + CMake: Improve NLS error messages + + CMakeLists.txt | 11 +++++++---- + 1 file changed, 7 insertions(+), 4 deletions(-) + +commit 628d8d2c4fdf9e6a91c7bba7a743f400a94c2909 +Author: Lasse Collin +Date: 2024-07-02 20:19:47 +0300 + + CMake: Update the comment at the top of CMakeLists.txt + + While po/*.gmo files won't be used from the release tarball, + the generated translated man pages will be used still. Those + are text files and po4a has slightly more dependencies than + gettext tools so installing po4a might be a bit more challenging + in some situations. + + CMakeLists.txt | 17 +++++++---------- + 1 file changed, 7 insertions(+), 10 deletions(-) + +commit b4b23c94fd4429abc663ced28d5cdc9cf7eb7507 +Author: Lasse Collin +Date: 2024-07-02 20:12:40 +0300 + + CMake: Drop support for pre-generated po/*.gmo files + + When a release tarball is created using Autotools, the tarball includes + po/*.gmo files which are binary files generated from po/*.po. Other + tarball creation methods don't and won't create the .gmo files. + + It feels clearer if CMake will never install pre-generated binary files + from the source package. If people are able to install CMake, they + likely are able to install gettext tools as well (assuming they want + translations). + + CMakeLists.txt | 66 +++++++++++++++++++--------------------------------------- + 1 file changed, 21 insertions(+), 45 deletions(-) + +commit fb99f8e8c50171b898cb79fe1dc703d5f91e4f0a +Author: Lasse Collin +Date: 2024-07-02 19:14:50 +0300 + + CMake: Make XZ_NLS handling more robust + + If a user set XZ_NLS=ON but find_package(Intl) failed or CMake version + wasn't at least 3.20, the configuration would fail in a cryptic way. + + If XZ_NLS is enabled, require that CMake is new enough and that either + gettext tools or pre-generated .gmo files are available. Otherwise fail + the configuration. Previously missing gettext tools and .gmo files would + only result in a warning. + + Missing man page translations are still only a warning. + + Thanks to Peter Seiderer for the bug report. + + Fixes: https://github.com/tukaani-project/xz/issues/129 + Closes: https://github.com/tukaani-project/xz/pull/130 + + CMakeLists.txt | 82 ++++++++++++++++++++++++++++++++-------------------------- + 1 file changed, 46 insertions(+), 36 deletions(-) + +commit ec6157570ea8a8e38158894e530d35416ff6a0f8 +Author: Lasse Collin +Date: 2024-07-02 19:39:05 +0300 + + CI: Add gettext as a dependency to CMake builds + + .github/workflows/ci.yml | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +commit 24f0f7e399de03bb2ff675d97b723d14f17ed6ac +Author: Lasse Collin +Date: 2024-07-02 18:43:56 +0300 + + CMake: Fix ENABLE_NLS comment too + + Fixes: 29f77c7b707f2458fb047e77497354b195e05b14 + + CMakeLists.txt | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit a0df0676130bc565af0ec911e68a1d0fbc3ed0fb +Author: Lasse Collin +Date: 2024-07-02 18:02:50 +0300 + + CMake: The compile definition is ENABLE_NLS, not XZ_NLS + + The CMake variables were renamed and accidentally also + the compile definition was renamed. As a result, translation + support wasn't actually enabled in the executables. + + Fixes: 29f77c7b707f2458fb047e77497354b195e05b14 + + CMakeLists.txt | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +commit 45d08abc33ccc52d2f050dcec458badc2ce59d0b +Author: Lasse Collin +Date: 2024-07-01 17:33:20 +0300 + + Update AUTHORS and THANKS + + AUTHORS | 2 +- + THANKS | 1 + + 2 files changed, 2 insertions(+), 1 deletion(-) + +commit 7baf6835cfbf9c85ba37f9ffb7d4f87fb86a474e +Author: Xi Ruoyao +Date: 2024-06-28 13:36:43 +0300 + + liblzma: Speed up CRC32 calculation on 64-bit LoongArch + + The crc.w.{b/h/w/d}.w instructions in LoongArch can calculate the CRC32 + result for 1/2/4/8 bytes in a single operation. Using these is much + faster compared to the generic method. + + Optimized CRC32 is enabled unconditionally on 64-bit LoongArch because + the LoongArch specification says that CRC32 instructions shall be + implemented for 64-bit processors. Optimized CRC32 isn't enabled for + 32-bit LoongArch processors because not enough information is available + about them. + + Co-authored-by: Lasse Collin + + Closes: https://github.com/tukaani-project/xz/pull/86 + + CMakeLists.txt | 25 ++++++++++++++ + configure.ac | 40 +++++++++++++++++++++++ + src/liblzma/check/Makefile.inc | 3 +- + src/liblzma/check/crc32_fast.c | 2 ++ + src/liblzma/check/crc32_loongarch.h | 65 +++++++++++++++++++++++++++++++++++++ + src/liblzma/check/crc_common.h | 15 +++++++++ + 6 files changed, 149 insertions(+), 1 deletion(-) + +commit 0ed893668554fb0758003289f8a6af9bd08b89d1 +Author: Lasse Collin +Date: 2024-06-28 14:20:49 +0300 + + liblzma: ARM64 CRC32: Align the buffer faster + + Instead of doing it byte by byte, use the 1/2/4-byte CRC32 instructions. + + src/liblzma/check/crc32_arm64.h | 54 ++++++++++++++++++++++++++++++----------- + 1 file changed, 40 insertions(+), 14 deletions(-) + +commit 7e99856f66c07852c4e0de7aa01951e9147d86b0 +Author: Sam James +Date: 2024-06-28 14:18:35 +0300 + + CI: Speed up Valgrind job by using --trace-children-skip-by-arg=... + + This addresses the issue I mentioned in + 6c095a98fbec70b790253a663173ecdb669108c4 and speeds up the Valgrind + job a bit, because non-xz tools aren't run unnecessarily with + Valgrind by the script tests. + + .github/workflows/ci.yml | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit 2402e8a1ae92676fa0d4cb1b761d7f62f005c098 +Author: Lasse Collin +Date: 2024-06-25 16:00:22 +0300 + + Build: Prepend, not append, PTHREAD_CFLAGS to LIBS + + It shouldn't make any difference because LIBS should be empty + at that point in configure. But prepending is the correct way + because in general the libraries being added might require other + libraries that come later on the command line. + + configure.ac | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit 7bb46f2b7b3989c1b589a247a251470f65e91cda +Author: Lasse Collin +Date: 2024-06-25 14:24:29 +0300 + + Build: Use AC_LINK_IFELSE to handle implicit function declarations + + It's more robust in case the compiler allows pre-C99 implicit function + declarations. If an x86 intrinsic is missing and gets treated as + implicit function, the linking step will very probably fail. This + isn't the only way to workaround implicit function declarations but + it might be the simplest and cleanest. + + The problem hasn't been observed in the wild. + + There are a couple more AC_COMPILE_IFELSE uses in configure.ac. + Of these, Landlock check calls prctl() and in theory could have + the same problem. In practice it doesn't as the check program + looks for several other things too. However, it was changed to + AC_LINK_IFELSE still to look more correct. + + Similarly, m4/tuklib_cpucores.m4 and m4/tuklib_physmem.m4 were + updated although they haven't given any trouble either. They + have worked all these years because those check programs rely + on specific headers and types: if headers or types are missing, + compilation will fail. Using the linker makes these checks more + similar to the ones in cmake/tuklib_*.cmake which always link. + + configure.ac | 8 ++++++-- + m4/tuklib_cpucores.m4 | 8 ++++---- + m4/tuklib_physmem.m4 | 17 +++++++++++------ + 3 files changed, 21 insertions(+), 12 deletions(-) + +commit 35eb57355ad1c415a838d26192d5af84abb7cf39 +Author: Lasse Collin +Date: 2024-06-24 23:35:59 +0300 + + Build: Use AC_LINK_IFELSE instead of -Werror + + AC_COMPILE_IFELSE needed -Werror because Clang <= 14 would merely + warn about the unsupported attribute and implicit function declaration. + Changing to AC_LINK_IFELSE handles the implicit declaration because + the symbol __crc32d is unlikely to exist in libc. + + Note that the other part of the check is that #include + must work. If the header is missing, most compilers give an error + and the linking step won't be attempted. + + Avoiding -Werror makes the check more robust in case CFLAGS contains + warning flags that break -Werror anyway (but this isn't the only check + in configure.ac that has this problem). Using AC_LINK_IFELSE also makes + the check more similar to how it is done in CMakeLists.txt. + + configure.ac | 12 +----------- + 1 file changed, 1 insertion(+), 11 deletions(-) + +commit 5a728813c378cc3c4c9c95793762452418d08f1b +Author: Lasse Collin +Date: 2024-06-24 23:34:34 +0300 + + Build: Sync the compile check changes from CMakeLists.txt + + It's nice to keep these in sync. The use of main() will later allow + AC_LINK_IFELSE usage too which may avoid the more fragile -Werror. + + configure.ac | 15 ++++++++------- + 1 file changed, 8 insertions(+), 7 deletions(-) + +commit 5279828635a95abdef82e691fc4979d362780e63 +Author: Lasse Collin +Date: 2024-06-24 20:14:43 +0300 + + CMake: Not experimental anymore + + While the CMake support has gotten a lot less testing than + the Autotools-based build, the supported features should now + be equal. The output may differ slightly, for example, + liblzma.pc may have + + Libs.private: -pthread -lpthread + + with Autotools on GNU/Linux. CMake doesn't put any options + in Libs.private because on modern glibc the pthread functions + are in libc. The options options aren't required to link static + liblzma into an application. + + Autotools-based build doesn't generate or install + lib/cmake/liblzma-*.cmake files. This means that on most + platforms one cannot rely on + + find_package(liblzma 5.2.5 REQUIRED CONFIG) + + or such finding those files. + + CMakeLists.txt | 9 ++++++--- + 1 file changed, 6 insertions(+), 3 deletions(-) + +commit de215a0517645d16343f3a5336d3df884a4f665f +Author: Lasse Collin +Date: 2024-06-25 16:11:13 +0300 + + CMake: Use configure_file() to copy a file + + I had missed this simpler method before. It does create a dependency + so that if .in.h changes the copying is done again. + + CMakeLists.txt | 17 +++++++---------- + 1 file changed, 7 insertions(+), 10 deletions(-) + +commit e620f35097c0ad20cd76d8258750aa706758ced9 +Author: Lasse Collin +Date: 2024-06-25 15:51:48 +0300 + + CMake: Always add pthread flags into CMAKE_REQUIRED_LIBRARIES + + It was weird to add CMAKE_THREAD_LIBS_INIT in CMAKE_REQUIRED_LIBRARIES + only if CLOCK_MONOTONIC is available. Alternative would be to remove + the thread libs from CMAKE_REQUIRED_LIBRARIES after the check for + pthread_condattr_setclock() but keeping the libs should be fine too. + Then it's ready in case more pthread functions were wanted some day. + + CMakeLists.txt | 6 ++++-- + 1 file changed, 4 insertions(+), 2 deletions(-) + +commit 068a70e54932ca32ca2922aff5a67a62615c650b +Author: Sam James +Date: 2024-06-24 19:25:30 +0100 + + CMake: Tweak comments + + Co-authored-by: Lasse Collin + + CMakeLists.txt | 15 +++++++-------- + 1 file changed, 7 insertions(+), 8 deletions(-) + +commit 3c95c93bca593bdd54ac5cc01526b12c82c78faa +Author: Lasse Collin +Date: 2024-06-24 22:42:01 +0300 + + CMake: Edit white space for consistency + + CMakeLists.txt | 26 +++++++++++++------------- + 1 file changed, 13 insertions(+), 13 deletions(-) + +commit 114cba69dbb96003e676c8c87a2e9943b12d065f +Author: Lasse Collin +Date: 2024-06-24 22:41:10 +0300 + + CMake: Fix three checks if building with -flto + + In CMake, check_c_source_compiles() always links too. With + link-time optimization, unused functions may get omitted if + main() doesn't depend on them. Consider the following which + tries to check if somefunction() is available when + has been included: + + #include + int foo(void) { return somefunction(); } + int main(void) { return 0; } + + LTO may omit foo() completely because the program as a whole doesn't + need it and then the program will link even if the symbol somefunction + isn't available in libc or other library being linked in, and then + the test may pass when it shouldn't. + + What happens if doesn't declare somefunction()? + Shouldn't the test fail in the compilation phase already? It should + but many compilers don't follow the C99 and later standards that + prohibit implicit function declarations. Instead such compilers + assume that somefunction() exists, compilation succeeds (with a + warning), and then linker with LTO omits the call to somefunction(). + + Change the tests so that they are part of main(). If compiler accepts + implicitly declared functions, LTO cannot omit them because it has to + assume that they might have side effects and thus linking will fail. + On the other hand, if the functions/intrinsics being used are supported, + they might get optimized away but in that case it's fine because they + really are supported. + + It is fine to use __attribute__((target(...))) for main(). At least + it works with GCC 4.9 to 14.1 on x86-64. + + Reported-by: Sam James + + CMakeLists.txt | 19 ++++++++----------- + 1 file changed, 8 insertions(+), 11 deletions(-) + +commit 78e882205e1f1e91df2af2cb7da00fe205dede99 +Author: Lasse Collin +Date: 2024-06-24 21:19:14 +0300 + + CMake: Use MATCHES instead of multiple STREQUAL + + CMakeLists.txt | 11 ++++------- + 1 file changed, 4 insertions(+), 7 deletions(-) + +commit d3f20382fc1bd865eb70a65455d5022ed05caac8 +Author: Lasse Collin +Date: 2024-06-24 21:06:18 +0300 + + CMake: Improve the comment about LIBS + + CMakeLists.txt | 6 ++++++ + 1 file changed, 6 insertions(+) + +commit 33ec377729a3889e58d98934b2777b2754a3e045 +Author: Lasse Collin +Date: 2024-06-24 20:01:25 +0300 + + CMake: Fix a typo in a message + + It was spotted with codespell. + + CMakeLists.txt | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit 2a47be823cd6c717bc91fa29c7710c9b1ae0331f +Author: Lasse Collin +Date: 2024-06-24 19:58:54 +0300 + + Document CMake options in INSTALL + + INSTALL | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----- + 1 file changed, 106 insertions(+), 9 deletions(-) + +commit 3faf4e8079a46bd46e05cd1234365724a6a33802 +Author: Lasse Collin +Date: 2024-06-24 17:18:44 +0300 + + CI: Don't omit crc32 from the list with CMake anymore + + XZ_CHECKS accepts it but works without too. + + build-aux/ci_build.bash | 10 +--------- + 1 file changed, 1 insertion(+), 9 deletions(-) + +commit 1bf83cded2955282fe1a868f08c83d4e5d6dca4a +Author: Lasse Collin +Date: 2024-06-24 17:39:54 +0300 + + CI: Workaround buggy config.guess on Ubuntu 22.04LTS and 24.04LTS + + Check for the wrong triplet from config.guess and override it with + the --build option on the configure command line. Then i386 assembly + autodetection will work. + + These Ubuntu versions (and as of writing, also Debian unstable) + ship config.guess version 2022-01-09 which contains a bug that + was fixed in version 2022-05-08. It results in a wrong configure + triplet when using CC="gcc -m32" to build i386 binaries. + + Upstream fix: + https://git.savannah.gnu.org/cgit/config.git/commit/?id=f56a7140386d08a531bcfd444d632b28c61a6329 + + More information: + https://mail.gnu.org/archive/html/config-patches/2022-05/msg00003.html + + build-aux/ci_build.bash | 9 +++++++++ + 1 file changed, 9 insertions(+) + +commit dbcdabf68fee9ed694b68c3a82e6adbeff20b679 +Author: Lasse Collin +Date: 2024-06-24 15:24:52 +0300 + + CI: Use CC="gcc -m32" to get i386 compiler on x86-64 + + The old method put it in CFLAGS which is a wrong place because + config.guess doesn't read CFLAGS. + + .github/workflows/ci.yml | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +commit 0c1e6d900bac127464fb30a854776e1810ab5f16 +Author: Lasse Collin +Date: 2024-06-24 14:54:17 +0300 + + CI: Let CMake use the CC environment variable + + CC from environment is used to initialize CMAKE_C_COMPILER so + setting CMAKE_C_COMPILER explicitly isn't needed. + + The syntax in ci_build.bash was broken in case one wished to put + spaces in CC. + + build-aux/ci_build.bash | 4 ---- + 1 file changed, 4 deletions(-) + +commit a3d6eb797c1bd9b0425ef6754e475e43e62bf075 +Author: Lasse Collin +Date: 2024-06-20 23:25:42 +0300 + + CMake: Add autodetection for 32-bit x86 CRC assembly usage + + CMakeLists.txt | 33 ++++++++++++++++++--------------- + 1 file changed, 18 insertions(+), 15 deletions(-) + +commit dbc14f213e5cf866f1f42b7c6381a91e1189908c +Author: Lasse Collin +Date: 2024-06-20 23:00:59 +0300 + + CMake: Move option(XZ_ASM_I386) downwards a few lines + + CMakeLists.txt | 16 ++++++++-------- + 1 file changed, 8 insertions(+), 8 deletions(-) + +commit e5c2b07b489b155c1bebd5cb5e5b94325c2fef1a +Author: Lasse Collin +Date: 2024-06-20 18:45:41 +0300 + + DOS: Update Makefile and config.h for the CRC changes + + dos/Makefile | 4 ++-- + dos/config.h | 3 +++ + 2 files changed, 5 insertions(+), 2 deletions(-) + +commit fe77c4e130d62dc3f9c1de40a18c0c6caa5a4d88 +Author: Lasse Collin +Date: 2024-06-23 15:35:35 +0300 + + liblzma: Tidy up crc_common.h + + Prefix ARM64_RUNTIME_DETECTION with CRC_ and reorder it to be with + the other ARM64-specific lines. That macro isn't used outside this + file. + + ARM64 CLMUL implementation doesn't exist yet and thus CRC64_ARM64_CLMUL + isn't used anywhere yet. + + It's not ideal that the single-letter CRC utility macros are here + as they pollute the namespace of the LZ encoder files. Those could + be moved their own crc_macros.h like they were in 5.2.x but in practice + this is fine enough already. + + src/liblzma/check/crc_common.h | 62 ++++++++++++++++++++++++++++-------------- + 1 file changed, 42 insertions(+), 20 deletions(-) + +commit 7484d375384f551d475ff44a93590a225e0cb8f6 +Author: Lasse Collin +Date: 2024-06-23 14:22:08 +0300 + + liblzma: Move lzma_crcXX_table[][] declarations to crc_common.h + + LZ encoder needs lzma_crc32_table[0] but otherwise those tables + are private to the CRC code. In contrast, the other things in + check.h are needed in several places. + + src/liblzma/check/check.h | 18 ------------------ + src/liblzma/check/crc32_small.c | 3 +++ + src/liblzma/check/crc_common.h | 18 ++++++++++++++++++ + src/liblzma/lz/lz_encoder_hash.h | 4 ++-- + 4 files changed, 23 insertions(+), 20 deletions(-) + +commit 85b081f5d4598342b8c155a2c08697fb2adc372c +Author: Lasse Collin +Date: 2024-06-19 18:38:22 +0300 + + liblzma: Make 32-bit x86 CRC assembly co-exist with CLMUL + + Now runtime detection of CLMUL support can pick between the CLMUL and + the generic assembly implementations. Whatever overhead this has for + builds that omit CLMUL completely isn't important because builds for + any non-ancient system is likely to include the CLMUL code too. + + Handle the CRC tables in crcXX_fast.c files because now these files + are built even when assembly code is used. + + If 32-bit x86 assembly is enabled then it will always be built even + if compiler flags were such that CLMUL would be allowed unconditionally. + That is, runtime detection will be used anyway. This keeps the build + rules simpler. + + In LZ encoder, build and use lzma_lz_hash_table[256] if CLMUL CRC + is used without runtime detection. Previously this wasn't needed + because crc32_table.c included the lzma_crc32_table[][] in the build + unless encoder support had been disabled. Including an 8 KiB table + was silly when only 1 KiB is actually used. So now liblzma is 7 KiB + smaller if CLMUL is enabled without runtime detection. + + CMakeLists.txt | 8 ++------ + src/liblzma/check/Makefile.inc | 8 ++------ + src/liblzma/check/crc32_fast.c | 14 ++++++++++++- + src/liblzma/check/crc32_table.c | 42 --------------------------------------- + src/liblzma/check/crc32_x86.S | 14 +++++-------- + src/liblzma/check/crc64_fast.c | 18 +++++++++++++---- + src/liblzma/check/crc64_table.c | 37 ---------------------------------- + src/liblzma/check/crc64_x86.S | 14 +++++-------- + src/liblzma/check/crc_common.h | 18 +++++++++-------- + src/liblzma/check/crc_x86_clmul.h | 5 ----- + src/liblzma/lz/lz_encoder.c | 2 +- + src/liblzma/lz/lz_encoder_hash.h | 30 ++++++++++++++++++++-------- + 12 files changed, 74 insertions(+), 136 deletions(-) + +commit 6667d503b5dc9826654e3d9ad505e1883ff6c388 +Author: Lasse Collin +Date: 2024-06-19 17:44:41 +0300 + + liblzma: CRC: Rename crcXX_generic to lzma_crcXX_generic + + This prepares for the possibility that lzma_crc32_generic and + lzma_crc64_generic are extern functions. + + src/liblzma/check/crc32_fast.c | 6 +++--- + src/liblzma/check/crc64_fast.c | 6 +++--- + 2 files changed, 6 insertions(+), 6 deletions(-) + +commit 1dca581ff20aa1cde61e9e5267d3aeb0af9b6845 +Author: Lasse Collin +Date: 2024-06-20 22:55:22 +0300 + + CMake: Define HAVE_CRC_X86_ASM when 32-bit x86 CRC assembly is used + + CMakeLists.txt | 3 +++ + 1 file changed, 3 insertions(+) + +commit f76837acb65676e541d8ee79cd62dbbf27280a62 +Author: Lasse Collin +Date: 2024-05-10 16:00:26 +0300 + + Build: Define HAVE_CRC_X86_ASM when 32-bit x86 CRC assembly is used + + This makes it easier to determine when the CRC tables are needed. + + configure.ac | 9 +++++++-- + 1 file changed, 7 insertions(+), 2 deletions(-) + +commit 9ce0866b070850da4dc837741ff055faa218bdd6 +Author: Lasse Collin +Date: 2024-06-21 00:46:09 +0300 + + CI: Update to the new renamed options in CMakeLists.txt + + build-aux/ci_build.bash | 10 +++++----- + 1 file changed, 5 insertions(+), 5 deletions(-) + +commit 0232e66d5bc5b01a25a447c657e51747626488ab +Author: Lasse Collin +Date: 2024-06-20 18:12:22 +0300 + + CMake: Add XZ_EXTERNAL_SHA256 + + CMakeLists.txt | 121 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--- + 1 file changed, 116 insertions(+), 5 deletions(-) + +commit 4535b80caead82a7ddf7feb988b8fbc773152522 +Author: Lasse Collin +Date: 2024-06-20 18:12:21 +0300 + + CMake: Move threading detection a few lines up + + It feels clearer this way, and when support for external SHA-256 + is added, this will keep the order of the library detection the + same as in configure.ac (check for pthreads before libmd) although + it shouldn't matter in practice. + + CMakeLists.txt | 176 ++++++++++++++++++++++++++++----------------------------- + 1 file changed, 88 insertions(+), 88 deletions(-) + +commit 94d062dbac34d366eb26625034200cc3457e6645 +Author: Lasse Collin +Date: 2024-06-20 18:12:21 +0300 + + CMake: Move the sandbox code out of the liblzma section + + Sandboxing is for the command line tools, not liblzma. + No functional changes. + + CMakeLists.txt | 214 ++++++++++++++++++++++++++++----------------------------- + 1 file changed, 107 insertions(+), 107 deletions(-) + +commit 75ce4797d49621710e6da95d8cb91541028c6d68 +Author: Lasse Collin +Date: 2024-06-20 18:12:21 +0300 + + CMake: Keep existing options in LIBS when adding -lrt + + This makes no difference yet because -lrt is currently the only option + that might be added to LIBS. + + CMakeLists.txt | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit 47aaa92516fd9609821d04e5e94ca6558e56d62b +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Don't install scripts if the xz tool isn't built + + The scripts need the xz tool. + + CMakeLists.txt | 11 +++++++++-- + tests/tests.cmake | 2 +- + 2 files changed, 10 insertions(+), 3 deletions(-) + +commit fb50c6ba1d4c9405e5b12b5988b01a3002638c5d +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Add XZ_TOOL_XZDEC and XZ_TOOL_LZMADEC + + CMakeLists.txt | 15 ++++++++++++++- + 1 file changed, 14 insertions(+), 1 deletion(-) + +commit def767f7d18ccbd81cd5e5b46c8b6031f3a1de34 +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Add XZ_TOOL_LZMAINFO + + CMakeLists.txt | 4 +++- + 1 file changed, 3 insertions(+), 1 deletion(-) + +commit 5600e370fb7e11eafabc6c3ef5bf6510e859f4f0 +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Add XZ_TOOL_XZ + + CMakeLists.txt | 4 +++- + 1 file changed, 3 insertions(+), 1 deletion(-) + +commit 6a3c4aaa43a90da441e1156c5ffd2e6098f5521f +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + Windows: Drop Visual Studio 2013 support + + This simplifies things a little. Building liblzma with VS2013 probably + still worked but building the command line tools was not supported. + + Microsoft ended support for VS2013 on 2024-04. + + CMakeLists.txt | 9 +++++++-- + src/common/sysdefs.h | 6 +----- + windows/INSTALL-MSVC.txt | 8 ++------ + 3 files changed, 10 insertions(+), 13 deletions(-) + +commit 5d5c92b26246936461a635dda1f95740d7de2058 +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Add XZ_TOOL_SCRIPTS + + CMakeLists.txt | 44 +++++++++++++++++++++++++++++--------------- + 1 file changed, 29 insertions(+), 15 deletions(-) + +commit d274a2bc00d235f07e96aaf82c149794cfe82b12 +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Add XZ_DOC + + CMakeLists.txt | 45 ++++++++++++++++++++++++--------------------- + 1 file changed, 24 insertions(+), 21 deletions(-) + +commit 188143a50ade67253ed256608f50f78aa1380403 +Author: Lasse Collin +Date: 2024-06-20 21:53:03 +0300 + + CMake: Refactor XZ_SYMBOL_VERSIONING to match configure.ac + + Make the available options and their behavior match + --enable-symbol-versions in configure.ac. + + Don't enable symbol versions on Linux if not using glibc. Previously + the generic variant was selected on Microblaze or if using NVHPC + without checking that libc is glibc. + + Leave the cache variable to "auto" or "yes" if that was specified + instead of setting it to the autodetected value by default. A downside + is that one cannot easily see which variant the autodetection code + has selected. The same applies to XZ_SANDBOX and XZ_THREADS though. + + CMakeLists.txt | 125 ++++++++++++++++++++++++++++++++++----------------------- + 1 file changed, 75 insertions(+), 50 deletions(-) + +commit cc52ef8ed3b75a581262c587f6c06c213a550f86 +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Use the same option list for XZ_THREADS as in configure.ac + + Also clarify that "yes" will fail if no threading support is found. + If no threading is wanted, it has to be disabled manually. + + configure.ac doesn't behave this way at the moment. Instead it + assumes pthreads to be present if not targeting Windows. If pthreads + actually are missing, the build fails later. + + CMakeLists.txt | 18 ++++++++++-------- + 1 file changed, 10 insertions(+), 8 deletions(-) + +commit 37f7af3452bab0a34ce320c2ad532835f18752d9 +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Use the same option list for XZ_SANDBOX as in configure.ac + + It's simpler to document this way. + + CMakeLists.txt | 20 ++++++++++---------- + 1 file changed, 10 insertions(+), 10 deletions(-) + +commit c715dec8e800b65145918cfb0ee9bbc90faa8aad +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Fix indentation + + CMakeLists.txt | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit ea379f2f180befabd2039342db8eaeb757fdd2b7 +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Add warning options for GCC and Clang + + The list was copied from configure.ac and should be kept in sync. + (Pretend that the deleted comment in CMakeLists.txt didn't exist.) + + There is no need to add equivalent of --enable-werror as CMake >= 3.24 + supports -DCMAKE_COMPILE_WARNING_AS_ERROR=ON. + + CMakeLists.txt | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++++----- + 1 file changed, 59 insertions(+), 5 deletions(-) + +commit 74223338197b7dfcd69f56df78b6502805a75f23 +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Use \040 instead of \x20 for a space + + This is for consistency with 4c81c9611f8b2e1ad65eb7fa166afc570c58607e + where \040 has to be used because \0x20F gets interpret at three hex + digits. Octals escapes are never longer than three digits. + + CMakeLists.txt | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit e8854b6bdc956c46dc4232bd07c17163034a00f2 +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Add XZ_ASSUME_RAM + + CMakeLists.txt | 4 +++- + 1 file changed, 3 insertions(+), 1 deletion(-) + +commit e1127e75cb82e0385f02c995771d6fe1420f43c5 +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Rename liblzma_INSTALL_CMAKEDIR to XZ_INSTALL_CMAKEDIR + + CMakeLists.txt | 6 +++--- + 1 file changed, 3 insertions(+), 3 deletions(-) + +commit 96abfe98c15e431a50a6a31015c5bb05540ab2ff +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Refactor ADDITIONAL_CHECK_TYPES to XZ_CHECKS + + Now "crc32" is in the list too for completeness but it doesn't + actually have any effect. The description of the cache variable + says that "crc32 is always built" so it should be clear enough. + + CMakeLists.txt | 14 +++++++------- + tests/tests.cmake | 17 ++++++++--------- + 2 files changed, 15 insertions(+), 16 deletions(-) + +commit 679500ffe00ecb4f02292129e7529ab7392f3943 +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Rename the cache variable POSIX_SHELL to XZ_POSIX_SHELL + + We still need the variable POSIX_SHELL for configure_file() + but it doesn't need to be a cache variable. + + CMakeLists.txt | 7 ++++--- + 1 file changed, 4 insertions(+), 3 deletions(-) + +commit e5c0eb2e50e5522a0a55e7ba83fe49b04c8a6eef +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Rename ENCODERS and DECODERS to use XZ_ prefix + + CMakeLists.txt | 34 +++++++++++++++++----------------- + tests/tests.cmake | 4 ++-- + 2 files changed, 19 insertions(+), 19 deletions(-) + +commit e7785e2061f95d44aa6c0856b09cc0fbad7d6154 +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Rename MATCH_FINDERS to XZ_MATCH_FINDERS + + CMakeLists.txt | 6 +++--- + 1 file changed, 3 insertions(+), 3 deletions(-) + +commit 63294806b488a27a28a0960f6a257695dd2b569a +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Rename SYMBOL_VERSIONING to XZ_SYMBOL_VERSIONING + + CMakeLists.txt | 9 +++++---- + 1 file changed, 5 insertions(+), 4 deletions(-) + +commit ad245b133675d285bca5d48123062e9d1e3f747e +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Rename ENABLE_THREADS to XZ_THREADS + + CMakeLists.txt | 24 +++++++++++------------- + 1 file changed, 11 insertions(+), 13 deletions(-) + +commit 4250d4de32e66e558cc2ebe73b05255633c933ed +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Rename ENABLE_SANDBOX to XZ_SANDBOX + + CMakeLists.txt | 23 +++++++++++------------ + 1 file changed, 11 insertions(+), 12 deletions(-) + +commit 0fdcd0c582f1a38542cd647dde449d9447d5888d +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Rename ENABLE_X86_ASM to XZ_ASM_I386 + + CMakeLists.txt | 10 +++++----- + 1 file changed, 5 insertions(+), 5 deletions(-) + +commit e017d5526e316003fdb2a3f76acbb83443f14ddf +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Rename CREATE_XZ_SYMLINKS to XZ_TOOL_SYMLINKS + + This only affects the names unxz and xzcat. The xz-prefixed script + symlinks (xzfgrep and such) are always created if scripts are enabled. + + CMakeLists.txt | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +commit 04cac14fcb9fb302c24e90b04ca4b77d3717b50c +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Rename CREATE_LZMA_SYMLINKS to XZ_TOOL_LZMA_SYMLINKS + + Update the description too. + + It affects creation of not only the legacy lzma, unlzma, lzcat symlinks + but also lzgrep and other legacy names for the scripts. The last + LZMA Utils release was made in 2008 but these names are still used + in some places to handle .lzma files. + + CMakeLists.txt | 7 ++++--- + 1 file changed, 4 insertions(+), 3 deletions(-) + +commit 612ccebf884eb1a9b6848e230c24f97a03fe917a +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Rename ALLOW_ARM64_CRC32 to XZ_ARM64_CRC32 + + Update description too. + + CMakeLists.txt | 6 +++--- + 1 file changed, 3 insertions(+), 3 deletions(-) + +commit 3dcc12290d6dffbe7f10f501c141d325bad65901 +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Rename ALLOW_CLMUL_CRC to XZ_CLMUL_CRC + + Update description too. + + CMakeLists.txt | 6 +++--- + 1 file changed, 3 insertions(+), 3 deletions(-) + +commit 4b8faa72442da9aa1a356f5848aae798d8588a7d +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Rename ENABLE_DOXYGEN to XZ_DOXYGEN + + CMakeLists.txt | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +commit b56273ae575bac350e50b0c689269dcab04b04b3 +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Rename LZIP_DECODER to XZ_LZIP_DECODER + + CMakeLists.txt | 4 ++-- + tests/tests.cmake | 2 +- + 2 files changed, 3 insertions(+), 3 deletions(-) + +commit 2343992fcbe8b436da6df888be37713cccaff0ab +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Rename MICROLZMA_ENCODER/DECODER to XZ_MICROLZMA_ENCODER/DECODER + + CMakeLists.txt | 8 ++++---- + tests/tests.cmake | 2 +- + 2 files changed, 5 insertions(+), 5 deletions(-) + +commit 96f0a6632cc0598a26d93255b0c444df18dc7891 +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Rename ENABLE_SMALL to XZ_SMALL + + CMakeLists.txt | 14 +++++++------- + 1 file changed, 7 insertions(+), 7 deletions(-) + +commit 29f77c7b707f2458fb047e77497354b195e05b14 +Author: Lasse Collin +Date: 2024-06-15 18:07:04 +0300 + + CMake: Rename ENABLE_NLS to XZ_NLS + + Also update the description to mention that this affects installation + of translated man pages too. + + Prefixing the cache variables with the project name helps if + the package is used as a subproject in another package. + It also makes the package-specific options group more nicely + in ccmake and cmake-gui. + + CMakeLists.txt | 28 +++++++++++++++------------- + 1 file changed, 15 insertions(+), 13 deletions(-) + +commit ac05f1b0d7cda1e7ae79775a8dfecc54601d7f1c +Author: Lasse Collin +Date: 2024-06-15 23:34:29 +0300 + + CMake: Link Threads::Threads as PRIVATE to liblzma + + This way pthread options aren't passed to the linker when linking + against shared liblzma but they are still passed when linking against + static liblzma. (Also, one never needs the include path of the + threading library to use liblzma since liblzma's API headers + don't #include . But tends to be in the + default include path so here this change makes no difference.) + + One cannot mix target_link_libraries() calls that use the scope + (PRIVATE, PUBLIC, or INTERFACE) keyword and calls that don't use it. + The calls without the keyword are like PUBLIC except perhaps when + they aren't, or something like that... It seems best to always + specify a scope keyword as the meanings of those three keywords + at least are clear. + + CMakeLists.txt | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +commit 82986d8c691a294c78b48d8391303e5c428b5437 +Author: Lasse Collin +Date: 2024-06-16 19:39:32 +0300 + + CMake: Add empty lines + + CMakeLists.txt | 2 ++ + 1 file changed, 2 insertions(+) + +commit 2aecffe0f0e14f3ef635e8cd7b405420f2385de2 +Author: Lasse Collin +Date: 2024-06-16 19:37:36 +0300 + + CMake: Use CMAKE_THREAD_LIBS_INIT in liblzma.pc only with pthreads + + This shouldn't make much difference in practice as on Windows + no flags are needed anyway and unitialized variable (when threading + is disabled) expands to empty. But it's clearer this way. + + CMakeLists.txt | 8 +++++++- + 1 file changed, 7 insertions(+), 1 deletion(-) + +commit 664918bd3635ea8e773f06022286ecb0c485166c +Author: Lasse Collin +Date: 2024-06-17 18:20:14 +0300 + + Update THANKS + + THANKS | 3 +++ + 1 file changed, 3 insertions(+) + +commit 5ca96a93488d0f5a530c78b274cac317453807ff +Author: Lasse Collin +Date: 2024-06-16 19:25:07 +0300 + + CMake: Use native newlines in liblzma.pc + + vcpkg doesn't specify the newline type so it should be fine to + use native newlines in liblzma.pc on Windows. + + CMakeLists.txt | 4 +--- + 1 file changed, 1 insertion(+), 3 deletions(-) + +commit ebd155c3a1b87411edae06d3bdaa9659ec057522 +Author: Lasse Collin +Date: 2024-06-16 19:18:56 +0300 + + CMake: Use relative paths in liblzma.pc if possible + + Now liblzma.pc can be relocatable only if using CMake >= 3.20 + but that should be OK as now we shouldn't get broken liblzma.pc + if CMAKE_INSTALL_LIBDIR or CMAKE_INSTALL_INCLUDEDIR contain an + absolute path. + + Thanks to Eli Schwartz. + + CMakeLists.txt | 18 ++++++++++++++---- + 1 file changed, 14 insertions(+), 4 deletions(-) + +commit 7a366d93cfd74ce10201db400be8836199944e36 +Author: Lasse Collin +Date: 2024-06-16 18:33:08 +0300 + + Revert "CMake: Set only "prefix" as an absolute path in liblzma.pc" + + This reverts commit 5d1c649ba9eb7a5b9371252ebfbc2911dc774e69. + + While CMAKE_INSTALL_ tend to be relative paths, they don't need + to be. Thus the commit was broken. A fancier method is required. + + Thanks to Eli Schwartz for the bug report and explanation. + + CMakeLists.txt | 6 +++--- + 1 file changed, 3 insertions(+), 3 deletions(-) + +commit 30a2d5d51006301a3ddab5ef1f5ff0a9d74dce6f +Author: Lasse Collin +Date: 2024-06-16 13:39:37 +0300 + + liblzma: CRC CLMUL: Omit is_arch_extension_supported() when not needed + + On E2K the function compiles only due to compiler emulation but the + function is never used. It's cleaner to omit the function when it's + not needed even though it's a "static inline" function. + + Thanks to Ilya Kurdyukov. + + src/liblzma/check/crc_x86_clmul.h | 4 ++++ + 1 file changed, 4 insertions(+) + +commit 54eaea5ea49bb8bca4286d4412f19ac73187489e +Author: Lasse Collin +Date: 2024-06-16 13:21:34 +0300 + + liblzma: x86 CLMUL CRC: Rewrite - (cherry picked from commit ac05f1b0d7cda1e7ae79775a8dfecc54601d7f1c) + It's faster with both tiny and large buffers and doesn't require + disabling any sanitizers. With large buffers the extra speed is + from folding four 16-byte chunks in parallel. + + The 32-bit x86 with MSVC reportedly still needs a workaround. + Now the simpler "__asm mov ebx, ebx" trick is enough but it + needs to be in lzma_crc64() instead of crc64_arch_optimized(). + Thanks to Iouri Kharon for testing and the fix. + + Thanks to Ilya Kurdyukov for testing the speed with aligned and + unaligned buffers on a few x86 processors and on E2K v6. + + Thanks to Sam James for general feedback. + + Fixes: https://github.com/tukaani-project/xz/issues/112 + Fixes: https://github.com/tukaani-project/xz/issues/122 - CMakeLists.txt | 2 +- - 1 file changed, 1 insertion(+), 1 deletion(-) + src/liblzma/check/crc64_fast.c | 8 + + src/liblzma/check/crc_x86_clmul.h | 437 ++++++++++++++++++++------------------ + 2 files changed, 237 insertions(+), 208 deletions(-) -commit 258bae30a2040138c783b5c380cef0ca603663ed +commit c0e7eaae8d6eef1e313c9d0da20ccf126ec61f38 Author: Lasse Collin -Date: 2024-06-16 19:39:32 +0300 +Date: 2024-06-01 14:44:04 +0300 - CMake: Add empty lines - - (cherry picked from commit 82986d8c691a294c78b48d8391303e5c428b5437) + sysdefs.h: Add alignas - CMakeLists.txt | 2 ++ - 1 file changed, 2 insertions(+) + src/common/sysdefs.h | 11 +++++++++++ + 1 file changed, 11 insertions(+) -commit a95a9601a109f0d0d059dea7a5a44efa87ef1401 +commit 20014c261451381d5e2f58e63e7b1fbefd4df4bf Author: Lasse Collin -Date: 2024-06-16 19:37:36 +0300 +Date: 2024-06-11 12:47:59 +0300 - CMake: Use CMAKE_THREAD_LIBS_INIT in liblzma.pc only with pthreads + liblzma: Use a single macro to select CLMUL CRC to build - This shouldn't make much difference in practice as on Windows - no flags are needed anyway and unitialized variable (when threading - is disabled) expands to empty. But it's clearer this way. + This way it's clearer that two things cannot be selected + at the same time. + + src/liblzma/check/crc32_fast.c | 2 +- + src/liblzma/check/crc64_fast.c | 2 +- + src/liblzma/check/crc_x86_clmul.h | 18 ++++++++++-------- + 3 files changed, 12 insertions(+), 10 deletions(-) + +commit d8fb0986171bd6a3066b236fc9a6b3d573c8e441 +Author: Lasse Collin +Date: 2024-06-10 15:31:01 +0300 + + liblzma: CRC32 CLMUL: Refactor the constants and simplify - (cherry picked from commit 2aecffe0f0e14f3ef635e8cd7b405420f2385de2) + By using modulus scaled constants, the final reduction can + be simplified. - CMakeLists.txt | 8 +++++++- - 1 file changed, 7 insertions(+), 1 deletion(-) + src/liblzma/check/crc_x86_clmul.h | 52 +++++++-------------------------------- + 1 file changed, 9 insertions(+), 43 deletions(-) -commit 65a10ddd439ad435d2c0176106b1e2d6b9c1b3a1 +commit ef652ac391ff7e8cda656238dc5b5f83bc1554c2 Author: Lasse Collin -Date: 2024-06-17 18:20:14 +0300 +Date: 2024-06-10 15:12:48 +0300 - Update THANKS + liblzma: CRC64 CLMUL: Refactor the constants - (cherry picked from commit 664918bd3635ea8e773f06022286ecb0c485166c) + Now it refers to crc_clmul_consts_gen.c. vfold8 was renamed to mu_p + and the p no longer has the lowest bit set (it makes no difference + as the output bits it affects are ignored). - THANKS | 3 +++ - 1 file changed, 3 insertions(+) + src/liblzma/check/crc_x86_clmul.h | 43 +++++++-------------------------------- + 1 file changed, 7 insertions(+), 36 deletions(-) -commit 6ad5739094ac69ac448a84493f2c7ddfc6eb0688 +commit 9f5fc17e32bf5c7c6cfadf40c29a1dedb4cc03ac Author: Lasse Collin -Date: 2024-06-16 19:25:07 +0300 +Date: 2024-06-10 14:45:44 +0300 - CMake: Use native newlines in liblzma.pc + liblzma: Add crc_clmul_consts_gen.c - vcpkg doesn't specify the newline type so it should be fine to - use native newlines in liblzma.pc on Windows. + It's a standalone program that prints the required constants. + It's won't be a part of the normal build of the package. + + src/liblzma/check/Makefile.inc | 1 + + src/liblzma/check/crc_clmul_consts_gen.c | 160 +++++++++++++++++++++++++++++++ + 2 files changed, 161 insertions(+) + +commit 71b147aab7fe4a60ed57b697d5bb490f099894be +Author: Lasse Collin +Date: 2024-05-09 21:44:03 +0300 + + liblzma: Remove CRC_USE_GENERIC_FOR_SMALL_INPUTS - (cherry picked from commit 5ca96a93488d0f5a530c78b274cac317453807ff) + It was already commented out. - CMakeLists.txt | 4 +--- - 1 file changed, 1 insertion(+), 3 deletions(-) + src/liblzma/check/crc32_fast.c | 21 --------------------- + src/liblzma/check/crc64_fast.c | 5 ----- + src/liblzma/check/crc_common.h | 14 -------------- + src/liblzma/check/crc_x86_clmul.h | 9 +-------- + 4 files changed, 1 insertion(+), 48 deletions(-) -commit 4107f2066764bb3a31d114852bc20722d582fd82 +commit f99a7be40645f86959a5b180dfae948dd165e07c Author: Lasse Collin -Date: 2024-06-16 19:18:56 +0300 +Date: 2024-05-09 21:03:39 +0300 - CMake: Use relative paths in liblzma.pc if possible + liblzma: Remove crc_attr_no_sanitize_address - Now liblzma.pc can be relocatable only if using CMake >= 3.20 - but that should be OK as now we shouldn't get broken liblzma.pc - if CMAKE_INSTALL_LIBDIR or CMAKE_INSTALL_INCLUDEDIR contain an - absolute path. + It's not enough to silence the address sanitizer. Also memory and + thread sanitizers would need to be silenced. They, at least currently, + aren't smart enough to see that the extra bytes are discarded from + the xmm registers by later instructions. - Thanks to Eli Schwartz. + Valgrind is smarter, possibly because this kind of code isn't weird + to write in assembly. Agner Fog's optimizing_assembly.pdf even mentions + this idea of doing an aligned read and then discarding the extra + bytes. The sanitizers don't instrument assembly code but Valgrind + checks all code. - (cherry picked from commit ebd155c3a1b87411edae06d3bdaa9659ec057522) + It's better to change the implementation to avoid the sanitization + attributes which also look scary in the code. (Somehow they can look + more scary than __asm__ which is implictly unsanitized.) + + See also: + https://github.com/tukaani-project/xz/issues/112 + https://github.com/tukaani-project/xz/issues/122 - CMakeLists.txt | 18 ++++++++++++++---- - 1 file changed, 14 insertions(+), 4 deletions(-) + src/liblzma/check/crc_common.h | 9 --------- + src/liblzma/check/crc_x86_clmul.h | 3 --- + 2 files changed, 12 deletions(-) -commit ff697eb154361417d94284e0c569aa08cacf9031 +commit ead4d151996f8a18bf9b07eb1e175c0a1590e562 Author: Lasse Collin -Date: 2024-06-16 13:39:37 +0300 +Date: 2024-06-10 15:37:49 +0300 - liblzma: CRC CLMUL: Omit is_arch_extension_supported() when not needed - - On E2K the function compiles only due to compiler emulation but the - function is never used. It's cleaner to omit the function when it's - not needed even though it's a "static inline" function. - - Thanks to Ilya Kurdyukov. + Revert "Build: Temporarily disable CRC CLMUL to silence OSS Fuzz" - (cherry picked from commit 30a2d5d51006301a3ddab5ef1f5ff0a9d74dce6f) + This reverts commit 9f1a6d6f9a258886933a22239a5b81af34b28199. - src/liblzma/check/crc_x86_clmul.h | 4 ++++ - 1 file changed, 4 insertions(+) + configure.ac | 4 +--- + 1 file changed, 1 insertion(+), 3 deletions(-) -commit 4e4a568f6a089c867891c2388a19624e312eb2f3 +commit 2178acf8a4d40a93e970cfcf9b807d5ef6c8da92 Author: Lasse Collin Date: 2024-06-12 14:26:44 +0300 CMake: Prefer C11 with a fallback to C99 There is no need to make a similar change in configure.ac. With Autoconf 2.72, the deprecated macro AC_PROG_CC_C99 is an alias for AC_PROG_CC which prefers a C11 compiler. - - (cherry picked from commit 2178acf8a4d40a93e970cfcf9b807d5ef6c8da92) CMakeLists.txt | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) -commit 849e757a8cce41bfd6acfaa7dd3b07324363de90 +commit c97e9c12fef4d1093ee2a75236742481361f50f5 Author: Lasse Collin Date: 2024-06-12 14:20:21 +0300 Update THANKS - - (cherry picked from commit c97e9c12fef4d1093ee2a75236742481361f50f5) THANKS | 4 ++++ 1 file changed, 4 insertions(+) -commit 1305056a54e68895e052506bceb26274f52bbc9a +commit 89e9f12e03324b8a186e807b268f34f92d1b2f41 Author: Lasse Collin Date: 2024-06-11 11:15:49 +0300 Tests: Improve the CRC32 test A similar one was already there for CRC64 but nowadays also CRC32 has a CLMUL implementation, so it's good to test it better too. - - (cherry picked from commit 89e9f12e03324b8a186e807b268f34f92d1b2f41) tests/test_check.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) -commit a44493ec41edc98f24ed9933668e7372f5267a40 +commit c7164b1927e3fe7cdba70ee4687e1a590a81043b Author: Lasse Collin Date: 2024-06-11 22:42:26 +0300 xz: Fix white space - - (cherry picked from commit c7164b1927e3fe7cdba70ee4687e1a590a81043b) src/xz/list.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -commit 5e74a6a8138b3c102193d731120139d5a854f2cf +commit 0a32d2072c598de281058b26dc08920fbf0cd2a1 Author: Lasse Collin Date: 2024-06-11 21:59:09 +0300 liblzma: Fix a typo in a comment Thanks to Sam James for spotting it. Fixes: f644473a211394447824ea00518d0a214ff3f7f2 - (cherry picked from commit 0a32d2072c598de281058b26dc08920fbf0cd2a1) src/liblzma/check/crc_x86_clmul.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit 3f7edc673cf21b3e4db3e2f11746905e0a393db7 +commit afd9b4d282a10186808c3331dad4caf79c02d55f Author: Lasse Collin Date: 2024-05-10 15:52:26 +0300 liblzma: Fix a comment indentation - - (cherry picked from commit afd9b4d282a10186808c3331dad4caf79c02d55f) src/liblzma/check/crc_common.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -commit 8a9cc7ca0867494f39990f0d4cbe0972042f6d59 +commit 50e6bff274568c568930e15094da8217e7d47d28 Author: Lasse Collin Date: 2024-05-09 22:09:12 +0300 liblzma: Fix white space - - (cherry picked from commit 50e6bff274568c568930e15094da8217e7d47d28) src/liblzma/check/crc32_table.c | 10 +++++----- src/liblzma/check/crc_x86_clmul.h | 6 +++--- src/liblzma/check/sha256.c | 2 +- 3 files changed, 9 insertions(+), 9 deletions(-) -commit b29b13082fe578a3bb9384a5939c82055f796a34 +commit caea7844d3824755d053b4743c4913d73ac2db3d +Author: Lasse Collin +Date: 2024-06-01 14:25:29 +0300 + + tuklib: __STDC_VERSION__ in C23 is 202311 + + src/common/tuklib_common.h | 4 +--- + 1 file changed, 1 insertion(+), 3 deletions(-) + +commit 9e73918a4f14be754a23f74dda45ca431939a4a0 Author: RainRat Date: 2024-06-05 15:21:49 -0700 Fix typos Closes: https://github.com/tukaani-project/xz/pull/124 - (cherry picked from commit 9e73918a4f14be754a23f74dda45ca431939a4a0) INSTALL | 2 +- doc/examples/03_compress_custom.c | 2 +- src/common/tuklib_integer.h | 2 +- src/liblzma/api/lzma/container.h | 2 +- src/xz/mytime.c | 2 +- tests/test_filter_str.c | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) -commit 6f66155e01a6467e70db48cddbe790bdb8d87754 +commit 04b23addf3733873667675df2439725f076c2f36 Author: Lasse Collin Date: 2024-06-07 15:47:20 +0300 tuklib_integer: Fix building on OpenBSD/sparc64 that uses GCC 4.2 GCC 4.2 doesn't have __builtin_bswap16() and friends so tuklib_integer.h tries to use OS-specific byte swap methods instead. On OpenBSD those macros are swap16/32/64 instead of bswap16/32/64 like on other *BSDs and Darwin. An alternative to "#ifdef __OpenBSD__" could be "#ifdef swap16" as it is a macro. But since OpenBSD seems to be a special case under this special case of "*BSDs and Darwin", checking for __OpenBSD__ seems the more conservative choice now. Thanks to Christian Weisgerber and Brad Smith who both submitted the same patch a few hours apart. Co-authored-by: Christian Weisgerber Co-authored-by: Brad Smith Closes: https://github.com/tukaani-project/xz/pull/126 - (cherry picked from commit 04b23addf3733873667675df2439725f076c2f36) src/common/tuklib_integer.h | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) -commit 5522759d31e0f1513fffbdf39a955f12d373f121 +commit dc03f6290f5b9bd3d50c7e12e58dee870889d599 +Author: Lasse Collin +Date: 2024-06-07 15:06:59 +0300 + + liblzma: Add ARM64 CRC32 instruction support detection on OpenBSD + + The C code is from Christian Weisgerber, I merely reordered the OSes. + Then I added the build system checks without testing them. + + Also thanks to Brad Smith who submitted a similar patch on GitHub + a few hours after Christian had sent his via email. + + Co-authored-by: Christian Weisgerber + Closes: https://github.com/tukaani-project/xz/pull/125 + + CMakeLists.txt | 6 ++++++ + configure.ac | 9 +++++++++ + src/liblzma/check/crc32_arm64.h | 15 +++++++++++++++ + src/liblzma/check/crc_common.h | 1 + + 4 files changed, 31 insertions(+) + +commit f5c2ae58ec68c665e62c790b842657afcb31474c Author: Lasse Collin Date: 2024-06-05 13:55:43 +0300 Update THANKS - - (cherry picked from commit f5c2ae58ec68c665e62c790b842657afcb31474c) THANKS | 2 ++ 1 file changed, 2 insertions(+) -commit 45aed6f37f17e5fac215290204e03894965cf1d5 +commit e5491dfab9c54dc7078a8d3d07fabb91d6e06418 +Author: Lasse Collin +Date: 2024-06-05 13:42:47 +0300 + + CMake: Include the "alpha" or "beta" suffix in PACKAGE_VERSION + + This way the version string gets into xzgrep and other scripts + in full and also into liblzma.pc. + + For the project() command, a suffixless string is required though. + + CMakeLists.txt | 16 +++++++++++++--- + 1 file changed, 13 insertions(+), 3 deletions(-) + +commit 1d3c61575fda0be6b2d50c9e32a343349d5cd5c0 Author: Lasse Collin Date: 2024-06-05 13:30:28 +0300 CMake: Fix wrong version variable liblzma_VERSION has never existed in the repository. xz_VERSION from the project() command was used for liblzma SOVERSION so use xz_VERSION here too. The wrong variable did no harm in practice as PROJECT_VERSION was used as the fallback. It has the same value as xz_VERSION. Fixes: 7e3493d40eac0c3fa3d5124097745a70e15c41f6 - (cherry picked from commit 1d3c61575fda0be6b2d50c9e32a343349d5cd5c0) CMakeLists.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit 198271a6ed0e6ac6820f8f44172a203aa44abe39 +commit 5d1c649ba9eb7a5b9371252ebfbc2911dc774e69 +Author: Lasse Collin +Date: 2024-06-05 12:59:59 +0300 + + CMake: Set only "prefix" as an absolute path in liblzma.pc + + CMake provides variables that are relative to CMAKE_INSTALL_PREFIX + so use them instead of repeating the full path. + + CMakeLists.txt | 6 +++--- + 1 file changed, 3 insertions(+), 3 deletions(-) + +commit e0d6d05ce0d464e966c0669bbf869202a43cc2f7 Author: Lasse Collin Date: 2024-06-04 23:59:29 +0300 CMake: Fix liblzma filename in Windows environments This is a mess because liblzma DLL outside Cygwin and MSYS2 is liblzma.dll instead of lzma.dll to avoid a conflict with lzma.dll from LZMA SDK. On Cygwin the name was "liblzma-5.dll" while "cyglzma-5.dll" would have been correct (and match what Libtool produces). MSYS2 likely was broken too as it uses the "msys-" prefix. This change has no effect with MinGW-w64 because with that the "lib" prefix was correct already. With MSVC builds this is a small breaking change that requires developers to adjust the library name when linking against liblzma. The liblzma.dll name is kept as is but the import library and static library are now lzma.lib instead of liblzma.lib. This is helpful when using pkgconf because "pkgconf --msvc-syntax --libs liblzma" outputs "lzma.lib" (it's converted from "-llzma" in liblzma.pc). It would be easy to keep the liblzma.lib naming but the pkgconf compatibility seems worth it in the long run. The lzma.lib name is compatible with MinGW-w64 too as -llzma will find also lzma.lib. vcpkg had been patching CMakeLists.txt this way since 2022 but I learned this only recently. The reasoning for the patch makes sense, and while this is a small breaking change with MSVC, it seems like a decent compromise as it keeps the DLL name the same. 2022 patch in vcpkg: https://github.com/microsoft/vcpkg/blob/0707a17ecf1466d64cf1a3c1ee18c8ff02aadb2d/ports/liblzma/win_output_name.patch See the discussion: https://github.com/microsoft/vcpkg/pull/39024 Thanks to Vincent Torri for confirming the naming issue on Cygwin. - - (cherry picked from commit e0d6d05ce0d464e966c0669bbf869202a43cc2f7) CMakeLists.txt | 34 ++++++++++++++++++++++++++++++---- 1 file changed, 30 insertions(+), 4 deletions(-) -commit 92e5425979199407080fd80e67c15f2cbf85392b +commit e7a42cda7c827e016619e8cab15e2faf5d4181ae Author: Lasse Collin Date: 2024-06-03 16:55:03 +0300 Fix version.sh compatiblity with Solaris The ancient /bin/tr on Solaris doesn't support '\n'. With /usr/xpg4/bin/tr it works but it might not be in PATH. Another problem was that sed was given input that didn't have a newline at the end. Text files must end with a newline to be portable. Fix both problems: - Handle multiline input within sed itself to avoid one tr invocation. The default sed even on Solaris does understand \n. - Use octals in tr -d. \012 works for ASCII "line feed", it's even used as an example in the Solaris man page. But we must strip also ASCII "carriage return" \015 and EBCDIC "next line" \025. The EBCDIC case got handled with \n previously. Stripping \012 and \015 on EBCDIC system won't matter as those control chars won't be present in the string in the first place. An awk-based solution could be an alternative but it might need special casing on Solaris to used nawk instead of awk. The changes in this commit are smaller and should have a smaller risk for regressions. It's also possible that version.sh will be dropped entirely at some point. - - (cherry picked from commit e7a42cda7c827e016619e8cab15e2faf5d4181ae) build-aux/version.sh | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) -commit 0c089a33a5b1f5b9451b332484c68e1d6f02631a +commit a61c9ab4751f2710dcd5459c7d74bbf20781f0f9 Author: Lasse Collin Date: 2024-06-03 17:07:11 +0300 CI: Don't require po4a on Solaris - - (cherry picked from commit a61c9ab4751f2710dcd5459c7d74bbf20781f0f9) .github/workflows/solaris.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit 83d3792711295656a3de69bbcd98dcb4b06be1c2 +commit 5229bdf5335ce18ed54beb7e646e39927663be86 Author: Lasse Collin Date: 2024-06-03 15:08:15 +0300 CI: Use set -e on Solaris too - - (cherry picked from commit 5229bdf5335ce18ed54beb7e646e39927663be86) .github/workflows/solaris.yml | 1 + 1 file changed, 1 insertion(+) -commit 9c64d4fd787ea7bca3795be55367504a9f47a68c +commit afa938e429c1ce07d26d02999352fb014b62ff3d Author: Lasse Collin Date: 2024-06-03 17:44:50 +0300 CMake: Install liblzma.pc even with MSVC I had misunderstood that it wouldn't be useful with MSVC. vcpkg had been installing liblzma.pc with custom rules since 2020, years before liblzma.pc support was added to CMakeLists.txt. See: https://github.com/microsoft/vcpkg/blob/eb895b95aac6fd7485373702f29f508c42a180a0/ports/liblzma/portfile.cmake https://github.com/microsoft/vcpkg/pull/39024#issuecomment-2145064670 - (cherry picked from commit afa938e429c1ce07d26d02999352fb014b62ff3d) CMakeLists.txt | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) -commit 42754176bd84c4539db55a9e70bdcdd5700c709f +commit 35f8649f08341639a627fd06350e938124ca3622 Author: Sam James Date: 2024-06-03 06:16:23 +0100 ci: don't pin official GH actions via commit, just tag There's no real value in doing it via commit for official GH actions. We can keep using pinned commits for unofficial actions. It's hassle for no gain. Maybe going forward we can limit this further by only being paranoid for the jobs with any access to tokens. - - (cherry picked from commit 35f8649f08341639a627fd06350e938124ca3622) .github/workflows/ci.yml | 4 ++-- .github/workflows/freebsd.yml | 2 +- .github/workflows/netbsd.yml | 2 +- .github/workflows/openbsd.yml | 2 +- .github/workflows/solaris.yml | 2 +- .github/workflows/windows-ci.yml | 4 ++-- 6 files changed, 8 insertions(+), 8 deletions(-) -commit 9a5fee7022eddffdfcee32a7e43f64635581b393 +commit e885dae37ff5b1dbc760dabc1e03e866a7302ef2 Author: Christoph Junghans Date: 2024-04-30 07:49:26 -0600 ci: set -e on openbsd Closes: https://github.com/tukaani-project/xz/pull/116 - (cherry picked from commit e885dae37ff5b1dbc760dabc1e03e866a7302ef2) .github/workflows/openbsd.yml | 1 + 1 file changed, 1 insertion(+) -commit a2d66de54f234999a7d42305988cf2c3e0b1b8f6 +commit 21b02dd128cf9e8c76325ec124f70381862dcf19 Author: Christoph Junghans Date: 2024-04-30 07:48:58 -0600 ci: set -e on netbsd - - (cherry picked from commit 21b02dd128cf9e8c76325ec124f70381862dcf19) .github/workflows/netbsd.yml | 1 + 1 file changed, 1 insertion(+) -commit 1bdc70176b59b0e22c0a580c518dc5d0f2fd0723 +commit 8641f0c24c041136670c975b23408184b45431bc Author: Christoph Junghans Date: 2024-04-25 14:56:06 -0700 ci: actually fail on FreeBSD Without "set -e" the job will always be successful. See vmactions/freebsd-vm#72 - - (cherry picked from commit 8641f0c24c041136670c975b23408184b45431bc) .github/workflows/freebsd.yml | 1 + 1 file changed, 1 insertion(+) -commit 4132277103acdf1c01f8b5a4c12c0992c330ade4 +commit ef616683ef11f11ffdfbe0624da33905e28a70f9 Author: Andrew Murray Date: 2024-04-25 09:24:46 +1000 Updated actions Closes: https://github.com/tukaani-project/xz/pull/115 - (cherry picked from commit ef616683ef11f11ffdfbe0624da33905e28a70f9) .github/workflows/ci.yml | 4 ++-- .github/workflows/windows-ci.yml | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) -commit 1575414636104773cefc62cf075726c6ee7ae37d +commit 57b440d316da9ac9cb312ee7e6890f5382556f10 Author: Sam James Date: 2024-06-03 02:49:40 +0100 ci: add po4a - - (cherry picked from commit 57b440d316da9ac9cb312ee7e6890f5382556f10) .github/workflows/netbsd.yml | 2 +- .github/workflows/openbsd.yml | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) -commit c3e293037e1bb2bd9efedbb0e75387d1282cc03f +commit 08cdf4be9a673d78efe393b53dd73bf43c81dd95 Author: Sam James Date: 2024-04-13 21:02:04 +0100 ci: add Solaris Inspired by https://github.com/RsyncProject/rsync/commit/3f2a38b01184cae9a931280b534acf5a3dae2e94. It runs on Solaris 5.11 via a VirtualBox VM. - - (cherry picked from commit 08cdf4be9a673d78efe393b53dd73bf43c81dd95) .github/workflows/solaris.yml | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) -commit dc6b6011b45b0d0ddd0650f4885e24c68b37fddf +commit b69768c8bd1a34fde311935c551d061ba52d9a3f Author: Sam James Date: 2024-04-14 08:08:00 +0100 xz: list: suppress -Wformat-nonliteral for Solaris Solaris' GCC can't understand that our use is fine, unlike modern compilers: ``` list.c: In function 'print_totals_basic': list.c:1191:4: error: format not a string literal, argument types not checked [-Werror=format-nonliteral] uint64_to_str(totals.files, 0)); ^~~~~~~~~~~~~ cc1: all warnings being treated as errors ``` It's presumably because of older gettext missing format attributes. This is with `gcc (GCC) 7.3.0`. - - (cherry picked from commit b69768c8bd1a34fde311935c551d061ba52d9a3f) src/xz/list.c | 7 +++++++ 1 file changed, 7 insertions(+) -commit 7ce2ac795a812ecf1eb2d6b62f51b55ac799c2a5 +commit bb90e1f66d9beb490c4c99763e79519045968710 Author: Lasse Collin -Date: 2024-05-31 21:36:26 +0300 +Date: 2024-06-03 11:44:28 +0300 - Update THANKS + license-check.sh: Fix reporting of unclear license info - (cherry picked from commit b8d134e61ede9f4a296226d97f5c20721fb4e8e2) - - THANKS | 3 +++ - 1 file changed, 3 insertions(+) - -commit 3ec664d3f652133136587a51d4505b1abe1acdd7 -Author: Lasse Collin -Date: 2024-05-29 18:03:51 +0300 - - Bump version and soname for 5.6.2 - - src/liblzma/Makefile.am | 2 +- - src/liblzma/api/lzma/version.h | 2 +- - 2 files changed, 2 insertions(+), 2 deletions(-) - -commit 3cc0aa702e50b786c52c6f3d3f831a635c4df197 -Author: Lasse Collin -Date: 2024-05-29 18:03:04 +0300 - - Add NEWS for 5.6.2 - - NEWS | 130 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - 1 file changed, 130 insertions(+) - -commit 526d3f7f2c2d5e134157d08b37fb5fd0b125799e -Author: Lasse Collin -Date: 2024-05-29 18:03:04 +0300 - - Add NEWS for 5.4.7 + The main feature was broken because an old variable name hadn't + been updated to match the rest of the script. - NEWS | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - 1 file changed, 89 insertions(+) + build-aux/license-check.sh | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) -commit 660b09279e8f544acf120d29194d5c3051b484eb +commit b8d134e61ede9f4a296226d97f5c20721fb4e8e2 Author: Lasse Collin -Date: 2024-05-29 18:03:04 +0300 +Date: 2024-05-31 21:36:26 +0300 - Add NEWS for 5.2.13 + Update THANKS - NEWS | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - 1 file changed, 115 insertions(+) + THANKS | 3 +++ + 1 file changed, 3 insertions(+) -commit 7d76282dac766c0ced8ae24e0f7ce0005f3e377d +commit 162587d3fb3fcedc6eee61eda3ccaaf60c80f0de Author: Lasse Collin Date: 2024-05-29 17:47:13 +0300 Translations: Run po4a/update-po Now the files are in the new formatting without source file line numbers. Future updates should keep the diffs much smaller. po4a/de.po | 1592 ++++++++++--------- po4a/fr.po | 4450 +++++++++++++++++----------------------------------- po4a/ko.po | 1592 ++++++++++--------- po4a/pt_BR.po | 4817 ++++++++++++++++++--------------------------------------- po4a/ro.po | 1592 ++++++++++--------- po4a/uk.po | 1592 ++++++++++--------- 6 files changed, 6114 insertions(+), 9521 deletions(-) -commit 4470c3f7d8954bb47b280ec07ad0bd4be2223083 +commit 50cd8ed002473c5cd53980e70a53e5e6ad646ffe Author: Lasse Collin Date: 2024-05-29 17:44:53 +0300 Translations: Run "make -C po update-po" In the past this wasn't done before releases; the Git repository just contained the files from the Translation Project. But this way it is clearer when comparing release tarballs against the Git repository. In future releases this might no longer be necessary within a stable branch as the .po files won't change so easily anymore when creating a tarball. po/ca.po | 567 +++++++++++++++++++++++++--------------- po/cs.po | 821 +++++++++++++++++++++++++++++++++++++-------------------- po/da.po | 809 +++++++++++++++++++++++++++++++++++--------------------- po/de.po | 403 ++++++++++++++-------------- po/eo.po | 403 ++++++++++++++-------------- po/es.po | 403 ++++++++++++++-------------- po/fi.po | 578 +++++++++++++++++++++++++--------------- po/fr.po | 538 +++++++++++++++++++++++--------------- po/hr.po | 403 ++++++++++++++-------------- po/hu.po | 403 ++++++++++++++-------------- po/it.po | 854 +++++++++++++++++++++++++++++++++++++++--------------------- po/ko.po | 403 ++++++++++++++-------------- po/pl.po | 403 ++++++++++++++-------------- po/pt.po | 842 +++++++++++++++++++++++++++++++++++++++-------------------- po/pt_BR.po | 567 +++++++++++++++++++++++++--------------- po/ro.po | 403 ++++++++++++++-------------- po/sr.po | 838 ++++++++++++++++++++++++++++++++++++++-------------------- po/sv.po | 403 ++++++++++++++-------------- po/tr.po | 567 +++++++++++++++++++++++++--------------- po/uk.po | 403 ++++++++++++++-------------- po/vi.po | 403 ++++++++++++++-------------- po/zh_CN.po | 417 +++++++++++++++-------------- po/zh_TW.po | 558 ++++++++++++++++++++++++--------------- 23 files changed, 7257 insertions(+), 5132 deletions(-) -commit 33b8a85face5392b5ac843bdbe3a72f024cad6ef +commit 16dbd865c8833462e1604a1e13f7effe55bb3fe6 +Author: Lasse Collin +Date: 2024-05-29 18:03:04 +0300 + + Add NEWS for 5.6.2 + + NEWS | 130 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + 1 file changed, 130 insertions(+) + +commit a0eeb5f9369c43508610dcf00140edb8e2be92a6 +Author: Lasse Collin +Date: 2024-05-29 18:03:04 +0300 + + Add NEWS for 5.4.7 + + NEWS | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + 1 file changed, 89 insertions(+) + +commit 9b476fb93a9672f2e70b56e3e9c7e9cfedd6c162 +Author: Lasse Collin +Date: 2024-05-29 18:03:04 +0300 + + Add NEWS for 5.2.13 + + NEWS | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + 1 file changed, 115 insertions(+) + +commit 9284f1aea31f0eb23e2ea72f7218b271e2234762 Author: Lasse Collin Date: 2024-05-29 16:33:24 +0300 Build: Update po/*.po files only when needed When po/xz.pot doesn't exist, running "make" or "make dist" will create it. Then the .po files will be updated but only if they actually would change more than the POT-Creation-Date line. Then the .gmo files would be generated from the .po files. This is the case before and after this commit. However, "make dist" and thus "make mydist" did a forced update to the files, updating them even if the only change was the POT-Creation-Date line. This had pros and cons: It made it clear that the .po file really is in sync with the recent strings in the package. On the other hand, it added noise in form of changed files in the source tree and distribution tarballs. It can be ignored with something like "diff -I'^"POT-Creation-Date: '" but it's still a minor annoyance *if* there's not enough value in having the most recent timestamp. Setting DIST_DEPENDS_ON_UPDATE_PO = no means that such forced update won't happen in "make dist" anymore. However, the "mydist" target will use xz.pot-update target which is the same target that is run when xz.pot doesn't exist at all yet. Thus "mydist" will ensure that the translations are up to date, without noise from changes that would affect only the POT-Creation-Date line. Note that po4a always uses msgmerge with --update, so POT-Creation-Date in the man page translations is never the only change in .po files. In that sense this commit makes the message translations behave more similarly to the man page translations. Distribution tarballs will still have non-reproducible POT-Creation-Date in po/xz.pot and po4a/xz-man.pot but those are just two files. Even they could be made reproducible from a Git timestamp if desired. - - (cherry picked from commit 9284f1aea31f0eb23e2ea72f7218b271e2234762) Makefile.am | 3 ++- po/Makevars | 6 +++++- 2 files changed, 7 insertions(+), 2 deletions(-) -commit 09daebd66b55799bbc495b84310a86c91bbfc1c8 +commit 4beba1cd62d7f8f7a6f1e899b68292d94c53b599 Author: Lasse Collin Date: 2024-05-28 21:10:33 +0300 po4a/update-po: Disable wrapping in .pot and .po files The .po files from the Translation Project come with unwrapped strings so this matches it. This may reduce the noise in diffs too. When the beginning of a paragraph had changed, the rest of the lines got rewrapped in msgsid. Now it's just one very long line that changes when a paragraph has been edited. The --add-location=file option was removed as redundant. The line numbers don't exist in the .pot file due to --porefs file and thus they cannot get copied to the .po files either. - - (cherry picked from commit 4beba1cd62d7f8f7a6f1e899b68292d94c53b599) po4a/update-po | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) -commit 51ad72dae4e516e9292f6f399bd1e4970b77f7c1 +commit b14c130a58a649f9a73392eeb122cb252327c569 Author: Lasse Collin Date: 2024-05-28 18:36:53 +0300 Update contact info in README - - (cherry picked from commit b14c130a58a649f9a73392eeb122cb252327c569) README | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) -commit 18463917f9b255b8f925fa54ab9388319735b14a +commit 75f5f2e014b0ee646963f36bc6a9c840fb272353 Author: Lasse Collin Date: 2024-05-28 13:25:07 +0300 Translations: Use --package-name=xz-man with po4a This is to match reality. See the added comment. - - (cherry picked from commit 75f5f2e014b0ee646963f36bc6a9c840fb272353) po4a/update-po | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) -commit 26bbcb13cd2bbb56fe406544a484b4edfc7e0837 +commit eb217d016cfbbba1babc19a61095b3ea25898af6 Author: Lasse Collin Date: 2024-05-28 13:03:40 +0300 Translations: Omit --package-name from po/Makevars This is closer to the reality in the po/*.po files. - - (cherry picked from commit eb217d016cfbbba1babc19a61095b3ea25898af6) po/Makevars | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) -commit c35ee804b89556d15bc8cdc16867f4316e69392f +commit d28a4b2520adeeaa1b9e921bf42c7c1f36552c06 +Author: Lasse Collin +Date: 2024-05-27 17:45:51 +0300 + + license-check.sh: Use '--' with slightly untrusted filenames + + Names from git ls-files should be safe but if one runs it on + a tree without the .git dir and there are extra files, it's + safer to have the end of arguments marked with '--'. + + build-aux/license-check.sh | 6 +++--- + 1 file changed, 3 insertions(+), 3 deletions(-) + +commit fda0ec862a34094cf23fc25d0e0a95858c3a3ab5 +Author: Lasse Collin +Date: 2024-05-27 17:41:37 +0300 + + license-check.sh: Use xargs -0 instead of -d + + Neither are in POSIX but -0 is much more portable in practice. + + Despite the old comment, the grep usage should be portable already. + + build-aux/license-check.sh | 11 ++++++----- + 1 file changed, 6 insertions(+), 5 deletions(-) + +commit 9114267038deaecf4832a5cacb5acbe6591ac839 Author: Lasse Collin Date: 2024-05-28 01:17:45 +0300 Translations: Omit man page line numbers from .pot and .po files - - (cherry picked from commit 9114267038deaecf4832a5cacb5acbe6591ac839) po4a/update-po | 5 +++++ 1 file changed, 5 insertions(+) -commit 0f4429d47f9cfe2cdfbad115a7bc2f11221cb217 +commit 093490b58271e9424ce38a7b1b38bcf61b9c86c6 Author: Lasse Collin Date: 2024-05-28 01:06:30 +0300 Translations: Use the xgettext option --add-location=file - - (cherry picked from commit 093490b58271e9424ce38a7b1b38bcf61b9c86c6) po/Makevars | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -commit a93e2c2d1d34a6f609d24a8e62072ce78df7a734 +commit fccebe2b4fd513488fc920e4dac32562ed3c7637 Author: Lasse Collin Date: 2024-05-28 00:43:53 +0300 Translations: Use the msgmerge option --add-location=file This way the PO file diffs are less noisy but the locations of the strings are still present at file level, just without line numbers. The option is available since gettext 0.19 (2014). configure.ac requires 0.19.6. - - (cherry picked from commit fccebe2b4fd513488fc920e4dac32562ed3c7637) po/Makevars | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit d4389895592e9a8e0f6391fdad816ae0537bb07b +commit f361d9ae85707a87eb28db400eb7229cec103d58 Author: Lasse Collin Date: 2024-05-27 12:22:08 +0300 Build: Use $(SHELL) instead of sh to run scripts in Makefile.am - - (cherry picked from commit f361d9ae85707a87eb28db400eb7229cec103d58) - Makefile.am | 10 +++++----- - 1 file changed, 5 insertions(+), 5 deletions(-) + Makefile.am | 14 +++++++------- + 1 file changed, 7 insertions(+), 7 deletions(-) -commit 5781414b6e3120098b0060d073aa2b0580ff6f40 +commit a26dece34793a09aac2476f954d162d03e9cf62b Author: Lasse Collin Date: 2024-05-23 17:25:13 +0300 Translations: Change the home page URLs in man page translations Since the source strings have changed, these would get marked as fuzzy and the original string would be used instead. The original and translated strings are identical in this case so it wouldn't matter. But patching the translations helps still because then po4a will show the correct translation percentage. - - (cherry picked from commit a26dece34793a09aac2476f954d162d03e9cf62b) po4a/de.po | 8 ++++---- po4a/fr.po | 4 ++-- po4a/ko.po | 4 ++-- po4a/pt_BR.po | 4 ++-- po4a/ro.po | 8 ++++---- po4a/uk.po | 8 ++++---- 6 files changed, 18 insertions(+), 18 deletions(-) -commit 3670e0616eb9d86e7519d2b76242fd32c6e0c1ae +commit 24387c234b4eed1ef9a7eaa107391740b4095568 Author: Lasse Collin Date: 2024-05-23 15:15:18 +0300 CMake: Add manual support for 32-bit x86 assembly files One has to pass -DENABLE_X86_ASM=ON to cmake to enable the CRC assembly code. Autodetection isn't done. Looking at CMAKE_SYSTEM_PROCESSOR might not work as it comes from uname unless cross-compilation is done using a CMake toolchain file. On top of this, if the code is run on modern processors that support the CLMUL instruction, then the C code should be faster (but then one should also be using a x86-64 build if possible). - - (cherry picked from commit 24387c234b4eed1ef9a7eaa107391740b4095568) CMakeLists.txt | 34 +++++++++++++++++++++++++++++++--- 1 file changed, 31 insertions(+), 3 deletions(-) -commit c1b001b09e902ecacabb8a2ae1fc991018a4d1f8 +commit 0fb3c9c3f684f5a25bd425ed079a20a79f0c969d Author: Lasse Collin Date: 2024-05-23 14:26:45 +0300 CMake: Rename USE_DOXYGEN to ENABLE_DOXYGEN It's more consistent with the other option() uses. - - (cherry picked from commit 0fb3c9c3f684f5a25bd425ed079a20a79f0c969d) CMakeLists.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -commit 7213fe39c717d4623c92af715484a71d9a6ff8d0 +commit 6bbec3bda02bf87d24fa095074456e723589921f +Author: Lasse Collin +Date: 2024-05-22 15:21:53 +0300 + + Mention license-check.sh in COPYING + + COPYING | 6 ++++++ + 1 file changed, 6 insertions(+) + +commit 62733592a1cc6f0b41f46ef52e06d1a6fe1ff38a Author: Lasse Collin Date: 2024-05-22 15:21:53 +0300 Use more confident language in COPYING - - (cherry picked from commit 62733592a1cc6f0b41f46ef52e06d1a6fe1ff38a) COPYING | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) -commit 15358be94a4e3f9c20f331b64b3980f3e5283760 +commit a119a4209e8827e1d7c2cfd30cb9f5a9b76f9dff +Author: Lasse Collin +Date: 2024-05-22 15:21:53 +0300 + + Build: Run license-check.sh in "mydist" and "dist-hook" + + In mydist the point is to check using the file list from the Git + repository. In dist-hook it is to check that the TARBALL_IGNORE + patterns work when the .git dir or the "git" command aren't available. + + Refuse to create a distribution tarball if license issues are found. + + Makefile.am | 2 ++ + 1 file changed, 2 insertions(+) + +commit f3434ecfcb45154508752986f4fc670b8f0555dc +Author: Lasse Collin +Date: 2024-05-22 15:21:53 +0300 + + Add build-aux/license-check.sh + + This helps in spotting files that lack SPDX license identifier + and which haven't been explicitly white listed either. The script + requires the .git directory to be present as only the files that + are in the Git repository are checked. + + XZ Utils isn't FSFE REUSE compliant for now. + + Makefile.am | 1 + + build-aux/license-check.sh | 174 +++++++++++++++++++++++++++++++++++++++++++++ + 2 files changed, 175 insertions(+) + +commit 9ae2ebc1e504a1814b0788de95fb5c58c0328dde Author: Lasse Collin Date: 2024-04-29 17:16:38 +0300 Add SPDX license identifiers to files under tests/ossfuzz - - (cherry picked from commit 9ae2ebc1e504a1814b0788de95fb5c58c0328dde) tests/ossfuzz/Makefile | 2 ++ tests/ossfuzz/config/fuzz_decode_alone.options | 2 ++ tests/ossfuzz/config/fuzz_decode_stream.options | 2 ++ tests/ossfuzz/config/fuzz_encode_stream.options | 2 ++ tests/ossfuzz/config/fuzz_lzma.dict | 2 ++ tests/ossfuzz/config/fuzz_xz.dict | 2 ++ 6 files changed, 12 insertions(+) -commit 1aa92c7ffd0bf8f9738ebf3bd1263bd6f5f096a2 +commit 9000d70eb9815bd7f43ffddc1c3316c507aa0e05 Author: Lasse Collin Date: 2024-04-29 17:16:06 +0300 Add SPDX license identifier to .codespellrc - - (cherry picked from commit 9000d70eb9815bd7f43ffddc1c3316c507aa0e05) .codespellrc | 2 ++ 1 file changed, 2 insertions(+) -commit 3c7e400fdcabc0a1b78863948fc17964667a9401 +commit 903c16fcfa5bfad0cdb2a7383d941243bcb12e76 Author: Lasse Collin Date: 2024-05-22 15:12:09 +0300 Move entries po4a/.gitignore to the top level .gitignore The po4a directory is in EXTRA_DIST and thus all files there are included in the package. .gitignore doesn't belong in the package so keep that file out of the po4a directory. - - (cherry picked from commit 903c16fcfa5bfad0cdb2a7383d941243bcb12e76) .gitignore | 4 ++++ po4a/.gitignore | 3 --- 2 files changed, 4 insertions(+), 3 deletions(-) -commit 8a99272d4a9358dabdb5bc0b72f4c5240a9dc066 +commit 56f1d5ed68e84ba5dfa328ea2291b8f46c995125 Author: Lasse Collin Date: 2024-05-20 16:55:00 +0300 - CMake: Add comments + Tests: Make the config.h grep patterns Meson compatible + + Now the test scripts detect both + + #define HAVE_DECODER_ARM + #define HAVE_DECODER_ARM 1 + + as support for the ARM filter without confusing it with these: + + #define HAVE_DECODER_ARM64 + #define HAVE_DECODER_ARM64 1 + + Previously only the ones ending with " 1" were accepted for + the macros where this kind of confusion was possible. + + This should help with Meson support because Meson's built-in + features produce config.h entries that are either + + #define FOO 1 + #define FOO 0 + + or: - (cherry picked from commit 9d997d6f9d4f042412e45c7b7a23a14ad2e4f9aa) + #define FOO + #undef FOO + + The former method has a benefit that one can use "#if FOO" and -Wundef + will catch if a #define is missing (for example, it helps catching + typos). But XZ Utils has to use the latter since it has been + convenient with Autoconf's default behavior.[*] While it's easy to + emulate the Autoconf style (#define FOO 1 vs. no #define at all) + in Meson, it results in clumsy code. Thus it's better to change + the few places in the tests where this difference matters. + + [*] While most checks in Autoconf default to the second style above, + a few things use the first style (like AC_CHECK_DECLS). The mix + of both styles is the most confusing as one has to remember which + macro needs #ifdef and which #if. Currently HAVE_VISIBILITY is + only such config.h entry that is 1 or 0. It comes unmodified + from Gnulib's visibility.m4. + + tests/test_compress.sh | 4 ++-- + tests/test_files.sh | 2 +- + 2 files changed, 3 insertions(+), 3 deletions(-) + +commit 9d997d6f9d4f042412e45c7b7a23a14ad2e4f9aa +Author: Lasse Collin +Date: 2024-05-20 16:55:00 +0300 + + CMake: Add comments tests/tests.cmake | 2 ++ 1 file changed, 2 insertions(+) -commit c35259c9e2400f6f88c269d95ecafdb223ff45d2 +commit d35368b33e54bad2f566df99fac29ffea38e34de Author: Lasse Collin Date: 2024-05-20 16:55:00 +0300 CMake: Remove the note that some tests aren't run They are now in the common build configurations. - - (cherry picked from commit d35368b33e54bad2f566df99fac29ffea38e34de) CMakeLists.txt | 2 -- 1 file changed, 2 deletions(-) -commit 30982a215395f19b3837c3da540e1cb3f913569f +commit dc232d584619b2819a9c52d6ad5d8b5d56b392ba Author: Lasse Collin Date: 2024-05-20 16:55:00 +0300 CMake: Add support for test_files.sh - - (cherry picked from commit dc232d584619b2819a9c52d6ad5d8b5d56b392ba) tests/tests.cmake | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) -commit 3a8f81e0ad4cd1c102a03ff09e703cf8cb074afc +commit a7e9230af9d1f87f474fe38886eb977d4149dc9b Author: Lasse Collin Date: 2024-05-20 16:55:00 +0300 Tests: Make test_files.sh more flexible Add a new optional argument to specify the directory of the xz and xzdec executables. If ../config.h doesn't exist, assume that all encoders and decoders are available. - - (cherry picked from commit a7e9230af9d1f87f474fe38886eb977d4149dc9b) tests/test_files.sh | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) -commit 0644675c829143112c85455f8a6aa91bfc4e1bbb +commit b40e6efbb48d740b9b5b303e59e344801cbb5bd8 Author: Lasse Collin Date: 2024-05-20 16:55:00 +0300 CMake: Add support for test_compress.sh tests - - (cherry picked from commit b40e6efbb48d740b9b5b303e59e344801cbb5bd8) tests/tests.cmake | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) -commit dcc02a6ca0e0ac4e330e820683754badbcf9815b +commit ac3222d2cb1ff3a15eb6d58f9ea9bc78e8bc3bb2 Author: Lasse Collin Date: 2024-05-20 16:55:00 +0300 Tests: Make test_compress.sh more flexible Add a new optional second argument: directory of the xz and xzdec executables. This is need with the CMake build where the binaries end up in the top-level build directory. If ../config.h doesn't exist, assume that all encoders and decoders are available. This will make this script usable from CMake in the most common build configuration. NOTE: Since the existence of ../config.h is checked, the working directory of the test script must be a subdir in the build tree! Otherwise ../config.h would look outside the build tree. Use the default check type instead of forcing CRC32 or CRC64. Now the script doesn't need to check if CRC64 is available. - - (cherry picked from commit ac3222d2cb1ff3a15eb6d58f9ea9bc78e8bc3bb2) tests/test_compress.sh | 41 +++++++++++++++++++++++++++++------------ 1 file changed, 29 insertions(+), 12 deletions(-) -commit c761b7051fb2ebb6da3cbecafe695fb5af7b2c9c +commit 006040b29c83104403621e950ada0c8956c56b3d Author: Lasse Collin Date: 2024-05-20 16:55:00 +0300 CMake: Prepare to support the test_*.sh tests This is a bit hacky since the scripts grep config.h to know which features were built but the CMake build doesn't create config.h. So instead those test scripts will be run only when all relevant features have been enabled. - - (cherry picked from commit 006040b29c83104403621e950ada0c8956c56b3d) tests/tests.cmake | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) -commit a71bc2d75b95f85fe046f0fd1fb25d36be2b20ba +commit 6167607a6ea72fb74eefb943c4566e3cab528cd2 Author: Lasse Collin Date: 2024-05-20 16:55:00 +0300 Tests: test_suffix.sh: Add a comment - - (cherry picked from commit 6167607a6ea72fb74eefb943c4566e3cab528cd2) tests/test_suffix.sh | 3 +++ 1 file changed, 3 insertions(+) -commit 8fda5ce872632e464a1f9660b3ab8dac939a03c6 +commit 4e9023857d287f624562156b60dc23d2b64c0f10 Author: Lasse Collin Date: 2024-05-18 00:34:07 +0300 Fix typos Thanks to xx on #tukaani. - - (cherry picked from commit 4e9023857d287f624562156b60dc23d2b64c0f10) src/common/mythread.h | 2 +- src/common/tuklib_integer.h | 2 +- src/liblzma/api/lzma/base.h | 2 +- src/liblzma/common/filter_buffer_decoder.c | 2 +- src/liblzma/common/filter_common.c | 2 +- src/scripts/xzgrep.in | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) -commit 2729079bcb8dd1c3ab1a79426690d17f6f8e6f7d +commit b14d08fbbc254485ace9ccfe7908674f608a62ae Author: Lasse Collin Date: 2024-05-18 00:23:52 +0300 liblzma: Fix white space Thanks to xx on #tukaani. - - (cherry picked from commit b14d08fbbc254485ace9ccfe7908674f608a62ae) src/liblzma/simple/simple_coder.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) -commit a289c4dfeb3ded35e129c48b13f46605f0138704 +commit 9f1a6d6f9a258886933a22239a5b81af34b28199 +Author: Lasse Collin +Date: 2024-05-15 23:14:17 +0300 + + Build: Temporarily disable CRC CLMUL to silence OSS Fuzz + + The code makes aligned 16-byte reads which may read up to 15 bytes + before the beginning or past the end of the buffer if the buffer + is misaligned. The unneeded bytes are then ignored. It cannot cross + page boundaries and thus cannot cause access violations. + + This inherently trips address sanitizer which was already disabled + with __attribute__((__no_sanitize_address__)). However, it also + trips memory sanitizer if the extra bytes are uninitialized because + memory sanitizer doesn't see that those bytes then get ignored by + byte shuffling in the xmm registers. + + The plan is to change the code so that all sanitizers pass but it's + not finished yet (performance shouldn't get worse) so as a temporary + measure to keep OSS Fuzz happy, the CLMUL CRC is now disabled even + though I think think the code is fine to use (and easy enough to review + the memory accesses in it too). + + configure.ac | 4 +++- + 1 file changed, 3 insertions(+), 1 deletion(-) + +commit 142e670a413a7bce1a2647f1cf1f33f8ee2dbe88 Author: Lasse Collin Date: 2024-05-13 17:15:04 +0300 xz: Document the static function get_chains_memusage() - - (cherry picked from commit 142e670a413a7bce1a2647f1cf1f33f8ee2dbe88) src/xz/coder.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) -commit 6f0db31713845386ce2419c55b2df89b53b80dd3 +commit 78e984399a64bfee5d11e7308e0bdbc1006db2ca Author: Lasse Collin Date: 2024-05-13 17:07:22 +0300 xz: Rename filters_memusage_max() to get_chains_memusage() - - (cherry picked from commit 78e984399a64bfee5d11e7308e0bdbc1006db2ca) src/xz/coder.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) -commit d7e2bf7e2dc9289a7a5dd0311d19d10de6d7ea1b +commit 54c3db0a83d3e67d89aba92a0957f2dce9b111a7 Author: Lasse Collin Date: 2024-05-13 17:04:05 +0300 xz: Rename filter_memusages to chains_memusages - - (cherry picked from commit 54c3db0a83d3e67d89aba92a0957f2dce9b111a7) src/xz/coder.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -commit 58f200b6d1dc4cbc1ab3315a359120ab6eb84878 +commit d9e1ae79ec90d6a7eafeaceaf0ece4f0c83d4417 Author: Lasse Collin Date: 2024-05-12 22:26:30 +0300 xz: Simplify the memory usage scaling code This is closer to what it was before the --filtersX support was added, just extended to support for scaling all filter chains. The method before this commit was an extended version of the original too but it was done in a more complex way for no clear reason. In case of an error, the complex version printed fewer informative messages (a good thing) but it's not a sigificant benefit. In the limit is too low even for single-threaded mode, the required amount of memory is now reported like in 5.4.x instead of like in 5.5.1alpha - 5.6.1 which showed the original non-scaled usage. It had been a FIXME in the old code but it's not clear what message makes the most sense. Fixes: 5f0c5a04388f8334962c70bc37a8c2ff8f605e0a - (cherry picked from commit d9e1ae79ec90d6a7eafeaceaf0ece4f0c83d4417) src/xz/coder.c | 163 ++++++++++++++++++++------------------------------------- 1 file changed, 57 insertions(+), 106 deletions(-) -commit 41bdc9fa5cc2fc2a70f4331329ac724773cc2f26 +commit 0ee56983d198b776878432703de664049b1be32e Author: Lasse Collin Date: 2024-05-13 12:14:00 +0300 xz: Edit comments - - (cherry picked from commit 0ee56983d198b776878432703de664049b1be32e) src/xz/coder.h | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) -commit 52e40c1912dfdbf8c7aa85e3a4c3eb138fa73d5d +commit ec82a49c3553f7206104582dbfb8b64fa433b491 Author: Lasse Collin Date: 2024-05-13 12:03:51 +0300 xz: Rename chain_idx to chain_num - - (cherry picked from commit ec82a49c3553f7206104582dbfb8b64fa433b491) src/xz/coder.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -commit 8a019633319c694423691f58c55fa23a46e45ded +commit a731a6993c34bbbd55abaf9c166718682b1da24f Author: Lasse Collin Date: 2024-05-12 22:29:11 +0300 xz: Edit coding style - - (cherry picked from commit a731a6993c34bbbd55abaf9c166718682b1da24f) src/xz/coder.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit e3ad7eda74caea29849e2e9ec01212f5f7d0f574 +commit 32eb176b89243fce3112347fe43a8ad14a9fd2be Author: Lasse Collin Date: 2024-05-12 22:16:05 +0300 xz: Edit comments Fixes: 5f0c5a04388f8334962c70bc37a8c2ff8f605e0a - (cherry picked from commit 32eb176b89243fce3112347fe43a8ad14a9fd2be) src/xz/coder.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) -commit 09cabae2ab47a06f6eee02419a815d4bfd0d9490 +commit b90339f4daa510d2b1b8c550f855a99667f1d004 Author: Lasse Collin Date: 2024-05-12 21:57:49 +0300 xz: Fix grammar in a comment Fixes: cb3111e3ed84152912b5138d690c8d9f00c6ef02 - (cherry picked from commit b90339f4daa510d2b1b8c550f855a99667f1d004) src/xz/coder.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit c10b66fbf9b2442741a1f052bdb4ce7009af9cda +commit 4c0bdaf13d651b22ba13bd93f8379724d6ccdc13 Author: Lasse Collin Date: 2024-05-12 21:46:56 +0300 xz: Rename filter_memusages to encoder_memusages - - (cherry picked from commit 4c0bdaf13d651b22ba13bd93f8379724d6ccdc13) src/xz/coder.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) -commit 9132ce3564b2c003bffd6de6294a3d98dccf314e +commit b54aa023e0ec291b06e976e5f094ab0549e7b09b Author: Lasse Collin Date: 2024-05-12 21:42:05 +0300 xz: Edit coding style - - (cherry picked from commit b54aa023e0ec291b06e976e5f094ab0549e7b09b) src/xz/coder.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) -commit d642e13874e93b03959d1de523f1c8ebe9428838 +commit 49f67d3d3f42b640a7dfc4ca04c8934f658e10ce Author: Lasse Collin Date: 2024-05-12 21:31:02 +0300 xz: Rename filters_index to chain_num The reason is the same as in bd0782c1f13e52cd0fd8415208e30e47004a4c68. - - (cherry picked from commit 49f67d3d3f42b640a7dfc4ca04c8934f658e10ce) src/xz/args.c | 8 ++++---- src/xz/coder.c | 8 ++++---- src/xz/coder.h | 2 +- 3 files changed, 9 insertions(+), 9 deletions(-) -commit 47599f3b73f0a2bc18e0a8367d723f1eb0f11b63 +commit ff9e8b3d069ecfa52ec43dcdb198542d1692a492 Author: Lasse Collin Date: 2024-05-12 21:22:43 +0300 xz: Replace a few uint32_t with "unsigned" to reduce the number of casts These hold only tiny values. - - (cherry picked from commit ff9e8b3d069ecfa52ec43dcdb198542d1692a492) src/xz/args.c | 2 +- src/xz/coder.c | 17 ++++++++--------- src/xz/coder.h | 2 +- 3 files changed, 10 insertions(+), 11 deletions(-) -commit 8f5ab75c454ea8676ed09c7f6eda8afe87b008ad +commit b5e6c1113b1ba02c282bd9163eccdb521c937a78 Author: Lasse Collin Date: 2024-05-12 21:10:45 +0300 xz: Rename filters_used_mask to chains_used_mask The reason is the same as in bd0782c1f13e52cd0fd8415208e30e47004a4c68. - - (cherry picked from commit b5e6c1113b1ba02c282bd9163eccdb521c937a78) src/xz/coder.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) -commit 3eb7cf9dd5b90a074f741234225d7de51ad88774 +commit 32500dfaadae2ea36fda2e17b49ae7d9ac1acf52 Author: Lasse Collin Date: 2024-05-12 17:14:43 +0300 xz: Move the setting of "check" in coder_set_compression_settings() It's more logical to do it in the beginning instead of in the middle of the filter chain handling. Fixes: d6af7f347077b22403133239592e478931307759 - (cherry picked from commit 32500dfaadae2ea36fda2e17b49ae7d9ac1acf52) src/xz/coder.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) -commit 067961ee0e1adaa66a43fbf8c3be31697554a839 +commit ad146b1f42bbb678175a503a45ce525e779f9b8b Author: Lasse Collin Date: 2024-05-12 17:09:17 +0300 xz: Rename "filters" to "chains" The convention is that lzma_filter filters[LZMA_FILTERS_MAX + 1]; contains the filters of a single filter chain. It was so here as well before the commit d6af7f347077b22403133239592e478931307759. It changes "filters" to a ten-element array of filter chains. It's clearer to call this array-of-arrays "chains". This also renames "filter_idx" to "chain_idx" which is used as an index as in chains[chain_idx]. - - (cherry picked from commit ad146b1f42bbb678175a503a45ce525e779f9b8b) src/xz/coder.c | 68 +++++++++++++++++++++++++++++----------------------------- 1 file changed, 34 insertions(+), 34 deletions(-) -commit 6822f6f891d43c97ea379a51223ce8ea69439161 +commit 5a4ae4e4d0105404184e9a82ee08f94e1b7783e0 Author: Lasse Collin Date: 2024-05-12 16:56:15 +0300 xz: Clean up a comment - - (cherry picked from commit 5a4ae4e4d0105404184e9a82ee08f94e1b7783e0) src/xz/coder.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) -commit 0e5e3e7bdcfcdc4b4607665ff0f6ad794e5195af +commit 2de80494ed9a4dc7db395a32a5efb770ce769804 Author: Lasse Collin Date: 2024-05-12 16:52:09 +0300 xz: Add clarifying assertions - - (cherry picked from commit 2de80494ed9a4dc7db395a32a5efb770ce769804) src/xz/coder.c | 4 ++++ 1 file changed, 4 insertions(+) -commit 77bcf6b76a26833923e62b2dec717474d5d44700 +commit 1eaad004bf7748976324672db028e34f42802e61 Author: Lasse Collin Date: 2024-05-10 20:23:33 +0300 xz: Add a clarifying assertion Fixes: 5f0c5a04388f8334962c70bc37a8c2ff8f605e0a - (cherry picked from commit 1eaad004bf7748976324672db028e34f42802e61) src/xz/coder.c | 1 + 1 file changed, 1 insertion(+) -commit df3efc058a256629ea0153b4750d3df308757038 +commit 605094329b986244833c967c04963cacc41a868d Author: Lasse Collin Date: 2024-05-12 16:47:17 +0300 xz: Clarify a comment - - (cherry picked from commit 605094329b986244833c967c04963cacc41a868d) src/xz/coder.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -commit 4ebfe11cd33439675f03e1e3725abf03d6f8251b +commit 8fac2577f2dbb9491afd8500f60d004c9071df3b Author: Lasse Collin Date: 2024-05-12 16:28:25 +0300 xz: Use the info collected in parse_block_list() This is slightly simpler and it avoids looping through the opt_block_list array. - - (cherry picked from commit 8fac2577f2dbb9491afd8500f60d004c9071df3b) src/xz/coder.c | 95 ++++++++++++++++++++++++---------------------------------- 1 file changed, 39 insertions(+), 56 deletions(-) -commit bfea6913618357a7034a1d79079bccb688262124 +commit 81d350dab864b985b740742772f3b132d4c52914 Author: Lasse Collin Date: 2024-05-12 15:48:45 +0300 xz: Remember the filter chains and the largest Block in parse_block_list() - - (cherry picked from commit 81d350dab864b985b740742772f3b132d4c52914) src/xz/args.c | 18 ++++++++++++++++++ src/xz/coder.c | 2 ++ src/xz/coder.h | 13 +++++++++++++ 3 files changed, 33 insertions(+) -commit d4e33e73922427a0f5277b91b239af538fd41c06 +commit 46ab56968f7dfdac187710a1223659d832fa1565 Author: Lasse Collin Date: 2024-05-12 15:38:48 +0300 xz: Update a comment and initialization of filters_used_mask - - (cherry picked from commit 46ab56968f7dfdac187710a1223659d832fa1565) src/xz/coder.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) -commit 3c130737c9bb4a5021bb14eb19e9ceae30ffef3a +commit e89293a0baeb8663707c6b4a74fbb310ec698a8f Author: Lasse Collin Date: 2024-05-12 15:08:10 +0300 xz: parse_block_list: Edit integer type casting - - (cherry picked from commit e89293a0baeb8663707c6b4a74fbb310ec698a8f) src/xz/args.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) -commit 40c8513b4ee42b8c0fae9b2a229e078ac7e0f87a +commit 87011e40c168255cd2edea129ee68c901770603b Author: Lasse Collin Date: 2024-05-12 14:51:37 +0300 xz: Make filter_memusages a local variable - - (cherry picked from commit 87011e40c168255cd2edea129ee68c901770603b) src/xz/coder.c | 35 +++++++++++++++++++++-------------- 1 file changed, 21 insertions(+), 14 deletions(-) -commit cacaf25aa71cd1110cc049d037c11e4075602c35 +commit 347b412a9374e0456bef9da0d7d79174c0b6f1a5 Author: Lasse Collin Date: 2024-05-10 20:33:08 +0300 xz: Remove unused code and simplify opt_mode == MODE_COMPRESS isn't possible when HAVE_ENCODERS isn't defined. Thus, when *encoding*, the message about *decoder* memory usage is possible to show only when both encoder and decoder have been built. Since the message is shown only at V_DEBUG, skip the memusage calculation if verbosity level isn't high enough. Fixes: 5f0c5a04388f8334962c70bc37a8c2ff8f605e0a - (cherry picked from commit 347b412a9374e0456bef9da0d7d79174c0b6f1a5) src/xz/coder.c | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-) -commit 3495a6b291f49079485854bb185a52c29d06cd2f +commit 31358c057c9de9d6aba96bae112b2d17942de7cb Author: Lasse Collin Date: 2024-05-10 20:22:58 +0300 xz: Fix integer type from uint64_t to uint32_t lzma_options_lzma.dict_size is uint32_t so use it here too. Fixes: 5f0c5a04388f8334962c70bc37a8c2ff8f605e0a - (cherry picked from commit 31358c057c9de9d6aba96bae112b2d17942de7cb) src/xz/coder.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit 2861d856deb557734f067c5c471d670f0b0c6684 +commit 3f71e0f3a118e1012526f94fd640a626d30cb599 Author: Lasse Collin Date: 2024-05-08 21:40:07 +0300 debug/translation.bash: Remove an outdated test command Since 5.3.5beta, "xz --lzma2=mf=bt4,nice=2" works even though bt4 needs at least nice=4. It is rounded up internally by liblzma when needed. Fixes: 5cd9f0df78cc4f8a7807bf6104adea13034fbb45 - (cherry picked from commit 3f71e0f3a118e1012526f94fd640a626d30cb599) debug/translation.bash | 1 - 1 file changed, 1 deletion(-) -commit 54546babc3feb2786e541b80f9e7216b8f1bd543 +commit b05a516830095a0e1937aeb31c937fb0400408b6 Author: Lasse Collin Date: 2024-05-07 20:41:28 +0300 Fix the date of NEWS for 5.4.5 - - (cherry picked from commit b05a516830095a0e1937aeb31c937fb0400408b6) NEWS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit a7e58d1fdb493d58854ac599347cf64da0cecca4 +commit 6d336aeb97b69c496ddc626af403f6f21c753658 Author: Lasse Collin Date: 2024-05-07 16:21:15 +0300 Build: Update visibility.m4 from Gnulib This fixes the syntax of the "serial" line and renames a temporary variable. - - (cherry picked from commit 6d336aeb97b69c496ddc626af403f6f21c753658) m4/visibility.m4 | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) -commit 07a9cda037042b262ba6c8c18fae4a5b3333d508 +commit ab51e8ee610e2a893906859848f93d5cb0d5ba83 Author: Lasse Collin Date: 2024-05-07 15:05:21 +0300 po4a/update-po: Delete the *.po.authors files These are temporary files that are needed only when running po4a. The top-level Makefile.am puts the whole po4a directory into distribution tarball (it's simpler) so deleting these temporary files is needed to prevent them from getting into tarballs. - - (cherry picked from commit ab51e8ee610e2a893906859848f93d5cb0d5ba83) po4a/update-po | 4 ++++ 1 file changed, 4 insertions(+) -commit 1b4e7dca243d8ef297a245b5ee3ce9cd1ca20f56 +commit e4780244a17420cc95d5498cd6e02ad10eac6e5f Author: Lasse Collin Date: 2024-05-07 13:12:17 +0300 xz: Edit comments and coding style - - (cherry picked from commit e4780244a17420cc95d5498cd6e02ad10eac6e5f) src/xz/coder.c | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) -commit 18683525a78e96ec6d7c2b4e841e94ad39be7096 +commit fe4d8b0c80eaeca3381be302eeb89aba871a7e7c Author: Lasse Collin Date: 2024-05-06 23:08:22 +0300 xz: Omit an incorrect comment It likely was a leftover from a development version of the code. Fixes: 183819bfd9efac8c184d9bf123325719b7eee30f - (cherry picked from commit fe4d8b0c80eaeca3381be302eeb89aba871a7e7c) src/xz/coder.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) -commit 005f0398645b0342c9c1915d422743c77ec1d435 +commit 9bef5b8d17dd5e009d6a6b2becc2dc535da53937 Author: Lasse Collin Date: 2024-05-06 23:04:31 +0300 xz: Add braces to a for-statement and to an if-statement No functional changes. Fixes: 5f0c5a04388f8334962c70bc37a8c2ff8f605e0a Fixes: 479fd58d60622331fcbe48fddf756927b9f80d9a - (cherry picked from commit 9bef5b8d17dd5e009d6a6b2becc2dc535da53937) src/xz/coder.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) -commit 34be4e6aa62376314fde250ea4f142c18274272f +commit de06b9f0c0a3f72569829ecadbc9c0a3ef099f57 Author: Lasse Collin Date: 2024-05-06 23:00:09 +0300 liblzma: Omit an unneeded array from the x86 filter Fixes: 6aa2a6deeba04808a0fe4461396e7fb70277f3d4 - (cherry picked from commit de06b9f0c0a3f72569829ecadbc9c0a3ef099f57) src/liblzma/simple/x86.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) -commit 79e329b771210c30ea317dd4d99e8968f3e6f9b2 +commit 7da488cb933fdf51cfc14cb5810beb0766224380 Author: Lasse Collin Date: 2024-05-06 22:56:31 +0300 CMake: Add test_suffix.sh to the tests - - (cherry picked from commit 7da488cb933fdf51cfc14cb5810beb0766224380) tests/tests.cmake | 13 +++++++++++++ 1 file changed, 13 insertions(+) -commit 86f33bb90c6cfe6950f1d36c9e5dd7fdc9798124 +commit a805594ed0b4cbf7b81aa28ff46a8ab3c83c6876 Author: Lasse Collin Date: 2024-05-06 22:55:54 +0300 Test: Add CMake support to test_suffix.sh It needs to find the xz executable from a different directory and work without config.h. - - (cherry picked from commit a805594ed0b4cbf7b81aa28ff46a8ab3c83c6876) tests/test_suffix.sh | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) -commit 1e243ab378e8f78ebb3af741fb38354954cf20f9 +commit 50e19489387774bab3c4a988397d0d9c7a142a46 Author: Lasse Collin Date: 2024-05-06 20:45:34 +0300 Update INSTALL about MINIX 3 The latest stable is 3.3.0 and it's from 2014. Don't mention the older versions in INSTALL. 3.3.0 ships with Clang already. Testing with 3.4.0beta6 shows that tuklib_physmem works too so omit comments about that from INSTALL. Visibility warnigns weren't a problem either. Thus it's enough to mention the need for --disable-threads as configure doesn't autodetect the lack of pthreads. - - (cherry picked from commit 50e19489387774bab3c4a988397d0d9c7a142a46) INSTALL | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-) -commit 8595b5ab3ba766eb6daed890bfe91a16fe329c2c +commit 68d18aea1422a2b86b98b71d0b019233d84e01b0 Author: Lasse Collin Date: 2024-05-02 23:00:16 +0300 Windows: Remove the "doc/api" line from README-Windows.txt Fixes: 252aa1d67bc015eeba462803ab72edeb7744d864 - (cherry picked from commit 68d18aea1422a2b86b98b71d0b019233d84e01b0) windows/README-Windows.txt | 2 -- 1 file changed, 2 deletions(-) -commit a3f163a4ad97189744107e964e4dea505fbcc252 +commit 8ede961374613aa302a13571d662cfaea1cf91f7 Author: Lasse Collin Date: 2024-05-02 22:59:04 +0300 Build: Don't copy doc/api from source tree to distribution tarball It was copied if it existed. This was intentional when autogen.sh still built liblzma API docs with Doxygen. Fixes: d3a77ebc04bf1db8d52de2d9b0f07877bc4fd139 - (cherry picked from commit 8ede961374613aa302a13571d662cfaea1cf91f7) Makefile.am | 5 ----- 1 file changed, 5 deletions(-) -commit cb0e847fe07099c1ef6d8076f6a46e17bc431acb +commit 9a6761aa35ed84d30bd2fda2333a4fdf3f46ecdc Author: Sam James Date: 2024-05-02 13:26:40 +0100 ci: add SPDX headers I've checked over each of these and they're straightforward applications of the relevant Github Actions. - - (cherry picked from commit 9a6761aa35ed84d30bd2fda2333a4fdf3f46ecdc) .github/workflows/freebsd.yml | 2 ++ .github/workflows/netbsd.yml | 2 ++ .github/workflows/openbsd.yml | 2 ++ 3 files changed, 6 insertions(+) -commit c3c854dc759fe0c5549aa0a730be9e259243edb6 +commit 81efe6119f86e3274e512c9eca5ec22b2196c2b3 Author: Yaroslav Halchenko Date: 2024-03-29 14:37:24 -0400 codespell: Ignore the THANKS file and debbugs.gnu.org URL This way "codespell -i 0" is silent. This is the first commit from https://github.com/tukaani-project/xz/pull/93 with trivial edits by Lasse Collin. - - (cherry picked from commit 81efe6119f86e3274e512c9eca5ec22b2196c2b3) .codespellrc | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) -commit 3216301aa20fcf9d5a7485e35a295d5c451d9658 +commit 905bfc74fe2670fd9c39014803017ab53d325401 Author: Lasse Collin Date: 2024-04-30 14:37:11 +0300 Add .gitattributes to clean up git-archive output - - (cherry picked from commit 905bfc74fe2670fd9c39014803017ab53d325401) .gitattributes | 7 +++++++ 1 file changed, 7 insertions(+) -commit f99e7c69ada9e0db0ee1ebbc38c8ce9390cd9788 +commit 3334c71d3d4294a4f6569df3ba9bcf2443dfa501 Author: Lasse Collin Date: 2024-04-19 12:11:09 +0300 xzdec: Support Landlock ABI version 4 This was added to xz in 02e3505991233901575b7eabc06b2c6c62a96899 but I forgot to do the same in xzdec. The Landlock sandbox in xzdec could be stricter as now it's active only for the last file being decompressed. In xz, read-only sandbox is used for multi-file case. On the other hand, xz doesn't go to the strictest mode when processing the last file when more than one file was specified; xzdec does. - - (cherry picked from commit 3334c71d3d4294a4f6569df3ba9bcf2443dfa501) src/xzdec/xzdec.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) -commit bfe9be7a46cfd3b3069c15f7ba1432192bca1f5b +commit 278563ef8f2b8d98d7f2c85e1a64ec1bc21d26d8 Author: Lasse Collin Date: 2024-04-30 22:22:45 +0300 liblzma: Fix incorrect function type error from sanitizer Clang 17 with -fsanitize=address,undefined: src/liblzma/common/filter_common.c:366:8: runtime error: call to function encoder_find through pointer to incorrect function type 'const lzma_filter_coder *(*)(unsigned long)' src/liblzma/common/filter_encoder.c:187: note: encoder_find defined here Use a wrapper function to get the correct type neatly. This reduces the number of casts needed too. This issue could be a problem with control flow integrity (CFI) methods that check the function type on indirect function calls. Fixes: 3b34851de1eaf358cf9268922fa0eeed8278d680 - (cherry picked from commit 278563ef8f2b8d98d7f2c85e1a64ec1bc21d26d8) src/liblzma/common/filter_decoder.c | 15 ++++++++++++--- src/liblzma/common/filter_encoder.c | 17 +++++++++++++---- 2 files changed, 25 insertions(+), 7 deletions(-) -commit 882eadc5b820b6b1495fc91ba3573ac2aa6c1df3 +commit 77c8f60547decefca8f2d0c905d9c708c38ee8ff Author: Lasse Collin Date: 2024-04-30 21:41:11 +0300 xz: Avoid arithmetic on a null pointer It's undefined behavior. The result wasn't ever used as it occurred in the last iteration of a loop. Clang 17 with -fsanitize=address,undefined: $ src/xz/xz --block-list=123 src/xz/args.c:164:12: runtime error: applying non-zero offset 1 to null pointer Fixes: 88ccf47205d7f3aa314d358c72ef214f10f68b43 Co-authored-by: Sam James - (cherry picked from commit 77c8f60547decefca8f2d0c905d9c708c38ee8ff) src/xz/args.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) -commit ec5458e1c9b2beb416781e81ad4ff22b0149b99d +commit 64503cc2b76a388ced4ec5f68234a07f0dcddcd5 Author: Lasse Collin Date: 2024-04-27 20:42:00 +0300 CMake: Support building liblzma API docs using Doxygen This is disabled by default to match the default in Autotools. Use -DUSE_DOXYGEN=ON to enable Doxygen usage. This uses the update-doxygen script, thus this is under if(UNIX) although Doxygen itself can run on Windows too. - - (cherry picked from commit 64503cc2b76a388ced4ec5f68234a07f0dcddcd5) CMakeLists.txt | 40 +++++++++++++++++++++++++++++++--------- 1 file changed, 31 insertions(+), 9 deletions(-) -commit 8c93ced56bcb23df723dab23b7477d580720f522 +commit 0a7f5a80d8532a1d8cfa0a902c9d1ad7651eca37 Author: Lasse Collin Date: 2024-04-20 23:36:39 +0300 CMake: List API headers in LIBLZMA_API_HEADERS variable This way the same list will be usable in more than one location. - - (cherry picked from commit 0a7f5a80d8532a1d8cfa0a902c9d1ad7651eca37) CMakeLists.txt | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) -commit f7c9bab0372db357511e42c9c610a2cfe5fca9b1 +commit 541406bee3f09e9813103c6406b10fc6ab2e0d30 Author: Lasse Collin Date: 2024-04-19 15:16:42 +0300 PACKAGERS: Document the optional Doxygen usage Also add a note that packagers should check the licensing of the Doxygen output. - - (cherry picked from commit 541406bee3f09e9813103c6406b10fc6ab2e0d30) PACKAGERS | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) -commit 28e7d130cb843e96d7e6b0358f8dd58bd1b2a275 +commit e21efdf96f39378fe417479f89e97046680406f5 Author: Lasse Collin Date: 2024-04-27 17:47:09 +0300 Build: Add --enable-doxygen to generate and install API docs It requires Doxygen. This option is disabled by default. - - (cherry picked from commit e21efdf96f39378fe417479f89e97046680406f5) INSTALL | 6 ++++++ configure.ac | 10 +++++++++- src/liblzma/api/Makefile.am | 19 +++++++++++++++++++ 3 files changed, 34 insertions(+), 1 deletion(-) -commit cca7e6c05bc6cc51c0271c36856b7fe29f65c648 +commit 0ece09a575d7e542bda8825808ddd6cf7de8cc4b Author: Lasse Collin Date: 2024-04-19 15:15:17 +0300 Doxygen: update-doxygen: Support out-of-tree builds Also, now $0 is used to refer to the script itself. - - (cherry picked from commit 0ece09a575d7e542bda8825808ddd6cf7de8cc4b) doxygen/update-doxygen | 110 ++++++++++++++++++++++++++++++------------------- 1 file changed, 68 insertions(+), 42 deletions(-) -commit 8090d3dc7f0eea4a3a61f4f6d46a0d0866e345fe +commit 2c519f641f266fd897edf680827d9c905f411440 Author: Lasse Collin Date: 2024-04-28 21:08:00 +0300 Doxygen: Simplify Doxyfile and add SPDX license identifier This omits all comments and a few non-default options that weren't needed. Now it contains no copyrighted content from Doxygen itself. - - (cherry picked from commit 2c519f641f266fd897edf680827d9c905f411440) doxygen/Doxyfile | 2698 +----------------------------------------------------- 1 file changed, 25 insertions(+), 2673 deletions(-) -commit 0721b8bfe558502669f06c97601fe59ad0d52541 +commit bdba39a57530d11b88440df8024002be3d09e4a1 Author: Lasse Collin Date: 2024-04-19 15:14:02 +0300 Doxygen: Don't strip JavaScript anymore The stripping method worked well with Doxygen 1.8 and 1.9 but it doesn't work with Doxygen 1.10 anymore. Since we won't ship pre-generated liblzma API docs anymore, the extra bloat and extra license info of the JavaScript files won't affect the upstream source package anymore. - - (cherry picked from commit bdba39a57530d11b88440df8024002be3d09e4a1) doxygen/update-doxygen | 21 --------------------- 1 file changed, 21 deletions(-) -commit 1ddb40f6fd286c3c6ef510735112db1ac1b60936 +commit d3a77ebc04bf1db8d52de2d9b0f07877bc4fd139 Author: Lasse Collin Date: 2024-04-19 17:26:41 +0300 Build: Remove old Doxygen rules from top-level Makefile.am - - (cherry picked from commit d3a77ebc04bf1db8d52de2d9b0f07877bc4fd139) Makefile.am | 12 ------------ 1 file changed, 12 deletions(-) -commit 092af76234b1bc79380427456b3215aa0b80f339 +commit fd7faa4c338a42a6a40e854b837d285ae2e8c609 Author: Lasse Collin Date: 2024-04-19 15:10:06 +0300 Update COPYING to match the autogen.sh and mydist changes - - (cherry picked from commit fd7faa4c338a42a6a40e854b837d285ae2e8c609) COPYING | 11 ----------- 1 file changed, 11 deletions(-) -commit 77bce9a0a250cfb20333ee0dca036b3193dd4941 +commit b2bc55d8a0a9f2f59bfd4302067300e650f6baa3 Author: Lasse Collin Date: 2024-04-19 17:23:43 +0300 Build: Don't run update-doxygen as part of "make mydist" - - (cherry picked from commit b2bc55d8a0a9f2f59bfd4302067300e650f6baa3) Makefile.am | 1 - 1 file changed, 1 deletion(-) -commit 3a2fc62f59b2e8cc45f8d8fd9988b4305efe4bff +commit e9be74f5b129fe8a5388d588e68b1b7f5168a310 Author: Lasse Collin Date: 2024-04-19 15:09:48 +0300 autogen.sh: Don't generated Doxygen docs anymore - - (cherry picked from commit e9be74f5b129fe8a5388d588e68b1b7f5168a310) autogen.sh | 18 +++--------------- 1 file changed, 3 insertions(+), 15 deletions(-) -commit b04c16f9a5a8675a87783305568cadfa3f17d999 +commit 252aa1d67bc015eeba462803ab72edeb7744d864 Author: Lasse Collin Date: 2024-04-19 17:41:36 +0300 windows/build.bash: Omit Doxygen docs from the package They will be omitted from the source tarball and I don't want to make Doxygen a dependency of build.bash. - - (cherry picked from commit 252aa1d67bc015eeba462803ab72edeb7744d864) windows/build.bash | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -commit d4dd3c8f6169adf50cad8fe6872e0f5fcb82475c +commit 634095364d87444d62d8ec54c134c0cd4705f5d7 Author: Lasse Collin Date: 2024-04-19 14:14:47 +0300 README: Don't mention PDF man pages anymore - - (cherry picked from commit 634095364d87444d62d8ec54c134c0cd4705f5d7) README | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -commit be90720d6cd7fbb1b170794445815f579b444a6f +commit dc684bf76ea23574ee9d88382057381e04e6089a Author: Lasse Collin Date: 2024-04-19 14:10:39 +0300 Build: Omit PDF man pages from the package pdf-local rule was added to create the PDFs still with "make pdf". The install rules are missing but that likely doesn't matter at all. - - (cherry picked from commit dc684bf76ea23574ee9d88382057381e04e6089a) Makefile.am | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-) -commit f724552d0c1ae2e3aa693d80d8d0da962dfac4e8 +commit e3531ab4125cbd5c01ebd3200791350960547189 Author: Lasse Collin Date: 2024-04-19 13:54:39 +0300 windows/build.bash: Don't copy PDF man pages to the package - - (cherry picked from commit e3531ab4125cbd5c01ebd3200791350960547189) windows/README-Windows.txt | 2 +- windows/build.bash | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) -commit 00e774819c6550a8eac219e9f6f083ab2b155505 +commit 710a4573ef2cbd19c66318c3b2d1388e418e26c7 Author: Lasse Collin Date: 2024-04-28 01:34:50 +0300 Tests: test_index: Fix failures when features are disabled Fixes: cd88423e76d54eb72aea037364f3ebb21f122503 - (cherry picked from commit 710a4573ef2cbd19c66318c3b2d1388e418e26c7) tests/test_index.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) -commit 51133ad71eecc19bdb3ab287a0732fd9441753f4 +commit aaff75c3486c4489ce88b0efb36b41cf138af7c3 Author: Lasse Collin Date: 2024-04-20 17:09:11 +0300 CMake: Keep the build working if the "tests" directory is missing This moves the tests section as is from CMakeLists.txt into tests/tests.cmake. CMakeLists.txt now includes tests/tests.cmake if the latter file exists. Now it's possible to delete the whole "tests" directory and building with CMake will still work normally, just without the tests. This way the tests are readily available for those who want them, and those who won't run the tests anyway have a straightforward way to ensure that nothing from the "tests" directory can affect the build process. - - (cherry picked from commit aaff75c3486c4489ce88b0efb36b41cf138af7c3) CMakeLists.txt | 76 ++--------------------------------------------- tests/Makefile.am | 1 + tests/tests.cmake | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 92 insertions(+), 73 deletions(-) -commit 85b5595b67f0081b2a900104ed7589de4bb75e12 +commit a5f2aa5618fe9183706c9c514c3067985f6c338b Author: Lasse Collin Date: 2024-04-20 13:12:50 +0300 Tests: Remove x86 and SPARC BCJ tests These are very old but the exact test file isn't easy to reproduce as it was compiled from a short C program (bcj_test.c) long ago. These tests weren't very good anyway, just a little better than nothing. - - (cherry picked from commit a5f2aa5618fe9183706c9c514c3067985f6c338b) tests/Makefile.am | 7 ---- tests/bcj_test.c | 64 --------------------------------- tests/compress_prepared_bcj_sparc | Bin 1240 -> 0 bytes tests/compress_prepared_bcj_x86 | Bin 1388 -> 0 bytes tests/files/README | 8 ----- tests/files/good-1-sparc-lzma2.xz | Bin 612 -> 0 bytes tests/files/good-1-x86-lzma2.xz | Bin 716 -> 0 bytes tests/test_compress_prepared_bcj_sparc | 4 --- tests/test_compress_prepared_bcj_x86 | 4 --- 9 files changed, 87 deletions(-) -commit d8228d1ea08155a17acaadd76ed95805d3b0a929 +commit d879686469c9c4bf2a7c0bb6420ebe4530fc8f07 Author: Lasse Collin Date: 2024-04-27 18:30:40 +0300 Tests: test_index: Edit a misleading test - - (cherry picked from commit d879686469c9c4bf2a7c0bb6420ebe4530fc8f07) tests/test_index.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) -commit 2358ef8238f166c49e66f438e7494d4d352eb113 +commit 612005bbdb0dea9dc09e9e2e9cc16a15c1480acd Author: Lasse Collin Date: 2024-04-27 16:46:01 +0300 Tests: test_index: Use minimal values to test integer overflow - - (cherry picked from commit 612005bbdb0dea9dc09e9e2e9cc16a15c1480acd) tests/test_index.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -commit 54f4a4162aae8796580489013583d6148be5a473 +commit 4ad88b2544c2aaf8de8f38af54587098cbe66c1d Author: Lasse Collin Date: 2024-04-27 15:13:39 +0300 Tests: test_index: Test lzma_index_buffer_decode() more - - (cherry picked from commit 4ad88b2544c2aaf8de8f38af54587098cbe66c1d) tests/test_index.c | 29 ++++++++++++++++++++++++++--- 1 file changed, 26 insertions(+), 3 deletions(-) -commit 85ab59a6b70db33f320a3ea7a854249cb693dea2 +commit 575b11b0d291e66c5fce31ce7a72f11436d57c83 Author: Lasse Collin Date: 2024-04-27 15:08:29 +0300 Tests: test_index: Test that *i = NULL is done on LZMA_PROG_ERROR On LZMA_DATA_ERROR from lzma_index_buffer_decode(), *i = NULL was already done but this adds a test for that case too. - - (cherry picked from commit 575b11b0d291e66c5fce31ce7a72f11436d57c83) tests/test_index.c | 31 +++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-) -commit fb42599e44dde417305c7d92fd782147ca923079 +commit 2c970debdb285823f01f75e875561d893345ac2b Author: Lasse Collin Date: 2024-04-27 15:01:25 +0300 Tests: test_index: Test lzma_index_buffer_encode() with empty output buf - - (cherry picked from commit 2c970debdb285823f01f75e875561d893345ac2b) tests/test_index.c | 3 +++ 1 file changed, 3 insertions(+) -commit 20cac20f63a96a39391f2d613bef0f7bd6553495 +commit cd88423e76d54eb72aea037364f3ebb21f122503 Author: Lasse Collin Date: 2024-04-27 14:59:55 +0300 Tests: test_index: Replace if-statements with tuktest assertions - - (cherry picked from commit cd88423e76d54eb72aea037364f3ebb21f122503) tests/test_index.c | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-) -commit 91e3ea8735752db5d0373991e84607196070aeaa +commit 7f865577a6224fbbb5f5ca52574b62ea8ac9bf51 Author: Lasse Collin Date: 2024-04-27 14:56:16 +0300 Tests: test_index: Make it clear that my_alloc() has no integer overflows liblzma guarantees that the product of the allocation size arguments will fit in size_t. Putting the pre-increment in the if-statement was clearly wrong although in practice it didn't matter here as the function is called only a couple of times. - - (cherry picked from commit 7f865577a6224fbbb5f5ca52574b62ea8ac9bf51) tests/test_index.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) -commit df1659a6c8367db69e82e2ea59ad5f959cf4e615 +commit 12313a3b6596cdcf012e180597f84d231f8730d3 Author: Lasse Collin Date: 2024-04-27 14:51:52 +0300 Tests: test_index: Verify also iter.block.number_in_stream - - (cherry picked from commit 12313a3b6596cdcf012e180597f84d231f8730d3) tests/test_index.c | 2 ++ 1 file changed, 2 insertions(+) -commit e083e95dbfda73900109cca4c82c8713d0a1da21 +commit ad2654010d9d641ce1601beeff00630027e6bcd4 Author: Lasse Collin Date: 2024-04-27 14:51:06 +0300 Tests: test_index: Check cases that aren't a multiple of 4 bytes - - (cherry picked from commit ad2654010d9d641ce1601beeff00630027e6bcd4) tests/test_index.c | 33 +++++++++++++++++++++++++-------- 1 file changed, 25 insertions(+), 8 deletions(-) -commit b0d3b86ecf1881d10e6614b64b0fcc6c16a3b08f +commit 2524fcf2b68b662035437cee8edbe80067c0c240 Author: Lasse Collin Date: 2024-04-27 14:40:25 +0300 Tests: test_index: Edit comments and white space - - (cherry picked from commit 2524fcf2b68b662035437cee8edbe80067c0c240) tests/test_index.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) -commit bae288ea6ffb976c36e2387c03d75ce84a8a1034 +commit 71eed2520e2eecae89bade9dceea16e56cfa2ea0 Author: Lasse Collin Date: 2024-04-27 14:33:38 +0300 liblzma: index_decoder: Fix missing initializations on LZMA_PROG_ERROR If the arguments to lzma_index_decoder() or lzma_index_buffer_decode() were such that LZMA_PROG_ERROR was returned, the lzma_index **i argument wasn't touched even though the API docs say that *i = NULL is done if an error occurs. This obviously won't be done even now if i == NULL but otherwise it is best to do it due to the wording in the API docs. In practice this matters very little: The problem can occur only if the functions are called with invalid arguments, that is, the calling application must already have a bug. - - (cherry picked from commit 71eed2520e2eecae89bade9dceea16e56cfa2ea0) src/liblzma/common/index_decoder.c | 11 +++++++++++ 1 file changed, 11 insertions(+) -commit f10cb93f335900a29e50f990b751996ef026b3a3 +commit 0478473953f50716a2bc37b619b1c7dc2682b1ad Author: Lasse Collin Date: 2024-04-26 18:25:18 +0300 CMake: Bump maximum policy version to 3.29 - - (cherry picked from commit 0478473953f50716a2bc37b619b1c7dc2682b1ad) CMakeLists.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit 59055d70cdd3df091264ae9da793821bfd65314d +commit a607e2b40d23f7d998dbaba76692aa30b4c3d9d3 Author: Sam James Date: 2024-04-13 22:30:44 +0100 ci: add NetBSD - - (cherry picked from commit a607e2b40d23f7d998dbaba76692aa30b4c3d9d3) .github/workflows/netbsd.yml | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) -commit 812c1f95f37751aaa1e020fc2360949a674842fd +commit 72c210336de26fb87a928160d025fa10a638d23b Author: Sam James Date: 2024-04-13 23:49:26 +0100 ci: add FreeBSD - - (cherry picked from commit 72c210336de26fb87a928160d025fa10a638d23b) .github/workflows/freebsd.yml | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) -commit d2a4f963c28b864aa179464f7827cc10c6e1365d +commit b526ec2dbfb5889845ea60548c4f5b1f97d84ab2 Author: Sam James Date: 2024-04-13 23:16:08 +0100 ci: add OpenBSD - - (cherry picked from commit b526ec2dbfb5889845ea60548c4f5b1f97d84ab2) .github/workflows/openbsd.yml | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) -commit 493bc57c33385bda5ad32d01ab73dcfe8f5e7ced +commit c7ef767c49351743d8d011574abb9e200bf6b24f Author: Sam James Date: 2024-04-15 05:53:01 +0100 liblzma: outqueue: add header guard Reported by github's codeql. - - (cherry picked from commit c7ef767c49351743d8d011574abb9e200bf6b24f) src/liblzma/common/outqueue.h | 5 +++++ 1 file changed, 5 insertions(+) -commit cede418d4f8e1fb4c8a30839fa5d3b14743e83d4 +commit 55dcae3056d95cb2ddb8b560c12ba7596bc79f2c Author: Sam James Date: 2024-04-15 05:53:56 +0100 liblzma: easy_preset: add header guard Reported by github's codeql. - - (cherry picked from commit 55dcae3056d95cb2ddb8b560c12ba7596bc79f2c) src/liblzma/common/easy_preset.h | 5 +++++ 1 file changed, 5 insertions(+) -commit 6e76a25df28b47407a201bf0381fa6d3c80cb0bb +commit 4ffc60f32397371769b7d6b5e3ed8626292d58df Author: Lasse Collin Date: 2024-04-25 14:00:57 +0300 tuklib_integer: Rename bswapXX to byteswapXX The __builtin_bswapXX from GCC and Clang are preferred when they are available. This can allow compilers to emit the x86 MOVBE instruction instead of doing a load + byteswap as two instructions (which would happen if the byteswapping is done in inline asm). bswap16, bswap32, and bswap64 exist in system headers on *BSDs and Darwin. #defining bswap16 on NetBSD results in a warning about macro redefinition. It's safest to avoid this namespace conflict completely. No OS supported by tuklib_integer.h uses byteswapXX names and a web search doesn't immediately find any obvious danger of namespace conflicts. So let's try these still-pretty-short names for the macros. Thanks to Sam James for pointing out the compiler warning on NetBSD 10.0. - - (cherry picked from commit 4ffc60f32397371769b7d6b5e3ed8626292d58df) src/common/tuklib_integer.h | 47 ++++++++++++++++++++------------------ src/liblzma/check/crc32_fast.c | 4 ++-- src/liblzma/check/crc32_tablegen.c | 2 +- src/liblzma/check/crc64_fast.c | 4 ++-- src/liblzma/check/crc64_tablegen.c | 2 +- 5 files changed, 31 insertions(+), 28 deletions(-) -commit 0ca14871f306b97ce81bfe44c4a39b6b2af31bb3 +commit 08ab0966a75b501aa7c717622223f0c13a113c75 Author: Lasse Collin Date: 2024-04-24 01:20:26 +0300 liblzma: API doc cleanups - - (cherry picked from commit 08ab0966a75b501aa7c717622223f0c13a113c75) src/liblzma/api/lzma/container.h | 2 +- src/liblzma/api/lzma/index.h | 6 +++--- src/liblzma/api/lzma/vli.h | 5 ++--- 3 files changed, 6 insertions(+), 7 deletions(-) -commit 94a462850bc8718f5dd5b30116bce2165b2403c2 +commit 3ac8a9bb4cccbee88350696dc9c645c48d77c989 Author: Lasse Collin Date: 2024-04-23 16:35:33 +0300 Tests: test_filter_str: Add a few assertions - - (cherry picked from commit 3ac8a9bb4cccbee88350696dc9c645c48d77c989) tests/test_filter_str.c | 4 ++++ 1 file changed, 4 insertions(+) -commit 72058ca22a7f3c9c67ed58be624f8302c6337cd7 +commit 26c69be80523b05c84dea86c47c4ddd9a10945d7 Author: Lasse Collin Date: 2024-04-23 16:35:08 +0300 Tests: test_filter_str: Move one assertion and add a comment - - (cherry picked from commit 26c69be80523b05c84dea86c47c4ddd9a10945d7) tests/test_filter_str.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) -commit c59ebbe1c6dd18b78a046aae3133702dd52c352e +commit 4f6af853bc99904efb8b6c28a0af7b81a8476c1b Author: Lasse Collin Date: 2024-04-23 16:26:06 +0300 Tests: test_filter_str: Tweak comments and white space - - (cherry picked from commit 4f6af853bc99904efb8b6c28a0af7b81a8476c1b) tests/test_filter_str.c | 3 +++ 1 file changed, 3 insertions(+) -commit ceda860934b0272689d0722ceeb490cf9c559956 +commit c92663aa1bd576e0615498a4189acf0df12e84b9 Author: Lasse Collin Date: 2024-04-23 16:25:22 +0300 Tests: test_filter_str: Add missing RISC-V case Fixes: 89ea1a22f4ed3685b053b7260bc5acf6c75d1664 - (cherry picked from commit c92663aa1bd576e0615498a4189acf0df12e84b9) tests/test_filter_str.c | 3 +++ 1 file changed, 3 insertions(+) -commit 2234b7cc472e62f3401216a71261579342fa2959 +commit b0366df1d7ed26268101f9303a001c91c0806dfc Author: Lasse Collin Date: 2024-04-22 22:23:32 +0300 Tests: test_filter_str: Test *error_pos more thoroughly - - (cherry picked from commit b0366df1d7ed26268101f9303a001c91c0806dfc) tests/test_filter_str.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 76 insertions(+), 1 deletion(-) -commit 3ba3ef57f929670adb1f9c5e5207a81a29374237 +commit 70d12dd069bb9bb0d6bb1c8fafc4e6f77780263d Author: Lasse Collin Date: 2024-04-22 21:54:39 +0300 liblzma: lzma_str_to_filters: Set *error_pos on all errors The API docs clearly say that if error_pos isn't NULL then *error is always set on any error. However, it wasn't touched if str == NULL or filters == NULL or unsupported flags were specified. Fixes: cedeeca2ea6ada5b0411b2ae10d7a859e837f203 - (cherry picked from commit 70d12dd069bb9bb0d6bb1c8fafc4e6f77780263d) src/liblzma/common/string_conversion.c | 6 ++++++ 1 file changed, 6 insertions(+) -commit 57ad820e15381344a812c78ce9b67a77a60b9cf3 +commit ed8e552395701fbf046027cebc8be4a6755b263f Author: Lasse Collin Date: 2024-04-22 20:31:25 +0300 liblzma: Clean up white space - - (cherry picked from commit ed8e552395701fbf046027cebc8be4a6755b263f) src/liblzma/lz/lz_encoder.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit ba0b5bfe7cb3cdbd9a4e3c268e10c304cb834e8a +commit 2f06920f20b1ad63b7953dc09569e1d424998849 Author: Lasse Collin Date: 2024-04-22 18:35:19 +0300 Tests: test_filter_flags: Edit comments and style - - (cherry picked from commit 2f06920f20b1ad63b7953dc09569e1d424998849) tests/test_filter_flags.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) -commit d2ed6759596185ac6a9c69ea713c27cd4bd1d9ba +commit b101e1d1dbc81577c0c9aa0cb89cf2e46a15eb82 Author: Lasse Collin Date: 2024-04-22 16:39:44 +0300 Tests: Fix C99/C11 compatibility when features are disabled The array could become empty and then the initializer would be simply {} which is allowed only in GNU-C and C23. - - (cherry picked from commit b101e1d1dbc81577c0c9aa0cb89cf2e46a15eb82) tests/test_filter_flags.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) -commit 9a70e93fef3fd5943484e56f1881a7c6e3296027 +commit f8f3a220ac8afcb8cb2812917d3b77e00c2eab0d Author: Lasse Collin Date: 2024-04-21 20:32:16 +0300 DOS: Omit useless defines from config.h - - (cherry picked from commit f8f3a220ac8afcb8cb2812917d3b77e00c2eab0d) dos/config.h | 12 ------------ 1 file changed, 12 deletions(-) -commit dc4740f720e08bdd496aa2736db3b7aea6dd3d1e +commit fc1921b04b8840caaa777c2bd5340d41b259da20 Author: Lasse Collin Date: 2024-04-21 20:27:50 +0300 Build: Omit useless checks for fcntl.h, limits.h, and sys/time.h - - (cherry picked from commit fc1921b04b8840caaa777c2bd5340d41b259da20) configure.ac | 6 ------ 1 file changed, 6 deletions(-) -commit 6e210d5766b25d36729152a13c5889bb0605a1e3 +commit 6aa2a6deeba04808a0fe4461396e7fb70277f3d4 Author: Lasse Collin Date: 2024-04-19 22:04:21 +0300 liblzma: Silence a warning from Coverity static analysis It is logical why it cannot know for sure that the value has to be at most 4 if it is less than 16. The x86 filter is based on a very old LZMA SDK version. Newer ones have quite a different implementation for the same filter. Thanks to Sam James. - - (cherry picked from commit 6aa2a6deeba04808a0fe4461396e7fb70277f3d4) src/liblzma/simple/x86.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) -commit 4019b012f29008ea6545aba6fe6c141a2d920ae2 +commit e89d3e83b4496d0b5410870634970c0aa9721d59 Author: Lasse Collin Date: 2024-04-19 23:18:19 +0300 Update .gitignore - - (cherry picked from commit e89d3e83b4496d0b5410870634970c0aa9721d59) .gitignore | 21 ++++++++------------- 1 file changed, 8 insertions(+), 13 deletions(-) -commit 09a0311a1e8cdefbcfab9e490cdd41c97a459d24 +commit 86fc4ee859709da0ff9617a1490f13ddac0a109b Author: Lasse Collin Date: 2024-04-19 20:53:24 +0300 Tests: test_lzip_decoder: Tweak coding style and comments - - (cherry picked from commit 86fc4ee859709da0ff9617a1490f13ddac0a109b) tests/test_lzip_decoder.c | 58 +++++++++++++++++++++++------------------------ 1 file changed, 28 insertions(+), 30 deletions(-) -commit 3117336a0291309ddd2a54d2966a589f9f806850 +commit 38be573a279bd7b608ee7d8509ec10884e6fb0d5 Author: Lasse Collin Date: 2024-04-19 20:51:36 +0300 Tests: test_lzip_decoder: Remove redundant initializations - - (cherry picked from commit 38be573a279bd7b608ee7d8509ec10884e6fb0d5) tests/test_lzip_decoder.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) -commit f78081eb12c804ec4f5a3dc569b859646b16e9e5 +commit d7e4bc53eacfab9f3de95d8252bdfdc9419079c9 Author: Lasse Collin Date: 2024-04-19 20:47:24 +0300 Tests: test_lzip_decoder: Remove unneeded tuktest_malloc() calls - - (cherry picked from commit d7e4bc53eacfab9f3de95d8252bdfdc9419079c9) tests/test_lzip_decoder.c | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-) -commit 7413383e4280065b79ca70abe4d8ebc78055b35a +commit eeca8f7c5baf1ad69606bb734d5001763466d58f Author: Lasse Collin Date: 2024-04-15 20:35:07 +0300 xz: Fix white space error. Thanks to xx on #tukaani. - - (cherry picked from commit eeca8f7c5baf1ad69606bb734d5001763466d58f) src/xz/args.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit eed2f26c0edb6e31a50d48bab4ff619778690a1e +commit 462ca9409940a19f743daee6b3bcc611277d0007 Author: Sam James Date: 2024-04-11 23:01:44 +0100 xz: add missing noreturn for message_filters_help Fixes: a165d7df1964121eb9df715e6f836a31c865beef - (cherry picked from commit 462ca9409940a19f743daee6b3bcc611277d0007) src/xz/message.h | 1 + 1 file changed, 1 insertion(+) -commit 2633d8df616405bd54fd748d7bf887ebc4505b88 +commit 863f13d2828b99b0539ce73f9cf85bde32358034 Author: Sam James Date: 2024-04-11 19:34:04 +0100 xz: signals: suppress -Wsign-conversion on macOS On macOS, we get: ``` signals.c: In function 'signals_init': signals.c:76:17: error: conversion to 'sigset_t' {aka 'unsigned int'} from 'int' may change the sign of the result [-Werror=sign-conversion] 76 | sigaddset(&hooked_signals, sigs[i]); | ^~~~~~~~~ signals.c:81:17: error: conversion to 'sigset_t' {aka 'unsigned int'} from 'int' may change the sign of the result [-Werror=sign-conversion] 81 | sigaddset(&hooked_signals, message_progress_sigs[i]); | ^~~~~~~~~ signals.c:86:9: error: conversion to 'sigset_t' {aka 'unsigned int'} from 'int' may change the sign of the result [-Werror=sign-conversion] 86 | sigaddset(&hooked_signals, SIGTSTP); | ^~~~~~~~~ ``` We use `int` for `hooked_signals` but we can't just cast to whatever `sigset_t` is because `sigset_t` is an opaque type. It's an unsigned int on macOS. On macOS, `sigaddset` is implemented as a macro. Just suppress -Wsign-conversion for `signals_init` for macOS given there's no real nice way of fixing this. - - (cherry picked from commit 863f13d2828b99b0539ce73f9cf85bde32358034) src/xz/signals.c | 7 +++++++ 1 file changed, 7 insertions(+) -commit 50fb269c7a9cf62a9f3fe08859e2aa4348b600a7 +commit fcbd0d199933a69713cb293cbd7409a757d854cd Author: Lasse Collin Date: 2024-04-13 22:19:40 +0300 Tests: test_microlzma: Add a "FIXME?" about LZMA_FINISH handling - - (cherry picked from commit fcbd0d199933a69713cb293cbd7409a757d854cd) tests/test_microlzma.c | 8 ++++++++ 1 file changed, 8 insertions(+) -commit 3e2ff2d38c54c8fc7ce15aaf91185dc105d9c92c +commit 0fe2dfa68355d2b165544b2bc8babf77dcc2039e Author: Lasse Collin Date: 2024-04-13 18:05:31 +0300 Tests: test_microlzma: Tweak comments, coding style, and minor details A few lines were reordered, a few ARRAY_SIZE were changed to sizeof, and a few uint32_t were changed to size_t. No real functional changes were intended. - - (cherry picked from commit 0fe2dfa68355d2b165544b2bc8babf77dcc2039e) tests/test_microlzma.c | 149 +++++++++++++++++++++++++++---------------------- 1 file changed, 83 insertions(+), 66 deletions(-) -commit ebc8b8de19d641c37ab7959a224bcd0ff4c0833f +commit 97f0ee0f1f903f4e7c4ea23e9b89d687025d2992 Author: Ryan Carsten Schmidt Date: 2024-04-12 19:31:13 -0500 CI: Use only the active CPUs on macOS hw.ncpu counts all CPUs including inactive ones. hw.activecpu counts only the active CPUs. - - (cherry picked from commit 97f0ee0f1f903f4e7c4ea23e9b89d687025d2992) build-aux/ci_build.bash | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit 1e63f7d53648beb6dd5acb5771850d7c4bc30477 +commit 73f629e321b74f68c9954728fa4f19261afccf46 Author: Sam James Date: 2024-04-10 18:33:55 +0100 ci: rename ci_build.sh -> ci_build.bash We discussed the name and it's less cognitive load to just call it '.bash' so you don't have an immediate question about if bashisms are OK. - - (cherry picked from commit 73f629e321b74f68c9954728fa4f19261afccf46) .github/workflows/ci.yml | 52 ++++++++++++++++---------------- .github/workflows/windows-ci.yml | 20 ++++++------ build-aux/{ci_build.sh => ci_build.bash} | 0 3 files changed, 36 insertions(+), 36 deletions(-) -commit aea54a4724414466a20afd7493156d40d0a2741c +commit 8709407a9ef8e7e8aec117879400e4dd3e227ada Author: Sam James Date: 2024-04-10 17:42:23 +0100 ci: build in parallel by default - - (cherry picked from commit 8709407a9ef8e7e8aec117879400e4dd3e227ada) build-aux/ci_build.sh | 2 ++ 1 file changed, 2 insertions(+) -commit 4381fcf00b2fabb6dcc9fd5cf35d520feb9e775a +commit 65bf7e0a1ca6386f17608e8afb84ac470c18d23f Author: Sam James Date: 2024-04-10 15:41:08 +0100 ci: default to -O2 We need this for when we're passing sanitizer flags or -gdwarf-4 for Clang with Valgrind. Just always start with -O2 if CFLAGS isn't set in the environment and append what was passed on the command line. - - (cherry picked from commit 65bf7e0a1ca6386f17608e8afb84ac470c18d23f) build-aux/ci_build.sh | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -commit 752ba5ed99ec754bafbdc4d87a2876cb2566ecc4 +commit bc899f9e0700ad153bd65f4804c4de7515c8a847 Author: Sam James Date: 2024-04-10 15:17:47 +0100 ci: make automake's test runner verbose on failures This is a lot easier to work with than the save-logs thing the action tries to do... - - (cherry picked from commit bc899f9e0700ad153bd65f4804c4de7515c8a847) build-aux/ci_build.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit cc21af171599ffe0419fc32a30edd3ef7d479865 +commit b5e3470442531717b2457b40ab412740296af1bc Author: Sam James Date: 2024-04-10 12:38:51 +0100 ci: make UBSAN abort on errors Unfortunately, UBSAN doesn't do this by default. See also the change I made in Meson for this in October [0]. [0] https://github.com/mesonbuild/meson/commit/7b7d2e060b447de9c2642848847370a58711ac1c - - (cherry picked from commit b5e3470442531717b2457b40ab412740296af1bc) .github/workflows/ci.yml | 1 + 1 file changed, 1 insertion(+) -commit 2d2d5f14b392cd1aeddab7ce34fd50ba5422e5b5 +commit 6c095a98fbec70b790253a663173ecdb669108c4 Author: Sam James Date: 2024-04-10 11:43:10 +0100 ci: test Valgrind Using `--trace-children=yes` has a trade-off here, as it makes `test_scripts.sh` pretty slow when calling various non-xz utilities. But I also feel like it's not useless to have Valgrind used there and it's not easy to exclude Valgrind just for that one test... I did consider using AX_VALGRIND_CHECK [0][1] but I couldn't get it working immediately with some conditionally-built tests and I wondered if it was worth spending time on at least while we're debating xz's future build system situation. [0] https://www.gnu.org/software/autoconf-archive/ax_valgrind_check.html [1] https://tecnocode.co.uk/2014/12/23/automatically-valgrinding-code-with-ax_valgrind_check/ - - (cherry picked from commit 6c095a98fbec70b790253a663173ecdb669108c4) .github/workflows/ci.yml | 11 ++++++++++- build-aux/ci_build.sh | 8 +++++--- 2 files changed, 15 insertions(+), 4 deletions(-) -commit 5d20a612051fac3ca6d99abe3cd7e0e3370e5b67 +commit 6286c1900c2d2ca33d9b1b397122c7bcdb9a4d59 Author: Lasse Collin Date: 2024-04-10 23:20:02 +0300 liblzma: CRC: Simplify table omission macros A macro is useful to prevent a single #if directive from getting too ugly but only one macro is needed for all archs. - - (cherry picked from commit 6286c1900c2d2ca33d9b1b397122c7bcdb9a4d59) src/liblzma/check/crc32_table.c | 10 ++++------ src/liblzma/check/crc64_table.c | 4 ++-- src/liblzma/check/crc_common.h | 5 +++-- 3 files changed, 9 insertions(+), 10 deletions(-) -commit 2a80827e23169c624560ac89714bf5084cbead43 +commit 45da936c879acf4f053a3055665bf1b10ded4462 Author: Lasse Collin Date: 2024-04-10 23:09:40 +0300 liblzma: ARM64 CRC: Fix omission of CRC32 table The macro name had an odd typo so the table wasn't omitted when it should have. Fixes: 1940f0ec28f08c0ac72c1413d9706fb82eabe6ad - (cherry picked from commit 45da936c879acf4f053a3055665bf1b10ded4462) src/liblzma/check/crc32_table.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit a54117377151356c1e2494ba1febc245cb71b51c +commit 308a9af85400b0e2019f0f012c8354e831d06d65 Author: Lasse Collin Date: 2024-04-10 22:21:51 +0300 Build: If ARM64 feature detection func is found, stop looking for others This can speed up configure a tiny bit. Fixes: c5f6d79cc9515a7f22d7ea4860c6cc394b295732 - (cherry picked from commit 308a9af85400b0e2019f0f012c8354e831d06d65) configure.ac | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit 9223ad6e78a666cc9f9aba135d1755fec184a24a +commit fc43cecd32bf9d5f8caa599206b15c9569af1eb6 Author: Lasse Collin Date: 2024-04-10 22:04:27 +0300 liblzma: ARM64 CRC32: Change style of the macOS code to match FreeBSD I didn't test this but it shouldn't change any functionality. Fixes: 761f5b69a4c778c8bcb09279b845b07c28790575 - (cherry picked from commit fc43cecd32bf9d5f8caa599206b15c9569af1eb6) src/liblzma/check/crc32_arm64.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) -commit 32ceb2c36a0e450037bbe906c2a1ea42607b9d21 +commit 1024cd4cd966b998fedec51e385e9ee9a49b3c57 Author: Lasse Collin Date: 2024-04-10 21:59:27 +0300 liblzma: ARM64 CRC32: Add error checking to FreeBSD-specific code Also add parenthesis to the return statement. I didn't test this. Fixes: 761f5b69a4c778c8bcb09279b845b07c28790575 - (cherry picked from commit 1024cd4cd966b998fedec51e385e9ee9a49b3c57) src/liblzma/check/crc32_arm64.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) -commit 42915101e914dba353c236925bc1d5e4826d3f7a +commit 2337f7021c860b026e3e849e60a9ae8d09ec0ea0 Author: Lasse Collin Date: 2024-04-10 21:56:33 +0300 liblzma: ARM64 CRC32: Use negation instead of subtracting from 8 Subtracting from 0 is negation, this just keeps warnings away. Fixes: 761f5b69a4c778c8bcb09279b845b07c28790575 - (cherry picked from commit 2337f7021c860b026e3e849e60a9ae8d09ec0ea0) src/liblzma/check/crc32_arm64.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit 42a9482b48f0171852fbaddbdc729a56f2daa547 +commit d8fffd01aa1a3c18e437a222abd34699e23ff5e7 Author: Lasse Collin Date: 2024-04-10 21:55:10 +0300 liblzma: ARM64 CRC32: Tweak coding style and comments - - (cherry picked from commit d8fffd01aa1a3c18e437a222abd34699e23ff5e7) src/liblzma/check/crc32_arm64.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) -commit 38a3ec5a7e2ddeee3686be64b037aa1377f31fd1 +commit 780d2c236de0e4749655696c2e0c26fb7565afd3 +Author: Lasse Collin +Date: 2024-04-09 21:55:01 +0300 + + Update SECURITY.md. + + .github/SECURITY.md | 25 ++++++++----------------- + 1 file changed, 8 insertions(+), 17 deletions(-) + +commit 986865ea2f9d1f8dbef4a130926df106b0f6d41a Author: Lasse Collin Date: 2024-04-09 17:47:01 +0300 CI: Remove ifunc support. - - (cherry picked from commit 986865ea2f9d1f8dbef4a130926df106b0f6d41a) .github/workflows/ci.yml | 13 +++---------- build-aux/ci_build.sh | 5 +---- 2 files changed, 4 insertions(+), 14 deletions(-) -commit 34d1252f093944ff350a88a6196539f95902ad41 +commit 689ae2427342a2ea1206eb5ca08301baf410e7e0 Author: Lasse Collin Date: 2024-04-09 17:43:16 +0300 liblzma: Remove ifunc support. This is *NOT* done for security reasons even though the backdoor relied on the ifunc code. Instead, the reason is that in this project ifunc provides little benefits but it's quite a bit of extra code to support it. The only case where ifunc *might* matter for performance is if the CRC functions are used directly by an application. In normal compression use it's completely irrelevant. - - (cherry picked from commit 689ae2427342a2ea1206eb5ca08301baf410e7e0) CMakeLists.txt | 79 --------------------------------------- INSTALL | 8 ---- configure.ac | 79 --------------------------------------- src/liblzma/check/crc32_fast.c | 48 +++--------------------- src/liblzma/check/crc64_fast.c | 21 ----------- src/liblzma/check/crc_common.h | 9 +---- src/liblzma/check/crc_x86_clmul.h | 11 +----- 7 files changed, 8 insertions(+), 247 deletions(-) -commit a594b39685051cd1ec866360bc4dd6c22f301bb4 +commit 6b4c859059a7eb9b0547590c081668e14ecf8af6 Author: Lasse Collin Date: 2024-04-08 22:04:41 +0300 tests/files/README: Update the main heading. - - (cherry picked from commit 6b4c859059a7eb9b0547590c081668e14ecf8af6) tests/files/README | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -commit fa76e3ef597ee2e9d150461a42d270a386204042 +commit 2a851e06b891ce894f918faff32a6cca6fdecee6 Author: Lasse Collin Date: 2024-04-08 22:02:45 +0300 tests/files/README: Explain how to recreate the ARM64 test files. - - (cherry picked from commit 2a851e06b891ce894f918faff32a6cca6fdecee6) tests/files/README | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) -commit 112fa0aba6be30968811c9131f1b995cf9e92e75 +commit 3d09b721b94e18fe1f853a04799697f5de10b291 Author: Lasse Collin Date: 2024-04-08 21:51:55 +0300 debug: Add generator for the ARM64 test file data. - - (cherry picked from commit 3d09b721b94e18fe1f853a04799697f5de10b291) debug/Makefile.am | 3 +- debug/testfilegen-arm64.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 118 insertions(+), 1 deletion(-) -commit 1a1f3d0323d5991a3238566e8f517d5116358b5c +commit 31ef676567c9d6fcc4ec9fc833c312f7a7c21c48 Author: Lasse Collin Date: 2024-04-08 21:19:38 +0300 xz man page: Use .ft CR instead of CW to silence warnings from groff. - - (cherry picked from commit 31ef676567c9d6fcc4ec9fc833c312f7a7c21c48) src/xz/xz.1 | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) -commit 9f9203f574f895c40a86a83c45c6bb79c25bb5d2 +commit 780cbf29d5a88db2b546e9b7b019c4c33ca72685 Author: Lasse Collin Date: 2024-04-08 19:28:35 +0300 Fix NEWS for 5.6.0 and 5.6.1. - - (cherry picked from commit 780cbf29d5a88db2b546e9b7b019c4c33ca72685) NEWS | 6 ++++++ 1 file changed, 6 insertions(+) -commit 12876b33c79e36d7e51e8ba6ab7162bd2129cb5b +commit bfd0c7c478e93a1911b845459549ff94587b6ea2 Author: Lasse Collin Date: 2024-04-08 19:22:26 +0300 Remove the XZ logo. - - (cherry picked from commit bfd0c7c478e93a1911b845459549ff94587b6ea2) COPYING | 5 - COPYING.CC-BY-SA-4.0 | 427 --------------------------------------------------- Makefile.am | 2 - README | 2 - doc/xz-logo.png | Bin 6771 -> 0 bytes doxygen/Doxyfile | 6 +- doxygen/footer.html | 13 -- 7 files changed, 3 insertions(+), 452 deletions(-) -commit 879295d91f06c241fd8a8fc1ca95776dbeb45f93 +commit 77a294d98a9d2d48f7e4ac273711518bf689f5c4 Author: Lasse Collin Date: 2024-04-08 18:27:39 +0300 Update maintainer and author info. The other maintainer suddenly disappeared. - - (cherry picked from commit 77a294d98a9d2d48f7e4ac273711518bf689f5c4) AUTHORS | 9 +++++++-- README | 10 +++------- THANKS | 1 - src/liblzma/api/lzma.h | 2 +- 4 files changed, 11 insertions(+), 11 deletions(-) -commit 859617d30d81317236e004b323fed0883f932dcf +commit 8dd03d4484ccf80022722a16d0ed9b37f2b58072 Author: Lasse Collin Date: 2024-04-08 18:05:32 +0300 Docs: Update .xz file format specification to 1.2.1. This only reverts the XZ URL changes. - - (cherry picked from commit 8dd03d4484ccf80022722a16d0ed9b37f2b58072) doc/xz-file-format.txt | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) -commit eeb74fba1f6ea334a519015938b4a26c6ba5d4eb +commit 17aa2e1a796d3f758802df29afc89dcf335db567 Author: Lasse Collin Date: 2024-04-08 17:33:56 +0300 Update website URLs back to tukaani.org. The XZ projects were moved back to their original URLs. - - (cherry picked from commit 17aa2e1a796d3f758802df29afc89dcf335db567) + .github/SECURITY.md | 2 +- CMakeLists.txt | 2 +- COPYING | 3 +-- README | 4 ++-- configure.ac | 2 +- doc/faq.txt | 2 +- doc/lzma-file-format.txt | 12 ++++++------ dos/config.h | 2 +- src/liblzma/api/lzma.h | 2 +- src/xz/xz.1 | 6 +++--- src/xzdec/xzdec.1 | 4 ++-- windows/README-Windows.txt | 2 +- - 11 files changed, 20 insertions(+), 21 deletions(-) + 12 files changed, 21 insertions(+), 22 deletions(-) -commit a7b9cd70004bfc1abadc7e865dfce765f7b8b59d +commit 2739db981023373a2ddabc7b456c7e658bb4f582 Author: Lasse Collin Date: 2024-04-08 17:07:08 +0300 xzdec: Tweak coding style and comments. - - (cherry picked from commit 2739db981023373a2ddabc7b456c7e658bb4f582) src/xzdec/xzdec.c | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-) -commit ebe9d6d8cb27168706078009b3f64da8fde63833 +commit 408b6adb2a07d07c6535f859571cca38837caaf3 Author: Lasse Collin Date: 2024-04-08 15:53:46 +0300 tests/ossfuzz: Tiny fix to a comment. - - (cherry picked from commit 408b6adb2a07d07c6535f859571cca38837caaf3) tests/ossfuzz/fuzz_decode_stream.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit 78ab47d65d916207233abbcdb0ccfd6efb946c05 +commit db4dd74a344580e0b81436598d9741a3454245b0 +Author: Lasse Collin +Date: 2024-04-09 18:22:16 +0300 + + Update THANKS. + + THANKS | 1 + + 1 file changed, 1 insertion(+) + +commit e93e13c8b3bec925c56e0c0b675d8000a0f7f754 +Author: Lasse Collin +Date: 2024-04-08 15:32:58 +0300 + + Remove the backdoor found in 5.6.0 and 5.6.1 (CVE-2024-3094). + + While the backdoor was inactive (and thus harmless) without inserting + a small trigger code into the build system when the source package was + created, it's good to remove this anyway: + + - The executable payloads were embedded as binary blobs in + the test files. This was a blatant violation of the + Debian Free Software Guidelines. + + - On machines that see lots bots poking at the SSH port, the backdoor + noticeably increased CPU load, resulting in degraded user experience + and thus overwhelmingly negative user feedback. + + - The maintainer who added the backdoor has disappeared. + + - Backdoors are bad for security. + + This reverts the following without making any other changes: + + 6e636819 Tests: Update two test files. + a3a29bbd Tests: Test --single-stream can decompress bad-3-corrupt_lzma2.xz. + 0b4ccc91 Tests: Update RISC-V test files. + 8c9b8b20 liblzma: Fix typos in crc32_fast.c and crc64_fast.c. + 82ecc538 liblzma: Fix false Valgrind error report with GCC. + cf44e4b7 Tests: Add a few test files. + 3060e107 Tests: Use smaller dictionary size in RISC-V test files. + e2870db5 Tests: Add two RISC-V Filter test files. + + The RISC-V test files also have real content that tests the filter + but the real content would fit into much smaller files. A generator + program would need to be available as well. + + Thanks to Andres Freund for finding and reporting it and making + it public quickly so others could act without a delay. + See: https://www.openwall.com/lists/oss-security/2024/03/29/4 + + src/liblzma/check/crc32_fast.c | 7 +++++-- + src/liblzma/check/crc64_fast.c | 4 +++- + src/liblzma/check/crc_common.h | 25 ------------------------- + tests/files/README | 27 --------------------------- + tests/files/bad-3-corrupt_lzma2.xz | Bin 512 -> 0 bytes + tests/files/bad-dict_size.lzma | Bin 41 -> 0 bytes + tests/files/good-1-riscv-lzma2-1.xz | Bin 7424 -> 0 bytes + tests/files/good-1-riscv-lzma2-2.xz | Bin 7432 -> 0 bytes + tests/files/good-2cat.xz | Bin 136 -> 0 bytes + tests/files/good-large_compressed.lzma | Bin 35421 -> 0 bytes + tests/files/good-small_compressed.lzma | Bin 258 -> 0 bytes + tests/test_files.sh | 11 ----------- + 12 files changed, 8 insertions(+), 66 deletions(-) + +commit f9cf4c05edd14dedfe63833f8ccbe41b55823b00 Author: Lasse Collin Date: 2024-03-30 14:36:28 +0200 CMake: Fix sabotaged Landlock sandbox check. It never enabled it. - - (cherry picked from commit f9cf4c05edd14dedfe63833f8ccbe41b55823b00) CMakeLists.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit 5f178c364c3b5c6fe87099b7624d5c76995ff8e6 -Author: Lasse Collin -Date: 2024-05-22 14:08:33 +0300 +commit af071ef7702debef4f1d324616a0137a5001c14c +Author: Jia Tan +Date: 2024-03-26 01:50:02 +0800 - Delete SECURITY.md from v5.6 - - It's too easily out of date in the stable branches. - It's not included in the release packages anyway. + Docs: Simplify SECURITY.md. - .github/SECURITY.md | 29 ----------------------------- - 1 file changed, 29 deletions(-) + .github/SECURITY.md | 8 +------- + 1 file changed, 1 insertion(+), 7 deletions(-) -commit b3a756188004a16de5956c368e3b0efd1a9bccb0 +commit 0b99783d63f27606936bb79a16c52d0d70c0b56f Author: Lasse Collin Date: 2024-03-22 17:46:30 +0200 liblzma: memcmplen.h: Add a comment why subtraction is used. - - (cherry picked from commit 0b99783d63f27606936bb79a16c52d0d70c0b56f) src/liblzma/common/memcmplen.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) -commit 94939a145f362ff8b09fb37fc72901743f7f5cb2 +commit 8a25ba024d55610c448c6e4f1400a00bae51b493 Author: Lasse Collin Date: 2024-03-15 17:43:39 +0200 INSTALL: Document arguments of --enable-symbol-versions. - - (cherry picked from commit 8a25ba024d55610c448c6e4f1400a00bae51b493) INSTALL | 43 +++++++++++++++++++++++++++++++++++++++---- 1 file changed, 39 insertions(+), 4 deletions(-) -commit fa14c8aaf0d0266b7e0c3b7c766159299c1a0f18 +commit 49324b711f9d42b3543bf2f3ae598eaa03360bd5 Author: Lasse Collin Date: 2024-03-15 17:15:50 +0200 Build: Use only the generic symbol versioning with NVIDIA HPC Compiler. This does the previous commit with CMake. AC_EGREP_CPP uses AC_REQUIRE so the outermost if-commands must be changed to AS_IF to ensure that things wont break some day. See 5a5bd7f871818029d5ccbe189f087f591258c294. - - (cherry picked from commit 49324b711f9d42b3543bf2f3ae598eaa03360bd5) configure.ac | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) -commit 73baa8d99b51c7623ed95afe6411302d9ff56864 +commit c273123ed0ebaebf49994057a7fe98aae7f42c40 Author: Lasse Collin Date: 2024-03-15 16:36:35 +0200 CMake: Use only the generic symbol versioning with NVIDIA HPC Compiler. It doesn't support the __symver__ attribute or __asm__(".symver ..."). The generic symbol versioning can still be used since it only needs linker support. - - (cherry picked from commit c273123ed0ebaebf49994057a7fe98aae7f42c40) CMakeLists.txt | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) -commit 886633f42376f4648d931917733c8a59fb2e1f6c +commit df7f487648d18a3992386a59b8a061edca862d17 Author: Lasse Collin Date: 2024-03-13 21:38:24 +0200 Update THANKS. - - (cherry picked from commit df7f487648d18a3992386a59b8a061edca862d17) THANKS | 1 + 1 file changed, 1 insertion(+) -commit 760f622f0d73632df2347aaca7ac7ff5761e98b6 +commit 3217b82b3ec023bf8338249134a076bea0ea30ec Author: Lasse Collin Date: 2024-03-13 21:30:18 +0200 liblzma: Minor comment edits. - - (cherry picked from commit 3217b82b3ec023bf8338249134a076bea0ea30ec) src/liblzma/common/string_conversion.c | 4 ++-- src/liblzma/delta/delta_decoder.c | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) -commit 403b4c78b81f67bc3787542f55f555407253316c +commit 096bc0e3f8fb4bfc4d2f3f64a7f219401ffb4c31 Author: Sergey Kosukhin Date: 2024-03-13 13:07:13 +0100 liblzma: Fix building with NVHPC (NVIDIA HPC SDK). NVHPC compiler has several issues that make it impossible to build liblzma: - the compiler cannot handle unions that contain pointers that are not the first members; - the compiler cannot handle the assembler code in range_decoder.h (LZMA_RANGE_DECODER_CONFIG has to be set to zero); - the compiler fails to produce valid code for delta_decode if the vectorization is enabled, which results in failed tests. This introduces NVHPC-specific workarounds that address the issues. - - (cherry picked from commit 096bc0e3f8fb4bfc4d2f3f64a7f219401ffb4c31) src/liblzma/common/string_conversion.c | 6 ++++-- src/liblzma/delta/delta_decoder.c | 3 +++ src/liblzma/rangecoder/range_decoder.h | 1 + 3 files changed, 8 insertions(+), 2 deletions(-) -commit 1888fb49f629340758e98e69d5aa328f6f73c5e1 +commit 2ad7fad67080e88fa7fc191f9d613d8b7add9c62 Author: Lasse Collin Date: 2024-03-13 21:17:10 +0200 CMake: Disable symbol versioning on non-glibc Linux. This better matches what configure.ac does. For example, musl has only basic symbol versioning support: https://wiki.musl-libc.org/functional-differences-from-glibc.html#Symbol_versioning configure.ac tries to enable symbol versioning only with glibc so now CMake does the same. - - (cherry picked from commit 2ad7fad67080e88fa7fc191f9d613d8b7add9c62) CMakeLists.txt | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) -commit 4b3c84e8eebbcf712fc2396dbb8117cce2d72464 +commit 82f0c0d39eb2c026b1d96ee706f70ace868d4ed4 Author: Lasse Collin Date: 2024-03-13 20:32:46 +0200 CMake: Make symbol versioning configurable. - - (cherry picked from commit 82f0c0d39eb2c026b1d96ee706f70ace868d4ed4) CMakeLists.txt | 62 +++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 42 insertions(+), 20 deletions(-) -commit 69d1e20208eb9bd1f4f1c8ee4e49cc82d681a877 +commit 45d33bfc45e4295b8ad743bc2ae61cc724f98076 Author: Lasse Collin Date: 2024-03-13 19:47:36 +0200 Build: Style tweaks to configure.ac. The AC_MSG_ERROR line is overlong anyway as are a few other AC_MSG_ERROR lines already. - - (cherry picked from commit 45d33bfc45e4295b8ad743bc2ae61cc724f98076) configure.ac | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) -commit 051d6b5c85a874c78249693865fd751088f403a2 +commit f56ed6fac6619b56b005878d3b5210e2f0d721c0 Author: Sergey Kosukhin Date: 2024-03-12 20:03:49 +0100 Build: Let the users override the symbol versioning variant. There are cases when the users want to decide themselves whether they want to have the generic (even on GNU/Linux) or the linux (even if we do not recommend that) symbol versioning variant. The former might be needed to circumvent compiler issues (i.e. the compiler does not support all features that are required for the linux versioning), the latter might help in overriding the assumptions made in the configure script. - - (cherry picked from commit f56ed6fac6619b56b005878d3b5210e2f0d721c0) configure.ac | 91 +++++++++++++++++++++++++++++++++--------------------------- 1 file changed, 50 insertions(+), 41 deletions(-) -commit 95dcea4b5df0b180af461e4584d2bcf7725e3aef -Author: Lasse Collin -Date: 2024-04-09 18:22:16 +0300 - - Update THANKS. - - THANKS | 1 + - 1 file changed, 1 insertion(+) - -commit 1107712e372f7593ad729764c0c2644d0e4aa675 -Author: Lasse Collin -Date: 2024-04-08 15:32:58 +0300 - - Remove the backdoor found in 5.6.0 and 5.6.1 (CVE-2024-3094). - - While the backdoor was inactive (and thus harmless) without inserting - a small trigger code into the build system when the source package was - created, it's good to remove this anyway: - - - The executable payloads were embedded as binary blobs in - the test files. This was a blatant violation of the - Debian Free Software Guidelines. - - - On machines that see lots bots poking at the SSH port, the backdoor - noticeably increased CPU load, resulting in degraded user experience - and thus overwhelmingly negative user feedback. - - - The maintainer who added the backdoor has disappeared. - - - Backdoors are bad for security. - - This reverts the following without making any other changes: - - 6e636819 Tests: Update two test files. - a3a29bbd Tests: Test --single-stream can decompress bad-3-corrupt_lzma2.xz. - 0b4ccc91 Tests: Update RISC-V test files. - 8c9b8b20 liblzma: Fix typos in crc32_fast.c and crc64_fast.c. - 82ecc538 liblzma: Fix false Valgrind error report with GCC. - cf44e4b7 Tests: Add a few test files. - 3060e107 Tests: Use smaller dictionary size in RISC-V test files. - e2870db5 Tests: Add two RISC-V Filter test files. - - The RISC-V test files also have real content that tests the filter - but the real content would fit into much smaller files. A generator - program would need to be available as well. - - Thanks to Andres Freund for finding and reporting it and making - it public quickly so others could act without a delay. - See: https://www.openwall.com/lists/oss-security/2024/03/29/4 - - src/liblzma/check/crc32_fast.c | 7 +++++-- - src/liblzma/check/crc64_fast.c | 4 +++- - src/liblzma/check/crc_common.h | 25 ------------------------- - tests/files/README | 27 --------------------------- - tests/files/bad-3-corrupt_lzma2.xz | Bin 512 -> 0 bytes - tests/files/bad-dict_size.lzma | Bin 41 -> 0 bytes - tests/files/good-1-riscv-lzma2-1.xz | Bin 7424 -> 0 bytes - tests/files/good-1-riscv-lzma2-2.xz | Bin 7432 -> 0 bytes - tests/files/good-2cat.xz | Bin 136 -> 0 bytes - tests/files/good-large_compressed.lzma | Bin 35421 -> 0 bytes - tests/files/good-small_compressed.lzma | Bin 258 -> 0 bytes - tests/test_files.sh | 11 ----------- - 12 files changed, 8 insertions(+), 66 deletions(-) - -commit fd1b975b7851e081ed6e5cf63df946cd5cbdbb94 -Author: Jia Tan -Date: 2024-03-09 11:42:50 +0800 - - Bump version and soname for 5.6.1. - - src/liblzma/Makefile.am | 2 +- - src/liblzma/api/lzma/version.h | 2 +- - 2 files changed, 2 insertions(+), 2 deletions(-) - -commit a2cda572498e96163fe4e2bde096d5dd7b814668 +commit a4f2e20d8466369b1bb277c66f75c9e4ba9cc378 Author: Jia Tan Date: 2024-03-09 11:27:27 +0800 Add NEWS for 5.6.1 NEWS | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) -commit 8583c6021124e388bce044a09f00ebabfd6165a7 +commit f01be8ad754a905d8c418601767480ec11621b02 Author: Jia Tan Date: 2024-03-09 10:43:20 +0800 Translations: Add missing --riscv option to man page translations. po4a/de.po | 702 +++++++++++++++++++++++++++++----------------------------- po4a/fr.po | 549 ++++++++++++++++++++++----------------------- po4a/ko.po | 702 +++++++++++++++++++++++++++++----------------------------- po4a/pt_BR.po | 641 +++++++++++++++++++++++++++-------------------------- po4a/ro.po | 702 +++++++++++++++++++++++++++++----------------------------- po4a/uk.po | 702 +++++++++++++++++++++++++++++----------------------------- 6 files changed, 2024 insertions(+), 1974 deletions(-) -commit 74b138d2a6529f2c07729d7c77b1725a8e8b16f1 +commit 6e636819e8f070330d835fce46289a3ff72a7b89 Author: Jia Tan Date: 2024-03-09 10:18:29 +0800 Tests: Update two test files. The original files were generated with random local to my machine. To better reproduce these files in the future, a constant seed was used to recreate these files. tests/files/bad-3-corrupt_lzma2.xz | Bin 484 -> 512 bytes tests/files/good-large_compressed.lzma | Bin 35430 -> 35421 bytes 2 files changed, 0 insertions(+), 0 deletions(-) -commit 3ec6dfd656bdd40ede2a5f11e6be338988e38be4 +commit a3a29bbd5d86183fc7eae8f0182dace374e778d8 Author: Jia Tan Date: 2024-03-09 10:08:32 +0800 Tests: Test --single-stream can decompress bad-3-corrupt_lzma2.xz. The first stream in this file is valid, so this tests that xz properly stops after decompressing it. tests/test_files.sh | 11 +++++++++++ 1 file changed, 11 insertions(+) -commit a67dcce6109c2f932a0a86abb0d7a95d3c31fb3e +commit 0b4ccc91454dbcf0bf521b9bd51aa270581ee23c Author: Jia Tan Date: 2024-03-09 10:05:32 +0800 Tests: Update RISC-V test files. This increases code coverage and tests for possible shifting bugs. tests/files/good-1-riscv-lzma2-1.xz | Bin 7512 -> 7424 bytes tests/files/good-1-riscv-lzma2-2.xz | Bin 7512 -> 7432 bytes 2 files changed, 0 insertions(+), 0 deletions(-) -commit 058337b0f1da9f166049ecc972fa5c499c1af08c +commit 8c9b8b2063daa78ead9f648c2ec3c91e8615dffb Author: Jia Tan Date: 2024-03-09 09:52:32 +0800 liblzma: Fix typos in crc32_fast.c and crc64_fast.c. src/liblzma/check/crc32_fast.c | 4 ++-- src/liblzma/check/crc64_fast.c | 3 +-- 2 files changed, 3 insertions(+), 4 deletions(-) -commit cd5de9c1bbab3dd41b34b37a89c193fb6ff51ca5 +commit b93a8d7631d9517da63f03e0185455024a4609e8 Author: Jia Tan Date: 2024-03-09 09:49:55 +0800 Tests: Replace HAVE_MICROLZMA usage in CMake and Autotools builds. This reverts commit adaacafde6661496ca2814b1e94a3ba5186428cb. CMakeLists.txt | 15 ++++++++++----- configure.ac | 9 ++------- tests/Makefile.am | 9 ++++++--- tests/test_microlzma.c | 12 ++++-------- 4 files changed, 22 insertions(+), 23 deletions(-) -commit 651a1545c8b6150051a0b44857136efd419afc6f +commit 82ecc538193b380a21622aea02b0ba078e7ade92 Author: Jia Tan Date: 2024-03-09 09:20:57 +0800 liblzma: Fix false Valgrind error report with GCC. With GCC and a certain combination of flags, Valgrind will falsely trigger an invalid write. This appears to be due to the omission of instructions to properly save, set up, and restore the frame pointer. The IFUNC resolver is a leaf function since it only calls a function that is inlined. So sometimes GCC omits the frame pointer instructions in the resolver unless this optimization is explictly disabled. This fixes https://bugzilla.redhat.com/show_bug.cgi?id=2267598. src/liblzma/check/crc32_fast.c | 9 +++------ src/liblzma/check/crc64_fast.c | 7 +++---- src/liblzma/check/crc_common.h | 25 +++++++++++++++++++++++++ 3 files changed, 31 insertions(+), 10 deletions(-) -commit 6e97b299f1b22e366ec42ba5dc5b9d0746e87b84 +commit 3007e74ef250f0ce95d97ffbdf2282284f93764d Author: Lasse Collin Date: 2024-03-05 23:21:26 +0200 liblzma: Fix a typo in a comment in the RISC-V filter. src/liblzma/simple/riscv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -commit 4e1c97052b5f14f4d6dda99d12cbbd01e66e3712 +commit 72d2933bfae514e0dbb123488e9f1eb7cf64175f Author: Jia Tan Date: 2024-03-05 00:34:46 +0800 liblzma: Use attribute no_profile_instrument_function with ifunc. Thanks to Sam James for determining this was the attribute needed to workaround the GCC bug and for his version of the patch in Gentoo. src/liblzma/check/crc32_fast.c | 5 +++++ src/liblzma/check/crc64_fast.c | 3 +++ 2 files changed, 8 insertions(+) -commit ed957d39426695e948b06de0ed952a2fbbe84bd1 +commit e5faaebbcf02ea880cfc56edc702d4f7298788ad Author: Jia Tan Date: 2024-03-05 00:27:31 +0800 Build: Require attribute no_profile_instrument_function for ifunc usage. Using __attribute__((__no_profile_instrument_function__)) on the ifunc resolver works around a bug in GCC -fprofile-generate: it adds profiling code even to ifunc resolvers which can make the ifunc resolver crash at program startup. This attribute was not introduced until GCC 7 and Clang 13, so ifunc won't be used with prior versions of these compilers. This bug was brought to our attention by: https://bugs.gentoo.org/925415 And was reported to upstream GCC by: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11411 CMakeLists.txt | 7 +++++++ configure.ac | 7 +++++++ 2 files changed, 14 insertions(+) -commit e98ddaf85a1a8fb3cc863637f83356cc9db31e13 +commit 7eeadd279a24c26ca7ff1292b7df802b89409eb7 Author: Lasse Collin Date: 2024-03-04 19:23:18 +0200 liblzma: Fix a comment in the RISC-V filter. src/liblzma/simple/riscv.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -commit 319cec142f67fe294e0486402f1569f223d9a83d +commit 5f3d0595296cc3035eae9e7bb6c3ffb1e1267333 Author: Lasse Collin Date: 2024-02-29 16:35:52 +0200 CMake: Warn if translated man pages are missing. CMakeLists.txt | 9 +++++++++ 1 file changed, 9 insertions(+) -commit 46c3e113d8eeb1a731a60829fa7f5d1b519f7f26 +commit 4cd1042ee752d61370c685d0d8b20c1e935672f7 Author: Lasse Collin Date: 2024-02-29 16:35:52 +0200 CMake: Warn if gettext tools and pre-created .gmo files are missing. It's only done with CMake >= 3.20 and if library support for translation was already found. Sort of fixes: https://github.com/tukaani-project/xz/issues/82 CMakeLists.txt | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) -commit 86bec8334bb1dcb6d9293a11cdccd895b17f364b +commit a94b42362c8e807f92236d6d63373f04991e3a50 Author: Lasse Collin Date: 2024-02-28 18:26:25 +0200 xz: Add comments. src/xz/coder.c | 10 ++++++++++ 1 file changed, 10 insertions(+) -commit 5c91b454c24e043ca8f2cc7d2b09bd091dafe655 +commit bbf112e32307a75a54a9e170bc392811443d5c87 Author: Jia Tan Date: 2024-02-27 23:42:41 +0800 xz: Change logging level for thread reduction to highest verbosity only. Now that multi threaded encoding is the default, users do not need to see a warning message everytime the number of threads is reduced. On some machines, this could happen very often. It is not unreasonable for users to need to set double verbose mode to see this kind of information. To see these warning messages -vv or --verbose --verbose must be passed to set xz into the highest possible verbosity mode. These warnings had caused automated testing frameworks to fail when they expected no output to stderr. Thanks to Sebastian Andrzej Siewior for reporting this and for the initial version of the patch. src/xz/coder.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -commit d0e57b2f159f8fd03a9a89f2f593a768d0487898 +commit 649f6447441510d593a88475ad6df4bcdf74ce48 Author: Lasse Collin Date: 2024-02-26 23:06:13 +0200 Fix sorting in THANKS. THANKS | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -commit d416be55ac02af1144fed455fb18b710147bb490 +commit 1255b7d849bf53f196a842ef2a508ed0ff577eaa Author: Jia Tan Date: 2024-02-26 23:39:29 +0800 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) -commit f06b33edd2aeabdb11836a2bf0b681768dad29d3 +commit eee579fff50099ba163c12305e81a4bd42b7dd53 Author: Chien Wong Date: 2024-02-25 21:38:13 +0800 xz: Add missing RISC-V on the filter list in the man page Signed-off-by: Chien Wong src/xz/xz.1 | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -commit a100f9111c8cc7f5b5f0e4a5e8af3de7161c7975 +commit 328c52da8a2bbb81307644efdb58db2c422d9ba7 Author: Jia Tan Date: 2024-02-26 23:02:06 +0800 Build: Fix Linux Landlock feature test in Autotools and CMake builds. The previous Linux Landlock feature test assumed that having the linux/landlock.h header file was enough. The new feature tests also requires that prctl() and the required Landlock system calls are supported. CMakeLists.txt | 25 ++++++++++++++++++++++--- configure.ac | 27 ++++++++++++++++++++++++++- src/xz/sandbox.c | 2 +- src/xz/sandbox.h | 2 +- src/xzdec/xzdec.c | 8 ++++---- 5 files changed, 54 insertions(+), 10 deletions(-) -commit d85efdc8911e6e8964ec920af44c8a6fe0a4c3c2 +commit eb8ad59e9bab32a8d655796afd39597ea6dcc64d Author: Jia Tan Date: 2024-02-26 20:06:10 +0800 Tests: Add test_microlzma to .gitignore and CMakeLists.txt. .gitignore | 1 + CMakeLists.txt | 1 + 2 files changed, 2 insertions(+) -commit 42ee4256739779005a7f921946c8a8e483d1f2ed +commit 9eed1b9a3ae140e93a82febc05a0181e9a4f5093 Author: Jia Tan Date: 2024-02-26 19:56:25 +0800 Tests: Correct license header in test_microlzma.c. tests/test_microlzma.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) -commit c83349dfd9cf9c495005b6d30e2fd34a9cafc18a +commit 8bf9f72ee1c05b9e205a72807e8a9e304785673d Author: Jia Tan Date: 2024-02-25 21:41:55 +0800 Fix typos in NEWS and CMakeLists. CMakeLists.txt | 2 +- NEWS | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) -commit 2d7d862e3ffa8cec4fd3fdffcd84e984a17aa429 +commit 5d8d915ebe2e345820a0f54d1baf8d7d4824c0c7 Author: Jia Tan -Date: 2024-02-24 15:55:08 +0800 +Date: 2024-02-24 16:30:06 +0800 - Bump version and soname for 5.6.0. + Bump version and soname for 5.7.0alpha. + + Like 5.5.0alpha, 5.7.0alpha won't be released, it's just to mark that + the branch is not stable. + + Once again there is no API/ABI stability for new features in devel + versions. The major soname won't be bumped even if API/ABI of new + features breaks between devel releases. src/liblzma/Makefile.am | 2 +- src/liblzma/api/lzma/version.h | 6 +++--- src/liblzma/liblzma_generic.map | 2 +- src/liblzma/liblzma_linux.map | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) commit a18fb1edef0d0aac12a09eed05e9c448c777af7b Author: Jia Tan Date: 2024-02-24 15:50:36 +0800 Add NEWS for 5.6.0. NEWS | 143 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 143 insertions(+) commit 24355c5280bc95e3d594432d60bb8432aa6af173 Author: Jia Tan Date: 2024-02-22 22:27:01 +0800 Translations: Remove obsolete and fuzzy matches from some translations. The French and Brazilian Portuguese man page translations have not been updated since the switch from public domain to 0BSD. The old GPLv2 strings have now been removed from these files. po4a/fr.po | 4702 +++++++++++++++++++++++++++++++++++++---------------- po4a/pt_BR.po | 4987 ++++++++++++++++++++++++++++++++++++++++----------------- 2 files changed, 6832 insertions(+), 2857 deletions(-) commit 02ca4a7d7b703e2ec63e00b70feec825e919dbc1 Author: Jia Tan Date: 2024-02-21 00:31:54 +0800 Translations: Patch man pages to avoid fuzzy matches. This will be fixed in the next round of translations, but this avoids having a fuzzy match or not fixing the English version. po4a/de.po | 2 +- po4a/ko.po | 2 +- po4a/ro.po | 2 +- po4a/uk.po | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) commit 898aad9fc711e03452d24d9e2c5b7f77a6f9ce64 Author: Jia Tan Date: 2024-02-21 00:30:43 +0800 xzmore: Fix typo in xzmore.1. Thanks to Yuri Chornoivan. src/scripts/xzmore.1 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 5631aa206c8d16b4eeab85a46b8b698f4fc4cdba Author: Jia Tan Date: 2024-02-24 12:12:16 +0800 Translations: Update the Vietnamese translation. po/vi.po | 505 ++++++++++++++++++++++++++++++++++++++------------------------- 1 file changed, 309 insertions(+), 196 deletions(-) commit a65fd7ce9d6228e87faf61dc56a35984d0088248 Author: Jia Tan Date: 2024-02-24 12:06:40 +0800 Translations: Update the Esperanto translation. po/eo.po | 502 ++++++++++++++++++++++++++++++++++++++------------------------- 1 file changed, 306 insertions(+), 196 deletions(-) commit cf44e4b7f5dfdbf8c78aef377c10f71e274f63c0 Author: Jia Tan Date: 2024-02-23 23:09:59 +0800 Tests: Add a few test files. tests/files/README | 19 +++++++++++++++++++ tests/files/bad-3-corrupt_lzma2.xz | Bin 0 -> 484 bytes tests/files/bad-dict_size.lzma | Bin 0 -> 41 bytes tests/files/good-2cat.xz | Bin 0 -> 136 bytes tests/files/good-large_compressed.lzma | Bin 0 -> 35430 bytes tests/files/good-small_compressed.lzma | Bin 0 -> 258 bytes 6 files changed, 19 insertions(+) commit 39f4a1a86ad80b2d064b812cee42668e6c8b8c73 Author: Jia Tan Date: 2024-02-23 20:58:36 +0800 Tests: Add MicroLZMA test. tests/Makefile.am | 4 +- tests/test_microlzma.c | 548 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 551 insertions(+), 1 deletion(-) commit adaacafde6661496ca2814b1e94a3ba5186428cb Author: Jia Tan Date: 2024-02-23 20:57:59 +0800 Build: Define HAVE_MICROLZMA when it is configured. CMakeLists.txt | 4 ++++ configure.ac | 9 +++++++-- 2 files changed, 11 insertions(+), 2 deletions(-) commit eea78216d27182ca917bf00e02feaab058a4d21e Author: Jia Tan Date: 2024-02-23 20:27:15 +0800 xz: Fix Capsicum sandbox compile error. user_abort_pipe[] was still being used instead of the parameters. src/xz/sandbox.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 32b0a3ce19224f9074d01a4ffbc1655b05fcb82d Author: Jia Tan Date: 2024-02-23 16:12:32 +0800 Build: Fix ARM64 CRC32 instruction feature test. Old versions of Clang reported the unsupported function attribute and __crc32d() function as warnings instead of errors, so the feature test passed when it shouldn't have, causing a compile error at build time. -Werror was added to this feature test to fix this. The change is not needed for CMake because check_c_source_compiles() also performs linking and the error is caught then. Thanks to Sebastian Andrzej Siewior for reporting this. configure.ac | 10 ++++++++++ 1 file changed, 10 insertions(+) commit 4c81c9611f8b2e1ad65eb7fa166afc570c58607e Author: Lasse Collin Date: 2024-02-22 19:16:35 +0200 CMake: Add LOCALEDIR to the windres workaround. LOCALEDIR may contain spaces like in "C:\Program Files". CMakeLists.txt | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) commit de4337fd89ca7db5feb97b5c40143404f6e22986 Author: Lasse Collin Date: 2024-02-22 15:18:25 +0200 xz: Landlock: Fix error message if input file is a directory. If xz is given a directory, it should look like this: $ xz /usr/bin xz: /usr/bin: Is a directory, skipping The Landlock rules didn't allow opening directories for reading: $ xz /usr/bin xz: /usr/bin: Permission denied The simplest fix was to allow opening directories for reading. While it's a bit silly to allow it solely for the error message, it shouldn't make the sandbox significantly weaker. The single-file use case (like when called from GNU tar) is still as strict as possible: all Landlock restrictions are enabled before (de)compression starts. src/xz/sandbox.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) commit 120da10ae139ea52ca4275452adf8eda02d07cc8 Author: Lasse Collin Date: 2024-02-22 14:41:29 +0200 liblzma: Disable branchless C version in range decoder. Thanks to Sebastian Andrzej Siewior and Sam James for benchmarking on various systems. src/liblzma/rangecoder/range_decoder.h | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) commit 00440f52be9ac2c7438c7b0cb1082f12399632c6 Author: Lasse Collin Date: 2024-02-21 17:41:32 +0200 INSTALL: Clarify that --disable-assembler affects only 32-bit x86. INSTALL | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) commit 11405be84ea294497e12d03d7219f607063f4a00 Author: Lasse Collin Date: 2024-02-19 18:41:37 +0200 Windows: build.bash: Include COPYING.0BSD in the package. windows/build.bash | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit c27cf64e3e27f4968431d65be7098a12a3a80d30 Author: Lasse Collin Date: 2024-02-18 17:59:46 +0200 Windows: build.bash: include liblzma-crt-mixing.txt in the package. windows/build.bash | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) commit 8d38941baed05de4ff7cc775de14833108f62184 Author: Lasse Collin Date: 2024-02-18 17:17:43 +0200 Windows: Major update to Windows build instructions. INSTALL | 68 ++++----- windows/INSTALL-MSVC.txt | 23 +-- windows/INSTALL-MinGW-w64_with_Autotools.txt | 49 +++++++ windows/INSTALL-MinGW-w64_with_CMake.txt | 203 +++++++++++++++++++++++++++ windows/INSTALL-MinGW.txt | 138 ------------------ windows/README-Windows.txt | 2 + windows/build-with-cmake.bat | 35 +++++ windows/liblzma-crt-mixing.txt | 70 +++++++++ 8 files changed, 404 insertions(+), 184 deletions(-) commit 4b5b0d352348ff510ffb50a3b5b71788857d37a1 Author: Lasse Collin Date: 2024-02-18 15:15:04 +0200 Windows: Update windows/README-Windows.txt. It's for binary packages built with windows/build.bash. windows/README-Windows.txt | 104 ++++++++++++++++++--------------------------- 1 file changed, 41 insertions(+), 63 deletions(-) commit 1ee716f74085223c8fbcae1d5a384e6bf53c0f6a Author: Lasse Collin Date: 2024-02-18 15:15:04 +0200 Windows: Update windows/build.bash. Support for the old MinGW was dropped. Only MinGW-w64 with GCC is supported now. The script now supports also cross-compilation from GNU/Linux (tests are not run). MSYS2 and also the old MSYS 1.0.11 work for building on Windows. The i686 and x86_64 toolchains must be in PATH to build both 32-bit and 64-bit versions. Parallel builds are done if "nproc" from GNU coreutils is available. MinGW-w64 runtime copyright information file was renamed from COPYING-Windows.txt to COPYING.MinGW-w64-runtime.txt which is the filename used by MinGW-w64 itself. Its existence is now mandatory, it's checked at the beginning of the script. The file TODO is no longer copied to the package. windows/build.bash | 191 +++++++++++++++++++++++++++++++---------------------- 1 file changed, 112 insertions(+), 79 deletions(-) commit 60462e42609a1d961868a1d1ebecc713c6d27e2e Author: Jia Tan Date: 2024-02-20 23:32:22 +0800 Translations: Update the Romanian man page translations. po4a/ro.po | 1715 +++++++++++++++++++++++++++++++----------------------------- 1 file changed, 875 insertions(+), 840 deletions(-) commit 10d733e5b8929c642e00891cfa9ead9c2cdd2e05 Author: Jia Tan Date: 2024-02-20 23:30:25 +0800 Translations: Update the Korean man page translations. po4a/ko.po | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit 797a34b72ac6baff237d7a546fa941d8f78f2f62 Author: Jia Tan Date: 2024-02-20 21:03:53 +0800 Translations: Update the Spanish translation. po/es.po | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit 5c3751d019f023e091df9a653e2bb1f6ea8b0d49 Author: Jia Tan Date: 2024-02-20 20:18:07 +0800 Translations: Update the Romanian translation. po/ro.po | 470 ++++++++++++++++++++++++++++++--------------------------------- 1 file changed, 227 insertions(+), 243 deletions(-) commit e2d31154ecc750935436e8b62c6b073b2cfa84e3 Author: Jia Tan Date: 2024-02-20 20:15:50 +0800 Translations: Update the Croatian translation. po/hr.po | 648 ++++++++++++++++++++++++++++++++++----------------------------- 1 file changed, 355 insertions(+), 293 deletions(-) commit 704500f994d5ac271bfcfd592275c5a7da4dc8d2 Author: Jia Tan Date: 2024-02-20 20:05:44 +0800 Translations: Update the German man page translations. po4a/de.po | 1696 +++++++++++++++++++++++++++++++----------------------------- 1 file changed, 873 insertions(+), 823 deletions(-) commit 1cfd3dca3fef321b06db73c3c9e13f347c2e2f5f Author: Jia Tan Date: 2024-02-20 19:58:25 +0800 Translations: Update the German translation. po/de.po | 427 +++++++++++++++++++++++++++++++++------------------------------ 1 file changed, 225 insertions(+), 202 deletions(-) commit 28b9b3f16cc7c6e5b42e691994569c17f4561c9a Author: Jia Tan Date: 2024-02-20 19:56:52 +0800 Translations: Update the Hungarian translation. po/hu.po | 556 ++++++++++++++++++++++++++++++++++++++------------------------- 1 file changed, 338 insertions(+), 218 deletions(-) commit 00b06cd0af6ad2ee93d3006bf80417db060c2b04 Author: Lasse Collin Date: 2024-02-19 16:48:05 +0200 CMake: Fix building of lzmainfo when translations are enabled. CMakeLists.txt | 2 ++ 1 file changed, 2 insertions(+) commit b0d1422b6037bfea6f6723683bd82a8e6d77026c Author: Lasse Collin Date: 2024-02-19 13:38:42 +0200 CMake: Don't assume that -fvisibility=hidden is supported outside Windows. The original code was good enough for supporting GNU/Linux and a few others but it wasn't very portable. CMake doesn't support Solaris Studio's -xldscope=hidden. If it ever does, things should still work with this commit as Solaris Studio supports not only its own __global but also the GNU C __attribute__((visibility("default"))). Support for the attribute was added in 2007 to Sun Studio 12 compiler version 5.9. CMakeLists.txt | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-) commit 2ced9d34bef4dce52ecbbf84d0903ab0aae1442c Author: Lasse Collin Date: 2024-02-19 12:20:59 +0200 CMake: Revise the component splitting. CMakeLists.txt | 57 +++++++++++++++++++++++++++++++-------------------------- 1 file changed, 31 insertions(+), 26 deletions(-) commit 426bdc709c169d39b31dec410016779de117ef69 Author: Lasse Collin Date: 2024-02-17 21:45:07 +0200 CMake: Update the main comment and document CMAKE_BUILD_TYPE=Release. CMakeLists.txt | 79 ++++++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 63 insertions(+), 16 deletions(-) commit 4430e075f7ccfc47972d6ca0aa1c3779fc265e10 Author: Lasse Collin Date: 2024-02-17 21:27:48 +0200 CMake: Use -O2 instead of -O3 in CMAKE_BUILD_TYPE=Release. -O3 doesn't seem useful for speed but it makes the code bigger. CMake makes is difficult for users to simply override the optimization level: CFLAGS / CMAKE_C_FLAGS aren't helpful because they go before CMAKE_C_FLAGS_RELEASE. Of course, users can override CMAKE_C_FLAGS_RELEASE directly but then they have to remember to add also -DNDEBUG to disable assertions. This commit changes -O3 to -O2 in CMAKE_C_FLAGS_RELEASE if and only if CMAKE_C_FLAGS_RELEASE cache variable doesn't already exist. So if a custom value is passed on the command line (or reconfiguring an already-configured build), the cache variable won't be modified. CMakeLists.txt | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) commit 025eb6d7879e4c4e8cb29716b371e0f4c1aea660 Author: Lasse Collin Date: 2024-02-18 14:59:52 +0200 CMake: Handle symbol versioning on MicroBlaze specially. This is to match configure.ac. CMakeLists.txt | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) commit 2edd1a35b2507d1ce68b52dbaebe23c4850a74ce Author: Lasse Collin Date: 2024-02-17 22:18:12 +0200 CMake: Keep build working even if lib/*.[ch] are removed. CMakeLists.txt | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) commit d753e2ce4715552884afadc4ed6fbf8ccca6efac Author: Lasse Collin Date: 2024-02-17 18:10:40 +0200 CMake: Install documentation. CMakeLists.txt | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) commit 7a0405bea9cb0df9318b70f779f82b2c473e98ac Author: Lasse Collin Date: 2024-02-17 15:35:35 +0200 CMake: Bump maximum policy version to 3.28. CMP0154 doesn't affect us since we don't use FILE_SET. CMakeLists.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit c2264ffbe3892d28930b89b0123efc369cabc143 Author: Lasse Collin Date: 2024-02-17 15:35:35 +0200 CMake: Build lzmainfo. CMakeLists.txt | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) commit 998d0b29536094a89cf385a3b894e157db1ccefe Author: Lasse Collin Date: 2024-02-17 15:35:35 +0200 CMake: Build lzmadec. CMakeLists.txt | 76 ++++++++++++++++++++++++++++++++-------------------------- 1 file changed, 42 insertions(+), 34 deletions(-) commit 74e8bc7417a0f37ca7ed5ee0127d33c69b3100b9 Author: Lasse Collin Date: 2024-02-17 15:35:35 +0200 CMake: Add test_scripts.sh to the tests. In contrast to Automake, skipping of this test when decoders are disabled is handled at CMake side instead of test_scripts.sh because CMake-build doesn't create config.h. CMakeLists.txt | 14 ++++++++++++++ tests/test_scripts.sh | 13 ++++++++----- 2 files changed, 22 insertions(+), 5 deletions(-) commit 4808f238a731befcd46c2117c62a1caaf4403989 Author: Lasse Collin Date: 2024-02-17 15:35:35 +0200 CMake: Install scripts. Compared to the Autotools-based build, this has simpler handling for the shell (@POSIX_SHELL@) and extra PATH entry for the scripts (configure has --enable-path-for-scripts=PREFIX). The simpler metho should be enough for non-ancient systems and Solaris. CMakeLists.txt | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 82 insertions(+), 1 deletion(-) commit 3462362ebd94d835c664e94ad8f414cfe7590ca7 Author: Lasse Collin Date: 2024-02-17 15:35:35 +0200 Scripts: Use @PACKAGE_VERSION@ instead of @VERSION@. PACKAGE_VERSION was already used in liblzma.pc.in. This way only one version @foo@ is used. src/scripts/xzdiff.in | 2 +- src/scripts/xzgrep.in | 2 +- src/scripts/xzless.in | 2 +- src/scripts/xzmore.in | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) commit 67610c245ba6c68cf65991693bab9312b7dc987b Author: Lasse Collin Date: 2024-02-17 15:35:35 +0200 CMake: Simplify symlink creation and install translated man pages. It helps that cmake_install.cmake doesn't parallelize installation so symlinks can be created so that the target is always known to exist (a requirement on Windows in some cases). This bumps the minimum CMake version from 3.13 to 3.14 to use file(CREATE_LINK ...). It could be made to work on 3.13 by calling "cmake -E create_symlink" but it's uglier code and slower in "make install". 3.14 should be a reasonable version to require nowadays, especially since the Autotools build is still the primary build system for most OSes. CMakeLists.txt | 195 +++++++++++++++++++++++++++++---------------------------- 1 file changed, 98 insertions(+), 97 deletions(-) commit 50cc1d8a5a8154428bf240c7e4972e32b17d99bf Author: Lasse Collin Date: 2024-02-17 15:35:35 +0200 CMake: Add support for building and installing xz with translations. If gettext tools are available, the .po files listed in po/LINGUAS are converted using msgfmt. This allows building with translations directly from xz.git without Autotools. If gettext tools aren't available, the Autotools-created .gmo files in the "po" directory will be used. This allows CMake-based build to use translations from Autotools-generated tarball. If translation support is found (Intl_FOUND) but both the gettext tools and the pre-generated .gmo files are missing, then "make" will fail. CMakeLists.txt | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 66 insertions(+), 2 deletions(-) commit 746c471643009947f94a3494a1148f74c7381b56 Author: Lasse Collin Date: 2024-02-19 11:58:33 +0200 liblzma: Remove commented-out code. src/liblzma/rangecoder/range_decoder.h | 3 --- 1 file changed, 3 deletions(-) commit 4ce300ce0884c6e552de2af9ae8050b47b01f0e7 Author: Lasse Collin Date: 2024-02-17 23:07:35 +0200 xz: Delete old commented-out code. src/xz/message.c | 19 ------------------- 1 file changed, 19 deletions(-) commit cae9a5e0bf422e6c5e64180805904f7ed02dc3aa Author: Lasse Collin Date: 2024-02-17 23:07:35 +0200 xz: Use stricter pledge(2) and Landlock sandbox. This makes these sandboxing methods stricter when no files are created or deleted. That is, it's a middle ground between the initial sandbox and the strictest single-file-to-stdout sandbox: this allows opening files for reading but output has to go to stdout. src/xz/main.c | 46 +++++++++++++++++++++++++++++++++------------- src/xz/sandbox.c | 32 ++++++++++++++++++++++++++++++++ src/xz/sandbox.h | 4 ++++ 3 files changed, 69 insertions(+), 13 deletions(-) commit 02e3505991233901575b7eabc06b2c6c62a96899 Author: Lasse Collin Date: 2024-02-17 23:07:35 +0200 xz: Support Landlock ABI version 4. Linux 6.7 added support for ABI version 4 which restricts TCP connections which xz won't need and thus those can be forbidden now. Since the ABI version is handled at runtime, supporting version 4 won't cause any compatibility issues. Note that new enough kernel headers are required to get version 4 support enabled at build time. src/xz/sandbox.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) commit 374868d81d473ab56556a1cfd6b1b36a1fab348b Author: Lasse Collin Date: 2024-02-17 23:07:35 +0200 xz: Move sandboxing code to sandbox.c and improve Landlock sandbox. Landlock is now always used just like pledge(2) is: first in more permissive mode and later (under certain common conditions) in a strict mode that doesn't allow opening more files. I put pledge(2) first in sandbox.c because it's the simplest API to use and still somewhat fine-grained for basic applications. So it's the simplest thing to understand for anyone reading sandbox.c. CMakeLists.txt | 2 + src/xz/Makefile.am | 2 + src/xz/file_io.c | 170 +----------------------------- src/xz/file_io.h | 6 -- src/xz/main.c | 50 +++------ src/xz/private.h | 6 +- src/xz/sandbox.c | 295 +++++++++++++++++++++++++++++++++++++++++++++++++++++ src/xz/sandbox.h | 39 +++++++ 8 files changed, 357 insertions(+), 213 deletions(-) commit 7312dfbb02197c7f990c7a3cefd027a9387d1473 Author: Lasse Collin Date: 2024-02-17 23:07:35 +0200 xz: Tweak comments. src/xz/main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) commit c701a5909ad9882469fbab4fab5d2d5556d3ba78 Author: Lasse Collin Date: 2024-02-17 23:07:35 +0200 xz: Fix message_init() description. Also explicitly initialize progress_automatic to make it clear that it can be read before message_init() sets it. Static variable was initialized to false by default already so this is only for clarity. src/xz/main.c | 3 ++- src/xz/message.c | 2 +- src/xz/message.h | 5 ++++- 3 files changed, 7 insertions(+), 3 deletions(-) commit 9466306719f3b76e92fac4e55fbfd89ec92295fa Author: Lasse Collin Date: 2024-02-17 19:35:47 +0200 Build: Makefile.am: Sort EXTRA_DIST. Dirs first, then files in case-sensitive ASCII order. Makefile.am | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) commit f3440e78c9517db75bfa52e1a378fad60b073bbe Author: Lasse Collin Date: 2024-02-17 19:25:05 +0200 Build: Don't install TODO. Makefile.am | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit a7a3b62e2ab03c82b2bd5c78da1d1fb8b8490381 Author: Jia Tan Date: 2024-02-18 01:09:11 +0800 Translations: Update the Korean man page translations. po4a/ko.po | 1707 +++++++++++++++++++++++++++++++----------------------------- 1 file changed, 871 insertions(+), 836 deletions(-) commit 9b315db2d5e74700f3dc0755eb86c27947c0b393 Author: Jia Tan Date: 2024-02-18 01:08:32 +0800 Translations: Update the Korean translation. po/ko.po | 423 +++++++++++++++++++++++++++++++++------------------------------ 1 file changed, 223 insertions(+), 200 deletions(-) commit 56246607dff177b0410d140fcca4a42c865723dc Author: Lasse Collin Date: 2024-02-17 16:23:14 +0200 Build: Install translated lzmainfo man pages. All other translated man pages were being installed but lzmainfo had been forgotten. src/lzmainfo/Makefile.am | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) commit f1d6b88aefcced538403c5c2606ba57065b16e70 Author: Lasse Collin Date: 2024-02-17 16:01:32 +0200 liblzma: Avoid implementation-defined behavior in the RISC-V filter. GCC docs promise that it works and a few other compilers do too. Clang/LLVM is documented source code only but unsurprisingly it behaves the same as others on x86-64 at least. But the certainly-portable way is good enough here so use that. src/liblzma/simple/riscv.c | 30 ++++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 8 deletions(-) commit 843ddc5f617b91ae132d6bab0f2f2d9c9fcd214a Author: Lasse Collin Date: 2024-02-17 15:48:28 +0200 liblzma: Wrap a line exceeding 80 chars. src/liblzma/rangecoder/range_decoder.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) commit e9053c907250c70d98b319d95fa54cb94fc76869 Author: Sebastian Andrzej Siewior Date: 2024-02-16 21:50:15 +0100 liblzma/rangecoder: Exclude x32 from the x86-64 optimisation. The x32 port has a x86-64 ABI in term of all registers but uses only 32bit pointer like x86-32. The assembly optimisation fails to compile on x32. Given the state of x32 I suggest to exclude it from the optimisation rather than trying to fix it. Signed-off-by: Sebastian Andrzej Siewior src/liblzma/rangecoder/range_decoder.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 3d198fb13b87f8803442e5799d465f7434a70555 Author: Jia Tan Date: 2024-02-17 21:05:07 +0800 Translations: Update the Spanish translation. po/es.po | 427 +++++++++++++++++++++++++++++++++------------------------------ 1 file changed, 226 insertions(+), 201 deletions(-) commit cf278bfe60a25b54b3786f06503bc61272970820 Author: Jia Tan Date: 2024-02-17 20:43:29 +0800 Translations: Update the Swedish translation. po/sv.po | 434 +++++++++++++++++++++++++++++++++------------------------------ 1 file changed, 230 insertions(+), 204 deletions(-) commit b0f1a41be50560cc6cb528e8e96b02b2067c52c2 Author: Jia Tan Date: 2024-02-17 20:41:38 +0800 Translations: Update the Polish translation. po/pl.po | 424 +++++++++++++++++++++++++++++++++------------------------------ 1 file changed, 224 insertions(+), 200 deletions(-) commit d74ed48b30c631b6a4c7e7858b06828293bf8520 Author: Jia Tan Date: 2024-02-17 20:41:02 +0800 Translations: Update the Ukrainian translation. po/uk.po | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 711e22d5c5f3bac39ac904efb3ede874a66e2045 Author: Lasse Collin Date: 2024-02-16 17:53:34 +0200 Translations: Use the same sentence in xz.pot-header that the TP uses. po/xz.pot-header | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit fb5f6aaf18584672d0fee5dbe41fd30fc6bf5422 Author: Jia Tan Date: 2024-02-16 22:53:46 +0800 Fix typos discovered by codespell. AUTHORS | 2 +- NEWS | 2 +- src/liblzma/rangecoder/range_decoder.h | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) commit c64723bbb094e29b4edd98f6fcce866e1b569b42 Author: Jia Tan Date: 2024-02-16 22:52:41 +0800 Translations: Update the Ukrainian man page translations. po4a/uk.po | 1710 +++++++++++++++++++++++++++++++----------------------------- 1 file changed, 873 insertions(+), 837 deletions(-) commit 2895195ed0f68b245c7bd568c126ba6e685fa1d6 Author: Jia Tan Date: 2024-02-16 22:51:04 +0800 Translations: Update the Ukrainian translation. po/uk.po | 466 ++++++++++++++++++++++++++++++--------------------------------- 1 file changed, 225 insertions(+), 241 deletions(-) commit 4c20781f4c8f04879b64d631a4f44b4909147bde Author: Lasse Collin Date: 2024-02-15 22:32:52 +0200 Translations: Omit the generic copyright line from man page headers. po4a/update-po | 1 + 1 file changed, 1 insertion(+) commit 4323bc3e0c1e1d2037d5e670a3bf6633e8a3031e Author: Jia Tan Date: 2024-02-15 22:26:43 +0800 Update m4/.gitignore. m4/.gitignore | 1 + 1 file changed, 1 insertion(+) commit 5394a1665b7a108a54cb8b4ef3ebe59d3dbcca3a Author: Lasse Collin Date: 2024-02-14 21:11:49 +0200 Tests: tuktest.h: Treat Clang separately from GCC. Don't assume that Clang defines __GNUC__ as the extensions are available in clang-cl as well (and possibly in some other Clang variants?). tests/tuktest.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit cce7330b9f23485a0879422e0c3395a7065439ac Author: Lasse Collin Date: 2024-02-14 21:11:03 +0200 Tests: tuktest.h: Add a missing word to a comment. tests/tuktest.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 5dd8fc9452a3373cedc27379067ce638f992c741 Author: Lasse Collin Date: 2024-02-14 21:10:10 +0200 Tests: tuktest.h: Fix the comment about STest. tests/tuktest.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) commit 6f1790254a03c5edf0f2976f773220f070450acd Author: Jia Tan Date: 2024-02-15 01:53:40 +0800 Bump version for 5.5.2beta. src/liblzma/api/lzma/version.h | 4 ++-- src/liblzma/liblzma_generic.map | 2 +- src/liblzma/liblzma_linux.map | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) commit 924fdeedf48113fb1e0646d86bd89a356d21a055 Author: Lasse Collin Date: 2024-02-14 19:46:11 +0200 liblzma: Fix validate_map.sh. Adding the SPDX license identifier changed the line numbers. src/liblzma/validate_map.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 22140a2df6161b0110e6b4afa5ea0a07c5b60b01 Author: Lasse Collin Date: 2024-02-14 19:38:34 +0200 Build: Start the generated ChangeLog from around 5.4.0 instead of 5.2.0. Makefile.am | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 0b8cefa136c21d403a01b78517f4decb50172bdb Author: Lasse Collin Date: 2024-02-14 19:27:46 +0200 Fixed NEWS for 5.5.2beta. NEWS | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) commit a4557bad96361d93ea171ed859ac5a696fca824f Author: Lasse Collin Date: 2024-02-14 19:21:45 +0200 liblzma: Silence warnings in --enable-small build. src/liblzma/lzma/lzma_decoder.c | 2 ++ src/liblzma/rangecoder/range_decoder.h | 1 + 2 files changed, 3 insertions(+) commit 38edf473236d00b3e100dc4c4f0bf43a4993fed2 Author: Lasse Collin Date: 2024-02-14 19:15:58 +0200 Build: Install COPYING.0BSD as part of docs. Makefile.am | 1 + 1 file changed, 1 insertion(+) commit b74e10bd839bcdc239afb5300ffaee195f34c217 Author: Lasse Collin Date: 2024-02-14 19:14:05 +0200 Docs: List COPYING.0BSD in README. README | 1 + 1 file changed, 1 insertion(+) commit dfdb60ffe933a1f1497d300dbb4513ed17ec6f0e Author: Lasse Collin Date: 2024-02-14 19:11:48 +0200 Docs: Include doc/examples/11_file_info.c in tarballs. It was added in 2017 in c2e29f06a7d1e3ba242ac2fafc69f5d6e92f62cd but it never got into any release tarballs because it was forgotten to be added to Makefile.am. Makefile.am | 1 + 1 file changed, 1 insertion(+) commit 160b6862646d95dfdbd73ab7f1031ede0f54992d Author: Lasse Collin Date: 2024-02-14 19:05:58 +0200 liblzma: Silence a warning. src/liblzma/rangecoder/range_decoder.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit eeedd4d0925ea417add04ceb42a6c0829244b50c Author: Lasse Collin Date: 2024-02-14 18:32:27 +0200 Add NEWS for 5.5.2beta. NEWS | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) commit 8af7db854f903068d72a9a0d21103cb0c5027fa8 Author: Lasse Collin Date: 2024-02-13 14:32:47 +0200 xz: Mention lzmainfo if trying to use 'lzma --list'. This kind of fixes the problem reported here: https://bugs.launchpad.net/ubuntu/+source/xz-utils/+bug/1291020 src/xz/list.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) commit 0668907ff736e4cd16738c10d39a2bc9e851aefb Author: Lasse Collin Date: 2024-02-14 14:58:36 +0200 liblzma: Add comments. src/liblzma/lzma/lzma_decoder.c | 9 +++++++++ src/liblzma/rangecoder/range_decoder.h | 11 +++++++++-- 2 files changed, 18 insertions(+), 2 deletions(-) commit 109f1913d4824c8214d5bbd38ebebf62c37572da Author: Lasse Collin Date: 2024-02-13 17:00:17 +0200 Scripts: Add lz4 support to xzgrep and xzdiff. src/scripts/xzdiff.1 | 8 +++++--- src/scripts/xzdiff.in | 14 +++++++++----- src/scripts/xzgrep.1 | 6 ++++-- src/scripts/xzgrep.in | 1 + 4 files changed, 19 insertions(+), 10 deletions(-) commit de55485cb23af56c5adbe3239b935c957ff8ac4f Author: Lasse Collin Date: 2024-02-13 14:05:13 +0200 liblzma: Choose the range decoder variants using a bitmask macro. src/liblzma/rangecoder/range_decoder.h | 64 ++++++++++++++++++++++++++++------ 1 file changed, 53 insertions(+), 11 deletions(-) commit 0709c2b2d7c1d8f437b003f691880fd7810e5be5 Author: Lasse Collin Date: 2024-02-13 11:38:10 +0200 xz: Fix outdated threading related info on the man page. src/xz/xz.1 | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) commit 3182a330c1512cc1f5c87b5c5a272578e60a5158 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 liblzma: Range decoder: Add x86-64 inline assembly. It's compatible with GCC and Clang. src/liblzma/rangecoder/range_decoder.h | 491 +++++++++++++++++++++++++++++++++ 1 file changed, 491 insertions(+) commit cba2edc991dffba7cd4891dbc1bd26cb950cf053 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 liblzma: Range decoder: Add branchless C code. It's used only for basic bittrees and fixed-size reverse bittree because those showed a clear benefit on x86-64 with GCC and Clang. The other methods were more mixed and thus are commented out but they should be tested on other archs. src/liblzma/rangecoder/range_decoder.h | 76 ++++++++++++++++++++++++++++++++++ 1 file changed, 76 insertions(+) commit e290a72d6dee71faf3a90c9678b2f730083666a7 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 liblzma: Clarify a comment. src/liblzma/lzma/lzma_decoder.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) commit 5e04706b91ca90d6befd4da24a588a55e631d4a9 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 liblzma: LZMA decoder: Optimize loop comparison. But now it needs one more local variable. src/liblzma/lzma/lzma_decoder.c | 5 ++--- src/liblzma/rangecoder/range_decoder.h | 10 +++++++++- 2 files changed, 11 insertions(+), 4 deletions(-) commit 88276f9f2cb4871c7eb86952d93d07c1cf6caa66 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 liblzma: Optimize literal_subcoder() macro slightly. src/liblzma/lzma/lzma_common.h | 22 ++++++++++++---------- src/liblzma/lzma/lzma_decoder.c | 12 ++++++------ src/liblzma/lzma/lzma_encoder.c | 6 +++--- src/liblzma/lzma/lzma_encoder_optimum_normal.c | 2 +- src/liblzma/lzma/lzma_encoder_private.h | 4 ++-- 5 files changed, 24 insertions(+), 22 deletions(-) commit 5938f6de4d8ec9656776cd69e78ddfd6c3ad84e5 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 liblzma: LZ decoder: Add unlikely(). src/liblzma/lz/lz_decoder.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 9c252e3ed086c6b72590b2531586c42596d4a9d9 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 liblzma: LZ decoder: Remove a useless unlikely(). src/liblzma/lz/lz_decoder.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit f3872a59475456c5d365cad9f1c5be514cfa54b5 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 liblzma: Optimize LZ decoder slightly. Now extra buffer space is reserved so that repeating bytes for any single match will never need to copy from two places (both the beginning and the end of the buffer). This simplifies dict_repeat() and helps a little with speed. This seems to reduce .lzma decompression time about 2 %, so with .xz and CRC it could be slightly less. The small things add up still. src/liblzma/lz/lz_decoder.c | 43 ++++++++++++----- src/liblzma/lz/lz_decoder.h | 101 +++++++++++++++++++++------------------- src/liblzma/lzma/lzma_decoder.c | 4 +- 3 files changed, 88 insertions(+), 60 deletions(-) commit eb518446e578acf079abae5f1ce28db7b6e59bc1 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 liblzma: LZMA decoder: Get rid of next_state[]. It's not completely obvious if this is better in the decoder. It should be good if compiler can avoid creating a branch (like using CMOV on x86). This also makes lzma_encoder.c use the new macros. src/liblzma/lzma/lzma_common.h | 14 ++++++++++++++ src/liblzma/lzma/lzma_decoder.c | 30 ++++++++---------------------- src/liblzma/lzma/lzma_encoder.c | 4 ++-- 3 files changed, 24 insertions(+), 24 deletions(-) commit e0c0ee475c0800c08291ae45e0d66aa00d5ce604 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 liblzma: LZMA decoder improvements. This adds macros for bittree decoding which prepares the code for alternative C versions and inline assembly. src/liblzma/lzma/lzma_decoder.c | 264 ++++++++++----------------------- src/liblzma/rangecoder/range_common.h | 4 + src/liblzma/rangecoder/range_decoder.h | 142 ++++++++++++++++-- 3 files changed, 210 insertions(+), 200 deletions(-) commit de5c5e417645ad8906ef914bc059d08c1462fc29 Author: Jia Tan Date: 2024-02-12 17:09:10 +0200 liblzma: Creates Non-resumable and Resumable modes for lzma_decoder. The new decoder resumes the first decoder loop in the Resumable mode. Then, the code executes in Non-resumable mode until it detects that it cannot guarantee to have enough input/output to decode another symbol. The Resumable mode is how the decoder has always worked. Before decoding every input bit, it checks if there is enough space and will save its location to be resumed later. When the decoder has more input/output, it jumps back to the correct sequence in the Resumable mode code. When the input/output buffers are large, the Resumable mode is much slower than the Non-resumable because it has more branches and is harder for the compiler to optimize since it is in a large switch block. Early benchmarking shows significant time improvement (8-10% on gcc and clang x86) by using the Non-resumable code as much as possible. src/liblzma/lz/lz_decoder.h | 14 +- src/liblzma/lzma/lzma_decoder.c | 720 ++++++++++++++++++++++++++++------------ 2 files changed, 521 insertions(+), 213 deletions(-) commit e446ab7a18abfde18f8d1cf02a914df72b1370e3 Author: Jia Tan Date: 2024-02-12 17:09:10 +0200 liblzma: Creates separate "safe" range decoder mode. The new "safe" range decoder mode is the same as old range decoder, but now the default behavior of the range decoder will not check if there is enough input or output to complete the operation. When the buffers are close to fully consumed, the "safe" operations must be used instead. This will improve speed because it will reduce the number of branches needed for most of the range decoder operations. src/liblzma/lzma/lzma_decoder.c | 108 ++++++++------------------------- src/liblzma/rangecoder/range_decoder.h | 77 +++++++++++++++++------ 2 files changed, 82 insertions(+), 103 deletions(-) commit 7f6d9ca329ff3e01d4b0be7366eb4f5c93da41b9 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 doxygen/footer.html: Add missing closing tags and don't open a new tab. The footer template from Doxygen has the closing as Doxygen doesn't add them otherwise. target="_blank" was omitted as it's not useful here but it can be slightly annoying as one cannot just go back in the browser history. Since the footer links to the license file in the same directory and not to CC website, the rel attributes can be omitted. doxygen/footer.html | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) commit 26d1527d34d52b0f5d632d4fb636fb33d0867e92 Author: Lasse Collin Date: 2024-02-13 13:19:10 +0200 Tweak the expressions in AUTHORS. AUTHORS | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-) commit d231d56580175fa040fdd3c6207a58243ce6217b Author: Lasse Collin Date: 2024-02-13 13:07:33 +0200 Translations: Add the man page translators into man page header comment. It looked odd to only have the original English authors listed in the header comments of the translated files. po4a/.gitignore | 1 + po4a/po4a.conf | 14 +++++++------- po4a/update-po | 18 ++++++++++++++++++ 3 files changed, 26 insertions(+), 7 deletions(-) commit 6d35fcb936474fca1acaebfd9502c097b6fde88e Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 Translations: Translate also messages of lzmainfo. lzmainfo has had translation support since 2009 at least but it was never added to po/POTFILES.in so the messages weren't translated. It's a very rarely needed tool so it's not too bad. This also adds src/xz/mytime.c to po/POTFILES.in although there are no translatable strings. It's simpler this way so that it won't be forgotten if strings were ever added to that file. po/POTFILES.in | 2 ++ 1 file changed, 2 insertions(+) commit a9f369dd54b05f9ac4e00ead9d765d04fc259868 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 Translations: Add custom .pot header with SPDX license identifier. The same is used for both po/xz.pot and po4a/xz-man.pot. Makefile.am | 1 + po/xz.pot-header | 7 +++++++ po4a/update-po | 8 ++++++++ 3 files changed, 16 insertions(+) commit 469cd6653bb96e83c5cf1031c204d34566b15f44 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 Translations: po4a/update-po: Add copyright notice to xz-man.pot. All man pages are under 0BSD now so this is simple now. po4a/update-po | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 28ce45e38fbed4b5f54f2013e38dab47d22bf699 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 Update COPYING about the man pages of the scripts. COPYING | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit e48287bf51afd5184ea74de1dcade9e153f873f7 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 xzdiff, xzgrep, and xzmore: Rewrite the man pages. The main reason is a kind of silly one: xz-man.pot contains strings from all man pages in XZ Utils. The man pages of xzdiff, xzgrep, and xzmore were under GPLv2 and the rest under 0BSD. Thus xz-man.pot contained strings under two licences. po4a creates the translated man pages from the combined 0BSD+GPLv2 xz-man.pot. I haven't liked this mixing in xz-man.pot but the Translation Project requires that all man pages must be in the same .pot file. So a separate xz-man-gpl.pot wasn't an option. Since these man pages are short, rewriting them was quick enough. Now xz-man.pot is entirely under 0BSD and marking the per-file licenses is simpler. As a bonus, some wording hopefully is now slightly better although it's perhaps a matter of taste. NOTE: In xzgrep.1, the EXIT STATUS section was written by me in the commit d796b6d7fdb8b7238b277056cf9146cce25db604 so that's why that section could be taken as is from the old xzgrep.1. src/scripts/xzdiff.1 | 94 ++++++++++++++++++++++++----------------- src/scripts/xzgrep.1 | 116 ++++++++++++++++++++++++++++++++------------------- src/scripts/xzmore.1 | 79 ++++++++++++++++++++--------------- 3 files changed, 173 insertions(+), 116 deletions(-) commit 3e551b111b8ae8150f1a1040364dbafc034f22be Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 xzless: Update man page slightly. The xz tool can decompress three file formats and xzless has always supported uncompressed files too. src/scripts/xzless.1 | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) commit 40f36da2262d13d6e1ba8449caa855512ae626d7 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 Translations: Change po/Makevars to add a copyright notice to po/xz.pot. po/Makevars | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 24192854e2ea5c06997431a98bda3c36c5da1497 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 Translations: Update po/Makevars to use the template from gettext 0.22.4. Also add SPDX license identifier now that there is a known license. po/Makevars | 51 ++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 46 insertions(+), 5 deletions(-) commit b94154957370116480b43bcabca25fc52deb9853 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 liblzma: Include the SPDX license identifier 0BSD to generated files. Perhaps the generated files aren't even copyrightable but using the same license for them as for the rest of the liblzma keeps things more consistent for tools that look for license info. src/liblzma/check/crc32_table_be.h | 4 +++- src/liblzma/check/crc32_table_le.h | 4 +++- src/liblzma/check/crc32_tablegen.c | 16 ++++++++++------ src/liblzma/check/crc64_table_be.h | 4 +++- src/liblzma/check/crc64_table_le.h | 4 +++- src/liblzma/check/crc64_tablegen.c | 8 +++++--- src/liblzma/lz/lz_encoder_hash_table.h | 4 +++- src/liblzma/lzma/fastpos_table.c | 4 +++- src/liblzma/lzma/fastpos_tablegen.c | 12 +++++++----- src/liblzma/rangecoder/price_table.c | 4 +++- src/liblzma/rangecoder/price_tablegen.c | 12 +++++++----- 11 files changed, 50 insertions(+), 26 deletions(-) commit 8e4ec794836bc1701d8c9bd5e347b8ce8cc5bbb4 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 liblzma: Fix compilation of price_tablegen.c. It is built and run only manually so this didn't matter unless one wanted to regenerate the price_table.c. src/liblzma/rangecoder/price_tablegen.c | 5 +++++ src/liblzma/rangecoder/range_common.h | 5 ++++- 2 files changed, 9 insertions(+), 1 deletion(-) commit e99bff3ffbcdf2634fd5bd13887627ec7dbfecaf Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 Add SPDX license identifiers to GPL, LGPL, and FSFULLR files. extra/scanlzma/scanlzma.c | 2 ++ lib/Makefile.am | 2 ++ lib/getopt-cdefs.h | 2 ++ lib/getopt-core.h | 2 ++ lib/getopt-ext.h | 2 ++ lib/getopt-pfx-core.h | 2 ++ lib/getopt-pfx-ext.h | 2 ++ lib/getopt.c | 2 ++ lib/getopt.in.h | 2 ++ lib/getopt1.c | 2 ++ lib/getopt_int.h | 2 ++ m4/ax_pthread.m4 | 2 ++ m4/getopt.m4 | 2 ++ m4/posix-shell.m4 | 2 ++ m4/visibility.m4 | 2 ++ src/scripts/xzdiff.1 | 3 +-- src/scripts/xzdiff.in | 1 + src/scripts/xzgrep.1 | 3 +-- src/scripts/xzgrep.in | 1 + src/scripts/xzless.in | 1 + src/scripts/xzmore.1 | 3 +-- src/scripts/xzmore.in | 1 + 22 files changed, 37 insertions(+), 6 deletions(-) commit 22af94128b89a131f5e58ae69bee5e50227c15da Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 Add SPDX license identifier into 0BSD source code files. .github/workflows/ci.yml | 2 ++ .github/workflows/windows-ci.yml | 2 ++ CMakeLists.txt | 2 ++ Makefile.am | 3 +-- autogen.sh | 1 + build-aux/ci_build.sh | 3 ++- build-aux/manconv.sh | 3 ++- build-aux/version.sh | 3 ++- cmake/remove-ordinals.cmake | 2 ++ cmake/tuklib_common.cmake | 4 ++++ cmake/tuklib_cpucores.cmake | 4 ++++ cmake/tuklib_integer.cmake | 4 ++++ cmake/tuklib_large_file_support.cmake | 4 ++++ cmake/tuklib_mbstr.cmake | 4 ++++ cmake/tuklib_physmem.cmake | 4 ++++ cmake/tuklib_progname.cmake | 4 ++++ configure.ac | 4 +++- debug/Makefile.am | 3 +-- debug/crc32.c | 2 ++ debug/full_flush.c | 2 ++ debug/hex2bin.c | 2 ++ debug/known_sizes.c | 2 ++ debug/memusage.c | 2 ++ debug/repeat.c | 2 ++ debug/sync_flush.c | 2 ++ debug/translation.bash | 1 + doc/examples/01_compress_easy.c | 2 ++ doc/examples/02_decompress.c | 2 ++ doc/examples/03_compress_custom.c | 2 ++ doc/examples/04_compress_easy_mt.c | 2 ++ doc/examples/11_file_info.c | 2 ++ doc/examples/Makefile | 3 +-- dos/Makefile | 2 ++ dos/config.h | 2 ++ doxygen/update-doxygen | 3 ++- extra/7z2lzma/7z2lzma.bash | 3 ++- m4/tuklib_common.m4 | 8 ++++++-- m4/tuklib_cpucores.m4 | 8 ++++++-- m4/tuklib_integer.m4 | 8 ++++++-- m4/tuklib_mbstr.m4 | 8 ++++++-- m4/tuklib_physmem.m4 | 8 ++++++-- m4/tuklib_progname.m4 | 8 ++++++-- po/POTFILES.in | 2 ++ po4a/po4a.conf | 2 ++ po4a/update-po | 3 ++- src/Makefile.am | 3 +-- src/common/common_w32res.rc | 2 ++ src/common/mythread.h | 2 ++ src/common/sysdefs.h | 2 ++ src/common/tuklib_common.h | 2 ++ src/common/tuklib_config.h | 2 ++ src/common/tuklib_cpucores.c | 2 ++ src/common/tuklib_cpucores.h | 2 ++ src/common/tuklib_exit.c | 2 ++ src/common/tuklib_exit.h | 2 ++ src/common/tuklib_gettext.h | 2 ++ src/common/tuklib_integer.h | 2 ++ src/common/tuklib_mbstr.h | 2 ++ src/common/tuklib_mbstr_fw.c | 2 ++ src/common/tuklib_mbstr_width.c | 2 ++ src/common/tuklib_open_stdxxx.c | 2 ++ src/common/tuklib_open_stdxxx.h | 2 ++ src/common/tuklib_physmem.c | 2 ++ src/common/tuklib_physmem.h | 2 ++ src/common/tuklib_progname.c | 2 ++ src/common/tuklib_progname.h | 2 ++ src/liblzma/Makefile.am | 3 +-- src/liblzma/api/Makefile.am | 3 +-- src/liblzma/api/lzma.h | 2 ++ src/liblzma/api/lzma/base.h | 2 ++ src/liblzma/api/lzma/bcj.h | 2 ++ src/liblzma/api/lzma/block.h | 2 ++ src/liblzma/api/lzma/check.h | 2 ++ src/liblzma/api/lzma/container.h | 2 ++ src/liblzma/api/lzma/delta.h | 2 ++ src/liblzma/api/lzma/filter.h | 2 ++ src/liblzma/api/lzma/hardware.h | 2 ++ src/liblzma/api/lzma/index.h | 2 ++ src/liblzma/api/lzma/index_hash.h | 2 ++ src/liblzma/api/lzma/lzma12.h | 2 ++ src/liblzma/api/lzma/stream_flags.h | 2 ++ src/liblzma/api/lzma/version.h | 2 ++ src/liblzma/api/lzma/vli.h | 2 ++ src/liblzma/check/Makefile.inc | 4 ++-- src/liblzma/check/check.c | 2 ++ src/liblzma/check/check.h | 2 ++ src/liblzma/check/crc32_arm64.h | 2 ++ src/liblzma/check/crc32_fast.c | 2 ++ src/liblzma/check/crc32_small.c | 2 ++ src/liblzma/check/crc32_table.c | 2 ++ src/liblzma/check/crc32_tablegen.c | 2 ++ src/liblzma/check/crc32_x86.S | 2 ++ src/liblzma/check/crc64_fast.c | 2 ++ src/liblzma/check/crc64_small.c | 2 ++ src/liblzma/check/crc64_table.c | 2 ++ src/liblzma/check/crc64_tablegen.c | 2 ++ src/liblzma/check/crc64_x86.S | 2 ++ src/liblzma/check/crc_common.h | 2 ++ src/liblzma/check/crc_x86_clmul.h | 2 ++ src/liblzma/check/sha256.c | 2 ++ src/liblzma/common/Makefile.inc | 3 +-- src/liblzma/common/alone_decoder.c | 2 ++ src/liblzma/common/alone_decoder.h | 2 ++ src/liblzma/common/alone_encoder.c | 2 ++ src/liblzma/common/auto_decoder.c | 2 ++ src/liblzma/common/block_buffer_decoder.c | 2 ++ src/liblzma/common/block_buffer_encoder.c | 2 ++ src/liblzma/common/block_buffer_encoder.h | 2 ++ src/liblzma/common/block_decoder.c | 2 ++ src/liblzma/common/block_decoder.h | 2 ++ src/liblzma/common/block_encoder.c | 2 ++ src/liblzma/common/block_encoder.h | 2 ++ src/liblzma/common/block_header_decoder.c | 2 ++ src/liblzma/common/block_header_encoder.c | 2 ++ src/liblzma/common/block_util.c | 2 ++ src/liblzma/common/common.c | 2 ++ src/liblzma/common/common.h | 2 ++ src/liblzma/common/easy_buffer_encoder.c | 2 ++ src/liblzma/common/easy_decoder_memusage.c | 2 ++ src/liblzma/common/easy_encoder.c | 2 ++ src/liblzma/common/easy_encoder_memusage.c | 2 ++ src/liblzma/common/easy_preset.c | 2 ++ src/liblzma/common/easy_preset.h | 2 ++ src/liblzma/common/file_info.c | 2 ++ src/liblzma/common/filter_buffer_decoder.c | 2 ++ src/liblzma/common/filter_buffer_encoder.c | 2 ++ src/liblzma/common/filter_common.c | 2 ++ src/liblzma/common/filter_common.h | 2 ++ src/liblzma/common/filter_decoder.c | 2 ++ src/liblzma/common/filter_decoder.h | 2 ++ src/liblzma/common/filter_encoder.c | 2 ++ src/liblzma/common/filter_encoder.h | 2 ++ src/liblzma/common/filter_flags_decoder.c | 2 ++ src/liblzma/common/filter_flags_encoder.c | 2 ++ src/liblzma/common/hardware_cputhreads.c | 2 ++ src/liblzma/common/hardware_physmem.c | 2 ++ src/liblzma/common/index.c | 2 ++ src/liblzma/common/index.h | 2 ++ src/liblzma/common/index_decoder.c | 2 ++ src/liblzma/common/index_decoder.h | 2 ++ src/liblzma/common/index_encoder.c | 2 ++ src/liblzma/common/index_encoder.h | 2 ++ src/liblzma/common/index_hash.c | 2 ++ src/liblzma/common/lzip_decoder.c | 2 ++ src/liblzma/common/lzip_decoder.h | 2 ++ src/liblzma/common/memcmplen.h | 2 ++ src/liblzma/common/microlzma_decoder.c | 2 ++ src/liblzma/common/microlzma_encoder.c | 2 ++ src/liblzma/common/outqueue.c | 2 ++ src/liblzma/common/outqueue.h | 2 ++ src/liblzma/common/stream_buffer_decoder.c | 2 ++ src/liblzma/common/stream_buffer_encoder.c | 2 ++ src/liblzma/common/stream_decoder.c | 2 ++ src/liblzma/common/stream_decoder.h | 2 ++ src/liblzma/common/stream_decoder_mt.c | 2 ++ src/liblzma/common/stream_encoder.c | 2 ++ src/liblzma/common/stream_encoder_mt.c | 2 ++ src/liblzma/common/stream_flags_common.c | 2 ++ src/liblzma/common/stream_flags_common.h | 2 ++ src/liblzma/common/stream_flags_decoder.c | 2 ++ src/liblzma/common/stream_flags_encoder.c | 2 ++ src/liblzma/common/string_conversion.c | 2 ++ src/liblzma/common/vli_decoder.c | 2 ++ src/liblzma/common/vli_encoder.c | 2 ++ src/liblzma/common/vli_size.c | 2 ++ src/liblzma/delta/Makefile.inc | 3 +-- src/liblzma/delta/delta_common.c | 2 ++ src/liblzma/delta/delta_common.h | 2 ++ src/liblzma/delta/delta_decoder.c | 2 ++ src/liblzma/delta/delta_decoder.h | 2 ++ src/liblzma/delta/delta_encoder.c | 2 ++ src/liblzma/delta/delta_encoder.h | 2 ++ src/liblzma/delta/delta_private.h | 2 ++ src/liblzma/liblzma.pc.in | 3 +-- src/liblzma/liblzma_generic.map | 2 ++ src/liblzma/liblzma_linux.map | 2 ++ src/liblzma/liblzma_w32res.rc | 2 ++ src/liblzma/lz/Makefile.inc | 3 +-- src/liblzma/lz/lz_decoder.c | 2 ++ src/liblzma/lz/lz_decoder.h | 2 ++ src/liblzma/lz/lz_encoder.c | 2 ++ src/liblzma/lz/lz_encoder.h | 2 ++ src/liblzma/lz/lz_encoder_hash.h | 2 ++ src/liblzma/lz/lz_encoder_mf.c | 2 ++ src/liblzma/lzma/Makefile.inc | 3 +-- src/liblzma/lzma/fastpos.h | 2 ++ src/liblzma/lzma/fastpos_tablegen.c | 2 ++ src/liblzma/lzma/lzma2_decoder.c | 2 ++ src/liblzma/lzma/lzma2_decoder.h | 2 ++ src/liblzma/lzma/lzma2_encoder.c | 2 ++ src/liblzma/lzma/lzma2_encoder.h | 2 ++ src/liblzma/lzma/lzma_common.h | 2 ++ src/liblzma/lzma/lzma_decoder.c | 2 ++ src/liblzma/lzma/lzma_decoder.h | 2 ++ src/liblzma/lzma/lzma_encoder.c | 2 ++ src/liblzma/lzma/lzma_encoder.h | 2 ++ src/liblzma/lzma/lzma_encoder_optimum_fast.c | 2 ++ src/liblzma/lzma/lzma_encoder_optimum_normal.c | 2 ++ src/liblzma/lzma/lzma_encoder_presets.c | 2 ++ src/liblzma/lzma/lzma_encoder_private.h | 2 ++ src/liblzma/rangecoder/Makefile.inc | 3 +-- src/liblzma/rangecoder/price.h | 2 ++ src/liblzma/rangecoder/price_tablegen.c | 2 ++ src/liblzma/rangecoder/range_common.h | 2 ++ src/liblzma/rangecoder/range_decoder.h | 2 ++ src/liblzma/rangecoder/range_encoder.h | 2 ++ src/liblzma/simple/Makefile.inc | 3 +-- src/liblzma/simple/arm.c | 2 ++ src/liblzma/simple/arm64.c | 2 ++ src/liblzma/simple/armthumb.c | 2 ++ src/liblzma/simple/ia64.c | 2 ++ src/liblzma/simple/powerpc.c | 2 ++ src/liblzma/simple/riscv.c | 2 ++ src/liblzma/simple/simple_coder.c | 2 ++ src/liblzma/simple/simple_coder.h | 2 ++ src/liblzma/simple/simple_decoder.c | 2 ++ src/liblzma/simple/simple_decoder.h | 2 ++ src/liblzma/simple/simple_encoder.c | 2 ++ src/liblzma/simple/simple_encoder.h | 2 ++ src/liblzma/simple/simple_private.h | 2 ++ src/liblzma/simple/sparc.c | 2 ++ src/liblzma/simple/x86.c | 2 ++ src/liblzma/validate_map.sh | 1 + src/lzmainfo/Makefile.am | 3 +-- src/lzmainfo/lzmainfo.c | 2 ++ src/lzmainfo/lzmainfo_w32res.rc | 2 ++ src/scripts/Makefile.am | 3 +-- src/xz/Makefile.am | 3 +-- src/xz/args.c | 2 ++ src/xz/args.h | 2 ++ src/xz/coder.c | 2 ++ src/xz/coder.h | 2 ++ src/xz/file_io.c | 2 ++ src/xz/file_io.h | 2 ++ src/xz/hardware.c | 2 ++ src/xz/hardware.h | 2 ++ src/xz/list.c | 2 ++ src/xz/list.h | 2 ++ src/xz/main.c | 2 ++ src/xz/main.h | 2 ++ src/xz/message.c | 2 ++ src/xz/message.h | 2 ++ src/xz/mytime.c | 2 ++ src/xz/mytime.h | 2 ++ src/xz/options.c | 2 ++ src/xz/options.h | 2 ++ src/xz/private.h | 2 ++ src/xz/signals.c | 2 ++ src/xz/signals.h | 2 ++ src/xz/suffix.c | 2 ++ src/xz/suffix.h | 2 ++ src/xz/util.c | 2 ++ src/xz/util.h | 2 ++ src/xz/xz_w32res.rc | 2 ++ src/xzdec/Makefile.am | 3 +-- src/xzdec/lzmadec_w32res.rc | 2 ++ src/xzdec/xzdec.c | 2 ++ src/xzdec/xzdec_w32res.rc | 2 ++ tests/Makefile.am | 3 +-- tests/bcj_test.c | 2 ++ tests/code_coverage.sh | 1 + tests/create_compress_files.c | 2 ++ tests/ossfuzz/fuzz_common.h | 2 ++ tests/ossfuzz/fuzz_decode_alone.c | 2 ++ tests/ossfuzz/fuzz_decode_stream.c | 2 ++ tests/ossfuzz/fuzz_encode_stream.c | 2 ++ tests/test_bcj_exact_size.c | 2 ++ tests/test_block_header.c | 2 ++ tests/test_check.c | 2 ++ tests/test_compress.sh | 1 + tests/test_compress_generated_abc | 1 + tests/test_compress_generated_random | 1 + tests/test_compress_generated_text | 1 + tests/test_compress_prepared_bcj_sparc | 1 + tests/test_compress_prepared_bcj_x86 | 1 + tests/test_files.sh | 1 + tests/test_filter_flags.c | 2 ++ tests/test_filter_str.c | 2 ++ tests/test_hardware.c | 2 ++ tests/test_index.c | 2 ++ tests/test_index_hash.c | 2 ++ tests/test_lzip_decoder.c | 2 ++ tests/test_memlimit.c | 2 ++ tests/test_scripts.sh | 1 + tests/test_stream_flags.c | 2 ++ tests/test_suffix.sh | 1 + tests/test_vli.c | 2 ++ tests/tests.h | 2 ++ tests/tuktest.h | 2 ++ windows/build.bash | 3 ++- 290 files changed, 588 insertions(+), 58 deletions(-) commit 23de53421ea258cde6a3c33a038b1e9d08f771d1 Author: Lasse Collin Date: 2024-02-12 23:25:54 +0200 liblzma: Sync the AUTHORS fix about SHA-256 to lzma.h. src/liblzma/api/lzma.h | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) commit 689e0228baeb95232430e90d628379db89583d71 Author: Lasse Collin Date: 2024-02-12 17:09:10 +0200 Change most public domain parts to 0BSD. Translations and doc/xz-file-format.txt and doc/lzma-file-format.txt were not touched. COPYING.0BSD was added. .github/workflows/ci.yml | 3 - .github/workflows/windows-ci.yml | 3 - CMakeLists.txt | 3 - COPYING | 112 ++++++++++++++----------- COPYING.0BSD | 11 +++ Makefile.am | 3 - PACKAGERS | 11 +-- autogen.sh | 3 - build-aux/ci_build.sh | 3 - build-aux/manconv.sh | 3 - build-aux/version.sh | 3 - cmake/remove-ordinals.cmake | 3 - cmake/tuklib_common.cmake | 3 - cmake/tuklib_cpucores.cmake | 3 - cmake/tuklib_integer.cmake | 3 - cmake/tuklib_large_file_support.cmake | 3 - cmake/tuklib_mbstr.cmake | 3 - cmake/tuklib_physmem.cmake | 3 - cmake/tuklib_progname.cmake | 3 - configure.ac | 3 - debug/Makefile.am | 3 - debug/crc32.c | 3 - debug/full_flush.c | 3 - debug/hex2bin.c | 3 - debug/known_sizes.c | 3 - debug/memusage.c | 3 - debug/repeat.c | 3 - debug/sync_flush.c | 3 - debug/translation.bash | 3 - doc/examples/01_compress_easy.c | 3 - doc/examples/02_decompress.c | 3 - doc/examples/03_compress_custom.c | 3 - doc/examples/04_compress_easy_mt.c | 3 - doc/examples/11_file_info.c | 3 - doc/examples/Makefile | 3 - dos/Makefile | 3 - doxygen/update-doxygen | 3 - extra/7z2lzma/7z2lzma.bash | 3 - m4/tuklib_common.m4 | 3 - m4/tuklib_cpucores.m4 | 3 - m4/tuklib_integer.m4 | 3 - m4/tuklib_mbstr.m4 | 3 - m4/tuklib_physmem.m4 | 3 - m4/tuklib_progname.m4 | 3 - po4a/update-po | 3 - src/Makefile.am | 3 - src/common/common_w32res.rc | 3 - src/common/mythread.h | 3 - src/common/sysdefs.h | 3 - src/common/tuklib_common.h | 3 - src/common/tuklib_cpucores.c | 3 - src/common/tuklib_cpucores.h | 3 - src/common/tuklib_exit.c | 3 - src/common/tuklib_exit.h | 3 - src/common/tuklib_gettext.h | 3 - src/common/tuklib_integer.h | 3 - src/common/tuklib_mbstr.h | 3 - src/common/tuklib_mbstr_fw.c | 3 - src/common/tuklib_mbstr_width.c | 3 - src/common/tuklib_open_stdxxx.c | 3 - src/common/tuklib_open_stdxxx.h | 3 - src/common/tuklib_physmem.c | 3 - src/common/tuklib_physmem.h | 3 - src/common/tuklib_progname.c | 3 - src/common/tuklib_progname.h | 3 - src/liblzma/Makefile.am | 3 - src/liblzma/api/Makefile.am | 3 - src/liblzma/api/lzma.h | 13 ++- src/liblzma/api/lzma/base.h | 3 - src/liblzma/api/lzma/bcj.h | 3 - src/liblzma/api/lzma/block.h | 3 - src/liblzma/api/lzma/check.h | 3 - src/liblzma/api/lzma/container.h | 3 - src/liblzma/api/lzma/delta.h | 3 - src/liblzma/api/lzma/filter.h | 3 - src/liblzma/api/lzma/hardware.h | 3 - src/liblzma/api/lzma/index.h | 3 - src/liblzma/api/lzma/index_hash.h | 3 - src/liblzma/api/lzma/lzma12.h | 3 - src/liblzma/api/lzma/stream_flags.h | 3 - src/liblzma/api/lzma/version.h | 3 - src/liblzma/api/lzma/vli.h | 3 - src/liblzma/check/Makefile.inc | 3 - src/liblzma/check/check.c | 3 - src/liblzma/check/check.h | 3 - src/liblzma/check/crc32_arm64.h | 3 - src/liblzma/check/crc32_fast.c | 3 - src/liblzma/check/crc32_small.c | 3 - src/liblzma/check/crc32_table.c | 3 - src/liblzma/check/crc32_tablegen.c | 3 - src/liblzma/check/crc32_x86.S | 3 - src/liblzma/check/crc64_fast.c | 3 - src/liblzma/check/crc64_small.c | 3 - src/liblzma/check/crc64_table.c | 3 - src/liblzma/check/crc64_tablegen.c | 3 - src/liblzma/check/crc64_x86.S | 3 - src/liblzma/check/crc_common.h | 3 - src/liblzma/check/crc_x86_clmul.h | 3 - src/liblzma/check/sha256.c | 3 - src/liblzma/common/Makefile.inc | 3 - src/liblzma/common/alone_decoder.c | 3 - src/liblzma/common/alone_decoder.h | 3 - src/liblzma/common/alone_encoder.c | 3 - src/liblzma/common/auto_decoder.c | 3 - src/liblzma/common/block_buffer_decoder.c | 3 - src/liblzma/common/block_buffer_encoder.c | 3 - src/liblzma/common/block_buffer_encoder.h | 3 - src/liblzma/common/block_decoder.c | 3 - src/liblzma/common/block_decoder.h | 3 - src/liblzma/common/block_encoder.c | 3 - src/liblzma/common/block_encoder.h | 3 - src/liblzma/common/block_header_decoder.c | 3 - src/liblzma/common/block_header_encoder.c | 3 - src/liblzma/common/block_util.c | 3 - src/liblzma/common/common.c | 3 - src/liblzma/common/common.h | 3 - src/liblzma/common/easy_buffer_encoder.c | 3 - src/liblzma/common/easy_decoder_memusage.c | 3 - src/liblzma/common/easy_encoder.c | 3 - src/liblzma/common/easy_encoder_memusage.c | 3 - src/liblzma/common/easy_preset.c | 3 - src/liblzma/common/easy_preset.h | 3 - src/liblzma/common/file_info.c | 3 - src/liblzma/common/filter_buffer_decoder.c | 3 - src/liblzma/common/filter_buffer_encoder.c | 3 - src/liblzma/common/filter_common.c | 3 - src/liblzma/common/filter_common.h | 3 - src/liblzma/common/filter_decoder.c | 3 - src/liblzma/common/filter_decoder.h | 3 - src/liblzma/common/filter_encoder.c | 3 - src/liblzma/common/filter_encoder.h | 3 - src/liblzma/common/filter_flags_decoder.c | 3 - src/liblzma/common/filter_flags_encoder.c | 3 - src/liblzma/common/hardware_cputhreads.c | 3 - src/liblzma/common/hardware_physmem.c | 3 - src/liblzma/common/index.c | 3 - src/liblzma/common/index.h | 3 - src/liblzma/common/index_decoder.c | 3 - src/liblzma/common/index_decoder.h | 3 - src/liblzma/common/index_encoder.c | 3 - src/liblzma/common/index_encoder.h | 3 - src/liblzma/common/index_hash.c | 3 - src/liblzma/common/lzip_decoder.c | 3 - src/liblzma/common/lzip_decoder.h | 3 - src/liblzma/common/memcmplen.h | 3 - src/liblzma/common/microlzma_decoder.c | 3 - src/liblzma/common/microlzma_encoder.c | 3 - src/liblzma/common/outqueue.c | 3 - src/liblzma/common/outqueue.h | 3 - src/liblzma/common/stream_buffer_decoder.c | 3 - src/liblzma/common/stream_buffer_encoder.c | 3 - src/liblzma/common/stream_decoder.c | 3 - src/liblzma/common/stream_decoder.h | 3 - src/liblzma/common/stream_decoder_mt.c | 3 - src/liblzma/common/stream_encoder.c | 3 - src/liblzma/common/stream_encoder_mt.c | 3 - src/liblzma/common/stream_flags_common.c | 3 - src/liblzma/common/stream_flags_common.h | 3 - src/liblzma/common/stream_flags_decoder.c | 3 - src/liblzma/common/stream_flags_encoder.c | 3 - src/liblzma/common/string_conversion.c | 3 - src/liblzma/common/vli_decoder.c | 3 - src/liblzma/common/vli_encoder.c | 3 - src/liblzma/common/vli_size.c | 3 - src/liblzma/delta/Makefile.inc | 3 - src/liblzma/delta/delta_common.c | 3 - src/liblzma/delta/delta_common.h | 3 - src/liblzma/delta/delta_decoder.c | 3 - src/liblzma/delta/delta_decoder.h | 3 - src/liblzma/delta/delta_encoder.c | 3 - src/liblzma/delta/delta_encoder.h | 3 - src/liblzma/delta/delta_private.h | 3 - src/liblzma/liblzma.pc.in | 3 - src/liblzma/liblzma_w32res.rc | 3 - src/liblzma/lz/Makefile.inc | 3 - src/liblzma/lz/lz_decoder.c | 3 - src/liblzma/lz/lz_decoder.h | 3 - src/liblzma/lz/lz_encoder.c | 3 - src/liblzma/lz/lz_encoder.h | 3 - src/liblzma/lz/lz_encoder_hash.h | 3 - src/liblzma/lz/lz_encoder_mf.c | 3 - src/liblzma/lzma/Makefile.inc | 3 - src/liblzma/lzma/fastpos.h | 3 - src/liblzma/lzma/fastpos_tablegen.c | 3 - src/liblzma/lzma/lzma2_decoder.c | 3 - src/liblzma/lzma/lzma2_decoder.h | 3 - src/liblzma/lzma/lzma2_encoder.c | 3 - src/liblzma/lzma/lzma2_encoder.h | 3 - src/liblzma/lzma/lzma_common.h | 3 - src/liblzma/lzma/lzma_decoder.c | 3 - src/liblzma/lzma/lzma_decoder.h | 3 - src/liblzma/lzma/lzma_encoder.c | 3 - src/liblzma/lzma/lzma_encoder.h | 3 - src/liblzma/lzma/lzma_encoder_optimum_fast.c | 3 - src/liblzma/lzma/lzma_encoder_optimum_normal.c | 3 - src/liblzma/lzma/lzma_encoder_presets.c | 3 - src/liblzma/lzma/lzma_encoder_private.h | 3 - src/liblzma/rangecoder/Makefile.inc | 3 - src/liblzma/rangecoder/price.h | 3 - src/liblzma/rangecoder/price_tablegen.c | 3 - src/liblzma/rangecoder/range_common.h | 3 - src/liblzma/rangecoder/range_decoder.h | 3 - src/liblzma/rangecoder/range_encoder.h | 3 - src/liblzma/simple/Makefile.inc | 3 - src/liblzma/simple/arm.c | 3 - src/liblzma/simple/arm64.c | 3 - src/liblzma/simple/armthumb.c | 3 - src/liblzma/simple/ia64.c | 3 - src/liblzma/simple/powerpc.c | 3 - src/liblzma/simple/riscv.c | 3 - src/liblzma/simple/simple_coder.c | 3 - src/liblzma/simple/simple_coder.h | 3 - src/liblzma/simple/simple_decoder.c | 3 - src/liblzma/simple/simple_decoder.h | 3 - src/liblzma/simple/simple_encoder.c | 3 - src/liblzma/simple/simple_encoder.h | 3 - src/liblzma/simple/simple_private.h | 3 - src/liblzma/simple/sparc.c | 3 - src/liblzma/simple/x86.c | 3 - src/liblzma/validate_map.sh | 3 - src/lzmainfo/Makefile.am | 3 - src/lzmainfo/lzmainfo.1 | 4 +- src/lzmainfo/lzmainfo.c | 3 - src/lzmainfo/lzmainfo_w32res.rc | 3 - src/scripts/Makefile.am | 3 - src/scripts/xzless.1 | 4 +- src/xz/Makefile.am | 3 - src/xz/args.c | 3 - src/xz/args.h | 3 - src/xz/coder.c | 3 - src/xz/coder.h | 3 - src/xz/file_io.c | 3 - src/xz/file_io.h | 3 - src/xz/hardware.c | 3 - src/xz/hardware.h | 3 - src/xz/list.c | 3 - src/xz/list.h | 3 - src/xz/main.c | 3 - src/xz/main.h | 3 - src/xz/message.c | 3 - src/xz/message.h | 3 - src/xz/mytime.c | 3 - src/xz/mytime.h | 3 - src/xz/options.c | 3 - src/xz/options.h | 3 - src/xz/private.h | 3 - src/xz/signals.c | 3 - src/xz/signals.h | 3 - src/xz/suffix.c | 3 - src/xz/suffix.h | 3 - src/xz/util.c | 3 - src/xz/util.h | 3 - src/xz/xz.1 | 4 +- src/xz/xz_w32res.rc | 3 - src/xzdec/Makefile.am | 3 - src/xzdec/lzmadec_w32res.rc | 3 - src/xzdec/xzdec.1 | 4 +- src/xzdec/xzdec.c | 3 - src/xzdec/xzdec_w32res.rc | 3 - tests/Makefile.am | 3 - tests/bcj_test.c | 3 - tests/code_coverage.sh | 3 - tests/create_compress_files.c | 3 - tests/files/README | 3 +- tests/ossfuzz/fuzz_common.h | 3 - tests/ossfuzz/fuzz_decode_alone.c | 3 - tests/ossfuzz/fuzz_decode_stream.c | 3 - tests/ossfuzz/fuzz_encode_stream.c | 3 - tests/test_bcj_exact_size.c | 3 - tests/test_block_header.c | 3 - tests/test_check.c | 3 - tests/test_compress.sh | 3 - tests/test_files.sh | 3 - tests/test_filter_flags.c | 3 - tests/test_filter_str.c | 3 - tests/test_hardware.c | 3 - tests/test_index.c | 3 - tests/test_index_hash.c | 3 - tests/test_lzip_decoder.c | 3 - tests/test_memlimit.c | 3 - tests/test_scripts.sh | 3 - tests/test_stream_flags.c | 3 - tests/test_suffix.sh | 3 - tests/test_vli.c | 3 - tests/tests.h | 3 - tests/tuktest.h | 3 - windows/README-Windows.txt | 11 +-- windows/build.bash | 3 - 288 files changed, 100 insertions(+), 911 deletions(-) commit 76946dc4336c831fe2cc26696a035d807dd3cf13 Author: Lasse Collin Date: 2024-02-09 17:20:31 +0200 Fix SHA-256 authors. The initial commit 5d018dc03549c1ee4958364712fb0c94e1bf2741 in 2007 had a comment in sha256.c that the code is based on Crypto++ Library 5.5.1. In 2009 the Authors list in sha256.c and the AUTHORS file was updated with information that the code had come from Crypto++ but via 7-Zip. I know I had viewed 7-Zip's SHA-256 code but back then the C code has been identical enough with Crypto++, so I don't why I thought the author info would need that extra step via 7-Zip for this single file. Another error is that I had mixed sha.* and shacal2.* files when checking for author info in Crypto++. The shacal2.* files aren't related to liblzma's sha256.c and thus Kevin Springle's code in Crypto++ isn't either. AUTHORS | 6 ++---- src/liblzma/check/sha256.c | 14 ++++---------- 2 files changed, 6 insertions(+), 14 deletions(-) commit 21d9cbae9eecca28ce373d3d9464defd2cf5d851 Author: Lasse Collin Date: 2024-02-09 17:20:31 +0200 Remove macosx/build.sh. It was last updated in 2013. Makefile.am | 1 - macosx/build.sh | 113 -------------------------------------------------------- 2 files changed, 114 deletions(-) commit eac2c3c67f9113a225fb6667df862edd30366931 Author: Lasse Collin Date: 2024-02-09 17:20:31 +0200 Doc: Remove doc/examples_old. It was good to keep these around in parallel with the newer examples but I think it's OK to remove the old ones at this point. Makefile.am | 5 -- doc/examples_old/xz_pipe_comp.c | 127 -------------------------------------- doc/examples_old/xz_pipe_decomp.c | 123 ------------------------------------ 3 files changed, 255 deletions(-) commit 89ea1a22f4ed3685b053b7260bc5acf6c75d1664 Author: Jia Tan Date: 2024-02-13 22:38:58 +0800 Tests: Add RISC-V filter support in a few places. tests/test_filter_flags.c | 6 ++++++ tests/test_filter_str.c | 6 ++++++ 2 files changed, 12 insertions(+) commit 45663443eb2b377e6171529380fee312f1adcdf4 Author: Jia Tan Date: 2024-02-13 22:37:07 +0800 liblzma: Fix build error if only RISC-V BCJ filter is enabled. If any other BCJ filter was enabled for encoding or decoding, then this was not a problem. src/liblzma/common/string_conversion.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) commit 2f15597d677bc35743c777d4cf3bfa698b478681 Author: Jia Tan Date: 2024-02-13 22:56:24 +0800 Translations: Update the Korean translation. po/ko.po | 526 ++++++++++++++++++++++++++++++++++----------------------------- 1 file changed, 284 insertions(+), 242 deletions(-) commit df873143ad1615c6d6aaa1bf8808b1676091dfe3 Author: Jia Tan Date: 2024-02-13 01:55:53 +0800 Translations: Update the Korean man page translations. po4a/ko.po | 1375 ++++++++++++++++++++++++++++++++++-------------------------- 1 file changed, 770 insertions(+), 605 deletions(-) commit b3f415eddb150341865a1af47959c3baba076b33 Author: Jia Tan Date: 2024-02-13 01:53:33 +0800 Translations: Update the Chinese (simplified) translation. po/zh_CN.po | 424 ++++++++++++++++++++++++++++++++++++++---------------------- 1 file changed, 268 insertions(+), 156 deletions(-) commit 9860d418d296eb3c721e5384fb367c0499b579c8 Author: Lasse Collin Date: 2024-02-09 23:21:01 +0200 xzless: Use ||- in LESSOPEN with with "less" 451 and newer. src/scripts/xzless.in | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) commit fd0692b0525e6c26b496492be9e2c865cab734f8 Author: Lasse Collin Date: 2024-02-09 23:00:05 +0200 xzless: Use --show-preproc-errors with "less" 632 and newer. This makes "less" show a warning if a decompression error occurred. src/scripts/xzless.in | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) commit adb073da76a920b5a81e6b32254f4ddb054dc57a Author: Jia Tan Date: 2024-02-09 23:59:54 +0800 liblzma: Fix typo discovered by codespell. src/liblzma/check/crc32_arm64.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 55d9fc883d221cbace951a370f1fb144698f8c2e Author: Jia Tan Date: 2024-02-09 20:01:06 +0800 Translations: Update the Swedish translation. po/sv.po | 420 ++++++++++++++++++++++++++++++++++++++------------------------- 1 file changed, 254 insertions(+), 166 deletions(-) commit 55ba4a1ea321499c805eedfa811ffde690bae311 Author: Jia Tan Date: 2024-02-08 20:09:04 +0800 Translations: Update the Spanish translation. po/es.po | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) commit 7f2293cd804a89d3c3b2d3ed573560ca9e1520ae Author: Jia Tan Date: 2024-02-07 21:34:35 +0800 Translations: Update the Spanish translation. po/es.po | 419 ++++++++++++++++++++++++++++++++++++++------------------------- 1 file changed, 253 insertions(+), 166 deletions(-) commit f4af2036bc625739d6d33d9e1fede583a25c3828 Author: Jia Tan Date: 2024-02-07 21:28:32 +0800 Translations: Update the Polish translation. po/pl.po | 411 ++++++++++++++++++++++++++++++++++++++------------------------- 1 file changed, 249 insertions(+), 162 deletions(-) commit e5e93bb816043c559cddf03a3b7ba13bec353ee4 Author: Jia Tan Date: 2024-02-07 19:40:12 +0800 Translations: Update the German translation. po/de.po | 396 ++++++++++++++++++++++++++++++++++++++------------------------- 1 file changed, 242 insertions(+), 154 deletions(-) commit 28f18ff8e26902762fb007c13be235b4ac1ac071 Author: Jia Tan Date: 2024-02-07 19:27:25 +0800 Translations: Update the German man page translations. po4a/de.po | 1353 +++++++++++++++++++++++++++++++++--------------------------- 1 file changed, 752 insertions(+), 601 deletions(-) commit cabfbc7947da05aa5dfe39bec9759e076f940e3c Author: Jia Tan Date: 2024-02-06 23:44:06 +0800 Translations: Update the Romanian translation. po/ro.po | 416 ++++++++++++++++++++++++++++++++++++++------------------------- 1 file changed, 252 insertions(+), 164 deletions(-) commit bf20c94f5d748cea2147779f4fa7e2fd2eb8555e Author: Jia Tan Date: 2024-02-06 23:45:02 +0800 Translations: Update the Romanian man page translations. po4a/ro.po | 1759 +++++++++++++++++++++++++++++++++--------------------------- 1 file changed, 966 insertions(+), 793 deletions(-) commit 7c25ec9feb0241e4affb7432681cc4f5696f3a96 Author: Jia Tan Date: 2024-02-07 20:56:57 +0800 Translations: Update the Ukrainian translation. po/uk.po | 397 ++++++++++++++++++++++++++++++++++++++------------------------- 1 file changed, 242 insertions(+), 155 deletions(-) commit b3523250e9eef10b017473754c1e1c9e31f10374 Author: Jia Tan Date: 2024-02-06 23:30:03 +0800 Translations: Update the Ukrainian man page translations. po4a/uk.po | 1363 ++++++++++++++++++++++++++++++++++-------------------------- 1 file changed, 764 insertions(+), 599 deletions(-) commit a5c177f514f4c90e0d2f6045636fca6c2e80a20d Author: Jia Tan Date: 2024-02-02 01:39:28 +0800 Update AUTHORS. AUTHORS | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) commit 7f68a68c19d0ae57bd0e802be0ea8f974e41299f Author: Jia Tan Date: 2024-02-02 01:38:51 +0800 liblzma: Update Authors list in crc32_arm64.h. src/liblzma/check/crc32_arm64.h | 1 + 1 file changed, 1 insertion(+) commit 97f9ba50b84e67b3dcb5b17dd5d3e1d14f9ad1d0 Author: Jia Tan Date: 2024-02-01 16:07:03 +0800 liblzma: Check HAVE_USABLE_CLMUL before omitting CRC32 table. This was split from the prior commit so it could be easily applied to the 5.4 branch. Closes: https://github.com/tukaani-project/xz/pull/77 src/liblzma/check/crc32_table.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit ca9015f4dead2c97b48f5a6933631b0a448b65b9 Author: Jia Tan Date: 2024-02-01 16:06:29 +0800 liblzma: Check HAVE_USABLE_CLMUL before omitting CRC64 table. If liblzma is configured with --disable-clmul-crc CFLAGS="-msse4.1 -mpclmul", then it will fail to compile because the generic version must be used but the CRC tables were not included. src/liblzma/check/crc64_table.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 2f1552a91c825e87013925e1a67a0930e7aef592 Author: Jia Tan Date: 2024-01-23 18:02:13 +0800 liblzma: Only use ifunc in crcXX_fast.c if its needed. The code was using HAVE_FUNC_ATTRIBUTE_IFUNC instead of CRC_USE_IFUNC. With ARM64, ifunc is incompatible because it requires non-inline function calls for runtime detection. src/liblzma/check/crc32_fast.c | 6 +++--- src/liblzma/check/crc64_fast.c | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) commit 30a25f3742287697bc57a1bef86c19ecf5129322 Author: Jia Tan Date: 2024-01-22 22:08:45 +0800 Docs: Add --disable-arm64-crc32 description to INSTALL. INSTALL | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) commit 1940f0ec28f08c0ac72c1413d9706fb82eabe6ad Author: Jia Tan Date: 2024-01-22 21:36:09 +0800 liblzma: Omit CRC tables when not needed with ARM64 optimizations. This is similar to the existing x86-64 CLMUL conditions to omit the tables. They were slightly refactored to improve readability. src/liblzma/check/crc32_table.c | 18 +++++++++++++++--- src/liblzma/check/crc64_table.c | 7 ++++++- src/liblzma/check/crc_common.h | 5 ++++- 3 files changed, 25 insertions(+), 5 deletions(-) commit 761f5b69a4c778c8bcb09279b845b07c28790575 Author: Jia Tan Date: 2024-01-22 20:54:56 +0800 liblzma: Rename crc32_aarch64.h to crc32_arm64.h. Even though the proper name for the architecture is aarch64, this project uses ARM64 throughout. So the rename is for consistency. Additionally, crc32_arm64.h was slightly refactored for the following changes: * Added MSVC, FreeBSD, and macOS support in is_arch_extension_supported(). * crc32_arch_optimized() now checks the size when aligning the buffer. * crc32_arch_optimized() loop conditions were slightly modified to avoid both decrementing the size and incrementing the buffer pointer. * Use the intrinsic wrappers defined in because GCC and Clang name them differently. * Minor spacing and comment changes. CMakeLists.txt | 2 +- src/liblzma/check/Makefile.inc | 2 +- src/liblzma/check/crc32_aarch64.h | 109 ---------------------------------- src/liblzma/check/crc32_arm64.h | 119 ++++++++++++++++++++++++++++++++++++++ src/liblzma/check/crc32_fast.c | 3 +- src/liblzma/check/crc64_fast.c | 3 - 6 files changed, 122 insertions(+), 116 deletions(-) commit 455a08609caa3223066a717fb01bfa42c5dba47d Author: Jia Tan Date: 2024-01-22 20:49:30 +0800 liblzma: Refactor crc_common.h. The CRC_GENERIC is now split into CRC32_GENERIC and CRC64_GENERIC, since the ARM64 optimizations will be different between CRC32 and CRC64. For the same reason, CRC_ARCH_OPTIMIZED is split into CRC32_ARCH_OPTIMIZED and CRC64_ARCH_OPTIMIZED. ifunc will only be used with x86-64 CLMUL because the runtime detection methods needed with ARM64 are not compatible with ifunc. src/liblzma/check/crc32_fast.c | 8 +-- src/liblzma/check/crc64_fast.c | 8 +-- src/liblzma/check/crc_common.h | 108 ++++++++++++++++++++++++++++------------- 3 files changed, 82 insertions(+), 42 deletions(-) commit 61908e816049af7a9f43ea804a57ee8570e2e644 Author: Jia Tan Date: 2024-01-22 00:42:28 +0800 CMake: Add support for ARM64 CRC32 instruction detection. CMakeLists.txt | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) commit c5f6d79cc9515a7f22d7ea4860c6cc394b295732 Author: Jia Tan Date: 2024-01-22 00:36:47 +0800 Build: Add support for ARM64 CRC32 instruction detection. This adds --enable-arm64-crc32/--disable-arm64-crc32 (enabled by default) for using the ARM64 CRC32 instruction. This can be disabled if one knows the binary will never need to run on an ARM64 machine with this instruction extension. configure.ac | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) commit 849d0f282a6a890c5cf5a0e0f02980b12d9ebb0f Author: Chenxi Mao Date: 2024-01-09 17:23:11 +0800 Speed up CRC32 calculation on ARM64 The CRC32 instructions in ARM64 can calculate the CRC32 result for 8 bytes in a single operation, making the use of ARM64 instructions much faster compared to the general CRC32 algorithm. Optimized CRC32 will be enabled if ARM64 has CRC extension running on Linux. Signed-off-by: Chenxi Mao CMakeLists.txt | 1 + src/liblzma/check/Makefile.inc | 3 +- src/liblzma/check/crc32_aarch64.h | 109 ++++++++++++++++++++++++++++++++++++++ src/liblzma/check/crc32_fast.c | 5 +- src/liblzma/check/crc64_fast.c | 5 +- src/liblzma/check/crc_common.h | 16 +++--- 6 files changed, 130 insertions(+), 9 deletions(-) commit b43c3e48bf6097095eef36d44cdbec811074940a Author: Jia Tan Date: 2024-01-26 19:05:51 +0800 Bump version number for 5.5.1alpha. src/liblzma/api/lzma/version.h | 2 +- src/liblzma/liblzma_generic.map | 2 +- src/liblzma/liblzma_linux.map | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) commit c7a7ae1500ea90bd3c2d54533e4f433933eb598f Author: Jia Tan Date: 2024-01-26 19:00:52 +0800 Add NEWS for 5.5.1alpha NEWS | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 80 insertions(+) commit 0ef8192e8d5af4e6200d5d4aee22d1f177f7a2df Author: Jia Tan Date: 2024-01-26 18:54:24 +0800 Add NEWS for 5.4.6. NEWS | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) commit 93de7e751d17731315a899264f2a7239d7d2d316 Author: Lasse Collin Date: 2024-01-24 20:00:57 +0200 Move doc/logo/xz-logo.png to "doc" and Doxygen footer to "doxygen". The footer isn't a complete HTML file so having it in the doxygen directory is a tiny bit clearer. Makefile.am | 2 +- doc/{logo => }/xz-logo.png | Bin doxygen/Doxyfile | 4 ++-- doc/logo/copyright.html => doxygen/footer.html | 0 4 files changed, 3 insertions(+), 3 deletions(-) commit 00fa01698df51c58ae2acf8c7fa4e1fb159f75a9 Author: Jia Tan Date: 2024-01-09 17:05:01 +0800 README: Add COPYING.CC-BY-SA-4.0 entry to section 1.1. The Overall documentation section (1.1) table spacing had to be adjusted since the filename was very long. README | 38 ++++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 18 deletions(-) commit e280470040b27c5e58d78b25b9e2bb71fc6c3882 Author: Jia Tan Date: 2024-01-09 16:56:16 +0800 Build: Add the logo and license to the release. Makefile.am | 2 ++ 1 file changed, 2 insertions(+) commit b1ee6cf259bb49ce91abe9f622294524e37edf4c Author: Jia Tan Date: 2024-01-09 16:44:42 +0800 COPYING: Add the license for the XZ logo. COPYING | 5 + COPYING.CC-BY-SA-4.0 | 427 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 432 insertions(+) commit 31293ae7074802cc7286089a89c7b552d930c97f Author: Jia Tan Date: 2024-01-09 16:40:56 +0800 Doxygen: Added the XZ logo and copyright information. The PROJECT_LOGO field is now used to include the XZ logo. The footer of each page now lists the copyright information instead of the default footer. The license is also copied to statisfy the copyright and so the link in the documentation can be local. doc/logo/copyright.html | 11 +++++++++++ doc/logo/xz-logo.png | Bin 0 -> 6771 bytes doxygen/Doxyfile | 6 +++--- 3 files changed, 14 insertions(+), 3 deletions(-) commit 6daa4d0ea46a8441f21f609149f3633158bf4704 Author: Lasse Collin Date: 2024-01-23 18:29:28 +0200 xz: Use threaded mode by defaut (as if --threads=0 was used). This hopefully does more good than bad: + It's faster by default. + Only the threaded compressor creates files that can be decompressed in threaded mode. - Compression ratio is worse, usually not too much though. When it matters, -T1 must be used. - Memory usage increases. - Scripts that assume single-threaded mode but don't use -T1 will possibly use too much resources, for example, if they run multiple xz processes in parallel to compress multiple files. - Output from single-threaded and multi-threaded compressors differ but such changes could happen for other reasons too (they just haven't happened since 5.0.0). src/xz/hardware.c | 6 +++++- src/xz/message.c | 4 ++-- src/xz/xz.1 | 9 +++++++++ 3 files changed, 16 insertions(+), 3 deletions(-) commit a2dd2dc8e5307a7280bb99868bc478560facba2c Author: Jia Tan Date: 2024-01-23 23:52:49 +0800 CI: Use RISC-V filter when building with BCJ support. build-aux/ci_build.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 3060e1070b2421b26c0e17794c1307ec5622f11d Author: Jia Tan Date: 2024-01-23 23:52:14 +0800 Tests: Use smaller dictionary size in RISC-V test files. tests/files/good-1-riscv-lzma2-1.xz | Bin 7512 -> 7512 bytes tests/files/good-1-riscv-lzma2-2.xz | Bin 7516 -> 7512 bytes 2 files changed, 0 insertions(+), 0 deletions(-) commit 44ff2fa5c94dc345c4dd69195a19fc5238df60b3 Author: Jia Tan Date: 2024-01-23 23:50:57 +0800 Tests: Skip RISC-V test files if decoder was not built. tests/test_files.sh | 5 +++++ 1 file changed, 5 insertions(+) commit 6133a3f30049d3beaf7d22535b1e5d38e109be4e Author: Lasse Collin Date: 2024-01-23 16:11:54 +0200 xz: Man page: Add more examples of LZMA2 options with BCJ filters. src/xz/xz.1 | 38 +++++++++++++++++++++++++++++++------- 1 file changed, 31 insertions(+), 7 deletions(-) commit 50255feeaabcc7e7db22b858a6bd64a9b5b4f16d Author: Lasse Collin Date: 2024-01-23 00:09:48 +0200 liblzma: RISC-V filter: Use byte-by-byte access. Not all RISC-V processors support fast unaligned access so it's better to read only one byte in the main loop. This can be faster even on x86-64 when compared to reading 32 bits at a time as half the time the address is only 16-bit aligned. The downside is larger code size on archs that do support fast unaligned access. src/liblzma/simple/riscv.c | 114 +++++++++++++++++++++++++++++++++------------ 1 file changed, 84 insertions(+), 30 deletions(-) commit db5eb5f563e8baa8d912ecf576f53391ff861596 Author: Jia Tan Date: 2024-01-22 23:33:39 +0800 xz: Update xz -lvv for RISC-V filter. Version 5.6.0 will be shown, even though upcoming alphas and betas will be able to support this filter. 5.6.0 looks nicer in the output and people shouldn't be encouraged to use an unstable version in production in any way. src/xz/list.c | 10 ++++++++++ 1 file changed, 10 insertions(+) commit e2870db5be1503e6a489fc3d47daf950d6f62723 Author: Jia Tan Date: 2024-01-22 23:33:39 +0800 Tests: Add two RISC-V Filter test files. These test files achieve 100% code coverage in src/liblzma/simple/riscv.c. They contain all of the instructions that should be filtered and a few cases that should not. tests/files/README | 8 ++++++++ tests/files/good-1-riscv-lzma2-1.xz | Bin 0 -> 7512 bytes tests/files/good-1-riscv-lzma2-2.xz | Bin 0 -> 7516 bytes 3 files changed, 8 insertions(+) commit b26a89869315ece2f6d9d10d32d45f672550f245 Author: Jia Tan Date: 2024-01-22 23:33:39 +0800 xz: Update message in --long-help for RISC-V Filter. src/xz/message.c | 1 + 1 file changed, 1 insertion(+) commit 283f778908873eca61388029fc418fa800c9d7d7 Author: Jia Tan Date: 2024-01-22 23:33:39 +0800 xz: Update the man page for the RISC-V Filter. A special note was added to suggest using four-byte alignment when the compressed instruction extension is not present in a RISC-V binary. src/xz/xz.1 | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) commit ac3691ccca051d67f60b4a3b05b88e511d0b1b28 Author: Jia Tan Date: 2024-01-22 23:33:39 +0800 Tests: Add RISC-V Filter test in test_compress.sh. tests/test_compress.sh | 1 + 1 file changed, 1 insertion(+) commit 2959dbc7358efcf421ce51bc9cd7eae8fdd8fec4 Author: Jia Tan Date: 2024-01-22 23:33:39 +0800 liblzma: Update string_conversion.c to support RISC-V Filter. src/liblzma/common/string_conversion.c | 5 +++++ 1 file changed, 5 insertions(+) commit 34372a5adbe5a7f6bf29498410ba3a463a720966 Author: Jia Tan Date: 2024-01-22 23:33:39 +0800 CMake: Support RISC-V BCJ Filter for encoding and decoding. CMakeLists.txt | 1 + 1 file changed, 1 insertion(+) commit 440a2eccb082dc13400c09e22308a58fef85146c Author: Jia Tan Date: 2024-01-22 23:33:39 +0800 liblzma: Add RISC-V BCJ filter. The new Filter ID is 0x0B. Thanks to Chien Wong for the initial version of the Filter, the xz CLI updates, and the Autotools build system modifications. Thanks to Igor Pavlov for his many contributions to the design of the filter. configure.ac | 4 +- src/liblzma/api/lzma/bcj.h | 5 + src/liblzma/common/filter_common.c | 9 + src/liblzma/common/filter_decoder.c | 8 + src/liblzma/common/filter_encoder.c | 10 + src/liblzma/simple/Makefile.inc | 4 + src/liblzma/simple/riscv.c | 688 ++++++++++++++++++++++++++++++++++++ src/liblzma/simple/simple_coder.h | 9 + src/xz/args.c | 7 + 9 files changed, 742 insertions(+), 2 deletions(-) commit 5540f4329bbdb4deb4850d4af48b18ad074bba19 Author: Jia Tan Date: 2024-01-19 23:08:14 +0800 Docs: Update .xz file format specification to 1.2.0. The new RISC-V filter was added to the specification, in addition to updating the specification URL. doc/xz-file-format.txt | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-) commit 22d86192f8cf00902a1f90ee2a83ca600794459b Author: Jia Tan Date: 2024-01-19 23:08:14 +0800 xz: Update website URLs in the man pages. src/xz/xz.1 | 6 +++--- src/xzdec/xzdec.1 | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) commit 6b63c4c6139fa1bb21b570521d3d2b4a608bc34d Author: Jia Tan Date: 2024-01-19 23:08:14 +0800 liblzma: Update website URL. dos/config.h | 2 +- src/liblzma/api/lzma.h | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) commit fce4758018f3a3589236f3fe7999fd9dd08c77e9 Author: Jia Tan Date: 2024-01-19 23:08:14 +0800 Docs: Update website URLs. .github/SECURITY.md | 2 +- COPYING | 3 ++- README | 4 ++-- doc/faq.txt | 2 +- doc/lzma-file-format.txt | 18 +++++++++--------- windows/README-Windows.txt | 3 ++- 6 files changed, 17 insertions(+), 15 deletions(-) commit c26812c5b2c8a2a47f43214afe6b0b840c73e4f5 Author: Jia Tan Date: 2024-01-19 23:08:14 +0800 Build: Update website URL. CMakeLists.txt | 2 +- configure.ac | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) commit fbb3ce541ef79cad1710e88a27a5babb5f6f8e5b Author: Lasse Collin Date: 2024-01-11 15:01:50 +0200 liblzma: CRC: Add a comment to crc_x86_clmul.h about BUILDING_ macros. src/liblzma/check/crc_x86_clmul.h | 6 ++++++ 1 file changed, 6 insertions(+) commit 4f518c1b6b7b7ce5dcefea81acd44d7a086a8882 Author: Lasse Collin Date: 2024-01-11 15:22:36 +0200 liblzma: CRC: Remove crc_always_inline, use lzma_always_inline instead. Now crc_simd_body() in crc_x86_clmul.h is only called once in a translation unit, we no longer need to be so cautious about ensuring the always-inline behavior. src/liblzma/check/crc_common.h | 20 -------------------- src/liblzma/check/crc_x86_clmul.h | 2 +- 2 files changed, 1 insertion(+), 21 deletions(-) commit 35c03ec6bf66f1b159964c9721a2dce0e2859b20 Author: Lasse Collin Date: 2024-01-11 14:39:46 +0200 liblzma: CRC: Update CLMUL comments to more generic wording. src/liblzma/check/crc32_fast.c | 16 ++++++++-------- src/liblzma/check/crc64_fast.c | 10 +++++----- 2 files changed, 13 insertions(+), 13 deletions(-) commit 66f080e8016129576536482ac377e2ecac7a2b90 Author: Lasse Collin Date: 2024-01-10 18:23:31 +0200 liblzma: Rename arch-specific CRC functions and macros. CRC_CLMUL was split to CRC_ARCH_OPTIMIZED and CRC_X86_CLMUL. CRC_ARCH_OPTIMIZED is defined when an arch-optimized version is used. Currently the x86 CLMUL implementations are the only arch-optimized versions, and these also use the CRC_x86_CLMUL macro to tell when crc_x86_clmul.h needs to be included. is_clmul_supported() was renamed to is_arch_extension_supported(). crc32_clmul() and crc64_clmul() were renamed to crc32_arch_optimized() and crc64_arch_optimized(). This way the names make sense with arch-specific non-CLMUL implementations as well. src/liblzma/check/crc32_fast.c | 13 +++++++------ src/liblzma/check/crc64_fast.c | 13 +++++++------ src/liblzma/check/crc_common.h | 9 ++++++--- src/liblzma/check/crc_x86_clmul.h | 21 +++++++++++---------- 4 files changed, 31 insertions(+), 25 deletions(-) commit 3dbed75b0b9c7087c76fe687acb5cf582cd57b99 Author: Lasse Collin Date: 2024-01-10 18:19:21 +0200 liblzma: Fix a comment in crc_common.h. src/liblzma/check/crc_common.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) commit 419f55f9dfc2df8792902b8953d50690121afeea Author: Lasse Collin Date: 2023-10-20 23:35:10 +0300 liblzma: Avoid extern lzma_crc32_clmul() and lzma_crc64_clmul(). A CLMUL-only build will have the crcxx_clmul() inlined into lzma_crcxx(). Previously a jump to the extern lzma_crcxx_clmul() was needed. Notes about shared liblzma on ELF platforms: - On platforms that support ifunc and -fvisibility=hidden, this was silly because CLMUL-only build would have that single extra jump instruction of extra overhead. - On platforms that support neither -fvisibility=hidden nor linker version script (liblzma*.map), jumping to lzma_crcxx_clmul() would go via PLT so a few more instructions of overhead (still not a big issue but silly nevertheless). There was a downside with static liblzma too: if an application only needs lzma_crc64(), static linking would make the linker include the CLMUL code for both CRC32 and CRC64 from crc_x86_clmul.o even though the CRC32 code wouldn't be needed, thus increasing code size of the executable (assuming that -ffunction-sections isn't used). Also, now compilers are likely to inline crc_simd_body() even if they don't support the always_inline attribute (or MSVC's __forceinline). Quite possibly all compilers that build the code do support such an attribute. But now it likely isn't a problem even if the attribute wasn't supported. Now all x86-specific stuff is in crc_x86_clmul.h. If other archs The other archs can then have their own headers with their own is_clmul_supported() and crcxx_clmul(). Another bonus is that the build system doesn't need to care if crc_clmul.c is needed. is_clmul_supported() stays as inline function as it's not needed when doing a CLMUL-only build (avoids a warning about unused function). CMakeLists.txt | 7 +- configure.ac | 1 - src/liblzma/check/Makefile.inc | 6 +- src/liblzma/check/crc32_fast.c | 9 ++- src/liblzma/check/crc64_fast.c | 9 ++- src/liblzma/check/crc_common.h | 64 ---------------- src/liblzma/check/{crc_clmul.c => crc_x86_clmul.h} | 86 ++++++++++++++++++---- 7 files changed, 91 insertions(+), 91 deletions(-) commit e3833e297dfb5021a197bda34ba2a795e30aaf8a Author: Lasse Collin Date: 2023-10-21 00:06:52 +0300 liblzma: crc_clmul.c: Add crc_attr_target macro. This reduces the number of the complex #if directives. src/liblzma/check/crc_clmul.c | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-) commit d164ac0e62904126f7920c25f9a2875c8cd28b97 Author: Lasse Collin Date: 2023-10-20 22:49:48 +0300 liblzma: Simplify existing cases with lzma_attr_no_sanitize_address. src/liblzma/check/crc_clmul.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-) commit 9523c1300d22fa715765c181cf991d14d6112fb1 Author: Lasse Collin Date: 2023-10-20 21:53:35 +0300 liblzma: #define crc_attr_no_sanitize_address in crc_common.h. src/liblzma/check/crc_common.h | 10 ++++++++++ 1 file changed, 10 insertions(+) commit 93d144f0930821590524247bd174afd38003d7f0 Author: Lasse Collin Date: 2023-10-20 23:25:14 +0300 liblzma: CRC: Add empty lines. And remove one too. src/liblzma/check/crc32_fast.c | 2 ++ src/liblzma/check/crc64_fast.c | 3 +++ src/liblzma/check/crc_clmul.c | 1 - 3 files changed, 5 insertions(+), 1 deletion(-) commit 0c7e854ffd27f1cec2e9b0e61601d6f90bfa10ae Author: Lasse Collin Date: 2023-10-20 23:19:33 +0300 liblzma: crc_clmul.c: Tidy up the location of MSVC pragma. It makes no difference in practice. src/liblzma/check/crc_clmul.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 15cf3f04f270d707a5c91cc0208b23b6db42b774 Author: Lasse Collin Date: 2023-12-20 21:16:24 +0200 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit cd64dd70d5665b6048829c45772d08606f44672e Author: Lasse Collin Date: 2023-12-20 21:15:16 +0200 liblzma: Use 8-byte method in memcmplen.h on ARM64. It requires fast unaligned access to 64-bit integers and a fast instruction to count leading zeros in a 64-bit integer (__builtin_ctzll()). This perhaps should be enabled on some other archs too. Thanks to Chenxi Mao for the original patch: https://github.com/tukaani-project/xz/pull/75 (the first commit) According to the numbers there, this may improve encoding speed by about 3-5 %. This enables the 8-byte method on MSVC ARM64 too which should work but wasn't tested. src/liblzma/common/memcmplen.h | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) commit 12c90c00f05e19da3c0c91d8cd8e0d0d45965606 Author: Lasse Collin Date: 2023-12-20 21:01:06 +0200 liblzma: Check also for __clang__ in memcmplen.h. This change hopefully makes no practical difference as Clang likely was detected via __GNUC__ or _MSC_VER already. src/liblzma/common/memcmplen.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) commit 133c5851eb917c6d99d0b623c1689c8518e65f38 Author: Jia Tan Date: 2023-12-21 21:39:08 +0800 Translations: Update the French translation. po/fr.po | 632 +++++++++++++++++++++++++++++++++++++-------------------------- 1 file changed, 370 insertions(+), 262 deletions(-) commit 710cbc186cad0ac601c38bd6bf31167648a5581e Author: Jia Tan Date: 2023-12-21 16:39:53 +0800 xz: Add a comment to Capsicum sandbox setup. This comment is repeated in xzdec.c to help remind us why all the capabilities are removed from stdin in certain situations. src/xz/file_io.c | 1 + 1 file changed, 1 insertion(+) commit 4e1c695676bafbaecc9fb307f6ee94138ae72c12 Author: Jia Tan Date: 2023-12-20 22:19:19 +0800 Docs: Update --enable-sandbox option in INSTALL. xzdec now also uses the sandbox when its configured. INSTALL | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) commit ebddf20214143a8e002ab897e95e880bb4c5ac44 Author: Jia Tan Date: 2023-12-20 22:39:13 +0800 CMake: Move sandbox detection outside of xz section. The sandbox is now enabled for xzdec as well, so it no longer belongs in just the xz section. xz and xzdec are always built, except for older MSVC versions, so there isn't a need to conditionally show the sandbox configuration. CMake will do a little unecessary work on older MSVC versions that can't build xz or xzdec, but this is a very small downside. CMakeLists.txt | 178 +++++++++++++++++++++++++++++++-------------------------- 1 file changed, 98 insertions(+), 80 deletions(-) commit 5feb09266fd2928ec0a4dcb98c1dc7f053111316 Author: Jia Tan Date: 2023-12-20 22:43:44 +0800 Build: Allow sandbox to be configured for just xzdec. If xz is disabled, then xzdec can still use the sandbox. configure.ac | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) commit d74fb5f060b76db709b50f5fd37490394e52f975 Author: Jia Tan Date: 2023-12-19 21:18:28 +0800 xzdec: Add sandbox support for Pledge, Capsicum, and Landlock. A very strict sandbox is used when the last file is decompressed. The likely most common use case of xzdec is to decompress a single file. The Pledge sandbox is applied to the entire process with slightly more relaxed promises, until the last file is processed. Thanks to Christian Weisgerber for the initial patch adding Pledge sandboxing. src/xzdec/xzdec.c | 146 +++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 139 insertions(+), 7 deletions(-) commit b34b6a9912d6165e34ba0db151b7f9941d2e06d5 Author: Jia Tan Date: 2023-12-20 21:31:34 +0800 liblzma: Initialize lzma_lz_encoder pointers with NULL. This fixes the recent change to lzma_lz_encoder that used memzero instead of the NULL constant. On some compilers the NULL constant (always 0) may not equal the NULL pointer (this only needs to guarentee to not point to valid memory address). Later code compares the pointers to the NULL pointer so we must initialize them with the NULL pointer instead of 0 to guarentee code correctness. src/liblzma/lz/lz_encoder.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) commit 183a62f0b540ff4d23cc19b2b6bc2525f0bd64df Author: Jia Tan Date: 2023-12-16 20:51:38 +0800 liblzma: Set all values in lzma_lz_encoder to NULL after allocation. The first member of lzma_lz_encoder doesn't necessarily need to be set to NULL since it will always be set before anything tries to use it. However the function pointer members must be set to NULL since other functions rely on this NULL value to determine if this behavior is supported or not. This fixes a somewhat serious bug, where the options_update() and set_out_limit() function pointers are not set to NULL. This seems to have been forgotten since these function pointers were added many years after the original two (code() and end()). The problem is that by not setting this to NULL we are relying on the memory allocation to zero things out if lzma_filters_update() is called on a LZMA1 encoder. The function pointer for set_out_limit() is less serious because there is not an API function that could call this in an incorrect way. set_out_limit() is only called by the MicroLZMA encoder, which must use LZMA1 where set_out_limit() is always set. Its currently not possible to call set_out_limit() on an LZMA2 encoder at this time. So calling lzma_filters_update() on an LZMA1 encoder had undefined behavior since its possible that memory could be manipulated so the options_update member pointed to a different instruction sequence. This is unlikely to be a bug in an existing application since it relies on calling lzma_filters_update() on an LZMA1 encoder in the first place. For instance, it does not affect xz because lzma_filters_update() can only be used when encoding to the .xz format. This is fixed by using memzero() to set all members of lzma_lz_encoder to NULL after it is allocated. This ensures this mistake will not occur here in the future if any additional function pointers are added. src/liblzma/lz/lz_encoder.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) commit 1a1bb381db7a20cf86cb45a350e5cca35224d017 Author: Jia Tan Date: 2023-12-16 20:30:55 +0800 liblzma: Tweak a comment. src/liblzma/lz/lz_encoder.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 55810780e04f759747b02683fb8020b8cd022a85 Author: Jia Tan Date: 2023-12-16 20:28:21 +0800 liblzma: Make parameter names in function definition match declaration. lzma_raw_encoder() and lzma_raw_encoder_init() used "options" as the parameter name instead of "filters" (used by the declaration). "filters" is more clear since the parameter represents the list of filters passed to the raw encoder, each of which contains filter options. src/liblzma/common/filter_encoder.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) commit 5dad6f628af742bab826819760deb677597445f7 Author: Jia Tan Date: 2023-12-16 20:18:47 +0800 liblzma: Improve lzma encoder init function consistency. lzma_encoder_init() did not check for NULL options, but lzma2_encoder_init() did. This is more of a code style improvement than anything else to help make lzma_encoder_init() and lzma2_encoder_init() more similar. src/liblzma/lzma/lzma_encoder.c | 3 +++ 1 file changed, 3 insertions(+) commit e1b1a9d6370b788bd6078952c6c201e12bc27cbf Author: Jia Tan Date: 2023-12-16 11:20:20 +0800 Docs: Update repository URL in Changelog. ChangeLog | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit f9b82bc64a9405e486575c65c1729229eb0a8198 Author: Jia Tan Date: 2023-12-15 16:56:31 +0800 CI: Update Upload Artifact Action. .github/workflows/ci.yml | 2 +- .github/workflows/windows-ci.yml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) commit d0b24efe6cdc47db5b0fdf6306f70a2e0e63e49e Author: Jia Tan Date: 2023-12-07 21:48:07 +0800 Tests: Silence -Wsign-conversion warning on GCC version < 10. Since GCC version 10, GCC no longer complains about simple implicit integer conversions with Arithmetic operators. For instance: uint8_t a = 5; uint32_t b = a + 5; Give a warning on GCC 9 and earlier but this: uint8_t a = 5; uint32_t b = (a + 5) * 2; Gives a warning with GCC 10+. tests/test_block_header.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 4a972a8ee3ed88ac14067c1d2f15b78988e5dae8 Author: Jia Tan Date: 2023-12-06 18:39:03 +0800 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit ee2f48350099201694a7586e41d7aa2f09fc74da Author: Jia Tan Date: 2023-12-06 18:30:25 +0800 Tests: Minor cleanups to OSS-Fuzz files. Most of these fixes are small typos and tweaks. A few were caused by bad advice from me. Here is the summary of what is changed: - Author line edits - Small comment changes/additions - Using the return value in the error messages in the fuzz targets' coder initialization code - Removed fuzz_encode_stream.options. This set a max length, which may prevent some worthwhile code paths from being properly exercised. - Removed the max_len option from fuzz_decode_stream.options for the same reason as fuzz_encode_stream. The alone decoder fuzz target still has this restriction. - Altered the dictionary contents for fuzz_lzma.dict. Instead of keeping the properties static and varying the dictionary size, the properties are varied and the dictionary size is kept small. The dictionary size doesn't have much impact on the code paths but the properties do. Closes: https://github.com/tukaani-project/xz/pull/73 tests/ossfuzz/Makefile | 3 ++ tests/ossfuzz/config/fuzz_decode_stream.options | 1 - tests/ossfuzz/config/fuzz_lzma.dict | 34 +++++++++++----------- tests/ossfuzz/fuzz_common.h | 16 +++++------ tests/ossfuzz/fuzz_decode_alone.c | 15 +++++----- tests/ossfuzz/fuzz_decode_stream.c | 15 +++++----- tests/ossfuzz/fuzz_encode_stream.c | 38 +++++++++++++++---------- 7 files changed, 66 insertions(+), 56 deletions(-) commit 483bb90eec7c83e1c2bcd06287714afd62d8c17d Author: Maksym Vatsyk Date: 2023-12-05 16:31:09 +0100 Tests: Add fuzz_encode_stream ossfuzz target. This fuzz target handles .xz stream encoding. The first byte of input is used to dynamically set the preset level in order to increase the fuzz coverage of complex critical code paths. tests/ossfuzz/config/fuzz_encode_stream.options | 2 + tests/ossfuzz/fuzz_encode_stream.c | 79 +++++++++++++++++++++++++ 2 files changed, 81 insertions(+) commit 7ca8c9869df82756c3128c4fcf1058da4d18aa48 Author: Maksym Vatsyk Date: 2023-12-04 17:23:24 +0100 Tests: Add fuzz_decode_alone OSS-Fuzz target This fuzz target that handles LZMA alone decoding. A new fuzz dictionary .dict was also created with common LZMA header values to help speed up the discovery of valid headers. tests/ossfuzz/config/fuzz_decode_alone.options | 3 ++ tests/ossfuzz/config/fuzz_lzma.dict | 22 ++++++++++++++ tests/ossfuzz/fuzz_decode_alone.c | 41 ++++++++++++++++++++++++++ 3 files changed, 66 insertions(+) commit 37581a77ad5a49615325b1d1925fdc402b1e1d5a Author: Maksym Vatsyk Date: 2023-12-04 17:21:29 +0100 Tests: Update OSS-Fuzz Makefile. All .c files can be built as separate fuzz targets. This simplifies the Makefile by allowing us to use wildcards instead of having a Makefile target for each fuzz target. tests/ossfuzz/Makefile | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) commit 28ce6a1c2a74866c51f7996a6869679c236d3c94 Author: Maksym Vatsyk Date: 2023-12-04 17:20:08 +0100 Tests: Move common OSS-Fuzz target code to .h file. tests/ossfuzz/fuzz_common.h | 56 ++++++++++++++++++++++++++++++++++++ tests/ossfuzz/fuzz_decode_stream.c | 59 ++++++++++---------------------------- 2 files changed, 71 insertions(+), 44 deletions(-) commit bf0521ea1591c25b9d510c1b8be86073e9d847c6 Author: Maksym Vatsyk Date: 2023-12-04 17:18:20 +0100 Tests: Rename OSS-Fuzz files. tests/ossfuzz/config/fuzz.options | 2 -- tests/ossfuzz/config/fuzz_decode_stream.options | 3 +++ tests/ossfuzz/config/{fuzz.dict => fuzz_xz.dict} | 0 tests/ossfuzz/{fuzz.c => fuzz_decode_stream.c} | 0 4 files changed, 3 insertions(+), 2 deletions(-) commit 685094b8e1c1aa1bf934de0366ca42ef599d25f7 Author: Jia Tan Date: 2023-11-30 23:10:43 +0800 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit 3b3023e00b0071e10f589bbc3674e0ec432b8add Author: Kian-Meng Ang Date: 2023-11-30 23:01:19 +0800 Tests: Fix typos tests/test_index.c | 2 +- tests/test_lzip_decoder.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) commit 424d46ead8cbc0da57f406b76926ec4ed47437f5 Author: Kian-Meng Ang Date: 2023-11-30 22:59:47 +0800 xz: Fix typo src/xz/file_io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 35558adf9c45e5597f2c8dbd969885dd484038d2 Author: Jia Tan Date: 2023-11-30 20:41:00 +0800 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit fd170e8557727bed6bec0518c16415064d972e4e Author: Jia Tan Date: 2023-11-22 21:20:12 +0800 CI: Test musl libc builds on Ubuntu runner. .github/workflows/ci.yml | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) commit db2b4aa068a492c0013279a4ed43803e8ff9bb3e Author: Jia Tan Date: 2023-11-22 21:12:15 +0800 CI: Allow ci_build.sh to set a different C compiler. build-aux/ci_build.sh | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) commit ff7badef53c2cd698d4b72b945f34dfd0835e13c Author: Jia Tan Date: 2023-11-24 21:19:12 +0800 CMake: Use consistent indentation with check_c_source_compiles(). CMakeLists.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit d4af167570f2c14b002ee18a39d5b1e7e5a892b1 Author: Jia Tan Date: 2023-11-22 20:33:36 +0800 CMake: Change __attribute__((__ifunc__())) detection. This renames ALLOW_ATTR_IFUNC to USE_ATTR_IFUNC and applies the ifunc detection changes that were made to the Autotools build. Fixes: https://github.com/tukaani-project/xz/issues/70 CMakeLists.txt | 53 +++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 45 insertions(+), 8 deletions(-) commit 20ecee40a0053fd16371ef0628046bf45e548d72 Author: Jia Tan Date: 2023-11-24 20:19:11 +0800 Docs: Update INSTALL for --enable_ifunc change. INSTALL | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) commit ffb456593d695d70052a2f71c7a2e6269217d194 Author: Jia Tan Date: 2023-11-21 20:56:55 +0800 Build: Change --enable-ifunc handling. Some compilers support __attribute__((__ifunc__())) even though the dynamic linker does not. The compiler is able to create the binary but it will fail on startup. So it is not enough to just test if the attribute is supported. The default value for enable_ifunc is now auto, which will attempt to compile a program using __attribute__((__ifunc__())). There are additional checks in this program if glibc is being used or if it is running on FreeBSD. Setting --enable-ifunc will skip this test and always enable __attribute__((__ifunc__())), even if is not supported. configure.ac | 61 +++++++++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 44 insertions(+), 17 deletions(-) commit 12b89bcc9915090eb42ae638e565af44b6832a23 Author: Lasse Collin Date: 2023-11-23 17:39:10 +0200 xz: Tweak a comment. src/xz/util.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 2ab2e4b5a542eab93902985ce4e642719a8b7a4e Author: Jia Tan Date: 2023-11-23 22:13:39 +0800 xz: Use is_tty() in message.c. src/xz/message.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) commit 584e3a258f32d579b1d07f99b4dc6e856c10ac7e Author: Jia Tan Date: 2023-11-23 22:04:35 +0800 xz: Create separate is_tty() function. The new is_tty() will report if a file descriptor is a terminal or not. On POSIX systems, it is a wrapper around isatty(). However, the native Windows implementation of isatty() will return true for all character devices, not just terminals. So is_tty() has a special case for Windows so it can use alternative Windows API functions to determine if a file descriptor is a terminal. This fixes a bug with MSVC and MinGW-w64 builds that refused to read from or write to non-terminal character devices because xz thought it was a terminal. For instance: xz foo -c > /dev/null would fail because /dev/null was assumed to be a terminal. src/xz/util.c | 30 +++++++++++++++++++++++------- src/xz/util.h | 14 ++++++++++++++ 2 files changed, 37 insertions(+), 7 deletions(-) commit 6b05f827f50e686537e9a23c49c5aa4c0aa6b23d Author: Jia Tan Date: 2023-11-22 20:39:41 +0800 tuklib_integer: Fix typo discovered by codespell. Based on internet dictionary searches, 'choise' is an outdated spelling of 'choice'. src/common/tuklib_integer.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 659aca0d695807c0762d4101765189e4e33d1e2c Author: Lasse Collin Date: 2023-11-17 19:35:19 +0200 xz: Move the check for --suffix with --format=raw a few lines earlier. Now it reads from argv[] instead of args->arg_names. src/xz/args.c | 44 ++++++++++++++++++++++---------------------- 1 file changed, 22 insertions(+), 22 deletions(-) commit ca278eb2b7f5a4940f5ab18955297b398d423824 Author: Jia Tan Date: 2023-11-17 20:35:11 +0800 Tests: Create test_suffix.sh. This tests some complicated interactions with the --suffix= option. The suffix option must be used with --format=raw, but can optionally be used to override the default .xz suffix. This test also verifies some recent bugs have been correctly solved and to hopefully avoid further regressions in the future. tests/Makefile.am | 2 + tests/test_suffix.sh | 189 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 191 insertions(+) commit 2a732aba22da1b0d4a1241cb32280ed010ba03ce Author: Jia Tan Date: 2023-11-17 20:19:26 +0800 xz: Fix a bug with --files and --files0 in raw mode without a suffix. The following command caused a segmentation fault: xz -Fraw --lzma1 --files=foo when foo was a valid file. The usage of --files or --files0 was not being checked when compressing or decompressing in raw mode without a suffix. The suffix checking code was meant to validate that all files to be processed are "-" (if not writing to standard out), meaning the data is only coming from standard in. In this case, there were no file names to check since --files and --files0 store their file name in a different place. Later code assumed the suffix was set and caused a segmentation fault. Now, the above command results in an error. src/xz/args.c | 5 +++++ 1 file changed, 5 insertions(+) commit 299920bab9ae258a247366339264e8aefca9e3ce Author: Jia Tan Date: 2023-11-17 20:04:58 +0800 Tests: Fix typo in a comment. tests/test_files.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit f481523baac946fa3bc13d79186ffaf0c0b818a7 Author: Jia Tan Date: 2023-11-15 23:40:13 +0800 xz: Refactor suffix test with raw format. The previous version set opt_stdout, but this caused an issue with copying an input file to standard out when decompressing an unknown file type. The following needs to result in an error: echo foo | xz -df since -c, --stdout is not used. This fixes the previous error by not setting opt_stdout. src/xz/args.c | 38 +++++++++++++------------------------- 1 file changed, 13 insertions(+), 25 deletions(-) commit 837ea40b1c9d4998cac4500b55171bf33e0c31a6 Author: Jia Tan Date: 2023-11-14 20:27:46 +0800 xz: Move suffix check after stdout mode is detected. This fixes a bug introduced in cc5aa9ab138beeecaee5a1e81197591893ee9ca0 when the suffix check was initially moved. This caused a situation that previously worked: echo foo | xz -Fraw --lzma1 | wc -c to fail because the old code knew that this would write to standard out so a suffix was not needed. src/xz/args.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) commit d4f4a4d040ef47a5e82dffd0f067e92716606ddf Author: Jia Tan Date: 2023-11-14 20:27:04 +0800 xz: Detect when all data will be written to standard out earlier. If the -c, --stdout argument is not used, then we can still detect when the data will be written to standard out if all of the provided filenames are "-" (denoting standard in) or if no filenames are provided. src/xz/args.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) commit 2ade7246e7ba729a91460d2fab0f4c7b89d3998b Author: Jia Tan Date: 2023-11-09 01:21:53 +0800 liblzma: Add missing comments to lz_encoder.h. src/liblzma/lz/lz_encoder.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) commit 5fe1450603dc625340b8b7866fb4a83ff748ad06 Author: Jia Tan Date: 2023-11-01 20:18:30 +0800 Add NEWS for 5.4.5. NEWS | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 74 insertions(+) commit 46007049cd42e606543dbe650feb17bdf4469c29 Author: Lasse Collin Date: 2023-10-31 21:41:09 +0200 liblzma: Fix compilation of fastpos_tablegen.c. The macro lzma_attr_visibility_hidden has to be defined to make fastpos.h usable. The visibility attribute is irrelevant to fastpos_tablegen.c so simply #define the macro to an empty value. fastpos_tablegen.c is never built by the included build systems and so the problem wasn't noticed earlier. It's just a standalone program for generating fastpos_table.c. Fixes: https://github.com/tukaani-project/xz/pull/69 Thanks to GitHub user Jamaika1. src/liblzma/lzma/fastpos_tablegen.c | 2 ++ 1 file changed, 2 insertions(+) commit 148e20607e95781558bdfc823ecba07b7af4b590 Author: Jia Tan Date: 2023-10-31 21:51:40 +0800 Build: Fix text wrapping in an output message. configure.ac | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) commit 8c36ab79cbf23104ce7a3d533d5ac98cd492e57c Author: Lasse Collin Date: 2023-10-30 18:09:53 +0200 liblzma: Add a note why crc_always_inline exists for now. Solaris Studio is a possible example (not tested) which supports the always_inline attribute but might not get detected by the common.h #ifdefs. src/liblzma/check/crc_common.h | 5 +++++ 1 file changed, 5 insertions(+) commit e7a86b94cd247435ac96bc79ba528b690b9ca388 Author: Lasse Collin Date: 2023-10-22 17:59:11 +0300 liblzma: Use lzma_always_inline in memcmplen.h. src/liblzma/common/memcmplen.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) commit dcfe5632992fb7f06f921da13fcdd84f83d0d285 Author: Lasse Collin Date: 2023-10-30 17:43:03 +0200 liblzma: #define lzma_always_inline in common.h. src/liblzma/common/common.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) commit 41113fe30a47f6fd3e30cb4494dd538e86212edf Author: Lasse Collin Date: 2023-10-22 17:15:32 +0300 liblzma: Use lzma_attr_visibility_hidden on private extern declarations. These variables are internal to liblzma and not exposed in the API. src/liblzma/check/check.h | 7 +++++++ src/liblzma/common/stream_flags_common.h | 3 +++ src/liblzma/lz/lz_encoder_hash.h | 1 + src/liblzma/lzma/fastpos.h | 1 + src/liblzma/rangecoder/price.h | 1 + 5 files changed, 13 insertions(+) commit a2f5ca706acc6f7715b8d260a8c6ed50d7717478 Author: Lasse Collin Date: 2023-10-22 17:08:39 +0300 liblzma: #define lzma_attr_visibility_hidden in common.h. In ELF shared libs: -fvisibility=hidden affects definitions of symbols but not declarations.[*] This doesn't affect direct calls to functions inside liblzma as a linker can replace a call to lzma_foo@plt with a call directly to lzma_foo when -fvisibility=hidden is used. [*] It has to be like this because otherwise every installed header file would need to explictly set the symbol visibility to default. When accessing extern variables that aren't defined in the same translation unit, compiler assumes that the variable has the default visibility and thus indirection is needed. Unlike function calls, linker cannot optimize this. Using __attribute__((__visibility__("hidden"))) with the extern variable declarations tells the compiler that indirection isn't needed because the definition is in the same shared library. About 15+ years ago, someone told me that it would be good if the CRC tables would be defined in the same translation unit as the C code of the CRC functions. While I understood that it could help a tiny amount, I didn't want to change the code because a separate translation unit for the CRC tables was needed for the x86 assembly code anyway. But when visibility attributes are supported, simply marking the extern declaration with the hidden attribute will get identical result. When there are only a few affected variables, this is trivial to do. I wish I had understood this back then already. src/liblzma/common/common.h | 11 +++++++++++ 1 file changed, 11 insertions(+) commit 2c7ee92e44e1e66f0a427555233eb22c78f6c4f8 Author: Lasse Collin Date: 2023-09-30 22:54:28 +0300 liblzma: Refer to MinGW-w64 instead of MinGW in the API headers. MinGW (formely a MinGW.org Project, later the MinGW.OSDN Project at ) has GCC 9.2.0 as the most recent GCC package (released 2021-02-02). The project might still be alive but majority of people have switched to MinGW-w64. Thus it seems clearer to refer to MinGW-w64 in our API headers too. Building with MinGW is likely to still work but I haven't tested it in the recent years. src/liblzma/api/lzma.h | 4 ++-- src/liblzma/api/lzma/version.h | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) commit 597f49b61475438a43a417236989b2acc968a686 Author: Lasse Collin Date: 2023-09-27 00:58:17 +0300 CMake: Use -D_FILE_OFFSET_BITS=64 if (and only if) needed. A CMake option LARGE_FILE_SUPPORT is created if and only if -D_FILE_OFFSET_BITS=64 affects sizeof(off_t). This is needed on many 32-bit platforms and even with 64-bit builds with MinGW-w64 to get support for files larger than 2 GiB. CMakeLists.txt | 7 ++++- cmake/tuklib_large_file_support.cmake | 52 +++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+), 1 deletion(-) commit 1bc548b8210366e44ba35b0b11577a8e328c1228 Author: Lasse Collin Date: 2023-09-30 02:14:25 +0300 CMake: Generate and install liblzma.pc if not using MSVC. Autotools based build uses -pthread and thus adds it to Libs.private in liblzma.pc. CMake doesn't use -pthread at all if pthread functions are available in libc so Libs.private doesn't get -pthread either. CMakeLists.txt | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) commit 2add71966f891d315105d6245f724ed4f43a4eff Author: Lasse Collin Date: 2023-09-30 01:13:13 +0300 CMake: Rearrange the PACKAGE_ variables. The windres workaround now replaces spaces with \x20 so the package name isn't repeated. These changes will help with creation of liblzma.pc. CMakeLists.txt | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) commit a7d1b2825c49dc83f1910eeb8ba0f1dfbd886d91 Author: Lasse Collin Date: 2023-09-29 20:46:11 +0300 liblzma: Add Cflags.private to liblzma.pc.in for MSYS2. It properly adds -DLZMA_API_STATIC when compiling code that will be linked against static liblzma. Having it there on systems other than Windows does no harm. See: https://www.msys2.org/docs/pkgconfig/ src/liblzma/liblzma.pc.in | 1 + 1 file changed, 1 insertion(+) commit 80e0750e3996c1c659e972ce9cf789ca2e99f702 Author: Lasse Collin Date: 2023-09-27 22:46:20 +0300 CMake: Create liblzma.def when building liblzma.dll with MinGW-w64. CMakeLists.txt | 20 ++++++++++++++++++++ cmake/remove-ordinals.cmake | 26 ++++++++++++++++++++++++++ 2 files changed, 46 insertions(+) commit 08d12595f486890cf601b87f36ee0ddbce57728e Author: Lasse Collin Date: 2023-10-26 21:44:42 +0300 CMake: Change one CMAKE_CURRENT_SOURCE_DIR to CMAKE_CURRENT_LIST_DIR. In this case they have identical values. CMakeLists.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit e67aaf698de75c73443a5ec786781cbf2034461d Author: Lasse Collin Date: 2023-10-01 19:10:57 +0300 CMake/Windows: Fix the import library filename. Both PREFIX and IMPORT_PERFIX have to be set to "" to get liblzma.dll and liblzma.dll.a. CMakeLists.txt | 1 + 1 file changed, 1 insertion(+) commit 88588b1246d8c26ffbc138b3e5c413c5f14c3179 Author: Lasse Collin Date: 2023-10-25 19:13:25 +0300 Build: Detect -fsanitize= in CFLAGS and incompatible build options. Now configure will fail if -fsanitize= is found in CFLAGS and sanitizer-incompatible ifunc or Landlock sandboxing would be used. These are incompatible with one or more sanitizers. It's simpler to reject all -fsanitize= uses instead of trying to pass those that might not cause problems. CMake-based build was updated similarly. It lets the configuration finish (SEND_ERROR instead of FATAL_ERROR) so that both error messages can be seen at once. CMakeLists.txt | 29 +++++++++++++++++++++++++++++ configure.ac | 37 +++++++++++++++++++++++++++++++++---- 2 files changed, 62 insertions(+), 4 deletions(-) commit 5e3d890f8862a7d4fbef5e38e11b6c9fbd98f468 Author: Jia Tan Date: 2023-10-24 00:50:08 +0800 CI: Disable sandboxing in fsanitize=address,undefined job. The sandboxing on Linux now supports Landlock, which restricts all supported filesystem actions after xz opens the files it needs. The sandbox is only enabled when one file is input and we are writing to standard out. With fsanitize=address,undefined, the instrumentation needs to read additional files after the sandbox is in place. This forces all xz based test to fail, so the sandbox must instead be disabled. .github/workflows/ci.yml | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) commit b1408987ea832e2760e478ae960a636df17a1363 Author: Jia Tan Date: 2023-10-24 00:15:39 +0800 CI: Allow disabling the sandbox in ci_build.sh. build-aux/ci_build.sh | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) commit 91c435cf1c7a1e893706d4d716dfd361621ed824 Author: Lasse Collin Date: 2023-10-11 19:47:44 +0300 CMake: Don't shadow the cache entry ENABLE_THREADS with a normal variable. Using set(ENABLE_THREADS "posix") is confusing because it sets a new normal variable and leaves the cache entry with the same name unchanged. The intent wasn't to change the cache entry so this switches to a different variable name. CMakeLists.txt | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) commit fa1609eb9393ecd30decfed4891c907829f06710 Author: Lasse Collin Date: 2023-10-09 22:28:49 +0300 Docs: Update INSTALL about sandboxing support. INSTALL | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) commit 8276c7f41c671eee4aa3239490658b23dcfd3021 Author: Lasse Collin Date: 2023-10-09 22:07:52 +0300 xz: Support basic sandboxing with Linux Landlock (ABI versions 1-3). It is enabled only when decompressing one file to stdout, similar to how Capsicum is used. Landlock was added in Linux 5.13. CMakeLists.txt | 12 +++++++++++- configure.ac | 11 ++++++++--- src/xz/file_io.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ src/xz/main.c | 19 +++++++++++++++++++ src/xz/private.h | 3 ++- 5 files changed, 98 insertions(+), 5 deletions(-) commit 3a1e9fd031b9320d769d63b503ef4e82e1b6ea8c Author: Lasse Collin Date: 2023-10-09 21:12:31 +0300 CMake: Edit threading related messages. It's mostly to change from "thread method" to "threading method". CMakeLists.txt | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) commit bf011352528ae3539ea7b780b45b96736ee57a99 Author: Lasse Collin Date: 2023-10-09 20:59:24 +0300 CMake: Use FATAL_ERROR if user-supplied options aren't understood. This way typos are caught quickly and compounding error messages are avoided (a single typo could cause more than one error). This keeps using SEND_ERROR when the system is lacking a feature (like threading library or sandboxing method). This way the whole configuration log will be generated in case someone wishes to report a problem upstream. CMakeLists.txt | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) commit 3f53870c249945d657ca3d75e0993e6267d71f75 Author: Lasse Collin Date: 2023-10-09 18:37:32 +0300 CMake: Add sandboxing support. CMakeLists.txt | 50 +++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 49 insertions(+), 1 deletion(-) commit 2e2cd11535ad77364cf021297e0b3f162fa3a3d0 Author: Lasse Collin Date: 2023-10-09 18:13:08 +0300 Simplify detection of Capsicum support. This removes support for FreeBSD 10.0 and 10.1 which used instead of . Support for FreeBSD 10.1 ended on 2016-12-31. So now FreeBSD >= 10.2 is required to enable Capsicum support. This also removes support for Capsicum on Linux (libcaprights) which seems to have been unmaintained since 2017 and Linux 4.11: https://github.com/google/capsicum-linux configure.ac | 4 +-- m4/ax_check_capsicum.m4 | 85 ------------------------------------------------- src/xz/Makefile.am | 2 +- src/xz/file_io.c | 14 +++----- src/xz/private.h | 2 +- 5 files changed, 9 insertions(+), 98 deletions(-) commit c57858b60e186d020b2dbaf7aabd9b32c71da824 Author: Lasse Collin Date: 2023-09-25 01:46:36 +0300 xz/Windows: Allow clock_gettime with POSIX threads. If winpthreads are used for threading, it's OK to use clock_gettime() from winpthreads too. src/xz/mytime.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) commit dd32f628bb5541ef4e8ce66966ef456a1934084c Author: Lasse Collin Date: 2023-09-25 01:39:26 +0300 mythread.h: Make MYTHREAD_POSIX compatible with MinGW-w64's winpthreads. This might be almost useless but it doesn't need much extra code either. src/common/mythread.h | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) commit 680e52cdd086e92691d8a0bca2c98815565f60ca Author: Lasse Collin Date: 2023-09-23 03:06:36 +0300 CMake: Check for clock_gettime() even on Windows. This mirrors configure.ac although currently MinGW-w64 builds don't use clock_gettime() even if it is found. CMakeLists.txt | 44 +++++++++++++++++++++----------------------- 1 file changed, 21 insertions(+), 23 deletions(-) commit 1c1a8c3ee4dad0064dbe63b8dbc4ac4bc679f419 Author: Lasse Collin Date: 2023-09-23 03:23:32 +0300 Build: Check for clock_gettime() even if not using POSIX threads. See the new comment in the code. This also makes the check for clock_gettime() run with MinGW-w64 with which we don't want to use clock_gettime(). The previous commit already took care of this situation. configure.ac | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) commit 46fd991cd2808ef62554853864c946232e7547f0 Author: Lasse Collin Date: 2023-09-24 22:58:53 +0300 xz/Windows: Ensure that clock_gettime() isn't used with MinGW-w64. This commit alone doesn't change anything in the real-world: - configure.ac currently checks for clock_gettime() only when using pthreads. - CMakeLists.txt doesn't check for clock_gettime() on Windows. So clock_gettime() wasn't used with MinGW-w64 before either. clock_gettime() provides monotonic time and it's better than gettimeofday() in this sense. But clock_gettime() is defined in winpthreads, and liblzma or xz needs nothing else from winpthreads. By avoiding clock_gettime(), we avoid the dependency on libwinpthread-1.dll or the need to link against the static version. As a bonus, GetTickCount64() and MinGW-w64's gettimeofday() can be faster than clock_gettime(CLOCK_MONOTONIC, &tv). The resolution is more than good enough for the progress indicator in xz. src/xz/mytime.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) commit cdb4d91f2464b50c985ef7b9517314ea237ddda7 Author: Lasse Collin Date: 2023-09-24 00:21:22 +0300 xz/Windows: Use GetTickCount64() with MinGW-w64 if using Vista threads. src/xz/mytime.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) commit 988e09f27b9b04a43d45d10f92782e0092ee27a9 Author: Jia Tan Date: 2023-10-20 19:17:46 +0800 liblzma: Move is_clmul_supported() back to crc_common.h. This partially reverts creating crc_clmul.c (8c0f9376f58c0696d5d6719705164d35542dd891) where is_clmul_supported() was moved, extern'ed, and renamed to lzma_is_clmul_supported(). This caused a problem when the function call to lzma_is_clmul_supported() results in a call through the PLT. ifunc resolvers run very early in the dynamic loading sequence, so the PLT may not be setup properly at this point. Whether the PLT is used or not for lzma_is_clmul_supported() depened upon the compiler-toolchain used and flags. In liblzma compiled with GCC, for instance, GCC will go through the PLT for function calls internal to liblzma if the version scripts and symbol visibility hiding are not used. If lazy-binding is disabled, then it would have made any program linked with liblzma fail during dynamic loading in the ifunc resolver. src/liblzma/check/crc32_fast.c | 2 +- src/liblzma/check/crc64_fast.c | 2 +- src/liblzma/check/crc_clmul.c | 45 ------------------------------------ src/liblzma/check/crc_common.h | 52 +++++++++++++++++++++++++++++++++++++++--- 4 files changed, 51 insertions(+), 50 deletions(-) commit 105c7ca90d4152942e0798580a37f736d02faa22 Author: Jia Tan Date: 2023-10-19 16:23:32 +0800 Build: Remove check for COND_CHECK_CRC32 in check/Makefile.inc. Currently crc32 is always enabled, so COND_CHECK_CRC32 must always be set. Because of this, it makes the recent change to conditionally compile check/crc_clmul.c appear wrong since that file has CLMUL implementations for both CRC32 and CRC64. src/liblzma/check/Makefile.inc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 139757170468f0f1fafdf0a8ffa74363d1ea1d0c Author: Jia Tan Date: 2023-10-19 16:09:01 +0800 CMake: Add ALLOW_CLMUL_CRC option to enable/disable CLMUL. The option is enabled by default, but will only be visible to a user listing cache variables or using a CMake GUI application if the immintrin.h header file is found. This mirrors our Autotools build --disable-clmul-crc functionality. CMakeLists.txt | 44 +++++++++++++++++++++++++------------------- 1 file changed, 25 insertions(+), 19 deletions(-) commit c60b25569d414bb73b705977a4dd342f8f9f1965 Author: Jia Tan Date: 2023-10-19 00:22:50 +0800 liblzma: Fix -fsanitize=address failure with crc_clmul functions. After forcing crc_simd_body() to always be inlined it caused -fsanitize=address to fail for lzma_crc32_clmul() and lzma_crc64_clmul(). The __no_sanitize_address__ attribute was added to lzma_crc32_clmul() and lzma_crc64_clmul(), but not removed from crc_simd_body(). ASAN and inline functions behavior has changed over the years for GCC specifically, so while strictly required we will keep __attribute__((__no_sanitize_address__)) on crc_simd_body() in case this becomes a requirement in the future. Older GCC versions refuse to inline a function with ASAN if the caller and callee do not agree on sanitization flags (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89124#c3). If the function was forced to be inlined, it will not compile if the callee function has __no_sanitize_address__ but the caller doesn't. src/liblzma/check/crc_clmul.c | 6 ++++++ 1 file changed, 6 insertions(+) commit 9a78971261bc67622cbd7dae02f6966968ac1393 Author: Lasse Collin Date: 2023-10-14 20:16:13 +0300 tuklib_integer: Update the CMake test for fast unaligned access. cmake/tuklib_integer.cmake | 69 ++++++++++++++++++++++++++++++++++++---------- 1 file changed, 54 insertions(+), 15 deletions(-) commit 2f81ac852bc5aafc91c8e2adc66b5114761703c4 Author: Lasse Collin Date: 2023-09-23 23:28:48 +0300 Build: Enabled unaligned access by default on PowerPC64LE and some RISC-V. PowerPC64LE wasn't tested but it seems like a safe change. POWER8 supports unaligned access in little endian mode. Testing on godbolt.org shows that GCC uses unaligned access by default. The RISC-V macro __riscv_misaligned_fast is very new and not in any stable compiler release yet. Documentation in INSTALL was updated to match. Documentation about an autodetection bug when using ARM64 GCC with -mstrict-align was added to INSTALL. CMake files weren't updated yet. INSTALL | 39 +++++++++++++++++++++++++++++++++++++-- m4/tuklib_integer.m4 | 34 +++++++++++++++++++++++++++------- 2 files changed, 64 insertions(+), 9 deletions(-) commit c8f715f1bca4c30db814fcf1fd2fe88b8992ede2 Author: Lasse Collin Date: 2023-10-14 17:56:59 +0300 tuklib_integer: Revise unaligned reads and writes on strict-align archs. In XZ Utils context this doesn't matter much because unaligned reads and writes aren't used in hot code when TUKLIB_FAST_UNALIGNED_ACCESS isn't #defined. src/common/tuklib_integer.h | 256 ++++++++++++++++++++++++++++++++------------ 1 file changed, 189 insertions(+), 67 deletions(-) commit 6828242735cbf61b93d140383336e1e51a006f2d Author: Lasse Collin Date: 2023-09-23 02:21:49 +0300 tuklib_integer: Add missing write64be and write64le fallback functions. src/common/tuklib_integer.h | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) commit 1c8884f0af28b3a4690bb573cdf3240a8ec73416 Author: Jia Tan Date: 2023-10-18 19:57:10 +0800 liblzma: Set the MSVC optimization fix to only cover lzma_crc64_clmul(). After testing a 32-bit Release build on MSVC, only lzma_crc64_clmul() has the bug. crc_simd_body() and lzma_crc32_clmul() do not need the optimizations disabled. src/liblzma/check/crc_clmul.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) commit 5ce0f7a48bdf5c3b45430850a4487307afac6143 Author: Lasse Collin Date: 2023-10-18 14:30:00 +0300 liblzma: CRC_USE_GENERIC_FOR_SMALL_INPUTS cannot be used with ifunc. src/liblzma/check/crc_common.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) commit 27735380491bb5ce0d0f41d5244d89c1d0825f6b Author: Lasse Collin Date: 2023-10-17 21:53:11 +0300 liblzma: Include common.h in crc_common.h. crc_common.h depends on common.h. The headers include common.h except when there is a reason to not do so. src/liblzma/check/crc_clmul.c | 1 - src/liblzma/check/crc_common.h | 3 +++ 2 files changed, 3 insertions(+), 1 deletion(-) commit e13b7947b92355c334edd594295d3a2c99c4bca1 Author: Jia Tan Date: 2023-10-18 01:23:26 +0800 liblzma: Add include guards to crc_common.h. src/liblzma/check/crc_common.h | 5 +++++ 1 file changed, 5 insertions(+) commit 40abd88afcc61a8157fcd12d78d491caeb8e12be Author: Jia Tan Date: 2023-10-18 22:50:25 +0800 liblzma: Add the crc_always_inline macro to crc_simd_body(). Forcing this to be inline has a significant speed improvement at the cost of a few repeated instructions. The compilers tested on did not inline this function since it is large and is used twice in the same translation unit. src/liblzma/check/crc_clmul.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit a5966c276bd6fa975f0389f8a8dc61393de750b0 Author: Jia Tan Date: 2023-10-18 22:48:19 +0800 liblzma: Create crc_always_inline macro. This macro must be used instead of the inline keyword. On MSVC, it is a replacement for __forceinline which is an MSVC specific keyword that should not be used with inline (it will issue a warning if it is). It does not use a build system check to determine if __attribute__((__always_inline__)) since all compilers that can use CLMUL extensions (except the special case for MSVC) should support this attribute. If this assumption is incorrect then it will result in a bug report instead of silently producing slow code. src/liblzma/check/crc_common.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) commit 96b663f67c0e738a99ba8f35d9f4ced9add74544 Author: Jia Tan Date: 2023-10-14 13:23:23 +0800 liblzma: Refactor CRC comments. A detailed description of the three dispatch methods was added. Also, duplicated comments now only appear in crc32_fast.c or were removed from both crc32_fast.c and crc64_fast.c if they appeared in crc_clmul.c. src/liblzma/check/crc32_fast.c | 64 +++++++++++++++++++++++++++++------------- src/liblzma/check/crc64_fast.c | 61 ++++++---------------------------------- 2 files changed, 53 insertions(+), 72 deletions(-) commit 8c0f9376f58c0696d5d6719705164d35542dd891 Author: Jia Tan Date: 2023-10-14 12:17:57 +0800 liblzma: Create crc_clmul.c. Both crc32_clmul() and crc64_clmul() are now exported from crc32_clmul.c as lzma_crc32_clmul() and lzma_crc64_clmul(). This ensures that is_clmul_supported() (now lzma_is_clmul_supported()) is not duplicated between crc32_fast.c and crc64_fast.c. Also, it encapsulates the complexity of the CLMUL implementations into a single file and reduces the complexity of crc32_fast.c and crc64_fast.c. Before, CLMUL code was present in crc32_fast.c, crc64_fast.c, and crc_common.h. During the conversion, various cleanups were applied to code (thanks to Lasse Collin) including: - Require using semicolons with MASK_/L/H/LH macros. - Variable typing and const handling improvements. - Improvements to comments. - Fixes to the pragmas used. - Removed unneeded variables. - Whitespace improvements. - Fixed CRC_USE_GENERIC_FOR_SMALL_INPUTS handling. - Silenced warnings and removed the need for some #pragmas CMakeLists.txt | 6 +- configure.ac | 6 +- src/liblzma/check/Makefile.inc | 3 + src/liblzma/check/crc32_fast.c | 120 +----------- src/liblzma/check/crc64_fast.c | 128 +------------ src/liblzma/check/crc_clmul.c | 414 +++++++++++++++++++++++++++++++++++++++++ src/liblzma/check/crc_common.h | 190 +------------------ 7 files changed, 444 insertions(+), 423 deletions(-) commit a3ebc2c516b09616638060806c841bd4bcf7bce3 Author: Jia Tan Date: 2023-10-14 10:23:03 +0800 liblzma: Define CRC_USE_IFUNC in crc_common.h. When ifunc is supported, we can define a simpler macro instead of repeating the more complex check in both crc32_fast.c and crc64_fast.c. src/liblzma/check/crc32_fast.c | 3 +-- src/liblzma/check/crc64_fast.c | 3 +-- src/liblzma/check/crc_common.h | 5 +++++ 3 files changed, 7 insertions(+), 4 deletions(-) commit f1cd9d7194f005cd66ec03c6635ceae75f90ef17 Author: Hans Jansen Date: 2023-10-12 19:37:01 +0200 liblzma: Added crc32_clmul to crc32_fast.c. src/liblzma/check/crc32_fast.c | 247 ++++++++++++++++++++++++++++++++++++++-- src/liblzma/check/crc32_table.c | 19 +++- 2 files changed, 255 insertions(+), 11 deletions(-) commit 93e6fb08b22c7c13be2dd1e7274fe78413436254 Author: Hans Jansen Date: 2023-10-12 19:23:40 +0200 liblzma: Moved CLMUL CRC logic to crc_common.h. crc64_fast.c was updated to use the code from crc_common.h instead. src/liblzma/check/crc64_fast.c | 257 ++--------------------------------------- src/liblzma/check/crc_common.h | 230 +++++++++++++++++++++++++++++++++++- 2 files changed, 240 insertions(+), 247 deletions(-) commit 233885a437f8b55a5c8442984ebc0aaa579e92de Author: Hans Jansen Date: 2023-10-12 19:07:50 +0200 liblzma: Rename crc_macros.h to crc_common.h. CMakeLists.txt | 2 +- src/liblzma/check/Makefile.inc | 2 +- src/liblzma/check/crc32_fast.c | 2 +- src/liblzma/check/crc64_fast.c | 2 +- src/liblzma/check/{crc_macros.h => crc_common.h} | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) commit 37947d4a7565b87e4cec8b89229d35b0a3f8d2cd Author: Gabriela Gutierrez Date: 2023-09-26 15:55:13 +0000 CI: Bump and ref actions by commit SHA in windows-ci.yml Referencing actions by commit SHA in GitHub workflows guarantees you are using an immutable version. Actions referenced by tags and branches are more vulnerable to attacks, such as the tag being moved to a malicious commit or a malicious commit being pushed to the branch. It's important to make sure the SHA's are from the original repositories and not forks. For reference: https://github.com/msys2/setup-msys2/releases/tag/v2.20.1 https://github.com/msys2/setup-msys2/commit/27b3aa77f672cb6b3054121cfd80c3d22ceebb1d https://github.com/actions/checkout/releases/tag/v4.1.0 https://github.com/actions/checkout/commit/8ade135a41bc03ea155e62e844d188df1ea18608 https://github.com/actions/upload-artifact/releases/tag/v3.1.3 https://github.com/actions/upload-artifact/commit/a8a3f3ad30e3422c9c7b888a15615d19a852ae32 Signed-off-by: Gabriela Gutierrez .github/workflows/windows-ci.yml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit f28cc9bd481ce493da11f98c18526d324211599a Author: Gabriela Gutierrez Date: 2023-09-26 14:35:08 +0000 CI: Bump and ref actions by commit SHA in ci.yml Referencing actions by commit SHA in GitHub workflows guarantees you are using an immutable version. Actions referenced by tags and branches are more vulnerable to attacks, such as the tag being moved to a malicious commit or a malicious commit being pushed to the branch. It's important to make sure the SHA's are from the original repositories and not forks. For reference: https://github.com/actions/checkout/releases/tag/v4.1.0 https://github.com/actions/checkout/commit/8ade135a41bc03ea155e62e844d188df1ea18608 https://github.com/actions/upload-artifact/releases/tag/v3.1.3 https://github.com/actions/upload-artifact/commit/a8a3f3ad30e3422c9c7b888a15615d19a852ae32 Signed-off-by: Gabriela Gutierrez .github/workflows/ci.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit f74f1740067b75042497edbfa6ea457ff75484b9 Author: Jia Tan Date: 2023-10-12 20:12:18 +0800 Build: Update visibility.m4 from Gnulib. Updating from version 6 -> 8 from upstream. Declarations for variables and function bodies were added to avoid unnecessary failures with -Werror. m4/visibility.m4 | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) commit 5c4bca521e6fb435898a0012b3276eee70a6dadf Author: Lasse Collin Date: 2023-10-06 19:36:35 +0300 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit d91cb6e884c73d0b05d7e7d68ad4e6eb29f4b44b Author: Lasse Collin Date: 2023-10-06 18:55:57 +0300 CMake/Windows: Fix when the windres workaround is applied. CMake doesn't set WIN32 on CYGWIN but the workaround is probably needed on Cygwin too. Same for MSYS and MSYS2. The workaround must not be used with Clang that is acting in MSVC mode. This fixes it by checking for the known environments that need the workaround instead of using "NOT MSVC". Thanks to Martin Storsjö. https://github.com/tukaani-project/xz/commit/0570308ddd9c0e39e85597ebc0e31d4fc81d436f#commitcomment-129098431 CMakeLists.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit 01e34aa1171b04f8b28960b1cc6135a903e0c13d Author: Jia Tan Date: 2023-09-29 22:11:54 +0800 CI: Disable CLANG64 MSYS2 environment until bug is resolved. lld 17.0.1 searches for libraries to link first in the toolchain directories before the local directory when building. The is a problem for us because liblzma.a is installed in MSYS2 CLANG64 by default and xz.exe will thus use the installed library instead of the one being built. This causes tests to fail when they are expecting features to be disabled. More importantly, it will compile xz.exe with an incorrect liblzma and could cause unexpected behavior by being unable to update liblzma code in static builds. The CLANG64 environment can be tested again once this is fixed. Link to bug: https://github.com/llvm/llvm-project/issues/67779. .github/workflows/windows-ci.yml | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) commit 30d0c35327f3639cb11224872aa58fdbf0b1526e Author: Jia Tan Date: 2023-09-29 20:14:39 +0800 CMake: Rename xz and man page symlink custom targets. The Ninja Generator for CMake cannot have a custom target and its BYPRODUCTS have the same name. This has prevented Ninja builds on Unix-like systems since the xz symlinks were introduced in 80a1a8bb838842a2be343bd88ad1462c21c5e2c9. CMakeLists.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit 506d03127a8565442b028ec991e1578124fd3025 Author: Jia Tan Date: 2023-09-29 19:58:44 +0800 CMake: Specify LINKER_LANGUAGE for libgnu target to fix Ninja Generator. CMake is unable to guess the linker language for just a header file so it must be explicitly set. CMakeLists.txt | 6 ++++++ 1 file changed, 6 insertions(+) commit 0570308ddd9c0e39e85597ebc0e31d4fc81d436f Author: Lasse Collin Date: 2023-09-27 19:54:35 +0300 CMake: Fix Windows build with Clang/LLVM 17. llvm-windres 17.0.0 has more accurate emulation of GNU windres, so the hack for GNU windres must now be used with llvm-windres too. LLVM 16.0.6 has the old behavior and there likely won't be more 16.x releases. So we can simply check for >= 17.0.0. See also: https://github.com/llvm/llvm-project/commit/2bcc0fdc58a220cb9921b47ec8a32c85f2511a47 CMakeLists.txt | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) commit 5a9af95f85a7e5d4f9c10cb8cf737651a921f1d1 Author: Lasse Collin Date: 2023-09-26 21:47:13 +0300 liblzma: Update a comment. The C standards don't allow an empty translation unit which can be avoided by declaring something, without exporting any symbols. When I committed f644473a211394447824ea00518d0a214ff3f7f2 I had a feeling that some specific toolchain somewhere didn't like empty object files (assembler or maybe "ar" complained) but I cannot find anything to confirm this now. Quite likely I remembered nonsense. I leave this here as a note to my future self. :-) src/liblzma/check/crc64_table.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) commit 8ebaf3f665ddc7e4f19c613005050dde5ccbe499 Author: Jia Tan Date: 2023-09-27 00:02:11 +0800 liblzma: Avoid compiler warning without creating extra symbol. When the generic fast crc64 method is used, then we omit lzma_crc64_table[][]. Similar to d9166b52cf3458a4da3eb92224837ca8fc208d79, we can avoid compiler warnings with -Wempty-translation-unit (Clang) or -pedantic (GCC) by creating a never used typedef instead of an extra symbol. src/liblzma/check/crc64_table.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) commit 092d21db2e5eea19fe079264ce48c178989c7606 Author: Lasse Collin Date: 2023-09-26 17:24:15 +0300 Build: Update the comment about -Werror usage in checks. configure.ac | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) commit a37a2763383e6c204fe878e1416dd35e7711d3a9 Author: Lasse Collin Date: 2023-09-26 15:00:43 +0300 Build: Fix __attribute__((ifunc(...))) detection with clang -Wall. Now if user-supplied CFLAGS contains -Wall -Wextra -Wpedantic the two checks that need -Werror will still work. At CMake side there is add_compile_options(-Wall -Wextra) but it didn't affect the -Werror tests. So with both Autotools and CMake only user-supplied CFLAGS could make the checks fail when they shouldn't. This is not a full fix as things like -Wunused-macros in user-supplied CFLAGS will still cause problems with both GCC and Clang. CMakeLists.txt | 8 ++++++++ configure.ac | 8 ++++++++ 2 files changed, 16 insertions(+) commit 9c42f936939b813f25d0ff4e99c3eb9c2d17a0d2 Author: Lasse Collin Date: 2023-09-26 13:51:31 +0300 Build: Fix underquoted AC_LANG_SOURCE. It made no practical difference in this case. configure.ac | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 9f1444a8a5c0e724b2c7ef83424f642f07a95982 Author: Lasse Collin Date: 2023-09-26 13:14:37 +0300 Build: Silence two Autoconf warnings. There were two uses of AC_COMPILE_IFELSE that didn't use AC_LANG_SOURCE and Autoconf warned about these. The omission had been intentional but it turned out that this didn't do what I thought it would. Autoconf 2.71 manual gives an impression that AC_LANG_SOURCE inserts all #defines that have been made with AC_DEFINE so far (confdefs.h). The idea was that omitting AC_LANG_SOURCE would mean that only the exact code included in the AC_COMPILE_IFELSE call would be compiled. With C programs this is not true: the #defines get added without AC_LANG_SOURCE too. There seems to be no neat way to avoid this. Thus, with the C language at least, adding AC_LANG_SOURCE makes no other difference than silencing a warning from Autoconf. The generated "configure" remains identical. (Docs of AC_LANG_CONFTEST say that the #defines have been inserted since Autoconf 2.63b and that AC_COMPILE_IFELSE uses AC_LANG_CONFTEST. So the behavior is documented if one also reads the docs of macros that one isn't calling directly.) Any extra code, including #defines, can cause problems for these two tests because these tests must use -Werror. CC=clang CFLAGS=-Weverything is the most extreme example. It enables -Wreserved-macro-identifier which warns about #define __EXTENSIONS__ 1 because it begins with two underscores. It's possible to write a test file that passes -Weverything but it becomes impossible when Autoconf inserts confdefs.h. So this commit adds AC_LANG_SOURCE to silence Autoconf warnings. A different solution is needed for -Werror tests. configure.ac | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) commit 519e47c2818acde571fadc79551294527fe6cc22 Author: Jia Tan Date: 2023-09-26 01:17:11 +0800 CMake: Remove accidental extra newline. CMakeLists.txt | 1 - 1 file changed, 1 deletion(-) commit bbb42412da6a02705ba3e668e90840c2683e4e67 Author: Jia Tan Date: 2023-09-26 00:47:26 +0800 Build: Remove Gnulib dependency from tests. The tests do not use any Gnulib replacements so they do not need to link libgnu.a or have /lib in the include path. tests/Makefile.am | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) commit d265f6b75691c6c8fa876eb5320c3ff5aed17dfa Author: Jia Tan Date: 2023-09-26 00:43:43 +0800 CMake: Remove /lib from tests include path. The tests never included anything from /lib, so this was not needed. CMakeLists.txt | 1 - 1 file changed, 1 deletion(-) commit 9fb5de41f2fb654ca952d4bda15cf3777c2b720f Author: Jia Tan Date: 2023-09-24 22:10:41 +0800 Scripts: Change quoting style from `...' to '...'. src/scripts/xzdiff.in | 2 +- src/scripts/xzgrep.in | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) commit eaebdef4d4de3c088b0905f42626b74e0d23abf3 Author: Jia Tan Date: 2023-09-24 22:10:18 +0800 xz: Change quoting style from `...' to '...'. src/xz/args.c | 6 +++--- src/xz/file_io.c | 2 +- src/xz/main.c | 4 ++-- src/xz/message.c | 14 +++++++------- src/xz/options.c | 2 +- src/xz/suffix.c | 2 +- src/xz/util.c | 6 +++--- 7 files changed, 18 insertions(+), 18 deletions(-) commit f6667702bf075a05fbe336dbf3576ad1a82ec645 Author: Jia Tan Date: 2023-09-24 22:09:47 +0800 liblzma: Change quoting style from `...' to '...'. This was done for both internal and API headers. src/liblzma/api/lzma/base.h | 18 +++++++++--------- src/liblzma/api/lzma/container.h | 10 +++++----- src/liblzma/api/lzma/filter.h | 6 +++--- src/liblzma/api/lzma/index.h | 8 ++++---- src/liblzma/api/lzma/lzma12.h | 2 +- src/liblzma/lz/lz_encoder.h | 2 +- src/liblzma/rangecoder/range_decoder.h | 2 +- 7 files changed, 24 insertions(+), 24 deletions(-) commit be012b8097a4eaee335b51357d6befa745f753ce Author: Jia Tan Date: 2023-09-24 22:09:16 +0800 Build: Change quoting style from `...' to '...'. configure.ac | 18 +++++++++--------- dos/config.h | 6 +++--- m4/getopt.m4 | 2 +- m4/tuklib_progname.m4 | 2 +- windows/build.bash | 2 +- 5 files changed, 15 insertions(+), 15 deletions(-) commit ce162db07f03495bd333696e66883c8f36abdc1e Author: Jia Tan Date: 2023-09-24 22:05:02 +0800 Docs: Change quoting style from `...' to '...'. These days the ` and ' do not look symmetric. This quoting style has been changed in various apps over the years including the GNU tools. INSTALL | 6 +++--- doc/examples/01_compress_easy.c | 2 +- doc/examples/11_file_info.c | 16 ++++++++-------- 3 files changed, 12 insertions(+), 12 deletions(-) commit db17656721e43939bfa4ec13506e7c76f4b86da6 Author: Jia Tan Date: 2023-09-24 21:25:01 +0800 lib: Silence -Wsign-conversion in getopt.c. lib/getopt.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit a6234f677d66888f435010bc0b67de6a32fefcf6 Author: Jia Tan Date: 2023-09-24 20:48:52 +0800 Build: Update getopt.m4 from Gnulib. This file was modified from upstream since we do not need to replace getopt() and can avoid complexity and feature tests. m4/getopt.m4 | 79 ++++++++++++++++++++++++++++++------------------------------ 1 file changed, 39 insertions(+), 40 deletions(-) commit 84808b68f1075e8603a8ef95d361a61fdc6a5b10 Author: Jia Tan Date: 2023-09-26 00:09:53 +0800 CMake: Add /lib to include path. CMakeLists.txt | 5 +++++ 1 file changed, 5 insertions(+) commit 01804a0b4b64e0f33568e947e0579263808c59d3 Author: Jia Tan Date: 2023-09-24 20:36:34 +0800 CMake: Update libgnu target with new header files. CMakeLists.txt | 5 +++++ 1 file changed, 5 insertions(+) commit d34558388fe1d8929f6478d61dc322eb4f2900af Author: Jia Tan Date: 2023-09-23 00:47:52 +0800 lib: Update Makefile.am for new header files. lib/Makefile.am | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) commit 52bf644bdf536e20fcc743b712cede135e05eec5 Author: Jia Tan Date: 2023-09-24 20:34:03 +0800 lib: Update getopt1.c from Gnulib. The only difference was maintaining the conditional inclusion for config.h. lib/getopt1.c | 56 ++++++++++++++++++++++---------------------------------- 1 file changed, 22 insertions(+), 34 deletions(-) commit 7e884c00d0093c38339f17fb1d280eec493f42ca Author: Jia Tan Date: 2023-09-23 03:27:00 +0800 lib: Update getopt.in.h from Gnulib with modifications. We can still avoid modifying the contents of this file during configuration to simplify the build systems. Gnulib added replacements for inclusions guards for Cygwin. Cygwin should not need getopt_long replacement so this feature can be omitted. is conditionally included to avoid MSVC since it is not available. The definition for _GL_ARG_NONNULL was also copied into this file from Gnulib since this stage is usually done during gnulib-tool. lib/getopt.in.h | 228 +++++++------------------------------------------------- 1 file changed, 29 insertions(+), 199 deletions(-) commit cff05f82066ca3ce9425dafdb086325a8eef8de3 Author: Jia Tan Date: 2023-09-23 00:31:55 +0800 lib: Update getopt_int.h from Gnulib. lib/getopt_int.h | 109 ++++++++++++++++++++++++------------------------------- 1 file changed, 48 insertions(+), 61 deletions(-) commit 04bd86a4b010d43c6a016a3857ecb38dc1d5b024 Author: Jia Tan Date: 2023-09-23 00:27:23 +0800 lib: Update getopt.c from Gnulib with modifications. The code maintains the prior modifications of conditionally including config.h and disabling NLS support. _GL_UNUSED is repalced with the simple cast to void trick. _GL_UNUSED is only used for these two parameters so its simpler than having to define it. lib/getopt.c | 1134 +++++++++++++++++++--------------------------------------- 1 file changed, 377 insertions(+), 757 deletions(-) commit 56b42be7287844db20b3a3bc1372c6ae8c040d63 Author: Jia Tan Date: 2023-09-23 00:18:56 +0800 lib: Add getopt-cdefs.h for getopt_long update. This was modified slightly from Gnulib. In Gnulib, it expects the @HAVE_SYS_CDEFS_H@ to be replaced. Instead, we can set HAVE_SYS_CDEFS_H on systems that have it and avoid copying another file into the build directory. Since we are not using gnulib-tool, copying extra files requires extra build system updates (and special handling with CMake) so we should avoid when possible. lib/getopt-cdefs.h | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) commit 9834e591a4cf9dc2f49e42e26bf28d1d247bc196 Author: Jia Tan Date: 2023-09-23 00:15:25 +0800 lib: Copy new header files from Gnulib without modification. The getopt related files have changed from Gnulib by splitting up getopt.in.h into more modular header files. We could have kept everything in just getopt.in.h, but this will help us continue to update in the future. lib/getopt-core.h | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++ lib/getopt-ext.h | 77 +++++++++++++++++++++++++++++++++++++++++ lib/getopt-pfx-core.h | 66 +++++++++++++++++++++++++++++++++++ lib/getopt-pfx-ext.h | 70 +++++++++++++++++++++++++++++++++++++ 4 files changed, 309 insertions(+) commit 5b7a6f06e93d99d6635a740fd2e12fab66096c93 Author: Lasse Collin Date: 2023-09-22 21:16:52 +0300 Windows: Update the version requirement comments from Win95 to W2k. windows/README-Windows.txt | 10 ++++------ windows/build.bash | 6 +++--- 2 files changed, 7 insertions(+), 9 deletions(-) commit e582f8e0fee46e7cd967f42f465d6bb608b73bc1 Author: Lasse Collin Date: 2023-09-22 21:12:54 +0300 tuklib_physmem: Comment out support for Windows versions older than 2000. src/common/tuklib_physmem.c | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) commit 7d73d1f0e08f96c4ab7aac91b958e37a3dadf07a Author: Lasse Collin Date: 2023-09-24 16:32:32 +0300 sysdefs.h: Update the comment about __USE_MINGW_ANSI_STDIO. src/common/sysdefs.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) commit 2a9929af0ab7e6c0ab725565034afe3293e51d71 Author: Lasse Collin Date: 2023-09-22 02:33:29 +0300 xz: Windows: Don't (de)compress to special files like "con" or "nul". Before this commit, the following writes "foo" to the console and deletes the input file: echo foo | xz > con_xz xz --suffix=_xz --decompress con_xz It cannot happen without --suffix because names like con.xz are also special and so attempting to decompress con.xz (or compress con to con.xz) will already fail when opening the input file. Similar thing is possible when compressing. The following writes to "nul" and the input file "n" is deleted. echo foo | xz > n xz --suffix=ul n Now xz checks if the destination is a special file before continuing. DOS/DJGPP version had a check for this but Windows (and OS/2) didn't. src/xz/file_io.c | 35 ++++++++++++++++++++++++++++------- 1 file changed, 28 insertions(+), 7 deletions(-) commit 01311b81f03cce1c0ce847a3d556f84dbd439343 Author: Lasse Collin Date: 2023-09-21 20:42:52 +0300 CMake: Wrap two overlong lines that are possible to wrap. CMakeLists.txt | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) commit 152d0771ddd0cffcac9042ad1a66f110d228eee2 Author: Lasse Collin Date: 2023-09-21 20:36:31 +0300 CMake: Add a comment about threads on Cygwin. CMakeLists.txt | 1 + 1 file changed, 1 insertion(+) commit 6df988cceffaa3100b428ed816fad334935b27bf Author: Lasse Collin Date: 2023-09-12 23:53:25 +0300 MSVC: Remove Visual Studio project files and update INSTALL-MSVC.txt. CMake is now the preferred build file generator when building with MSVC. windows/INSTALL-MSVC.txt | 37 ++-- windows/vs2013/config.h | 157 --------------- windows/vs2013/liblzma.vcxproj | 363 --------------------------------- windows/vs2013/liblzma_dll.vcxproj | 398 ------------------------------------ windows/vs2013/xz_win.sln | 48 ----- windows/vs2017/config.h | 157 --------------- windows/vs2017/liblzma.vcxproj | 363 --------------------------------- windows/vs2017/liblzma_dll.vcxproj | 398 ------------------------------------ windows/vs2017/xz_win.sln | 48 ----- windows/vs2019/config.h | 157 --------------- windows/vs2019/liblzma.vcxproj | 364 --------------------------------- windows/vs2019/liblzma_dll.vcxproj | 399 ------------------------------------- windows/vs2019/xz_win.sln | 51 ----- 13 files changed, 12 insertions(+), 2928 deletions(-) commit edd563daf0da1d00018684614803c77ab62efcd6 Author: Lasse Collin Date: 2023-09-21 19:17:40 +0300 CMake: Require VS2015 or later for building xzdec. xzdec might build with VS2013 but it hasn't been tested. It was never supported before and VS2013 is old anyway so for simplicity only liblzma is supported with VS2013. CMakeLists.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit daea64d158a7151ca6c255a0e4554c6d521cd589 Author: Lasse Collin Date: 2023-09-12 23:43:49 +0300 CMake: Allow building xz with Visual Studio 2015 and later. Building the command line tools xz and xzdec with the combination of CMake + Visual Studio 2015/2017/2019/2022 works now. VS2013 update 2 should still be able to build liblzma. VS2013 cannot build the xz command line tool because xz needs snprintf() that roughly conforms to C99. VS2013 is old and no extra code will be added to support it. Thanks to Kelvin Lee and Jia Tan for testing. CMakeLists.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 8c2d197c940d246849b2ec48109bb22e54036927 Author: Lasse Collin Date: 2023-09-12 23:34:31 +0300 MSVC: #define inline and restrict only when needed. This also drops the check for _WIN32 as that shouldn't be needed. src/common/sysdefs.h | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) commit af66cd585902045e5689a0418103ec81f19f1d0a Author: Lasse Collin Date: 2023-09-12 22:16:56 +0300 CMake: Add support for replacement getopt_long (lib/getopt*). Thanks to Jia Tan for the initial work. I added the libgnu target and made a few related minor edits. CMakeLists.txt | 54 +++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 47 insertions(+), 7 deletions(-) commit e3288fdb45c580cb849f6799cf419c4922004ae5 Author: Lasse Collin Date: 2023-09-12 21:12:34 +0300 CMake: Bump maximum policy version to 3.27. There are several new policies. CMP0149 may affect the Windows SDK version that CMake will choose by default. The new behavior is more predictable, always choosing the latest SDK version by default. The other new policies shouldn't affect this package. CMakeLists.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit aff1b479c7b168652bd20305ceed4317d5db6661 Author: Lasse Collin Date: 2023-09-12 20:55:10 +0300 lib/getopt*.c: Include only HAVE_CONFIG_H is defined. The CMake-based build doesn't use config.h. Up-to-date getopt_long in Gnulib is LGPLv2 so at some point it could be included in XZ Utils too but for now this commit is enough to make CMake-based build possible. lib/getopt.c | 4 +++- lib/getopt1.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) commit aa0cd585d2ed1455d35732798e0d90e3520e8ba5 Author: Lasse Collin Date: 2023-09-08 19:08:57 +0300 Doxygen: Add more C macro names to PREDEFINED. doxygen/Doxyfile | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) commit ee7709bae53637e1765ce142ef102914f1423cb5 Author: Lasse Collin Date: 2023-09-11 18:47:26 +0300 liblzma: Move a few __attribute__ uses in function declarations. The API headers have many attributes but these were left as is for now. src/liblzma/common/common.c | 6 ++++-- src/liblzma/common/common.h | 8 ++++---- src/liblzma/common/memcmplen.h | 3 ++- 3 files changed, 10 insertions(+), 7 deletions(-) commit 217958d88713b5dc73d366d24dd64b2b311b86fe Author: Lasse Collin Date: 2023-09-11 19:03:35 +0300 xz, xzdec, lzmainfo: Use tuklib_attr_noreturn. For compatibility with C23's [[noreturn]], tuklib_attr_noreturn must be at the beginning of declaration (before "extern" or "static", and even before any GNU C's __attribute__). This commit also moves all other function attributes to the beginning of function declarations. "extern" is kept at the beginning of a line so the attributes are listed on separate lines before "extern" or "static". src/lzmainfo/lzmainfo.c | 6 ++++-- src/xz/coder.c | 3 ++- src/xz/hardware.h | 3 ++- src/xz/message.h | 30 +++++++++++++++++------------- src/xz/options.c | 3 ++- src/xz/util.h | 8 ++++---- src/xzdec/xzdec.c | 9 ++++++--- 7 files changed, 37 insertions(+), 25 deletions(-) commit 18a66fbac031c98f9c2077fc88846e4d07849197 Author: Lasse Collin Date: 2023-09-11 18:53:31 +0300 Remove incorrect uses of __attribute__((__malloc__)). xrealloc() is obviously incorrect, modern GCC docs even mention realloc() as an example where this attribute cannot be used. liblzma's lzma_alloc() and lzma_alloc_zero() would be correct uses most of the time but custom allocators may use a memory pool or otherwise hold the pointer so aliasing issues could happen in theory. The xstrdup() case likely was correct but I removed it anyway. Now there are no __malloc__ attributes left in the code. The allocations aren't in hot paths so this should make no practical difference. src/liblzma/common/common.c | 4 ++-- src/liblzma/common/common.h | 4 ++-- src/xz/util.h | 4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-) commit 74b0e900c92d5b222b36f474f1efa431f8e262f7 Author: Lasse Collin Date: 2023-09-08 18:41:25 +0300 Build: Omit -Wc99-c11-compat since it warns about _Noreturn. configure.ac | 1 - 1 file changed, 1 deletion(-) commit 90c94dddfd57b7d744bfad64c54e10d15778144b Author: Lasse Collin Date: 2023-09-08 18:19:26 +0300 tuklib: Update tuklib_attr_noreturn for C11/C17 and C23. This makes no difference for GCC or Clang as they support GNU C's __attribute__((__noreturn__)) but this helps with MSVC: - VS 2019 version 16.7 and later support _Noreturn if the options /std:c11 or /std:c17 are used. This gets handled with the check for __STDC_VERSION__ >= 201112. - When MSVC isn't in C11/C17 mode, __declspec(noreturn) is used. C23 will deprecate _Noreturn (and ) for [[noreturn]]. This commit anticipates that but the final __STDC_VERSION__ value isn't known yet. src/common/tuklib_common.h | 22 +++++++++++++++++++++- src/common/tuklib_exit.h | 4 ++-- 2 files changed, 23 insertions(+), 3 deletions(-) commit 189f72581329ab281ad6af37f60135910cb1b146 Author: Lasse Collin Date: 2023-09-11 17:22:44 +0300 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit 79334e7f20f2bf9e0de095835b48868f1238f584 Author: Lasse Collin Date: 2023-09-05 22:42:10 +0300 MSVC: xz: Make file_io.c and file_io.h compatible with MSVC. Thanks to Kelvin Lee for the original patches and testing the modifications I made. src/xz/file_io.c | 26 ++++++++++++++++++++++++++ src/xz/file_io.h | 10 ++++++++++ 2 files changed, 36 insertions(+) commit c660b8d78b7bda43b12b285550d8c70e8ccec698 Author: Lasse Collin Date: 2023-09-05 21:33:35 +0300 MSVC: xz: Use GetTickCount64() to implement mytime_now(). It's available since Windows Vista. src/xz/mytime.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) commit 5c6f892d411670e3060f4bc309402617a209e57c Author: Kelvin Lee Date: 2023-09-05 15:05:09 +0300 MSVC: xz: Use _stricmp() instead of strcasecmp() in suffix.c. src/xz/suffix.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) commit e241051f50044259d174e8b4633dd9a1c4478408 Author: Kelvin Lee Date: 2023-09-05 15:01:10 +0300 MSVC: xz: Use _isatty() from to implement isatty(). src/xz/message.c | 5 +++++ src/xz/util.c | 5 +++++ 2 files changed, 10 insertions(+) commit d14bba8fc2be02a9fed8c9bcaaf61103451755f8 Author: Kelvin Lee Date: 2023-09-05 15:10:31 +0300 MSVC: xz: Use _fileno() instead of fileno(). src/xz/private.h | 4 ++++ 1 file changed, 4 insertions(+) commit c4edd367678e6a38c42b149856159bf417da7fe1 Author: Kelvin Lee Date: 2023-09-05 15:00:07 +0300 MSVC: xzdec: Use _fileno and _setmode. src/xzdec/xzdec.c | 4 ++++ 1 file changed, 4 insertions(+) commit cfd1054b9b539ee92524901e95d7bb5a1fe670a0 Author: Kelvin Lee Date: 2023-09-05 14:37:50 +0300 MSVC: Don't #include . lib/getopt.c | 4 +++- lib/getopt.in.h | 4 +++- src/xz/private.h | 5 ++++- src/xzdec/xzdec.c | 5 ++++- 4 files changed, 14 insertions(+), 4 deletions(-) commit adef92f23563a2cc088b31ddee9040ecc96bc996 Author: Lasse Collin Date: 2023-09-19 14:03:45 +0300 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit 953e775941a25bfcfa353f802b13e66acb1edf2c Author: Jia Tan Date: 2023-09-14 21:13:23 +0800 CI: Enable CLMUL in address sanitization test. The crc64_clmul() function should be ignored by the address sanitizer now so these builds should still pass. .github/workflows/ci.yml | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) commit f167e79bc98f3f56af2e767b83aa81c2d2b9ed77 Author: Lasse Collin Date: 2023-09-14 16:35:46 +0300 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit 4f44ef86758a41a8ec814096f4cb6ee6de04c82e Author: Lasse Collin Date: 2023-09-14 16:34:07 +0300 liblzma: Mark crc64_clmul() with __attribute__((__no_sanitize_address__)). Thanks to Agostino Sarubbo. Fixes: https://github.com/tukaani-project/xz/issues/62 src/liblzma/check/crc64_fast.c | 8 ++++++++ 1 file changed, 8 insertions(+) commit 7379bb3eed428c0ae734d0cc4a1fd04359d53f08 Author: Jia Tan Date: 2023-09-12 22:36:12 +0800 CMake: Fix time.h checks not running on second CMake run. If CMake was configured more than once, HAVE_CLOCK_GETTIME and HAVE_CLOCK_MONOTONIC would not be set as compile definitions. The check for librt being needed to provide HAVE_CLOCK_GETTIME was also simplified. CMakeLists.txt | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) commit 5d691fe58286b92d704c0dc5cd0c4df22881c6c6 Author: Jia Tan Date: 2023-09-12 22:34:06 +0800 CMake: Fix unconditionally defining HAVE_CLOCK_MONOTONIC. If HAVE_CLOCK_GETTIME was defined, then HAVE_CLOCK_MONOTONIC was always added as a compile definition even if the check for it failed. CMakeLists.txt | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) commit eccf12866527b8d24c7d7f92f755142be8ef9b11 Author: Lasse Collin Date: 2023-08-31 19:50:05 +0300 xz: Refactor thousand separator detection and disable it on MSVC. Now the two variations of the format strings are created with a macro, and the whole detection code can be easily disabled on platforms where thousand separator formatting is known to not work (MSVC has no support, and on DJGPP 2.05 it can have problems in some cases). src/xz/util.c | 89 ++++++++++++++++++++++++++++++----------------------------- 1 file changed, 45 insertions(+), 44 deletions(-) commit f7093cd9d130477c234b40aeda613964171f8f21 Author: Lasse Collin Date: 2023-08-31 18:14:43 +0300 xz: Fix a too relaxed assertion and remove uses of SSIZE_MAX. SSIZE_MAX isn't readily available on MSVC. Removing it means that there is one thing less to worry when porting to MSVC. src/xz/file_io.c | 5 ++--- src/xz/file_io.h | 4 ++-- 2 files changed, 4 insertions(+), 5 deletions(-) commit 74c3449d8b816a724b12ebce7417e00fb597309a Author: Jia Tan Date: 2023-08-28 23:14:45 +0800 Tests: Improve invalid unpadded size check in test_lzma_index_append(). This check was extended to test the code added to fix a failing assert in ae5c07b22a6b3766b84f409f1b6b5c100469068a. tests/test_index.c | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-) commit 2544274a8b8a27f4ea6c457d2c4c32eb1e4cd336 Author: Jia Tan Date: 2023-08-28 21:54:41 +0800 Tests: Improve comments in test_index.c. tests/test_index.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit 49be29d6380b94e6fb26e511dd2cdbd9afce0f8b Author: Jia Tan Date: 2023-08-28 21:52:54 +0800 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit 721e3d9f7a82f59f32795d5fb97e0210d1aa839a Author: Jia Tan Date: 2023-08-28 21:50:16 +0800 liblzma: Update assert in vli_ceil4(). The argument to vli_ceil4() should always guarantee the return value is also a valid lzma_vli. Thus the highest three valid lzma_vli values are invalid arguments. All uses of the function ensure this so the assert is updated to match this. src/liblzma/common/index.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit ae5c07b22a6b3766b84f409f1b6b5c100469068a Author: Jia Tan Date: 2023-08-28 21:31:25 +0800 liblzma: Add overflow check for Unpadded size in lzma_index_append(). This was not a security bug since there was no path to overflow UINT64_MAX in lzma_index_append() or when it calls index_file_size(). The bug was discovered by a failing assert() in vli_ceil4() when called from index_file_size() when unpadded_sum (the sum of the compressed size of current Stream and the unpadded_size parameter) exceeds LZMA_VLI_MAX. Previously, the unpadded_size parameter was checked to be not greater than UNPADDED_SIZE_MAX, but no check was done once compressed_base was added. This could not have caused an integer overflow in index_file_size() when called by lzma_index_append(). The calculation for file_size breaks down into the sum of: - Compressed base from all previous Streams - 2 * LZMA_STREAM_HEADER_SIZE (size of the current Streams header and footer) - stream_padding (can be set by lzma_index_stream_padding()) - Compressed base from the current Stream - Unpadded size (parameter to lzma_index_append()) The sum of everything except for Unpadded size must be less than LZMA_VLI_MAX. This is guarenteed by overflow checks in the functions that can set these values including lzma_index_stream_padding(), lzma_index_append(), and lzma_index_cat(). The maximum value for Unpadded size is enforced by lzma_index_append() to be less than or equal UNPADDED_SIZE_MAX. Thus, the sum cannot exceed UINT64_MAX since LZMA_VLI_MAX is half of UINT64_MAX. Thanks to Joona Kannisto for reporting this. src/liblzma/common/index.c | 6 ++++++ 1 file changed, 6 insertions(+) commit 1057765aaabfe0f1397b8094531846655376ae38 Author: Jia Tan Date: 2023-08-28 22:18:29 +0800 Translations: Update the Esperanto translation. po/eo.po | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit f2e94d064f305bb8ad77ca70f91d93e55f5cf856 Author: Jia Tan Date: 2023-08-26 20:10:23 +0800 Translations: Update the Esperanto translation. po/eo.po | 47 +++++++++++++++++++++++++++++------------------ 1 file changed, 29 insertions(+), 18 deletions(-) commit 2b871f4dbffe3801d0da3f89806b5935f758d5f3 Author: Jia Tan Date: 2023-08-09 20:55:36 +0800 Docs: Update INSTALL for --enable-threads method win95. The Autotools build allows win95 threads and --enable-small together now if the compiler supports __attribute__((__constructor__)). INSTALL | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) commit 356ad5b26b4196f085ce3afa1869154ca81faad8 Author: Jia Tan Date: 2023-08-09 20:54:15 +0800 CMake: Conditionally allow win95 threads and --enable-small. CMakeLists.txt | 28 ++++++++++++++++++++-------- 1 file changed, 20 insertions(+), 8 deletions(-) commit de574404c4c2f87aca049f232c38526e3ce092aa Author: Jia Tan Date: 2023-08-09 20:35:16 +0800 Build: Conditionally allow win95 threads and --enable-small. When the compiler supports __attribute__((__constructor__)) mythread_once() is never used, even with --enable-small. A configuration with win95 threads and --enable-small will compile and be thread safe so it can be allowed. This isn't a very common configuration since MSVC does not support __attribute__((__constructor__)), but MINGW32 and CLANG32 environments for MSYS2 can use win95 threads and have __attribute__((__constructor__)) support. configure.ac | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) commit 6bf33b704cd31dccf25e68480464aa22d3fcad5a Author: Jamaika1 Date: 2023-08-08 14:07:59 +0200 mythread.h: Fix typo error in Vista threads mythread_once(). The "once_" variable was accidentally referred to as just "once". This prevented building with Vista threads when HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR was not defined. src/common/mythread.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 80cb961e5380a3878246d41341ff91378ca59e05 Author: Jia Tan Date: 2023-08-04 22:17:11 +0800 codespell: Add .codespellrc to set default options. The .codespellrc allows setting default options to avoid false positive matches, set additional dictionaries, etc. For now, codespell can be used locally before committing doc and comment changes. It should help prevent silly errors and fix up commits in the future. .codespellrc | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) commit cd678a6077358935249b64a4a16fe8d17434f9c9 Author: Jia Tan Date: 2023-08-03 20:10:21 +0800 Tests: Style fixes to test_lzip_decoder.c. tests/test_lzip_decoder.c | 36 ++++++++++++++++++++++++------------ 1 file changed, 24 insertions(+), 12 deletions(-) commit 1cac5ed4fa45c9861d745b02d80575cb2ff01d81 Author: Jia Tan Date: 2023-08-03 15:56:20 +0800 Translations: Update the Chinese (simplified) translation. po/zh_CN.po | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 16068f6c30b888cdb873f6285af941d00f95741d Author: Lasse Collin Date: 2023-08-02 17:15:12 +0300 xz: Omit an empty paragraph on the man page. src/xz/xz.1 | 1 - 1 file changed, 1 deletion(-) commit 9ae4371b5106189486e850ce777e40f7b6021c0b Author: Jia Tan Date: 2023-08-02 20:30:07 +0800 Add NEWS for 5.4.4. NEWS | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) commit e8c2203b2c76466d8d3387c5212b46151de8e605 Author: Lasse Collin Date: 2023-08-02 15:19:43 +0300 build-aux/manconv.sh: Fix US-ASCII and UTF-8 output. groff defaults to SGR escapes. Using -P-c passes -c to grotty which restores the old behavior. Perhaps there is a better way to get pure plain text output but this works for now. build-aux/manconv.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 9a706167b0d903d92fd134895acb4bc6a5e3e688 Author: Lasse Collin Date: 2023-08-01 19:10:43 +0300 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit 33e25a0f5650754c38bed640deedefe3b4fec5ef Author: Lasse Collin Date: 2023-08-01 18:22:24 +0300 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit 81db3b889830132334d1f2129bdc93177ac2ca7d Author: ChanTsune <41658782+ChanTsune@users.noreply.github.com> Date: 2023-08-01 18:17:17 +0300 mythread.h: Disable signal functions in builds targeting Wasm + WASI. signal.h in WASI SDK doesn't currently provide sigprocmask() or sigset_t. liblzma doesn't need them so this change makes liblzma and xzdec build against WASI SDK. xz doesn't build yet and the tests don't either as tuktest needs setjmp() which isn't (yet?) implemented in WASI SDK. Closes: https://github.com/tukaani-project/xz/pull/57 See also: https://github.com/tukaani-project/xz/pull/56 (The original commit was edited a little by Lasse Collin.) src/common/mythread.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 71c638c611324e606d324c8189fef8fe79db6991 Author: Jia Tan Date: 2023-08-01 21:58:51 +0800 Add newline to end of .gitignore. Newline was accidentally removed in commit 01cbb7f023ee7fda8ddde04bd17cf7d3c2418706. .gitignore | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 42df7c7aa1cca385e509eb33c65136e61890f0bf Author: Dimitri Papadopoulos Orfanos <3234522+DimitriPapadopoulos@users.noreply.github.com> Date: 2023-07-31 14:02:21 +0200 Docs: Fix typos found by codespell CMakeLists.txt | 4 ++-- NEWS | 2 +- configure.ac | 2 +- src/liblzma/api/lzma/container.h | 4 ++-- src/liblzma/api/lzma/filter.h | 2 +- src/liblzma/api/lzma/lzma12.h | 4 ++-- src/liblzma/common/block_buffer_encoder.c | 2 +- src/liblzma/common/common.h | 2 +- src/liblzma/common/file_info.c | 2 +- src/liblzma/common/lzip_decoder.c | 2 +- src/liblzma/common/stream_decoder_mt.c | 8 ++++---- src/liblzma/common/string_conversion.c | 6 +++--- src/liblzma/lz/lz_encoder.h | 2 +- src/liblzma/lzma/lzma_encoder.c | 4 ++-- src/xz/hardware.c | 4 ++-- tests/test_filter_flags.c | 4 ++-- tests/test_index.c | 2 +- tests/test_vli.c | 2 +- 18 files changed, 29 insertions(+), 29 deletions(-) commit 01cbb7f023ee7fda8ddde04bd17cf7d3c2418706 Author: Jia Tan Date: 2023-07-26 20:26:23 +0800 Update .gitignore. .gitignore | 4 ++++ 1 file changed, 4 insertions(+) commit f97a1afd564c48ad9cb94682e10972a72e11fa08 Author: Jia Tan Date: 2023-07-28 22:03:08 +0800 CMake: Conditionally allow the creation of broken symlinks. The CMake build will try to create broken symlinks on Unix and Unix-like platforms. Cygwin and MSYS2 are Unix-like, but may not be able to create broken symlinks. The value of the CYGWIN or MSYS environment variables determine if broken symlinks are valid. The default for MSYS2 does not allow for broken symlinks, so the CMake build has been broken for MSYS2 since commit 80a1a8bb838842a2be343bd88ad1462c21c5e2c9. CMakeLists.txt | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 75 insertions(+), 7 deletions(-) commit 7190f4cc7c9ade5b9b3675d0cbfa3b6d6ec9cb4f Author: Jia Tan Date: 2023-07-28 21:56:48 +0800 CI: Fix windows-ci dependency installation. All of the MSYS2 environments need make, and it does not come with the toolchain package. The toolchain package will install the needed compiler toolchains since without this package CMake cannot properly generate the Makefiles. .github/workflows/windows-ci.yml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit a048f472cd9a2245265cb292853cbbcdd4f02001 Author: Jia Tan Date: 2023-07-28 21:54:22 +0800 CI: Update ci_build.sh CMake to always make Unix Makefiles. The default for many of the MSYS2 environments is for CMake to create Ninja build files. This would complicate the build script since we would need a different command to run the tests. Its simpler to always use Unix Makefiles so that "make test" is always a usable target for testing. build-aux/ci_build.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 7870396a0ca945473aa0d1d790f4cbef456610bd Author: Jia Tan Date: 2023-07-25 20:17:23 +0800 CI: Test CMake builds and test framework with MSYS2. .github/workflows/windows-ci.yml | 32 ++++++++++++++++++++------------ 1 file changed, 20 insertions(+), 12 deletions(-) commit 6497d1f8875cb7e3007f714336cc09c06fed235b Author: Jia Tan Date: 2023-07-25 20:14:53 +0800 CI: Windows CI rename system matrix variable -> msys2_env. Calling the MSYS2 environment "system" was a bit vague and should be more specific. .github/workflows/windows-ci.yml | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) commit 785e4121d9b2921ad36bd3af1cf61fa20a9265bd Author: Jia Tan Date: 2023-07-24 23:11:45 +0800 CI: Add Clang64 MSYS2 environment to Windows CI. .github/workflows/windows-ci.yml | 1 + 1 file changed, 1 insertion(+) commit d9166b52cf3458a4da3eb92224837ca8fc208d79 Author: Jia Tan Date: 2023-07-24 21:43:44 +0800 liblzma: Prevent an empty translation unit in Windows builds. To workaround Automake lacking Windows resource compiler support, an empty source file is compiled to overwrite the resource files for static library builds. Translation units without an external declaration are not allowed by the C standard and result in a warning when used with -Wempty-translation-unit (Clang) or -pedantic (GCC). src/liblzma/Makefile.am | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) commit db5019d691f980d622fb56fdcf383af2c3519c98 Author: Jia Tan Date: 2023-07-22 18:37:56 +0800 Translations: Update the Vietnamese translation. po/vi.po | 45 ++++++++++++++++++++++++++++----------------- 1 file changed, 28 insertions(+), 17 deletions(-) commit f3a055f762ba5b71b746fc2d44a6ababde2c61b5 Author: Jia Tan Date: 2023-07-22 14:55:42 +0800 CI: Add Windows runner for Autotools builds with MSYS2. Only a subset of the tests run by the Linux and MacOS Autotools builds are run. The most interesting tests are the ones that disable threads, encoders, and decoders. The Windows runner will only be run manually since these tests will likely take much longer than the Linux and MacOS runners. This runner should be used before merging any large features and before releases. Currently the clang64 environment fails to due to a warning and -Werror is enabled for the CI tests. This is still an early version since the CMake build can be done for MSVC and optionally each of the MSYS2 environments. GitHub does not allow manually running the CI tests unless the workflow is checked on the default branch so checking in a minimum version is a good idea. Thanks to Arthur S for the original proposing the original patch. Closes: https://github.com/tukaani-project/xz/pull/34 .github/workflows/windows-ci.yml | 119 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 119 insertions(+) commit 556536a3525df9e5ed78b8c7057991cfa9edfac8 Author: Jia Tan Date: 2023-07-21 22:11:01 +0800 CI: Add argument to ci_build.sh to pass flags to autogen.sh. build-aux/ci_build.sh | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) commit 39a32d36fc465c4e70f13192eea380e518ba6e8a Author: Jia Tan Date: 2023-07-21 18:05:44 +0800 Tests: Skip .lz files in test_files.sh if not configured. Previously if the lzip decoder was not configured then test_files.sh would pass the lzip tests instead of skipping them. tests/test_files.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 194d12724b30fe42789d12a0184f9d412c449347 Author: Jia Tan Date: 2023-07-20 22:11:13 +0800 Tests: Add ARM64 filter test to test_compress.sh. tests/test_compress.sh | 1 + 1 file changed, 1 insertion(+) commit d850365c444368102c69beaddf849ed463c33467 Author: Jia Tan Date: 2023-07-20 20:30:05 +0800 Translations: Update the Croatian translation. po/hr.po | 49 ++++++++++++++++++++++++++++++------------------- 1 file changed, 30 insertions(+), 19 deletions(-) commit 24049eb7acf6d42a60f00efe4e7289fe8e1797fe Author: Jia Tan Date: 2023-07-20 20:28:32 +0800 Translations: Update the Korean man page translations. po4a/ko.po | 1255 ++++++++++++++++++++++++++++++------------------------------ 1 file changed, 629 insertions(+), 626 deletions(-) commit 4d4a4fa07de6cb9d913fb2f97712fddda2527b49 Author: Jia Tan Date: 2023-07-20 20:25:24 +0800 Translations: Update the Korean translation. po/ko.po | 45 ++++++++++++++++++++++++++++----------------- 1 file changed, 28 insertions(+), 17 deletions(-) commit 237f06d9c55cf438a7538a598354bcf103f23711 Author: Jia Tan Date: 2023-07-20 20:24:05 +0800 Translations: Update the Polish translation. po/pl.po | 47 +++++++++++++++++++++++++++++------------------ 1 file changed, 29 insertions(+), 18 deletions(-) commit 80c2c832136656d5ac7a1bca8bc42d95e13d281a Author: Jia Tan Date: 2023-07-20 20:22:23 +0800 Translations: Update the German man page translations. po4a/de.po | 1255 ++++++++++++++++++++++++++++++------------------------------ 1 file changed, 629 insertions(+), 626 deletions(-) commit fdbde14503ca03069d3649aa51926f5f796b89d8 Author: Jia Tan Date: 2023-07-20 20:18:44 +0800 Translations: Update the German translation. po/de.po | 47 +++++++++++++++++++++++++++++------------------ 1 file changed, 29 insertions(+), 18 deletions(-) commit 9f3bf5ff5b2b5cf0b252a2bf381238ca49dc4101 Author: Jia Tan Date: 2023-07-20 20:17:10 +0800 Translations: Update the Chinese (simplified) translation. po/zh_CN.po | 47 +++++++++++++++++++++++++++++------------------ 1 file changed, 29 insertions(+), 18 deletions(-) commit 376938c588011567c74f1d5a160c0ccce6336d46 Author: Jia Tan Date: 2023-07-20 20:15:47 +0800 Translations: Update the Swedish translation. po/sv.po | 47 +++++++++++++++++++++++++++++------------------ 1 file changed, 29 insertions(+), 18 deletions(-) commit 26b0bc6eb82c84559936a7c7080de5c71c8276f8 Author: Jia Tan Date: 2023-07-20 20:14:00 +0800 Translations: Update the Ukrainian man page translations. po4a/uk.po | 1253 ++++++++++++++++++++++++++++++------------------------------ 1 file changed, 628 insertions(+), 625 deletions(-) commit 2d02c8b7640b54f3c5aa1c8b5990ba56f322393b Author: Jia Tan Date: 2023-07-20 20:09:15 +0800 Translations: Update the Ukrainian translation. po/uk.po | 45 ++++++++++++++++++++++++++++----------------- 1 file changed, 28 insertions(+), 17 deletions(-) commit f881018b503fd334331c24a09075429558abbce1 Author: Jia Tan Date: 2023-07-20 20:06:57 +0800 Translations: Update the Spanish translation. po/es.po | 47 +++++++++++++++++++++++++++++------------------ 1 file changed, 29 insertions(+), 18 deletions(-) commit 791fe6d3ffd6877fa5f852be69d9251397dfaa31 Author: Jia Tan Date: 2023-07-20 20:05:19 +0800 Translations: Update the Romanian translation. po/ro.po | 48 ++++++++++++++++++++++++++++++------------------ 1 file changed, 30 insertions(+), 18 deletions(-) commit 8827e90704f699fe08bb5bed56b1717a2bc0eb77 Author: Jia Tan Date: 2023-07-20 20:02:56 +0800 Translations: Update the Romanian man page translations. po4a/ro.po | 1254 ++++++++++++++++++++++++++++++------------------------------ 1 file changed, 629 insertions(+), 625 deletions(-) commit 0184d344fa4f215cd345bb131db9068e077c69b8 Author: Jia Tan Date: 2023-07-19 23:36:00 +0800 liblzma: Suppress -Wunused-function warning. Clang 16.0.0 and earlier have a bug that the ifunc resolver function triggers the -Wunused-function warning. The resolver function is static and only "used" by the __attribute__((__ifunc()__)). At this time, the bug is still unresolved, but has been reported: https://github.com/llvm/llvm-project/issues/63957 This is not a problem in GCC. src/liblzma/check/crc64_fast.c | 10 ++++++++++ 1 file changed, 10 insertions(+) commit 43845fa70fc751736c44c18f4cee42d49bfd1392 Author: Jia Tan Date: 2023-07-18 22:52:25 +0800 liblzma: Reword lzma_str_list_filters() documentation. This further improves the documentation from commit f36ca7982f6bd5e9827219ed4f3c5a1fbf5d7bdf. The previous wording of "supported options" was slightly misleading since the options that are printed are the ones that are relevant for encoding/decoding. It is not about which options can or must be specified. src/liblzma/api/lzma/filter.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 818701ba1c9dff780b7fbf28f9ab8eb11a25dd67 Author: Jia Tan Date: 2023-07-18 22:49:57 +0800 liblzma: Improve comment in string_conversion.c. The comment used "flag" when referring to decoder options. Just referring to them as options is more clear and consistent. src/liblzma/common/string_conversion.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit b6b7d065853cd4c3f5b8d9be8aea0b6dcb0fe090 Author: Lasse Collin Date: 2023-07-18 17:37:33 +0300 xz: Translate the second "%s: " in message.c since French needs "%s : ". This string is used to print a filename when using "xz -v" and stderr isn't a terminal. src/xz/message.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit be644042c3066d8e7a2834f989671ba74d27f749 Author: Lasse Collin Date: 2023-07-18 14:35:33 +0300 xz: Make "%s: %s" translatable because French needs "%s : %s". src/xz/args.c | 5 ++++- src/xz/coder.c | 8 ++++---- src/xz/file_io.c | 8 ++++---- src/xz/list.c | 11 ++++++----- 4 files changed, 18 insertions(+), 14 deletions(-) commit 97fd5cb669ee0afc48d2087675ab166aff89eaa2 Author: Lasse Collin Date: 2023-07-18 13:57:54 +0300 liblzma: Tweak #if condition in memcmplen.h. Maybe ICC always #defines _MSC_VER on Windows but now it's very clear which code will get used. src/liblzma/common/memcmplen.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 40392c19f71985852d75997f109dea97177d6f3f Author: Lasse Collin Date: 2023-07-18 13:49:43 +0300 liblzma: Omit unnecessary parenthesis in a preprocessor directive. src/liblzma/common/memcmplen.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit abc1d5601b7e419ebc28a1ab4b268613b52e6f98 Author: Jia Tan Date: 2023-07-18 00:51:48 +0800 xz: Update Authors list in a few files. src/xz/args.c | 3 ++- src/xz/args.h | 3 ++- src/xz/coder.c | 3 ++- src/xz/coder.h | 3 ++- src/xz/message.c | 3 ++- 5 files changed, 10 insertions(+), 5 deletions(-) commit 289034a168878baa9df6ff6e159110aade69cba5 Author: Jia Tan Date: 2023-07-14 23:20:33 +0800 Docs: Add a new section to INSTALL for Tests. The new Tests section describes basic information about the tests, how to run them, and important details when cross compiling. We have had a few questions about how to compile the tests without running them, so hopefully this information will help others with the same question in the future. Fixes: https://github.com/tukaani-project/xz/issues/54 INSTALL | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 64 insertions(+), 17 deletions(-) commit 1119e5f5a519b0ab71c81fc4dc84c0cc72abe513 Author: Jia Tan Date: 2023-07-14 21:10:27 +0800 Docs: Update README. This adds an entry to "Other implementations of the .xz format" for XZ for Java. README | 4 ++++ 1 file changed, 4 insertions(+) commit f99e2e4e53b7ea89e4eef32ddd4882e0416357c9 Author: Jia Tan Date: 2023-07-13 23:32:10 +0800 xz: Fix typo in man page. The Memory limit information section described three output columns when it actually has six. This was reworded to "multiple" to make it more future proof. src/xz/xz.1 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit f907705eb1f6c5edaafc9668a34c51a989932f1d Author: Jia Tan Date: 2023-07-13 21:46:12 +0800 xz: Minor clean up for coder.c * Moved max_block_list_size from a global to local variable. * Reworded error message in validate_block_list_filter(). * Removed helper function filter_chain_error(). * Changed 1 << X to 1U << X in many places src/xz/coder.c | 53 +++++++++++++++++++++-------------------------------- 1 file changed, 21 insertions(+), 32 deletions(-) commit 9adc9e56157ecbf2948e5036df8567809b9ae177 Author: Jia Tan Date: 2023-07-13 21:26:47 +0800 xz: Update man page Authors and date. src/xz/xz.1 | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) commit c12e429f2635da8d8f5749e5f733f451baca6945 Author: Jia Tan Date: 2023-06-20 20:32:59 +0800 xz: Add a section to man page for robot mode --filters-help. src/xz/xz.1 | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) commit e10f2db5d10300c16fa482a136ed31c1aa6e8e8d Author: Jia Tan Date: 2023-06-19 23:11:41 +0800 xz: Slight reword in xz man page for consistency. Changed will print => prints in xz --robot --version description to match --robot --info-memory description. src/xz/xz.1 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit f5dc172a402fa946f3c45a16929d7fe14c9f5e81 Author: Jia Tan Date: 2023-06-19 23:07:10 +0800 xz: Reorder robot mode subsections in the man page. The order is now consistent with the order the command line arguments are documented earlier in the man page. The new order is: 1. --list 2. --info-memory 3. --version Instead of the previous order: 1. --version 2. --info-memory 3. --list src/xz/xz.1 | 192 ++++++++++++++++++++++++++++++------------------------------ 1 file changed, 96 insertions(+), 96 deletions(-) commit 9628be23aef2784249fd9f3199799d785d2ec5cc Author: Jia Tan Date: 2023-05-13 00:46:50 +0800 xz: Update man page for new --filters-help option. src/xz/xz.1 | 10 ++++++++++ 1 file changed, 10 insertions(+) commit a165d7df1964121eb9df715e6f836a31c865beef Author: Jia Tan Date: 2023-05-13 00:44:41 +0800 xz: Add a new --filters-help option. The --filters-help can be used to help create filter chains with the --filters and --filtersX options. The message in --long-help is too short to fully explain the syntax to construct complex filter chains. In --robot mode, xz will only print the output from liblzma function lzma_str_list_filters. src/xz/args.c | 8 ++++++++ src/xz/message.c | 30 ++++++++++++++++++++++++++++++ src/xz/message.h | 5 +++++ 3 files changed, 43 insertions(+) commit 95f1a414b156ee35d3e71862a14915fdd138f913 Author: Jia Tan Date: 2023-04-21 20:28:11 +0800 xz: Update the man page for --block-list and --filtersX The --block-list option description needed updating since the new --filtersX option changes how it can be used. The new entry for --filters1=FILTERS ... --filter9=FILTERS was created right after the --filters option. src/xz/xz.1 | 106 +++++++++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 80 insertions(+), 26 deletions(-) commit 47a63cad2aa778280e0c1926b7159427ea028cb1 Author: Jia Tan Date: 2023-04-21 19:50:14 +0800 xz: Update --long-help for the new --filtersX option. src/xz/message.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) commit 8b9913a13daca2550d02dfdcdc9be15f55ca4d13 Author: Jia Tan Date: 2023-06-17 20:46:21 +0800 xz: Ignore filter chains that are set but never used in --block-list. If a filter chain is set but not used in --block-list, it introduced unexpected behavior such as requiring an unneeded amount of memory to compress, reducing the number of threads in multi-threaded encoding, and printing an incorrect amount of memory needed to decompress. This also renames filters_init_mask => filters_used_mask. A filter is assumed to be used if it is specified in --filtersX until coder_set_compression_settings() determines which filters are referenced in --block-list. src/xz/coder.c | 66 ++++++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 48 insertions(+), 18 deletions(-) commit 183819bfd9efac8c184d9bf123325719b7eee30f Author: Jia Tan Date: 2023-05-13 20:11:13 +0800 xz: Set the Block size for mt encoding correctly. When opt_block_size is not used, the Block size for mt encoder is derived from the minimum of the largest Block specified by --block-list and the recommended Block size on all filter chains calculated by lzma_mt_block_size(). This avoids using unnecessary memory and ensures that all Blocks are large enough for the most memory needy filter chain. src/xz/coder.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 67 insertions(+), 1 deletion(-) commit afb2dbec3d857b026486b75e42a4728e12d234cb Author: Jia Tan Date: 2023-05-11 00:09:41 +0800 xz: Validate --flush-timeout for all specified filter chains. src/xz/coder.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) commit 5f0c5a04388f8334962c70bc37a8c2ff8f605e0a Author: Jia Tan Date: 2023-05-13 19:54:33 +0800 xz: Allows --block-list filters to scale down memory usage. Previously, only the default filter chain could have its memory usage adjusted. The filter chains specified with --filtersX were not checked for memory usage. Now, all used filter chains will be adjusted if necessary. src/xz/coder.c | 269 +++++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 214 insertions(+), 55 deletions(-) commit 479fd58d60622331fcbe48fddf756927b9f80d9a Author: Jia Tan Date: 2023-05-10 21:50:33 +0800 xz: Do not include block splitting if encoders are disabled. The block splitting logic and split_block() function are not needed if encoders are disabled. This will help slightly reduce the binary size when built without encoders and allow split_block() to use functions that require encoders being enabled. src/xz/coder.c | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-) commit f86ede22500f7ae024ec3ec3f3489ab5a857a3b3 Author: Jia Tan Date: 2023-05-10 22:38:59 +0800 xz: Free filters[] in debug mode. This will only free filter chains created with --filters1-9 since the default filter chain may be set from a static function variable. The complexity to free the default filter chain is not worth the burden on code maintenance. src/xz/coder.c | 10 ++++++++++ 1 file changed, 10 insertions(+) commit f281cd0d692ac0c70fc7669b80dddb863ea947e1 Author: Jia Tan Date: 2023-05-13 19:28:23 +0800 xz: Add a message if --block-list is used outside of xz compresssion. --block-list is only supported with compression in xz format. This avoids silently ignoring when --block-list is unused. src/xz/args.c | 11 +++++++++++ 1 file changed, 11 insertions(+) commit d6af7f347077b22403133239592e478931307759 Author: Jia Tan Date: 2023-04-18 20:29:09 +0800 xz: Create command line options for filters[1-9]. The new command line options are meant to be combined with --block-list. They work as an optional extension to --block-list to specify a custom filter chain for each block listed. The new options allow the creation of up to 9 reusable filter chains. For instance: xz --block-list=1:10MiB,3:5MiB,,2:5MiB,1:0 --filters1=delta--lzma2 \ --filters2=x86--lzma2 --filters3=arm64--lzma2 Will create the following blocks: 1. A block of size 10 MiB with filter chain delta, lzma2. 2. A block of size 5 MiB with filter chain arm64, lzma2. 3. A block of size 5 MiB with filter chain arm64, lzma2. 4. A block of size 5 MiB with filter chain x86, lzma2. 5. A block containing the rest of the file contents with filter chain delta, lzma2. src/xz/args.c | 82 ++++++++++++++++++++++--- src/xz/coder.c | 188 ++++++++++++++++++++++++++++++++++++++++++--------------- src/xz/coder.h | 20 +++++- 3 files changed, 230 insertions(+), 60 deletions(-) commit 072d29250113268536719ad0e040ab8a66fb6435 Author: Jia Tan Date: 2023-05-13 19:36:09 +0800 xz: Use lzma_filters_free() in forget_filter_chain(). This is a little cleaner than the previous implementation of forget_filter_chain(). It is also more consistent since lzma_str_to_filters() will always terminate the filter chain so there is no need to terminate it later in coder_set_compression_settings(). src/xz/coder.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) commit 3d21da5cff4b511633cb6e0d8a1090485c0c1059 Author: Jia Tan Date: 2023-04-17 22:22:45 +0800 xz: Separate string to filter conversion into a helper function. Converting from string to filter will also need to be done for block specific filter chains. src/xz/coder.c | 33 ++++++++++++++++++++------------- 1 file changed, 20 insertions(+), 13 deletions(-) commit a6583726e5f950278f96abcf79c04f1056810be6 Author: Jia Tan Date: 2023-01-06 00:03:35 +0800 Tests: Use new --filters option in test_compress.sh tests/test_compress.sh | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) commit 5f3b898d07cc9b7160c7c88b3120b7edabb8a5b0 Author: Jia Tan Date: 2023-01-06 00:03:06 +0800 xz: Update --long-help and man page for new --filters option. src/xz/message.c | 6 ++++++ src/xz/xz.1 | 41 ++++++++++++++++++++++++++++++++++++----- 2 files changed, 42 insertions(+), 5 deletions(-) commit 9ded880a0221f4d1256845fc4ab957ffd377c760 Author: Jia Tan Date: 2023-01-06 00:02:29 +0800 xz: Add --filters option to CLI. The --filters option uses the new lzma_str_to_filters() function to convert a string into a full filter chain. Using this option will reset all previous filters set by --preset, --[filter], or --filters. src/xz/args.c | 9 +++++++-- src/xz/coder.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++-- src/xz/coder.h | 3 +++ 3 files changed, 58 insertions(+), 4 deletions(-) commit 2c189bb00af73dc7ba1a67a9d274d5be03ee3a88 Author: Jia Tan Date: 2023-07-14 21:30:25 +0800 Tests: Improve feature testing for skipping. Fixed a bug where test_compress_* would all fail if arm64 or armthumb filters were enabled for compression but arm was disabled. Since the grep tests only checked for "define HAVE_ENCODER_ARM", this would match on HAVE_ENCODER_ARM64 or HAVE_ENCODER_ARMTHUMB. Now the config.h feature test requires " 1" at the end to prevent the prefix problem. have_feature() was also updated for this even though there were known current bugs affecting it. This is just in case future features have a similar prefix problem. tests/test_compress.sh | 4 ++-- tests/test_files.sh | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) commit 80a6b9bcad016c99c9ba3f3eeb4a619fcadfd357 Author: Jia Tan Date: 2023-07-10 20:56:28 +0800 Translations: Update the Chinese (traditional) translation. po/zh_TW.po | 659 ++++++++++++++++++++++++++++++++++-------------------------- 1 file changed, 377 insertions(+), 282 deletions(-) commit 17f8844e6fc355abf997d77637a7447c4f7bbcbd Author: Jia Tan Date: 2023-07-08 21:24:19 +0800 liblzma: Remove non-portable empty initializer. Commit 78704f36e74205857c898a351c757719a6c8b666 added an empty initializer {} to prevent a warning. The empty initializer is a GNU extension and results in a build failure on MSVC. The -wpedantic flag warns about empty initializers. src/liblzma/common/stream_encoder_mt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 3aca4f629cd577f0c54f594d5d88722edf0b0413 Author: Jia Tan Date: 2023-07-08 20:03:59 +0800 Translations: Update the Vietnamese translation. po/vi.po | 620 +++++++++++++++++++++++++++++++++++---------------------------- 1 file changed, 349 insertions(+), 271 deletions(-) commit 66bdcfa85fef2911cc80f5f30fed3f9610faccb4 Author: Jia Tan Date: 2023-06-28 20:46:31 +0800 Tests: Fix memory leaks in test_index. Several tests were missing calls to lzma_index_end() to clean up the lzma_index structs. The memory leaks were discovered by using -fsanitize=address with GCC. tests/test_index.c | 11 +++++++++++ 1 file changed, 11 insertions(+) commit fe3bd438fb119f9bad3f08dc29d331e4956196e1 Author: Jia Tan Date: 2023-06-28 20:43:29 +0800 Tests: Fix memory leaks in test_block_header. test_block_header was not properly freeing the filter options between calls to lzma_block_header_decode(). The memory leaks were discovered by using -fsanitize=address with GCC. tests/test_block_header.c | 38 ++++++++++++++++++++++---------------- 1 file changed, 22 insertions(+), 16 deletions(-) commit 78704f36e74205857c898a351c757719a6c8b666 Author: Jia Tan Date: 2023-06-28 20:31:11 +0800 liblzma: Prevent uninitialzed warning in mt stream encoder. This change only impacts the compiler warning since it was impossible for the wait_abs struct in stream_encode_mt() to be used before it was initialized since mythread_condtime_set() will always be called before mythread_cond_timedwait(). Since the mythread.h code is different between the POSIX and Windows versions, this warning was only present on Windows builds. Thanks to Arthur S for reporting the warning and providing an initial patch. src/liblzma/common/stream_encoder_mt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit e3356a204c5ae02db3ec4552b6c1be354e9b6142 Author: Jia Tan Date: 2023-06-28 20:22:38 +0800 liblzma: Prevent warning for MSYS2 Windows build. In lzma_memcmplen(), the header file is only included if _MSC_VER and _M_X64 are both defined but _BitScanForward64() was previously used if _M_X64 was defined. GCC for MSYS2 defines _M_X64 but not _MSC_VER so _BitScanForward64() was used without including . Now, lzma_memcmplen() will use __builtin_ctzll() for MSYS2 GCC builds as expected. src/liblzma/common/memcmplen.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) commit 45e250a9e9f3c3e8e8af2983366b170bf54f890e Author: Jia Tan Date: 2023-06-28 21:01:22 +0800 CI: Add test with -fsanitize=address,undefined. ci_build.sh was updated to accept disabling of __attribute__ ifunc and CLMUL. This will allow -fsanitize=address to pass because ifunc is incompatible with -fsanitize=address. The CLMUL implementation has optimizations that potentially read past the buffer and mask out the unwanted bytes. This test will only run on Autotools Linux. .github/workflows/ci.yml | 23 +++++++++++++++++++---- build-aux/ci_build.sh | 8 +++++++- 2 files changed, 26 insertions(+), 5 deletions(-) commit 596ee722cd7ddf0afae584fc06365adc0e735977 Author: Jia Tan Date: 2023-06-28 20:16:04 +0800 CI: Upgrade checkout action from v2 to v3. .github/workflows/ci.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 86118ea320f867e09e98a8682cc08cbbdfd640e2 Author: Jia Tan Date: 2023-06-27 23:38:32 +0800 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit 3d1fdddf92321b516d55651888b9c669e254634e Author: Jia Tan Date: 2023-06-27 17:27:09 +0300 Docs: Document the configure option --disable-ifunc in INSTALL. INSTALL | 8 ++++++++ 1 file changed, 8 insertions(+) commit b4cf7a2822e8d30eb2b12a1a07fd04383b10ade3 Author: Lasse Collin Date: 2023-06-27 17:24:49 +0300 Minor tweaks to style and comments. CMakeLists.txt | 8 ++++---- configure.ac | 9 +++++---- 2 files changed, 9 insertions(+), 8 deletions(-) commit 23fb9e3a329117c2968c1e7388b6ef07c782dba1 Author: Lasse Collin Date: 2023-06-27 17:19:49 +0300 CMake: Rename CHECK_ATTR_IFUNC to ALLOW_ATTR_IFUNC. It's so that there's a clear difference in wording compared to liblzma's integrity check types. CMakeLists.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit ee44863ae88e377a5df10db007ba9bfadde3d314 Author: Lasse Collin Date: 2023-06-27 17:05:23 +0300 liblzma: Add ifunc implementation to crc64_fast.c. The ifunc method avoids indirection via the function pointer crc64_func. This works on GNU/Linux and probably on FreeBSD too. The previous __attribute((__constructor__)) method is kept for compatibility with ELF platforms which do support ifunc. The ifunc method has some limitations, for example, building liblzma with -fsanitize=address will result in segfaults. The configure option --disable-ifunc must be used for such builds. Thanks to Hans Jansen for the original patch. Closes: https://github.com/tukaani-project/xz/pull/53 src/liblzma/check/crc64_fast.c | 35 ++++++++++++++++++++++++++--------- 1 file changed, 26 insertions(+), 9 deletions(-) commit b72d21202402a603db6d512fb9271cfa83249639 Author: Hans Jansen Date: 2023-06-22 19:49:30 +0200 Add ifunc check to CMakeLists.txt CMake build system will now verify if __attribute__((__ifunc__())) can be used in the build system. If so, HAVE_FUNC_ATTRIBUTE_IFUNC will be defined to 1. CMakeLists.txt | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) commit 23b5c36fb71904bfbe16bb20f976da38dadf6c3b Author: Hans Jansen Date: 2023-06-22 19:46:55 +0200 Add ifunc check to configure.ac configure.ac will now verify if __attribute__((__ifunc__())) can be used in the build system. If so, HAVE_FUNC_ATTRIBUTE_IFUNC will be defined to 1. configure.ac | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) commit dbb3a536ed9873ffa0870321f6873e564c6a9da8 Author: Jia Tan Date: 2023-06-07 00:18:30 +0800 CI: Add apt update command before installing dependencies. Without the extra command, all of the CI tests were automatically failing because the Ubuntu servers could not be reached properly. .github/workflows/ci.yml | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) commit 6bcd516812331de42b347922913230895bebad34 Author: Jia Tan Date: 2023-06-07 00:10:38 +0800 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit 0d94ba69220d894d2a86081821d2d7a89df5a10b Author: Benjamin Buch Date: 2023-06-06 15:32:45 +0200 CMake: Protects against double find_package Boost iostream uses `find_package` in quiet mode and then again uses `find_package` with required. This second call triggers a `add_library cannot create imported target "ZLIB::ZLIB" because another target with the same name already exists.` This can simply be fixed by skipping the alias part on secondary `find_package` runs. CMakeLists.txt | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) commit 045d7aae286ecd2ce163be9e0d9041343a03f89a Author: Jia Tan Date: 2023-05-31 20:26:42 +0800 Translations: Update the Esperanto translation. po/eo.po | 185 +++++++++++++++++++++++++++++++-------------------------------- 1 file changed, 92 insertions(+), 93 deletions(-) commit b0cc7c2dcefe4cbc4e1e697598c14fb687ed0b78 Author: Jia Tan Date: 2023-05-31 20:25:00 +0800 Translations: Update the Croatian translation. po/hr.po | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit af045ef6f848f02cd14c9ad195a5f87bb0c02dce Author: Jia Tan Date: 2023-05-31 20:15:53 +0800 Translations: Update the Chinese (simplified) translation. po/zh_CN.po | 317 ++++++++++++++++++++++++++++++------------------------------ 1 file changed, 157 insertions(+), 160 deletions(-) commit e6b92d5817fe91ad27a0f7f57bd0f2144311e383 Author: Jia Tan Date: 2023-05-17 23:12:13 +0800 Translations: Update German translation of man pages. po4a/de.po | 52 ++++++++++++---------------------------------------- 1 file changed, 12 insertions(+), 40 deletions(-) commit 592961ccdbba39c7d60fe37e36764232feb57c60 Author: Jia Tan Date: 2023-05-17 23:09:18 +0800 Translations: Update the German translation. po/de.po | 189 +++++++++++++++++++++++++++++++-------------------------------- 1 file changed, 94 insertions(+), 95 deletions(-) commit 13572cb2c391f5b7503e333c6e05b20bd5bbb524 Author: Jia Tan Date: 2023-05-17 20:30:01 +0800 Translations: Update the Croatian translation. po/hr.po | 187 +++++++++++++++++++++++++++++++-------------------------------- 1 file changed, 93 insertions(+), 94 deletions(-) commit 4e6e425ea8f097c6fb43e69cc9540294dca3680d Author: Jia Tan Date: 2023-05-17 20:26:54 +0800 Translations: Update Korean translation of man pages. po4a/ko.po | 3015 ++++++++++++------------------------------------------------ 1 file changed, 568 insertions(+), 2447 deletions(-) commit d5ef1f6faf7c270f60093629257150085ecf19ca Author: Jia Tan Date: 2023-05-17 20:13:01 +0800 Translations: Update the Korean translation. po/ko.po | 319 +++++++++++++++++++++++++++++++-------------------------------- 1 file changed, 158 insertions(+), 161 deletions(-) commit e22d0b0f2e301e7906d0106689d967ed84362028 Author: Jia Tan Date: 2023-05-16 23:49:09 +0800 Translations: Update the Spanish translation. po/es.po | 319 +++++++++++++++++++++++++++++++-------------------------------- 1 file changed, 158 insertions(+), 161 deletions(-) commit f50da74d52d01f6cfd826a921249e289cf671678 Author: Jia Tan Date: 2023-05-16 23:47:23 +0800 Translations: Update the Romanian translation. po/ro.po | 195 ++++++++++++++++++++++++++++++++------------------------------- 1 file changed, 98 insertions(+), 97 deletions(-) commit 4b9ad60a7305e9841b7cb4ea611bdf5fa7271696 Author: Jia Tan Date: 2023-05-16 23:45:43 +0800 Translations: Update Romanian translation of man pages. po4a/ro.po | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) commit cb6fd57f889c5d9fab36ae8c9e10083a5fe32dea Author: Jia Tan Date: 2023-05-16 23:43:51 +0800 Translations: Update Ukrainian translation of man pages. po4a/uk.po | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) commit c3e8fcbc2db4861f92ad15606c995bd255803c52 Author: Jia Tan Date: 2023-05-16 23:37:54 +0800 Translations: Update the Ukrainian translation. po/uk.po | 321 +++++++++++++++++++++++++++++++-------------------------------- 1 file changed, 159 insertions(+), 162 deletions(-) commit 27b81b84fcedbc55aa6e6b21004c44070b15b038 Author: Jia Tan Date: 2023-05-16 23:07:35 +0800 Translations: Update the Polish translation. po/pl.po | 316 +++++++++++++++++++++++++++++++-------------------------------- 1 file changed, 155 insertions(+), 161 deletions(-) commit 8024ad636a65ed6ea95c94d57255be4c6724d6ed Author: Jia Tan Date: 2023-05-16 22:52:14 +0800 Translations: Update the Swedish translation. po/sv.po | 319 +++++++++++++++++++++++++++++++-------------------------------- 1 file changed, 158 insertions(+), 161 deletions(-) commit 6699a29673f227c4664826db485ed9f7596320d2 Author: Jia Tan Date: 2023-05-16 21:21:38 +0800 Translations: Update the Esperanto translation. po/eo.po | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) commit f36ca7982f6bd5e9827219ed4f3c5a1fbf5d7bdf Author: Jia Tan Date: 2023-05-13 21:21:54 +0800 liblzma: Slightly rewords lzma_str_list_filters() documentation. Reword "options required" to "supported options". The previous may have suggested that the options listed were all required anytime a filter is used for encoding or decoding. The reword makes this more clear that adjusting the options is optional. src/liblzma/api/lzma/filter.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 3374a5359e52f1671d8f831d65827d5020fe2595 Author: Jia Tan Date: 2023-05-11 23:49:23 +0800 liblzma: Adds lzma_nothrow to MicroLZMA API functions. None of the liblzma functions may throw an exception, so this attribute should be applied to all liblzma API functions. src/liblzma/api/lzma/container.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) commit 8f236574986e7c414c0ea059f441982d1387e6a4 Author: Jia Tan Date: 2023-05-09 20:20:06 +0800 liblzma: Exports lzma_mt_block_size() as an API function. The lzma_mt_block_size() was previously just an internal function for the multithreaded .xz encoder. It is used to provide a recommended Block size for a given filter chain. This function is helpful to determine the maximum Block size for the multithreaded .xz encoder when one wants to change the filters between blocks. Then, this determined Block size can be provided to lzma_stream_encoder_mt() in the lzma_mt options parameter when intializing the coder. This requires one to know all the filter chains they are using before starting to encode (or at least the filter chain that will need the largest Block size), but that isn't a bad limitation. src/liblzma/api/lzma/container.h | 28 ++++++++++++++++++++++++++++ src/liblzma/common/filter_encoder.c | 16 ++++++++++------ src/liblzma/common/filter_encoder.h | 6 +----- src/liblzma/common/stream_encoder_mt.c | 20 +++++++++----------- src/liblzma/liblzma_generic.map | 5 +++++ src/liblzma/liblzma_linux.map | 5 +++++ src/liblzma/lzma/lzma2_encoder.c | 3 +++ 7 files changed, 61 insertions(+), 22 deletions(-) commit d0f33d672a4da7985ebb5ba8d829f885de49c171 Author: Jia Tan Date: 2023-05-08 22:58:09 +0800 liblzma: Creates IS_ENC_DICT_SIZE_VALID() macro. This creates an internal liblzma macro to test if the dictionary size is valid for encoding. src/liblzma/lz/lz_encoder.c | 4 +--- src/liblzma/lz/lz_encoder.h | 8 ++++++++ 2 files changed, 9 insertions(+), 3 deletions(-) commit c247d06e1f6cada9a76f4f6225cbd97ea760f52f Author: Jia Tan Date: 2023-05-02 20:39:56 +0800 Add NEWS for 5.4.3. NEWS | 10 ++++++++++ 1 file changed, 10 insertions(+) commit 77050b78364ffb6b0f129e742b7c31602d725c08 Author: Jia Tan Date: 2023-05-02 20:39:37 +0800 Add NEWS for 5.2.12. NEWS | 14 ++++++++++++++ 1 file changed, 14 insertions(+) commit 713e15e43eb6279a7ab4bbad3d1325ebfdcf09a0 Author: Jia Tan Date: 2023-05-04 20:38:52 +0800 Translations: Update the Croatian translation. po/hr.po | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit 9ad64bdf309844b6ca6c3e8a4dfb6dbaedda0ca9 Author: Jia Tan Date: 2023-05-04 20:30:25 +0800 tuklib_integer.h: Reverts previous commit. Previous commit 6be460dde07113fe3f08f814b61ddc3264125a96 would cause an error if the integer size was 32 bit. src/common/tuklib_integer.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 6be460dde07113fe3f08f814b61ddc3264125a96 Author: Jia Tan Date: 2023-05-04 19:25:20 +0800 tuklib_integer.h: Changes two other UINT_MAX == UINT32_MAX to >=. src/common/tuklib_integer.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 44c0c5eae990a22ef04e9b88c1a15838a0d00878 Author: Lasse Collin Date: 2023-05-03 22:46:42 +0300 tuklib_integer.h: Fix a recent copypaste error in Clang detection. Wrong line was changed in 7062348bf35c1e4cbfee00ad9fffb4a21aa6eff7. Also, this has >= instead of == since ints larger than 32 bits would work too even if not relevant in practice. src/common/tuklib_integer.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 2cf5ae5b5b279b0b2e69ca4724e7bd705865fe68 Author: Jia Tan Date: 2023-04-25 20:06:15 +0800 CI: Adds a build and test for small configuration. .github/workflows/ci.yml | 5 +++++ 1 file changed, 5 insertions(+) commit 16b81a057a87c2f18e6ed6447f003af0cbdcfe43 Author: Jia Tan Date: 2023-04-25 20:05:26 +0800 CI: ci_build.sh allows configuring small build. build-aux/ci_build.sh | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) commit 78ccd93951f9e988d447bcdd70b24f6df5448d1d Author: Jia Tan Date: 2023-04-20 20:15:00 +0800 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit f41df2ac2fed347d3f107f3533e76e000d29c6cb Author: Jia Tan Date: 2023-04-19 22:22:16 +0800 Windows: Include when needed. Legacy Windows did not need to #include to use the MSVC intrinsics. Newer versions likely just issue a warning, but the MSVC documentation says to include the header file for the intrinsics we use. GCC and Clang can "pretend" to be MSVC on Windows, so extra checks are needed in tuklib_integer.h to only include when it will is actually needed. src/common/tuklib_integer.h | 6 ++++++ src/liblzma/common/memcmplen.h | 10 ++++++++++ 2 files changed, 16 insertions(+) commit 7062348bf35c1e4cbfee00ad9fffb4a21aa6eff7 Author: Jia Tan Date: 2023-04-19 21:59:03 +0800 tuklib_integer: Use __builtin_clz() with Clang. Clang has support for __builtin_clz(), but previously Clang would fallback to either the MSVC intrinsic or the regular C code. This was discovered due to a bug where a new version of Clang required the header file in order to use the MSVC intrinsics. Thanks to Anton Kochkov for notifying us about the bug. src/common/tuklib_integer.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit 3938718ce3773c90755785c0df8777f133b7ae29 Author: Lasse Collin Date: 2023-04-14 18:42:33 +0300 liblzma: Update project maintainers in lzma.h. AUTHORS was updated earlier, lzma.h was simply forgotten. src/liblzma/api/lzma.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 2a89670ab295e377f8b44f5bda6d198deb8ea285 Author: Jia Tan Date: 2023-04-13 20:45:19 +0800 liblzma: Cleans up old commented out code. src/liblzma/common/alone_encoder.c | 11 ----------- 1 file changed, 11 deletions(-) commit 0fbb2b87a7b5a1dd9d0f4a5e84ac7919557dbe81 Author: Jia Tan Date: 2023-04-07 20:46:41 +0800 Docs: Add missing word to SECURITY.md. .github/SECURITY.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit fb9c50f38a17bf37581de4034b36c8df8ec90a87 Author: Jia Tan Date: 2023-04-07 20:43:22 +0800 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit 537c6cd8a9db0dd6b13683e64ddac2943190d715 Author: Jia Tan Date: 2023-04-07 20:42:12 +0800 Docs: Minor edits to SECURITY.md. .github/SECURITY.md | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) commit 6549df8dd53f358345957e232648fdb699930074 Author: Gabriela Gutierrez Date: 2023-04-07 12:08:30 +0000 Docs: Create SECURITY.md Signed-off-by: Gabriela Gutierrez .github/SECURITY.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) commit d0faa85df5a5d253a4625d45313cf5e9277e6cd2 Author: Jia Tan Date: 2023-03-28 22:48:24 +0800 CI: Tests for disabling threading on CMake builds. .github/workflows/ci.yml | 3 --- build-aux/ci_build.sh | 4 ++-- 2 files changed, 2 insertions(+), 5 deletions(-) commit 8be5cc3b1359d88b4b30a39067466c0ae0bfbc4d Author: Jia Tan Date: 2023-03-28 22:45:42 +0800 CI: Removes CMakeCache.txt between builds. If the cache file is not removed, CMake will not reset configurations back to their default values. In order to make the tests independent, it is simplest to purge the cache. Unfortunatly, this will slow down the tests a little and repeat some checks. build-aux/ci_build.sh | 2 ++ 1 file changed, 2 insertions(+) commit 2cb6028fc31de082b7f927632363bb1426b61aaa Author: Jia Tan Date: 2023-03-28 22:32:40 +0800 CMake: Update liblzma-config.cmake generation. Now that the threading is configurable, the liblzma CMake package only needs the threading library when using POSIX threads. CMakeLists.txt | 33 ++++++++++++++++++++++----------- 1 file changed, 22 insertions(+), 11 deletions(-) commit 4d7fac0b07cc722825ba8d7838c558827e635611 Author: Jia Tan Date: 2023-03-28 22:25:33 +0800 CMake: Allows setting thread method. The thread method is now configurable for the CMake build. It matches the Autotools build by allowing ON (pick the best threading method), OFF (no threading), posix, win95, and vista. If both Windows and posix threading are both available, then ON will choose Windows threading. Windows threading will also not use: target_link_libraries(liblzma Threads::Threads) since on systems like MinGW-w64 it would link the posix threads without purpose. CMakeLists.txt | 144 +++++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 104 insertions(+), 40 deletions(-) commit 20cd905d898c1494dee42b78530769bb9c9f8076 Author: Jia Tan Date: 2023-03-24 23:05:48 +0800 CI: Runs CMake feature tests. Now, CMake will run similar feature disable tests that the Autotools version did before. In order to do this without repeating lines in ci.yml, it now makes sense to use the GitHub Workflow matrix to create a loop. .github/workflows/ci.yml | 169 +++++++++++++++-------------------------------- 1 file changed, 55 insertions(+), 114 deletions(-) commit 4fabdb269f1fc5624b3b94a170c4efb329d1d229 Author: Jia Tan Date: 2023-03-24 20:35:11 +0800 CI: ci_build.sh allows CMake features to be configured. Also included various clean ups for style and helper functions for repeated work. build-aux/ci_build.sh | 233 +++++++++++++++++++++++++++++++------------------- 1 file changed, 143 insertions(+), 90 deletions(-) commit cf3d1f130e50cf63da4bb1031771605f6f443b6a Author: Jia Tan Date: 2023-03-24 20:06:33 +0800 CI: Change ci_build.sh to use bash instead of sh. This script is only meant to be run as part of the CI build/test process on machines that are known to have bash (Ubuntu and MacOS). If this assumption changes in the future, then the bash specific commands will need to be replaced with a more portable option. For now, it is convenient to use bash commands. build-aux/ci_build.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit ddfe164368e779c40d061aa4ccc376129e92f8e1 Author: Jia Tan Date: 2023-03-24 20:05:59 +0800 CMake: Only build xzdec if decoders are enabled. CMakeLists.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 116e81f002c503d3c3cd12726db8f9116e58ef25 Author: Jia Tan Date: 2023-03-22 15:42:04 +0800 Build: Removes redundant check for LZMA1 filter support. src/liblzma/lzma/Makefile.inc | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) commit 0ba234f692772595329d225462d391fe2c199d0a Author: Lasse Collin Date: 2023-03-23 15:14:29 +0200 CMake: Bump maximum policy version to 3.26. It adds only one new policy related to FOLDERS which we don't use. This makes it clear that the code is compatible with the policies up to 3.26. CMakeLists.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit b0891684b4436aed31510fddcbb218d513bd5489 Author: Jia Tan Date: 2023-03-21 23:36:00 +0800 CMake: Conditionally build xz list.* files if decoders are enabled. CMakeLists.txt | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) commit 2c1a830efb61d9d65906a09c9ee3ce27c2c49227 Author: Jia Tan Date: 2023-02-25 11:46:50 +0800 CMake: Allow configuring features as cache variables. This allows users to change the features they build either in CMakeCache.txt or by using a CMake GUI. The sources built for liblzma are affected by this too, so only the necessary files will be compiled. CMakeLists.txt | 528 ++++++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 391 insertions(+), 137 deletions(-) commit 8be136f667aaeb8f9e16fbd57a83cb282f0c27ff Author: Lasse Collin Date: 2023-03-21 14:07:51 +0200 Build: Add a comment that AC_PROG_CC_C99 is needed for Autoconf 2.69. It's obsolete in Autoconf >= 2.70 and just an alias for AC_PROG_CC but Autoconf 2.69 requires AC_PROG_CC_C99 to get a C99 compiler. configure.ac | 3 +++ 1 file changed, 3 insertions(+) commit 53cc475f2652d9e390ca002018dfd0af0626ef80 Author: Lasse Collin Date: 2023-03-21 14:04:37 +0200 Build: configure.ac: Use AS_IF and AS_CASE where required. This makes no functional difference in the generated configure (at least with the Autotools versions I have installed) but this change might prevent future bugs like the one that was just fixed in the commit 5a5bd7f871818029d5ccbe189f087f591258c294. configure.ac | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) commit 3b8890a40233b6c783bb101ec14405e786871775 Author: Lasse Collin Date: 2023-03-21 13:12:03 +0200 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit 5a5bd7f871818029d5ccbe189f087f591258c294 Author: Lasse Collin Date: 2023-03-21 13:11:49 +0200 Build: Fix --disable-threads breaking the building of shared libs. This is broken in the releases 5.2.6 to 5.4.2. A workaround for these releases is to pass EGREP='grep -E' as an argument to configure in addition to --disable-threads. The problem appeared when m4/ax_pthread.m4 was updated in the commit 6629ed929cc7d45a11e385f357ab58ec15e7e4ad which introduced the use of AC_EGREP_CPP. AC_EGREP_CPP calls AC_REQUIRE([AC_PROG_EGREP]) to set the shell variable EGREP but this was only executed if POSIX threads were enabled. Libtool code also has AC_REQUIRE([AC_PROG_EGREP]) but Autoconf omits it as AC_PROG_EGREP has already been required earlier. Thus, if not using POSIX threads, the shell variable EGREP would be undefined in the Libtool code in configure. ax_pthread.m4 is fine. The bug was in configure.ac which called AX_PTHREAD conditionally in an incorrect way. Using AS_CASE ensures that all AC_REQUIREs get always run. Thanks to Frank Busse for reporting the bug. Fixes: https://github.com/tukaani-project/xz/issues/45 configure.ac | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) commit dfe1710784c0a3c3a90c17b80c9e1fe19b5fce06 Author: Lasse Collin Date: 2023-03-19 22:45:59 +0200 liblzma: Silence -Wsign-conversion in SSE2 code in memcmplen.h. Thanks to Christian Hesse for reporting the issue. Fixes: https://github.com/tukaani-project/xz/issues/44 src/liblzma/common/memcmplen.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) commit f0c580c5fc38bf49a184b48d76c1d8c057d499ce Author: Jia Tan Date: 2023-03-18 22:10:57 +0800 Add NEWS for 5.4.2. NEWS | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) commit af4925e6043113ec9b5f9c0cf13abf2a18ccb1f6 Author: Jia Tan Date: 2023-03-18 22:10:12 +0800 Add NEWS for 5.2.11. NEWS | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) commit 5a7b930efa7f9849d8da8397e8e5d8638f92be40 Author: Lasse Collin Date: 2023-03-18 16:00:54 +0200 Update the copy of GNU GPLv3 from gnu.org to COPYING.GPLv3. COPYING.GPLv3 | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) commit b473a92891f7e991398a3b5eff305f6f2b6d7293 Author: Lasse Collin Date: 2023-03-18 15:51:57 +0200 Change a few HTTP URLs to HTTPS. The xz man page timestamp was intentionally left unchanged. INSTALL | 2 +- README | 8 ++++---- configure.ac | 2 +- dos/INSTALL.txt | 4 ++-- src/liblzma/api/lzma.h | 8 ++++---- src/liblzma/check/sha256.c | 2 +- src/xz/xz.1 | 2 +- windows/INSTALL-MinGW.txt | 10 +++++----- 8 files changed, 19 insertions(+), 19 deletions(-) commit 8b2f6001b4f412c259a7883427f2f2c8cea98ea8 Author: Jia Tan Date: 2023-03-18 00:40:28 +0800 CMake: Fix typo in a comment. CMakeLists.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 76e2315e14c399c15cc90e7930fd4d3d086b0227 Author: Lasse Collin Date: 2023-03-17 18:36:22 +0200 Windows: build.bash: Copy liblzma API docs to the output package. windows/build.bash | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) commit 133cf55edc5ce92952d2709abd992e48ef1f45ee Author: Lasse Collin Date: 2023-03-17 08:53:38 +0200 Windows: Add microlzma_*.c to the VS project files. These should have been included in 5.3.2alpha already. windows/vs2013/liblzma.vcxproj | 2 ++ windows/vs2013/liblzma_dll.vcxproj | 2 ++ windows/vs2017/liblzma.vcxproj | 2 ++ windows/vs2017/liblzma_dll.vcxproj | 2 ++ windows/vs2019/liblzma.vcxproj | 2 ++ windows/vs2019/liblzma_dll.vcxproj | 2 ++ 6 files changed, 12 insertions(+) commit 75c9ca450fab6982fda9286b168081c9d54126cd Author: Lasse Collin Date: 2023-03-17 08:43:51 +0200 CMake: Add microlzma_*.c to the build. These should have been included in 5.3.2alpha already. CMakeLists.txt | 2 ++ 1 file changed, 2 insertions(+) commit 0cc3313bd4e569c51e686e5aab8c40c35241d34b Author: Lasse Collin Date: 2023-03-17 08:41:36 +0200 Build: Update comments about unaligned access to mention 64-bit. cmake/tuklib_integer.cmake | 7 +++---- m4/tuklib_integer.m4 | 4 ++-- 2 files changed, 5 insertions(+), 6 deletions(-) commit 5e57e3301319f20c35f8111dea73fa58403b96b1 Author: Lasse Collin Date: 2023-03-17 00:02:30 +0200 Tests: Update .gitignore. .gitignore | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) commit 0007394d54e21bf30abb9a5e09cbc1e8d44a73ac Author: Lasse Collin Date: 2023-03-14 20:04:03 +0200 po4a/update-po: Display the script name consistently in error messages. po4a/update-po | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 509157c80c500426ec853bd992d684ebafc8500c Author: Jia Tan Date: 2023-03-17 01:30:36 +0800 Doc: Rename Doxygen HTML doc directory name liblzma => api. When the docs are installed, calling the directory "liblzma" is confusing since multiple other files in the doc directory are for liblzma. This should also make it more natural for distros when they package the documentation. .gitignore | 2 +- Makefile.am | 18 +++++++++--------- PACKAGERS | 4 ++-- doxygen/Doxyfile | 2 +- doxygen/update-doxygen | 18 +++++++++--------- 5 files changed, 22 insertions(+), 22 deletions(-) commit fd90e2f4c29180b44e33c7ef726f94e4eae54ed3 Author: Jia Tan Date: 2023-03-16 22:07:15 +0800 liblzma: Remove note from lzma_options_bcj about the ARM64 exception. This was left in by mistake since an early version of the ARM64 filter used a different struct for its options. src/liblzma/api/lzma/bcj.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 4f50763b981f9056c5f1763dfb26cfa4a26a181d Author: Jia Tan Date: 2023-03-16 21:44:02 +0800 CI: Add doxygen as a dependency. Autogen now requires --no-doxygen or having doxygen installed to run without errors. .github/workflows/ci.yml | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) commit f68f4b27f62f53fdac570885a1f4f23367ce6599 Author: Lasse Collin Date: 2023-03-15 19:19:13 +0200 COPYING: Add a note about the included Doxygen-generated HTML. COPYING | 11 +++++++++++ 1 file changed, 11 insertions(+) commit 8979308528c1f45cb9ee52d511f05232b4ad90a1 Author: Jia Tan Date: 2023-03-16 21:41:09 +0800 Doc: Update PACKAGERS with details about liblzma API docs install. PACKAGERS | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) commit 55ba6e93004842ae0a0792214a23504267ad8f43 Author: Jia Tan Date: 2023-03-16 21:38:32 +0800 liblzma: Add set lzma.h as the main page for Doxygen documentation. The \mainpage command is used in the first block of comments in lzma.h. This changes the previously nearly empty index.html to use the first comment block in lzma.h for its contents. lzma.h is no longer documented separately, but this is for the better since lzma.h only defined a few macros that users do not need to use. The individual API header files all have a disclaimer that they should not be #included directly, so there should be no confusion on the fact that lzma.h should be the only header used by applications. Additionally, the note "See ../lzma.h for information about liblzma as a whole." was removed since lzma.h is now the main page of the generated HTML and does not have its own page anymore. So it would be confusing in the HTML version and was only a "nice to have" when browsing the source files. src/liblzma/api/lzma.h | 1 + src/liblzma/api/lzma/base.h | 2 -- src/liblzma/api/lzma/bcj.h | 2 -- src/liblzma/api/lzma/block.h | 2 -- src/liblzma/api/lzma/check.h | 2 -- src/liblzma/api/lzma/container.h | 2 -- src/liblzma/api/lzma/delta.h | 2 -- src/liblzma/api/lzma/filter.h | 2 -- src/liblzma/api/lzma/hardware.h | 2 -- src/liblzma/api/lzma/index.h | 2 -- src/liblzma/api/lzma/index_hash.h | 4 +--- src/liblzma/api/lzma/lzma12.h | 2 -- src/liblzma/api/lzma/stream_flags.h | 2 -- src/liblzma/api/lzma/version.h | 2 -- src/liblzma/api/lzma/vli.h | 2 -- 15 files changed, 2 insertions(+), 29 deletions(-) commit 16f21255597f6a57e5692780f962cdc090f62b8c Author: Jia Tan Date: 2023-03-16 21:37:32 +0800 Build: Generate doxygen documentation in autogen.sh. Another command line option (--no-doxygen) was added to disable creating the doxygen documenation in cases where it not wanted or if the doxygen tool is not installed. autogen.sh | 35 +++++++++++++++++++++++++++++------ 1 file changed, 29 insertions(+), 6 deletions(-) commit 1321852a3be7196bd7fcfd146221a5669e46407c Author: Jia Tan Date: 2023-03-16 21:35:55 +0800 Build: Create doxygen/update-doxygen script. This is a helper script to generate the Doxygen documentation. It can be run in 'liblzma' or 'internal' mode by setting the first argument. It will default to 'liblzma' mode and only generate documentation for the liblzma API header files. The helper script will be run during the custom mydist hook when we create releases. This hook already alters the source directory, so its fine to do it here too. This way, we can include the Doxygen generated files in the distrubtion and when installing. In 'liblzma' mode, the JavaScript is stripped from the .html files and the .js files are removed. This avoids license hassle from jQuery and other libraries that Doxygen 1.9.6 puts into jquery.js in minified form. Makefile.am | 1 + doxygen/update-doxygen | 111 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 112 insertions(+) commit b1216a7772952d2fe7fe9c6acfcbd98d30abbc7b Author: Jia Tan Date: 2023-03-16 21:34:36 +0800 Build: Install Doxygen docs and include in distribution if generated. Added a install-data-local target to install the Doxygen documentation only when it has been generated. In order to correctly remove the docs, a corresponding uninstall-local target was added. If the doxygen docs exist in the source tree, they will also be included in the distribution now too. Makefile.am | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) commit c97d12f300b2a94c9f54a44c8931c8bc08cf0a73 Author: Lasse Collin Date: 2023-03-16 21:23:48 +0800 Doxygen: Refactor Doxyfile.in to doxygen/Doxyfile. Instead of having Doxyfile.in configured by Autoconf, the Doxyfile can have the tags that need to be configured piped into the doxygen command through stdin with the overrides after Doxyfile's contents. Going forward, the documentation should be generated in two different modes: liblzma or internal. liblzma is useful for most users. It is the documentation for just the liblzma API header files. This is the default. internal is for people who want to understand how xz and liblzma work. It might be useful for people who want to contribute to the project. .gitignore | 3 +- Makefile.am | 1 - configure.ac | 40 --- Doxyfile.in => doxygen/Doxyfile | 721 +++++++++++++++++++++++++--------------- 4 files changed, 456 insertions(+), 309 deletions(-) commit 1b7661faa4bbf4a54c6b75900b5059835c382a0f Author: Jia Tan Date: 2023-02-28 23:22:36 +0800 Tests: Remove unused macros and functions. tests/tests.h | 75 ----------------------------------------------------------- 1 file changed, 75 deletions(-) commit af55191102f01e76de658c881299f0909ca0feda Author: Jia Tan Date: 2022-12-29 21:52:15 +0800 liblzma: Defines masks for return values from lzma_index_checks(). src/liblzma/api/lzma/index.h | 23 +++++++++++++++++++++++ tests/test_index.c | 22 +++++++++++----------- 2 files changed, 34 insertions(+), 11 deletions(-) commit 8f38cdd9ab71e2a9d5a9787550222b7578243b73 Author: Jia Tan Date: 2023-01-12 22:29:07 +0800 Tests: Refactors existing lzma_index tests. Converts the existing lzma_index tests into tuktests and covers every API function from index.h except for lzma_file_info_decoder, which can be tested in the future. tests/test_index.c | 2036 ++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 1492 insertions(+), 544 deletions(-) commit 717aa3651ce582807f379d8654c2516e1594df77 Author: Lasse Collin Date: 2023-03-11 18:42:08 +0200 xz: Simplify the error-label in Capsicum sandbox code. Also remove unneeded "sandbox_allowed = false;" as this code will never be run more than once (making it work with multiple input files isn't trivial). src/xz/file_io.c | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-) commit a0eecc235d3ba8ad3453da98b46c7bc3e644de75 Author: Lasse Collin Date: 2023-03-07 19:59:23 +0200 xz: Make Capsicum sandbox more strict with stdin and stdout. src/xz/file_io.c | 8 ++++++++ 1 file changed, 8 insertions(+) commit 916448d624aaf55cef0fc3e53754affb8c4f309d Author: Jia Tan Date: 2023-03-08 23:08:46 +0800 Revert: "Add warning if Capsicum sandbox system calls are unsupported." The warning causes the exit status to be 2, so this will cause problems for many scripted use cases for xz. The sandbox usage is already very limited already, so silently disabling this allows it to be more usable. src/xz/file_io.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) commit 01587dda2a8f13fef7e12fd624e6d05da5f9624f Author: Jia Tan Date: 2023-03-07 20:02:22 +0800 xz: Fix -Wunused-label in io_sandbox_enter(). Thanks to Xin Li for recommending the fix. src/xz/file_io.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 5fb936786601a1cd013a5d436adde65982b1e13c Author: Jia Tan Date: 2023-03-06 21:37:45 +0800 xz: Add warning if Capsicum sandbox system calls are unsupported. The warning is only used when errno == ENOSYS. Otherwise, xz still issues a fatal error. src/xz/file_io.c | 2 ++ 1 file changed, 2 insertions(+) commit 61ee82cb1232a402c82282bbae42821f2b952b0d Author: Jia Tan Date: 2023-03-06 21:27:53 +0800 xz: Skip Capsicum sandbox system calls when they are unsupported. If a system has the Capsicum header files but does not actually implement the system calls, then this would render xz unusable. Instead, we can check if errno == ENOSYS and not issue a fatal error. src/xz/file_io.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) commit f070722b57ba975a0dff36492d766f03026b1d21 Author: Jia Tan Date: 2023-03-06 21:08:26 +0800 xz: Reorder cap_enter() to beginning of capsicum sandbox code. cap_enter() puts the process into the sandbox. If later calls to cap_rights_limit() fail, then the process can still have some extra protections. src/xz/file_io.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit f1ab1f6b339d16a53ac53efeb97779ecd2bae70f Author: Jia Tan Date: 2023-02-24 23:46:23 +0800 liblzma: Clarify lzma_lzma_preset() documentation in lzma12.h. lzma_lzma_preset() does not guarentee that the lzma_options_lzma are usable in an encoder even if it returns false (success). If liblzma is built with default configurations, then the options will always be usable. However if the match finders hc3, hc4, or bt4 are disabled, then the options may not be usable depending on the preset level requested. The documentation was updated to reflect this complexity, since this behavior was unclear before. src/liblzma/api/lzma/lzma12.h | 5 +++++ 1 file changed, 5 insertions(+) commit 4b7fb3bf41a0ca4c97fad3799949a2aa61b13b99 Author: Lasse Collin Date: 2023-02-27 18:38:35 +0200 CMake: Require that the C compiler supports C99 or a newer standard. Thanks to autoantwort for reporting the issue and suggesting a different patch: https://github.com/tukaani-project/xz/pull/42 CMakeLists.txt | 8 ++++++++ 1 file changed, 8 insertions(+) commit 9aa7fdeb04c486d2700967090956af88fdccab7e Author: Jia Tan Date: 2023-02-24 18:10:37 +0800 Tests: Small tweak to test-vli.c. The static global variables can be disabled if encoders and decoders are not built. If they are not disabled and -Werror is used, it will cause an usused warning as an error. tests/test_vli.c | 2 ++ 1 file changed, 2 insertions(+) commit 3cf72c4bcba5370f07477c9b9b62ae33069ef9a9 Author: Jia Tan Date: 2023-02-06 21:46:43 +0800 liblzma: Replace '\n' -> newline in filter.h documentation. The '\n' renders as a newline when the comments are converted to html by Doxygen. src/liblzma/api/lzma/filter.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 002006be62d77c706565fa6ec828bea64be302da Author: Jia Tan Date: 2023-02-06 21:45:37 +0800 liblzma: Shorten return description for two functions in filter.h. Shorten the description for lzma_raw_encoder_memusage() and lzma_raw_decoder_memusage(). src/liblzma/api/lzma/filter.h | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) commit 463d9359b8595f01d44ada1739d75aeb87f36524 Author: Jia Tan Date: 2023-02-06 21:44:45 +0800 liblzma: Reword a few lines in filter.h src/liblzma/api/lzma/filter.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) commit 01441df92c0fd6a6c02fe5ac27982a54ce887cc0 Author: Jia Tan Date: 2023-02-06 21:35:06 +0800 liblzma: Improve documentation in filter.h. All functions now explicitly specify parameter and return values. The notes and code annotations were moved before the parameter and return value descriptions for consistency. Also, the description above lzma_filter_encoder_is_supported() about not being able to list available filters was removed since lzma_str_list_filters() will do this. src/liblzma/api/lzma/filter.h | 226 ++++++++++++++++++++++++++---------------- 1 file changed, 143 insertions(+), 83 deletions(-) commit 805b45cd60bfd5da3d3d89077de3789df179b324 Author: Lasse Collin Date: 2023-02-23 20:46:16 +0200 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit 30e95bb44c36ae26b2ab12a94343b215fec285e7 Author: Lasse Collin Date: 2023-02-21 22:57:10 +0200 liblzma: Avoid null pointer + 0 (undefined behavior in C). In the C99 and C17 standards, section 6.5.6 paragraph 8 means that adding 0 to a null pointer is undefined behavior. As of writing, "clang -fsanitize=undefined" (Clang 15) diagnoses this. However, I'm not aware of any compiler that would take advantage of this when optimizing (Clang 15 included). It's good to avoid this anyway since compilers might some day infer that pointer arithmetic implies that the pointer is not NULL. That is, the following foo() would then unconditionally return 0, even for foo(NULL, 0): void bar(char *a, char *b); int foo(char *a, size_t n) { bar(a, a + n); return a == NULL; } In contrast to C, C++ explicitly allows null pointer + 0. So if the above is compiled as C++ then there is no undefined behavior in the foo(NULL, 0) call. To me it seems that changing the C standard would be the sane thing to do (just add one sentence) as it would ensure that a huge amount of old code won't break in the future. Based on web searches it seems that a large number of codebases (where null pointer + 0 occurs) are being fixed instead to be future-proof in case compilers will some day optimize based on it (like making the above foo(NULL, 0) return 0) which in the worst case will cause security bugs. Some projects don't plan to change it. For example, gnulib and thus many GNU tools currently require that null pointer + 0 is defined: https://lists.gnu.org/archive/html/bug-gnulib/2021-11/msg00000.html https://www.gnu.org/software/gnulib/manual/html_node/Other-portability-assumptions.html In XZ Utils null pointer + 0 issue should be fixed after this commit. This adds a few if-statements and thus branches to avoid null pointer + 0. These check for size > 0 instead of ptr != NULL because this way bugs where size > 0 && ptr == NULL will likely get caught quickly. None of them are in hot spots so it shouldn't matter for performance. A little less readable version would be replacing ptr + offset with offset != 0 ? ptr + offset : ptr or creating a macro for it: #define my_ptr_add(ptr, offset) \ ((offset) != 0 ? ((ptr) + (offset)) : (ptr)) Checking for offset != 0 instead of ptr != NULL allows GCC >= 8.1, Clang >= 7, and Clang-based ICX to optimize it to the very same code as ptr + offset. That is, it won't create a branch. So for hot code this could be a good solution to avoid null pointer + 0. Unfortunately other compilers like ICC 2021 or MSVC 19.33 (VS2022) will create a branch from my_ptr_add(). Thanks to Marcin Kowalczyk for reporting the problem: https://github.com/tukaani-project/xz/issues/36 src/liblzma/common/block_decoder.c | 5 ++++- src/liblzma/common/block_encoder.c | 7 +++++-- src/liblzma/common/common.c | 20 ++++++++++++++------ src/liblzma/common/index_decoder.c | 13 ++++++++++--- src/liblzma/common/index_encoder.c | 11 +++++++++-- src/liblzma/common/index_hash.c | 13 ++++++++++--- src/liblzma/common/lzip_decoder.c | 6 +++++- src/liblzma/delta/delta_decoder.c | 7 ++++++- src/liblzma/delta/delta_encoder.c | 12 ++++++++++-- src/liblzma/simple/simple_coder.c | 6 ++++-- 10 files changed, 77 insertions(+), 23 deletions(-) commit fa9065fac54194fe0407fc7f0cc9633fdce13c21 Author: Jia Tan Date: 2023-02-07 00:00:44 +0800 liblzma: Adjust container.h for consistency with filter.h. src/liblzma/api/lzma/container.h | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) commit 00a721b63d82dfb658dca8d8cb599d8a245c663f Author: Jia Tan Date: 2023-02-07 00:00:09 +0800 liblzma: Fix small typos and reword a few things in filter.h. src/liblzma/api/lzma/container.h | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) commit 5b1c171d4ffe89ef18fa31509bb0185d6fd11d39 Author: Jia Tan Date: 2023-02-06 23:42:08 +0800 liblzma: Convert list of flags in lzma_mt to bulleted list. src/liblzma/api/lzma/container.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) commit dbd47622eb99fefb3538a22baec3def002aa56f5 Author: Jia Tan Date: 2023-01-26 23:17:41 +0800 liblzma: Fix typo in documentation in container.h lzma_microlzma_decoder -> lzma_microlzma_encoder src/liblzma/api/lzma/container.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 14cd30806d69e55906073745bcce3ee50e0ec942 Author: Jia Tan Date: 2023-01-26 23:16:34 +0800 liblzma: Improve documentation for container.h Standardizing each function to always specify parameters and return values. Also moved the parameters and return values to the end of each function description. src/liblzma/api/lzma/container.h | 146 +++++++++++++++++++++++++-------------- 1 file changed, 93 insertions(+), 53 deletions(-) commit c9c8bfae3502842dcead85eeb2b951b437c2cd88 Author: Jia Tan Date: 2023-02-22 20:59:41 +0800 CMake: Add LZIP decoder test to list of tests. CMakeLists.txt | 1 + 1 file changed, 1 insertion(+) commit b9f171dd00a3cc32b6d41ea8e082cf545640ec2a Author: Lasse Collin Date: 2023-02-17 20:56:49 +0200 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit 2ee86d20e49985b903b78ebcfa3fa672e73e93aa Author: Lasse Collin Date: 2023-02-17 20:48:28 +0200 Build: Use only the generic symbol versioning on MicroBlaze. On MicroBlaze, GCC 12 is broken in sense that __has_attribute(__symver__) returns true but it still doesn't support the __symver__ attribute even though the platform is ELF and symbol versioning is supported if using the traditional __asm__(".symver ...") method. Avoiding the traditional method is good because it breaks LTO (-flto) builds with GCC. See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101766 For now the only extra symbols in liblzma_linux.map are the compatibility symbols with the patch that spread from RHEL/CentOS 7. These require the use of __symver__ attribute or __asm__(".symver ...") in the C code. Compatibility with the patch from CentOS 7 doesn't seem valuable on MicroBlaze so use liblzma_generic.map on MicroBlaze instead. It doesn't require anything special in the C code and thus no LTO issues either. An alternative would be to detect support for __symver__ attribute in configure.ac and CMakeLists.txt and fall back to __asm__(".symver ...") but then LTO would be silently broken on MicroBlaze. It sounds likely that MicroBlaze is a special case so let's treat it as a such because that is simpler. If a similar issue exists on some other platform too then hopefully someone will report it and this can be reconsidered. (This doesn't do the same fix in CMakeLists.txt. Perhaps it should but perhaps CMake build of liblzma doesn't matter much on MicroBlaze. The problem breaks the build so it's easy to notice and can be fixed later.) Thanks to Vincent Fazio for reporting the problem and proposing a patch (in the end that solution wasn't used): https://github.com/tukaani-project/xz/pull/32 configure.ac | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) commit d831072cceca458d94d2d5da201862f6d43a417b Author: Lasse Collin Date: 2023-02-16 21:09:00 +0200 liblzma: Very minor API doc tweaks. Use "member" to refer to struct members as that's the term used by the C standard. Use lzma_options_delta.dist and such in docs so that in Doxygen's HTML output they will link to the doc of the struct member. Clean up a few trailing white spaces too. src/liblzma/api/lzma/block.h | 6 +++--- src/liblzma/api/lzma/delta.h | 6 +++--- src/liblzma/api/lzma/index.h | 10 +++++----- src/liblzma/api/lzma/stream_flags.h | 6 +++--- 4 files changed, 14 insertions(+), 14 deletions(-) commit f029daea39c215fd7d5cb6b6798818b055cf5b22 Author: Jia Tan Date: 2023-02-17 00:54:33 +0800 liblzma: Adjust spacing in doc headers in bcj.h. src/liblzma/api/lzma/bcj.h | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) commit a5de68bac2bb7e1b9119e6cea7d761a22ea73e9c Author: Jia Tan Date: 2023-02-17 00:44:44 +0800 liblzma: Adjust documentation in bcj.h for consistent style. src/liblzma/api/lzma/bcj.h | 43 ++++++++++++++++++++++--------------------- 1 file changed, 22 insertions(+), 21 deletions(-) commit efa498c13b883810497e0ea8a169efd6f48f5026 Author: Jia Tan Date: 2023-02-17 00:36:05 +0800 liblzma: Rename field => member in documentation. Also adjusted preset value => preset level. src/liblzma/api/lzma/base.h | 18 +++++++-------- src/liblzma/api/lzma/block.h | 44 ++++++++++++++++++------------------- src/liblzma/api/lzma/container.h | 26 +++++++++++----------- src/liblzma/api/lzma/delta.h | 12 +++++----- src/liblzma/api/lzma/index.h | 30 ++++++++++++------------- src/liblzma/api/lzma/lzma12.h | 28 +++++++++++------------ src/liblzma/api/lzma/stream_flags.h | 32 +++++++++++++-------------- 7 files changed, 95 insertions(+), 95 deletions(-) commit 718b22a6c5e3ee5de123323ea798872381f9320e Author: Lasse Collin Date: 2023-02-16 17:59:50 +0200 liblzma: Silence a warning from MSVC. It gives C4146 here since unary minus with unsigned integer is still unsigned (which is the intention here). Doing it with substraction makes it clearer and avoids the warning. Thanks to Nathan Moinvaziri for reporting this. src/liblzma/check/crc64_fast.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 87c53553fa7d50f777b4edfa99f2083628f590fe Author: Jia Tan Date: 2023-02-16 21:04:54 +0800 liblzma: Improve documentation for stream_flags.h Standardizing each function to always specify parameters and return values. Also moved the parameters and return values to the end of each function description. A few small things were reworded and long sentences broken up. src/liblzma/api/lzma/stream_flags.h | 76 ++++++++++++++++++++++--------------- 1 file changed, 46 insertions(+), 30 deletions(-) commit 13d99e75a543e9e5f8633cc241eae55b91a3b242 Author: Jia Tan Date: 2023-02-14 21:50:16 +0800 liblzma: Improve documentation in lzma12.h. All functions now explicitly specify parameter and return values. src/liblzma/api/lzma/lzma12.h | 32 +++++++++++++++++++++++--------- 1 file changed, 23 insertions(+), 9 deletions(-) commit 43ec344c868f930e96879eb9e49212cce92a9884 Author: Jia Tan Date: 2023-01-27 22:44:06 +0800 liblzma: Improve documentation in check.h. All functions now explicitly specify parameter and return values. Also moved the note about SHA-256 functions not being exported to the top of the file. src/liblzma/api/lzma/check.h | 41 ++++++++++++++++++++++++++++------------- 1 file changed, 28 insertions(+), 13 deletions(-) commit 9c71db4e884fd49aea3d1e711036bff45ca66487 Author: Jia Tan Date: 2023-02-08 21:33:52 +0800 liblzma: Improve documentation in index.h All functions now explicitly specify parameter and return values. src/liblzma/api/lzma/index.h | 177 ++++++++++++++++++++++++++++++------------- 1 file changed, 126 insertions(+), 51 deletions(-) commit 421f2f2e160720f6009e3b6a125cafe2feaa9419 Author: Jia Tan Date: 2023-02-08 20:35:32 +0800 liblzma: Reword a comment in index.h. src/liblzma/api/lzma/index.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit b67539484981351d501b68de5e925425e50c59b1 Author: Jia Tan Date: 2023-02-08 20:30:23 +0800 liblzma: Omit lzma_index_iter's internal field from Doxygen docs. Add \private above this field and its sub-fields since it is not meant to be modified by users. src/liblzma/api/lzma/index.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) commit 0c9e4fc2ad6d88d54f299240fcc5a2ce7d695d96 Author: Jia Tan Date: 2023-01-21 21:32:03 +0800 liblzma: Fix documentation for LZMA_MEMLIMIT_ERROR. LZMA_MEMLIMIT_ERROR was missing the "<" character needed to put documentation after a member. src/liblzma/api/lzma/base.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 816fec125aa74bcef46512c73acc6d9e5a700d15 Author: Jia Tan Date: 2023-01-21 00:29:38 +0800 liblzma: Improve documentation for base.h. Standardizing each function to always specify params and return values. Also fixed a small grammar mistake. src/liblzma/api/lzma/base.h | 30 +++++++++++++++++++++++++----- 1 file changed, 25 insertions(+), 5 deletions(-) commit 862dacef1a4e7e1b28d465956fa4244ed01df154 Author: Jia Tan Date: 2023-02-14 00:12:34 +0800 liblzma: Add one more missing [out] annotation in vli.h src/liblzma/api/lzma/vli.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 867b08ae4254bf55dd1f7fd502cc618231b92f75 Author: Jia Tan Date: 2023-02-14 00:08:33 +0800 liblzma: Minor improvements to vli.h. Added [out] annotations to parameters that are pointers and can have their value changed. Also added a clarification to lzma_vli_is_valid. src/liblzma/api/lzma/vli.h | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) commit 90d0e628ff11e5030bcc4fc000bca056adda6603 Author: Jia Tan Date: 2023-02-10 21:38:02 +0800 liblzma: Add comments for macros in delta.h. Document LZMA_DELTA_DIST_MIN and LZMA_DELTA_DIST_MAX for completeness and to avoid Doxygen warnings. src/liblzma/api/lzma/delta.h | 8 ++++++++ 1 file changed, 8 insertions(+) commit 9255fffdb13e59874bf7f95c370c410ad3a7e114 Author: Jia Tan Date: 2023-02-10 21:35:23 +0800 liblzma: Improve documentation in index_hash.h. All functions now explicitly specify parameter and return values. Also reworded the description of lzma_index_hash_init() for readability. src/liblzma/api/lzma/index_hash.h | 36 +++++++++++++++++++++++++++--------- 1 file changed, 27 insertions(+), 9 deletions(-) commit 1dbe12b90cff79bb51923733ac0840747b4b4131 Author: Lasse Collin Date: 2023-02-07 19:07:45 +0200 xz: Improve the comment about start_time in mytime.c. start_time is relative to an arbitary point in time, it's not time of day, so using it for anything else than time differences wouldn't make sense. src/xz/mytime.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) commit 7673ef5aa80c1af7fb693360dd82f527b46c2c56 Author: Jia Tan Date: 2023-02-04 21:06:35 +0800 Build: Adjust CMake version search regex. Now, the LZMA_VERSION_MAJOR, LZMA_VERSION_MINOR, and LZMA_VERSION_PATCH macros do not need to be on consecutive lines in version.h. They can be separated by more whitespace, comments, or even other content, as long as they appear in the proper order (major, minor, patch). CMakeLists.txt | 2 ++ 1 file changed, 2 insertions(+) commit b8bce89be7fb5bffe5fef4a2782ca9b2b107eaac Author: Jia Tan Date: 2023-02-04 12:01:23 +0800 xz: Add a comment clarifying the use of start_time in mytime.c. src/xz/mytime.c | 5 +++++ 1 file changed, 5 insertions(+) commit 912af91b10a18fb9bb3167247ecaaefca8248ee9 Author: Jia Tan Date: 2023-01-26 09:50:21 +0800 liblzma: Improve documentation for version.h. Specified parameter and return values for API functions and documented a few more of the macros. src/liblzma/api/lzma/version.h | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-) commit 850adec171203cd22b57d016084d713f72ae5307 Author: Jia Tan Date: 2023-02-03 22:52:55 +0800 Docs: Omit SIGTSTP not handled from TODO. TODO | 4 ---- 1 file changed, 4 deletions(-) commit 2c78a83c6faec70154d9eb78022a618ed62cdcb3 Author: Jia Tan Date: 2023-02-03 00:33:32 +0800 liblzma: Fix bug in lzma_str_from_filters() not checking filters[] length. The bug is only a problem in applications that do not properly terminate the filters[] array with LZMA_VLI_UNKNOWN or have more than LZMA_FILTERS_MAX filters. This bug does not affect xz. src/liblzma/common/string_conversion.c | 7 +++++++ 1 file changed, 7 insertions(+) commit e01f01b9af1c074463b92694a16ecc16a31907c0 Author: Jia Tan Date: 2023-02-03 00:32:47 +0800 Tests: Create test_filter_str.c. Tests lzma_str_to_filters(), lzma_str_from_filters(), and lzma_str_list_filters() API functions. CMakeLists.txt | 1 + tests/Makefile.am | 2 + tests/test_filter_str.c | 593 ++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 596 insertions(+) commit 8dfc029e7a4ce45809c30313dc0e502f0d22be26 Author: Jia Tan Date: 2023-01-22 08:49:00 +0800 liblzma: Fix typos in comments in string_conversion.c. src/liblzma/common/string_conversion.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 54ad83c1ae2180dcc0cb2445b181dc1e9732a5d6 Author: Jia Tan Date: 2023-02-03 00:20:20 +0800 liblzma: Clarify block encoder and decoder documentation. Added a few sentences to the description for lzma_block_encoder() and lzma_block_decoder() to highlight that the Block Header must be coded before calling these functions. src/liblzma/api/lzma/block.h | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) commit f680e771b3eb2a46310fe85b3e000ac3a1a0640f Author: Jia Tan Date: 2023-02-03 00:12:24 +0800 Update lzma_block documentation for lzma_block_uncomp_encode(). src/liblzma/api/lzma/block.h | 3 +++ 1 file changed, 3 insertions(+) commit 504cf4af895fd45aad0c56eb3b49d90acd54465b Author: Jia Tan Date: 2023-02-03 00:11:37 +0800 liblzma: Minor edits to lzma_block header_size documentation. src/liblzma/api/lzma/block.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) commit 115b720fb521f99aa832d06b2c12b7f8c6c50680 Author: Jia Tan Date: 2023-02-03 00:11:07 +0800 liblzma: Enumerate functions that read version in lzma_block. src/liblzma/api/lzma/block.h | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) commit 85ea0979adcf808a3830aefbe7a4ec884e542ea1 Author: Jia Tan Date: 2023-02-03 00:10:34 +0800 liblzma: Clarify comment in block.h. src/liblzma/api/lzma/block.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) commit 1f7ab90d9ce224230a04de6b921ad6e2029023a8 Author: Jia Tan Date: 2023-02-03 00:07:23 +0800 liblzma: Improve documentation for block.h. Standardizing each function to always specify params and return values. Output pointer parameters are also marked with doxygen style [out] to make it clear. Any note sections were also moved above the parameter and return sections for consistency. src/liblzma/api/lzma/block.h | 96 ++++++++++++++++++++++++++++++++++---------- 1 file changed, 75 insertions(+), 21 deletions(-) commit c563a4bc554a96bd0b6aab3c139715b7ec8f6ca3 Author: Jia Tan Date: 2023-02-01 23:38:30 +0800 liblzma: Clarify a comment about LZMA_STR_NO_VALIDATION. The flag description for LZMA_STR_NO_VALIDATION was previously confusing about the treatment for filters than cannot be used with .xz format (lzma1) without using LZMA_STR_ALL_FILTERS. Now, it is clear that LZMA_STR_NO_VALIDATION is not a super set of LZMA_STR_ALL_FILTERS. src/liblzma/api/lzma/filter.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) commit 315c64c7e18acc59a745b68148188a73e998252b Author: Jia Tan Date: 2023-02-01 21:43:33 +0800 CI: Update .gitignore for artifacts directory in build-aux. The workflow action for our CI pipeline can only reference artifacts in the source directory, so we should ignore these files if the ci_build.sh is run locally. .gitignore | 1 + 1 file changed, 1 insertion(+) commit 2c1341f4fa06e7f487d61142aa354c433e17ec7f Author: Jia Tan Date: 2023-02-01 21:36:46 +0800 CI: Add quotes around variables in a few places. build-aux/ci_build.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit 3a401b0e0c7a2658af7801dd0690256ef24149e0 Author: Jia Tan Date: 2023-02-01 21:36:22 +0800 CI: Upload test logs as artifacts if a test fails. .github/workflows/ci.yml | 60 ++++++++++++++++++++++++++++++++++-------------- build-aux/ci_build.sh | 31 ++++++++++++++++++++----- 2 files changed, 68 insertions(+), 23 deletions(-) commit 610dde15a88f12cc540424eb3eb3ed61f3876f74 Author: Lasse Collin Date: 2023-01-27 20:02:49 +0200 xz: Use clock_gettime() even if CLOCK_MONOTONIC isn't available. mythread.h and thus liblzma already does it. src/xz/mytime.c | 11 ++++++++--- src/xz/private.h | 3 +-- 2 files changed, 9 insertions(+), 5 deletions(-) commit 2e02877288f6576cd4595e9ac7684f867cd47d68 Author: Lasse Collin Date: 2023-01-27 19:41:19 +0200 po4a/po4a.conf: Sort the language identifiers in alphabetical order. po4a/po4a.conf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit ff592c616eda274215b485cf1b8d34f060c9f3be Author: Lasse Collin Date: 2023-01-26 18:29:17 +0200 xz: Add SIGTSTP handler for progress indicator time keeping. This way, if xz is stopped the elapsed time and estimated time remaining won't get confused by the amount of time spent in the stopped state. This raises SIGSTOP. It's not clear to me if this is the correct way. POSIX and glibc docs say that SIGTSTP shouldn't stop the process if it is orphaned but this commit doesn't attempt to handle that. Search for SIGTSTP in section 2.4.3: https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html src/xz/mytime.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++- src/xz/mytime.h | 6 ++++++ src/xz/private.h | 12 ++++++++++++ src/xz/signals.c | 17 ++++++++++++++++- 4 files changed, 89 insertions(+), 2 deletions(-) commit 3b1c8ac8d1d553cbb1fb22b545d2b1424c752b76 Author: Jia Tan Date: 2023-01-27 20:14:51 +0800 Translations: Add Brazilian Portuguese translation of man pages. Thanks to Rafael Fontenelle. po4a/po4a.conf | 2 +- po4a/pt_BR.po | 3677 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 3678 insertions(+), 1 deletion(-) commit a15a7552f9f67c4e402f5d2967324e0ccfd6fccc Author: Lasse Collin Date: 2023-01-26 17:51:06 +0200 Build: Avoid different quoting style in --enable-doxygen doc. configure.ac | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) commit af5a4bd5afc089d9697756dded38feafaa987ae4 Author: Lasse Collin Date: 2023-01-26 17:39:46 +0200 tuklib_physmem: Check for __has_warning before GCC version. Clang can be configured to fake a too high GCC version so this way it's more robust. src/common/tuklib_physmem.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit f35d98e20609e0be6a04ae2604bfb7cb9d5bd5e4 Author: Jia Tan Date: 2023-01-24 20:48:50 +0800 liblzma: Fix documentation in filter.h for lzma_str_to_filters() The previous documentation for lzma_str_to_filters() was technically correct, but misleading. lzma_str_to_filters() returns NULL on success, which is in practice always defined to 0. This is the same value as LZMA_OK, but lzma_str_to_filters() does not return lzma_ret so we should be more clear. src/liblzma/api/lzma/filter.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 2f78ecc5939b3d97ddfc2a6bd31b50108a28d0a2 Author: Lasse Collin Date: 2023-01-23 23:44:58 +0200 Revert "tuklib_common: Define __has_warning if it is not defined." This reverts commit 82e3c968bfa10e3ff13333bd9cbbadb5988d6766. Macros in the reserved namespace (_foo or __foo) shouldn't be #defined without a very good reason. Here the alternative would have been to #define tuklib_has_warning(str) to an approriate value. Also the tuklib_* files should stay namespace clean if possible. src/common/tuklib_common.h | 7 ------- 1 file changed, 7 deletions(-) commit 8366cf8738e8b7bb74c967d07bf0fd2a1878e575 Author: Lasse Collin Date: 2023-01-23 23:38:34 +0200 tuklib_physmem: Clean up the way -Wcast-function-type is silenced on Windows. __has_warning and other __has_foo macros are meant to become compiler-agnostic so it's not good to check for __clang__ with it. This also relied on tuklib_common.h for #defining __has_warning which was confusing as #defining reserved macros is generally not a good idea. src/common/tuklib_physmem.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) commit 683a3c7e2fcd922200c31078e5c9dd1348e90941 Author: Lasse Collin Date: 2023-01-24 00:05:38 +0200 xz: Flip the return value of suffix_is_set to match the documentation. Also edit style to match the existing coding style in the project. src/xz/args.c | 6 +++--- src/xz/suffix.c | 2 +- src/xz/suffix.h | 1 + 3 files changed, 5 insertions(+), 4 deletions(-) commit cc5aa9ab138beeecaee5a1e81197591893ee9ca0 Author: Jia Tan Date: 2023-01-07 21:55:06 +0800 xz: Refactor duplicated check for custom suffix when using --format=raw src/xz/args.c | 8 ++++++++ src/xz/suffix.c | 26 ++++++++------------------ src/xz/suffix.h | 7 +++++++ 3 files changed, 23 insertions(+), 18 deletions(-) commit 9663141274e01592a281a7f2df5d7a31a1dac8bf Author: Jia Tan Date: 2023-01-20 21:53:14 +0800 liblzma: Set documentation on all reserved fields to private. This prevents the reserved fields from being part of the generated Doxygen documentation. src/liblzma/api/lzma/base.h | 17 +++++++++++++++ src/liblzma/api/lzma/block.h | 43 +++++++++++++++++++++++++++++++++++++ src/liblzma/api/lzma/container.h | 24 +++++++++++++++++++++ src/liblzma/api/lzma/delta.h | 12 +++++++++++ src/liblzma/api/lzma/index.h | 27 +++++++++++++++++++++++ src/liblzma/api/lzma/lzma12.h | 22 +++++++++++++++++++ src/liblzma/api/lzma/stream_flags.h | 28 ++++++++++++++++++++++++ 7 files changed, 173 insertions(+) commit 6327a045f34d48fc5afc58ba0d32a82c94403049 Author: Jia Tan Date: 2022-12-20 21:39:59 +0800 Doxygen: Update Doxyfile.in from 1.4.7 to 1.8.17. A few Doxygen tags were obsolete from 1.4.7. Version 1.8.17 released in 2019, so this should be compatible with resonable modern distros. The purpose of Doxygen these days is for docs on the website, so it doesn't necessarily have to work for everyone. Just when the maintainers want to update the docs. Doxyfile.in | 2523 ++++++++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 1893 insertions(+), 630 deletions(-) commit bbf71b69ebf9d0d62a0af150a5c37d193b8159ad Author: Jia Tan Date: 2023-01-03 20:37:30 +0800 Doxygen: Make Doxygen only produce liblzma API documentation by default. Doxygen is now configurable in autotools only with --enable-doxygen=[api|all]. The default is "api", which will only generate HTML output for liblzma API functions. The LaTex documentation output was also disabled. Doxyfile.in | 18 +++++++++--------- configure.ac | 39 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 48 insertions(+), 9 deletions(-) commit 6fcf4671b6047113c583a0919fc850987a4ec5f4 Author: Jia Tan Date: 2022-12-21 23:59:43 +0800 liblzma: Highlight liblzma API headers should not be included directly. This improves the generated Doxygen HTML files to better highlight how to properly use the liblzma API header files. src/liblzma/api/lzma/base.h | 5 +++-- src/liblzma/api/lzma/bcj.h | 5 +++-- src/liblzma/api/lzma/block.h | 5 +++-- src/liblzma/api/lzma/check.h | 5 +++-- src/liblzma/api/lzma/container.h | 5 +++-- src/liblzma/api/lzma/delta.h | 5 +++-- src/liblzma/api/lzma/filter.h | 5 +++-- src/liblzma/api/lzma/hardware.h | 5 +++-- src/liblzma/api/lzma/index.h | 5 +++-- src/liblzma/api/lzma/index_hash.h | 5 +++-- src/liblzma/api/lzma/lzma12.h | 5 +++-- src/liblzma/api/lzma/stream_flags.h | 5 +++-- src/liblzma/api/lzma/version.h | 5 +++-- src/liblzma/api/lzma/vli.h | 5 +++-- 14 files changed, 42 insertions(+), 28 deletions(-) commit b43ff180fb2e372adce876bfa155fc9bcf0c3db4 Author: Jia Tan Date: 2023-01-19 20:35:09 +0800 tuklib_physmem: Silence warning from -Wcast-function-type on MinGW-w64. tuklib_physmem depends on GetProcAddress() for both MSVC and MinGW-w64 to retrieve a function address. The proper way to do this is to cast the return value to the type of function pointer retrieved. Unfortunately, this causes a cast-function-type warning, so the best solution is to simply ignore the warning. src/common/tuklib_physmem.c | 9 +++++++++ 1 file changed, 9 insertions(+) commit 82e3c968bfa10e3ff13333bd9cbbadb5988d6766 Author: Jia Tan Date: 2023-01-19 20:32:40 +0800 tuklib_common: Define __has_warning if it is not defined. clang supports the __has_warning macro to determine if the version of clang compiling the code supports a given warning. If we do not define it for other compilers, it may cause a preprocessor error. src/common/tuklib_common.h | 7 +++++++ 1 file changed, 7 insertions(+) commit b2ba1a489df451cdcd93b2334e319dd06778de19 Author: Jia Tan Date: 2023-01-18 22:11:05 +0800 CI: Reorder 32-bit build first for Linux autotool builds. The 32-bit build needs to be first so the configure cache only needs to be reset one time. The 32-bit build sets the CFLAGS env variable, so any build using that flag after will fail unless the cache is reset. .github/workflows/ci.yml | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) commit dd1c1135741057c91e8d018be9ec4d43968b0e64 Author: Jia Tan Date: 2023-01-18 21:51:43 +0800 CI: Enable --config-cache in autotool builds. If CFLAGS are set in a build, the cache must be cleared with "make distclean", or by deleting the cache file. build-aux/ci_build.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit d3e11477053764c003eec2daa5198c747d70ff69 Author: Jia Tan Date: 2023-01-16 21:35:45 +0800 xz: Add missing comment for coder_set_compression_settings() src/xz/coder.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) commit 123255b6ed15f4428b2aa92e4962015a5362f6bf Author: Jia Tan Date: 2023-01-16 20:55:10 +0800 xz: Do not set compression settings with raw format in list mode. Calling coder_set_compression_settings() in list mode with verbose mode on caused the filter chain and memory requirements to print. This was unnecessary since the command results in an error and not consistent with other formats like lzma and alone. src/xz/args.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) commit 571919c47b9ff5171ede84378620ed0a9aeb98c0 Author: Jia Tan Date: 2023-01-13 20:37:06 +0800 Translations: Update the Brazilian Portuguese translation. po/pt_BR.po | 603 ++++++++++++++++++++++++++++++++++-------------------------- 1 file changed, 344 insertions(+), 259 deletions(-) commit 81cb02e2c22bbc036cdfaa2d2c4176f6bd60d3cf Author: Jia Tan Date: 2023-01-12 23:43:06 +0800 CI: Disable shared and nls from various jobs in autotool runners. Disabling shared library generation and linking should help speed up the runners. The shared library is still being tested in the 32 bit build and the full feature. Disabling nls is to check for any unexpected warnings or errors. .github/workflows/ci.yml | 56 ++++++++++++++++++++++++------------------------ 1 file changed, 28 insertions(+), 28 deletions(-) commit 58a052198a7bcaf6e958f87fad72e69e19a2579b Author: Jia Tan Date: 2023-01-12 23:39:19 +0800 CI: Reorder the 32-bit job in the Ubuntu runner. Run the 32 bit job sooner since this is a more interesting test than some of the later jobs. .github/workflows/ci.yml | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) commit 4110a998b83459fe2bc9bc1bec30ad68afa8f797 Author: Jia Tan Date: 2023-01-12 23:09:03 +0800 CI: Allow disabling Native Language Support. build-aux/ci_build.sh | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) commit 0dec634e705b5bf89a37c5d62d71e8511d480058 Author: Jia Tan Date: 2023-01-12 23:02:20 +0800 CI: Only run autogen.sh if it has not already run. build-aux/ci_build.sh | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) commit 32287dc8def94df4546e903495d14c132bd54cc4 Author: Jia Tan Date: 2023-01-12 22:58:36 +0800 CI: Allow disabling shared library in autotools builds. build-aux/ci_build.sh | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) commit 77d1ebcc99ddd82a300d1838f608150221931dcd Author: Jia Tan Date: 2023-01-12 22:44:18 +0800 CI: Improve Usage readability and add -h option. build-aux/ci_build.sh | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) commit a8bb8358d10b059274f3cf993d9b8f490bafb268 Author: Lasse Collin Date: 2023-01-12 13:04:05 +0200 Build: Omit -Wmissing-noreturn from the default warnings. It's not that important. It can be annoying in builds that disable many features since in those cases the tests programs will correctly trigger this warning with Clang. configure.ac | 1 - 1 file changed, 1 deletion(-) commit 52dc033d0bde0d19e3912303c6c74bae559d6498 Author: Lasse Collin Date: 2023-01-12 06:05:58 +0200 xz: Use ssize_t for the to-be-ignored return value from write(fd, ptr, 1). It makes no difference here as the return value fits into an int too and it then gets ignored but this looks better. src/xz/file_io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit b1a6d180a363d57b2b1c89526ff3f0782bf863d3 Author: Lasse Collin Date: 2023-01-12 06:01:12 +0200 xz: Silence warnings from -Wsign-conversion in a 32-bit build. src/common/tuklib_mbstr_fw.c | 2 +- src/xz/list.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) commit 31c21c734b7c7d7428a3da7402a2cb7bc2587339 Author: Lasse Collin Date: 2023-01-12 05:38:48 +0200 liblzma: Silence another warning from -Wsign-conversion in a 32-bit build. It doesn't warn on a 64-bit system because truncating a ptrdiff_t (signed long) to uint32_t is diagnosed under -Wconversion by GCC and -Wshorten-64-to-32 by Clang. src/liblzma/lz/lz_encoder_mf.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) commit 37fbdfb7263522c11c7ad2685413d6295532581d Author: Lasse Collin Date: 2023-01-12 04:46:45 +0200 liblzma: Silence a warning from -Wsign-conversion in a 32-bit build. src/common/mythread.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 5ce6ddc221d0bfb57d810d845bb65fb0aac0b008 Author: Lasse Collin Date: 2023-01-12 04:17:24 +0200 Build: Make configure add more warning flags for GCC and Clang. -Wstrict-aliasing was removed from the list since it is enabled by -Wall already. A normal build is clean with these on GNU/Linux x86-64 with GCC 12.2.0 and Clang 14.0.6. configure.ac | 36 +++++++++++++++++++++++++++++++----- 1 file changed, 31 insertions(+), 5 deletions(-) commit bfc3a0a8ac16de90049c1b1ba1445a7626d0230c Author: Lasse Collin Date: 2023-01-12 04:14:18 +0200 Tests: Fix warnings from clang --Wassign-enum. Explicitly casting the integer to lzma_check silences the warning. Since such an invalid value is needed in multiple tests, a constant INVALID_LZMA_CHECK_ID was added to tests.h. The use of 0x1000 for lzma_block.check wasn't optimal as if the underlying type is a char then 0x1000 will be truncated to 0. However, in these test cases the value is ignored, thus even with such truncation the test would have passed. tests/test_block_header.c | 6 +++--- tests/test_check.c | 2 +- tests/test_stream_flags.c | 8 ++++---- tests/tests.h | 9 +++++++++ 4 files changed, 17 insertions(+), 8 deletions(-) commit 49245bb31e215ad455a1ab85e4ed6783152dc522 Author: Lasse Collin Date: 2023-01-12 03:51:07 +0200 Tests: Silence warnings from -Wsign-conversion. Note that assigning an unsigned int to lzma_check doesn't warn on GNU/Linux x86-64 since the enum type is unsigned on that platform. The enum can be signed on some other platform though so it's best to use enumeration type lzma_check in these situations. tests/test_check.c | 6 +++--- tests/test_stream_flags.c | 10 +++++----- 2 files changed, 8 insertions(+), 8 deletions(-) commit 3f13bf6b9e8624cbe6d6e3e82d6c98a3ed1ad571 Author: Lasse Collin Date: 2023-01-12 03:19:59 +0200 liblzma: Silence warnings from clang -Wconditional-uninitialized. This is similar to 2ce4f36f179a81d0c6e182a409f363df759d1ad0. The actual initialization of the variables is done inside mythread_sync() macro. Clang doesn't seem to see that the initialization code inside the macro is always executed. src/liblzma/common/stream_decoder_mt.c | 8 +++++--- src/liblzma/common/stream_encoder_mt.c | 2 +- 2 files changed, 6 insertions(+), 4 deletions(-) commit 6c886cc5b3c90c6a75e6be8b1278ec2261e452a6 Author: Lasse Collin Date: 2023-01-12 03:11:40 +0200 Fix warnings from clang -Wdocumentation. src/liblzma/check/check.h | 4 ---- src/liblzma/lz/lz_encoder_mf.c | 4 ++-- src/xz/options.c | 4 ++-- 3 files changed, 4 insertions(+), 8 deletions(-) commit a0e7fb1c1ea658b67f30517f5d1975efd0226dba Author: Lasse Collin Date: 2023-01-12 03:04:28 +0200 Tests: test_lzip_decoder: Remove trailing white-space. tests/test_lzip_decoder.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit c0f8d6782f29e219fd496dd23f6a033270509d5c Author: Lasse Collin Date: 2023-01-12 03:03:55 +0200 Tests: test_lzip_decoder: Silence warnings from -Wsign-conversion. tests/test_lzip_decoder.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) commit 62efd48a825e8f439e84c85e165d8774ddc68fd2 Author: Jia Tan Date: 2023-01-11 23:58:16 +0800 Add NEWS for 5.4.1. NEWS | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) commit d1561c47ec8cd3844a785d3741dc932f9b9c5790 Author: Jia Tan Date: 2023-01-11 22:46:48 +0800 xz: Fix warning -Wformat-nonliteral on clang in message.c. clang and gcc differ in how they handle -Wformat-nonliteral. gcc will allow a non-literal format string as long as the function takes its format arguments as a va_list. src/xz/message.c | 9 +++++++++ 1 file changed, 9 insertions(+) commit 8c0f115cc489331c48df77beca92fe378039d919 Author: Jia Tan Date: 2023-01-11 20:58:31 +0800 Tests: Fix test_filter_flags copy/paste error. tests/test_filter_flags.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 25035813d1d596fde692addc33e7f715f1fe55eb Author: Jia Tan Date: 2023-01-11 20:42:29 +0800 Tests: Fix type-limits warning in test_filter_flags. This only occurs in test_filter_flags when the BCJ filters are not configured and built. In this case, ARRAY_SIZE() returns 0 and causes a type-limits warning with the loop variable since an unsigned number will always be >= 0. tests/test_filter_flags.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) commit 0b8fa310cf56fec55663f62340e49e8e1441594f Author: Lasse Collin Date: 2023-01-10 22:14:03 +0200 liblzma: CLMUL CRC64: Work around a bug in MSVC, second attempt. This affects only 32-bit x86 builds. x86-64 is OK as is. I still cannot easily test this myself. The reporter has tested this and it passes the tests included in the CMake build and performance is good: raw CRC64 is 2-3 times faster than the C version of the slice-by-four method. (Note that liblzma doesn't include a MSVC-compatible version of the 32-bit x86 assembly code for the slice-by-four method.) Thanks to Iouri Kharon for figuring out a fix, testing, and benchmarking. src/liblzma/check/crc64_fast.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) commit 765354b50c2886fc0d294d6be3b207f7ae2ada70 Author: Jia Tan Date: 2023-01-11 01:18:50 +0800 Tests: Fix unused function warning in test_block_header. One of the global arrays of filters was only used in a test that required both encoders and decoders to be configured in the build. tests/test_block_header.c | 4 ++++ 1 file changed, 4 insertions(+) commit 7c23c05befdcc73231c0d6632a7d943dbeaea1aa Author: Jia Tan Date: 2023-01-11 01:08:03 +0800 Tests: Fix unused function warning in test_index_hash. test_index_hash does not use fill_index_hash() unless both encoders and decoders are configured in the build. tests/test_index_hash.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) commit 57464bb4ebd6c00dc8b19803f05ea55ddd0826f6 Author: Jia Tan Date: 2023-01-11 00:54:45 +0800 CI/CD: Add 32-bit build and test steps to Ubuntu autotools runner. If all goes well, Mac autotools and Linux and Mac CMake will be added later for 32-bit builds. .github/workflows/ci.yml | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) commit 923eb689a4b863b6cca8df6360d4962aae994edf Author: Jia Tan Date: 2023-01-11 00:51:01 +0800 CI/CD: Enables warnings as errors in autotool build. This will help us catch warnings and potential bugs in builds that are not often tested by us. build-aux/ci_build.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit feae5528a30c006b6e2f96a95116e20b983703fc Author: Jia Tan Date: 2023-01-11 00:48:35 +0800 CI/CD: Add -f argument to set CFLAGS in ci_build.sh. For now, the suggested option is for -m32 only, but this can be updated later if other flags are deemed useful. build-aux/ci_build.sh | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) commit cfabb62a4874c146e7d6f30445637602545bc054 Author: Lasse Collin Date: 2023-01-10 12:47:16 +0200 Revert "liblzma: CLMUL CRC64: Workaround a bug in MSVC (VS2015-2022)." This reverts commit 36edc65ab4cf10a131f239acbd423b4510ba52d5. It was reported that it wasn't a good enough fix and MSVC still produced (different kind of) bad code when building for 32-bit x86 if optimizations are enabled. Thanks to Iouri Kharon. src/liblzma/check/crc64_fast.c | 6 ------ 1 file changed, 6 deletions(-) commit 0b64215170dd3562f207ef26f794755bcd600526 Author: Lasse Collin Date: 2023-01-10 11:56:11 +0200 sysdefs.h: Don't include strings.h anymore. On some platforms src/xz/suffix.c may need for strcasecmp() but suffix.c includes the header when it needs it. Unless there is an old system that otherwise supports enough C99 to build XZ Utils but doesn't have C89/C90-compatible , there should be no need to include in sysdefs.h. src/common/sysdefs.h | 6 ------ 1 file changed, 6 deletions(-) commit ec2fc39fe4f4e6e242b3a669585049763968cdeb Author: Lasse Collin Date: 2023-01-10 11:23:41 +0200 xz: Include in suffix.c if needed for strcasecmp(). SUSv2 and POSIX.1‐2017 declare only a few functions in . Of these, strcasecmp() is used on some platforms in suffix.c. Nothing else in the project needs (at least if building on a modern system). sysdefs.h currently includes if HAVE_STRINGS_H is defined and suffix.c relied on this. Note that dos/config.h doesn't #define HAVE_STRINGS_H even though DJGPP does have strings.h. It isn't needed with DJGPP as strcasecmp() is also in in DJGPP. src/xz/suffix.c | 3 +++ 1 file changed, 3 insertions(+) commit 7049c4a76c805ad27d6cf4ee119a2ef2a7add59f Author: Lasse Collin Date: 2023-01-10 10:05:13 +0200 sysdefs.h: Fix a comment. src/common/sysdefs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 194a5fab69277d9e804a6113b5f676b8666b3a61 Author: Lasse Collin Date: 2023-01-10 10:04:06 +0200 sysdefs.h: Don't include memory.h anymore even if it were available. It quite probably was never needed, that is, any system where memory.h was required likely couldn't compile XZ Utils for other reasons anyway. XZ Utils 5.2.6 and later source packages were generated using Autoconf 2.71 which no longer defines HAVE_MEMORY_H. So the code being removed is no longer used anyway. src/common/sysdefs.h | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) commit 5e34774c31d1b7509b5cb77a3be9973adec59ea0 Author: Lasse Collin Date: 2023-01-10 08:29:32 +0200 CMake: Fix appending to CMAKE_RC_FLAGS. It's a string, not a list. It only worked when the variable was empty. Thanks to Iouri Kharon. CMakeLists.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 6e652ceb18c615c578c869db300fa0756788b4e0 Author: Lasse Collin Date: 2023-01-10 00:33:14 +0200 Windows: Update INSTALL-MSVC.txt to recommend CMake over project files. windows/INSTALL-MSVC.txt | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) commit 6b117d3b1fe91eb26d533ab16a2e552f84148d47 Author: Lasse Collin Date: 2023-01-09 23:41:25 +0200 CMake: Fix windres issues again. At least on some systems, GNU windres needs --use-temp-file in addition to the \x20 hack to avoid spaces in the command line argument. Hovever, that \x20 syntax is broken with llvm-windres version 15.0.0 (results in "XZx20Utils") but luckily it works with a regular space. Thus it is best to limit the workarounds to GNU toolchain on Windows. CMakeLists.txt | 35 +++++++++++++++++++++++------------ 1 file changed, 23 insertions(+), 12 deletions(-) commit 0c210ca7f489e971e94e1ddc72b0b0806e3c7935 Author: Lasse Collin Date: 2023-01-06 22:53:38 +0200 Tests: test_filter_flags: Clean up minor issues. Here are the list of the most significant issues addressed: - Avoid using internal common.h header. It's not good to copy the constants like this but common.h cannot be included for use outside of liblzma. This is the quickest thing to do that could be fixed later. - Omit the INIT_FILTER macro. Initialization should be done with just regular designated initializers. - Use start_offset = 257 for BCJ tests. It demonstrates that Filter Flags encoder and decoder don't validate the options thoroughly. 257 is valid only for the x86 filter. This is a bit silly but not a significant problem in practice because the encoder and decoder initialization functions will catch bad alignment still. Perhaps this should be fixed but it's not urgent and doesn't need to be in 5.4.x. - Various tweaks to comments such as filter id -> Filter ID tests/test_filter_flags.c | 153 +++++++++++++++++++++++----------------------- 1 file changed, 78 insertions(+), 75 deletions(-) commit 5c9fdd3bf53a9655f5eb2807d662b3af0d5e1865 Author: Jia Tan Date: 2022-12-29 23:33:33 +0800 Tests: Refactors existing filter flags tests. Converts the existing filter flags tests into tuktests. tests/test_filter_flags.c | 655 ++++++++++++++++++++++++++++++++-------------- 1 file changed, 457 insertions(+), 198 deletions(-) commit 36edc65ab4cf10a131f239acbd423b4510ba52d5 Author: Lasse Collin Date: 2023-01-09 12:22:05 +0200 liblzma: CLMUL CRC64: Workaround a bug in MSVC (VS2015-2022). I haven't tested with MSVC myself and there doesn't seem to be information about the problem online, so I'm relying on the bug report. Thanks to Iouri Kharon for the bug report and the patch. src/liblzma/check/crc64_fast.c | 6 ++++++ 1 file changed, 6 insertions(+) commit 790a12a95a78ff82d8c6d4efe3b789851ca9470d Author: Lasse Collin Date: 2023-01-09 11:27:24 +0200 CMake: Fix a copypaste error in xzdec Windows resource file handling. It was my mistake. Thanks to Iouri Kharon for the bug report. CMakeLists.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 0e1545fea39c0514c7b7032a0a3592a9a33d2848 Author: Lasse Collin Date: 2023-01-08 00:32:29 +0200 Tests: tuktest.h: Support tuktest_malloc(0). It's not needed in XZ Utils at least for now. It's good to support it still because if such use is needed later, it wouldn't be caught on GNU/Linux since malloc(0) from glibc returns non-NULL. tests/tuktest.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 69d5d78c6904668eb09a131da86276beec3281f8 Author: Lasse Collin Date: 2023-01-08 00:24:23 +0200 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit dd38655f80c113c9db73b9ed370dc900e1c4dc41 Author: Lasse Collin Date: 2023-01-07 21:57:11 +0200 CMake: Update cmake_minimum_required from 3.13...3.16 to 3.13...3.25. The changes listed on cmake-policies(7) for versions 3.17 to 3.25 shouldn't affect this project. CMakeLists.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit a890a637bee9193d5b690aefa9a59eba5b8532ae Author: Lasse Collin Date: 2023-01-07 19:50:35 +0200 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit 6e38e595dd56ac1800478cef1f6f754d0eba0d2e Author: Lasse Collin Date: 2023-01-07 19:50:03 +0200 CMake/Windows: Add resource files to xz.exe and xzdec.exe. The command line tools cannot be built with MSVC for now but they can be built with MinGW-w64. Thanks to Iouri Kharon for the bug report and the original patch. CMakeLists.txt | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) commit 443dfebced041adc88f10d824188eeef5b5821a9 Author: Lasse Collin Date: 2023-01-07 19:48:52 +0200 CMake/Windows: Add a workaround for windres from GNU binutils. Thanks to Iouri Kharon for the bug report and the original patch. CMakeLists.txt | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) commit ceb805011747d04a915f3f39e4bed9eed151c634 Author: Lasse Collin Date: 2023-01-07 19:31:15 +0200 Build: Require that _mm_set_epi64x() is usable to enable CLMUL support. VS2013 doesn't have _mm_set_epi64x() so this way CLMUL gets disabled with VS2013. Thanks to Iouri Kharon for the bug report. CMakeLists.txt | 3 ++- configure.ac | 8 ++++++-- 2 files changed, 8 insertions(+), 3 deletions(-) commit 8d372bd94066b1a5b0570b2550f83c2868486adf Author: Jia Tan Date: 2023-01-07 21:05:15 +0800 CI/CD: Split CMake Linux and MacOS build phase to build and test. The phase split was only done for Autotools before, so should also apply to CMake. .github/workflows/ci.yml | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) commit 747c7f2b34bd498f6702c6875500a26b06201772 Author: Jia Tan Date: 2023-01-07 11:16:55 +0800 CI/CD: Reduce job runners to 4 instead of using matrix strategy. The old version used too many runners that resulted in unnecessary dependency downloads. Now, the runners are reused for the different configurations for each OS and build system. .github/workflows/ci.yml | 95 ++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 83 insertions(+), 12 deletions(-) commit 4de35fd6b58d46fc887c78faf163f6a37b790c45 Author: Jia Tan Date: 2023-01-07 10:07:20 +0800 CI/CD: Add new -p (PHASE) argument to ci_build.sh The new PHASE argument can be build, test, or all. all is the default. This way, the CI/CD script can differentiate between the build and test phases to make it easier to track down errors when they happen. build-aux/ci_build.sh | 140 +++++++++++++++++++++++++++----------------------- 1 file changed, 76 insertions(+), 64 deletions(-) commit 6fd39664de47801e670a16617863196bfbde4755 Merge: 78e0561d fc0c7884 Author: Jia Tan Date: 2023-01-07 00:10:50 +0800 Merge pull request #7 from tukaani-project/tuktest_index_hash Tuktest index hash commit fc0c788469159f634f09ff23c8cef6925c91da57 Author: Lasse Collin Date: 2023-01-06 17:58:48 +0200 Tests: test_index_hash: Add an assert_uint_eq(). tests/test_index_hash.c | 3 +++ 1 file changed, 3 insertions(+) commit d550304f5343b3a082da265107cd820e0d81dc71 Author: Lasse Collin Date: 2023-01-06 17:55:06 +0200 Tests: test_index_hash: Fix a memory leak. tests/test_index_hash.c | 2 ++ 1 file changed, 2 insertions(+) commit 02608f74ea1f2d2d56585711ff241c34b4ad0937 Author: Lasse Collin Date: 2023-01-06 17:53:03 +0200 Tests: test_index_hash: Don't treat pointers as booleans. tests/test_index_hash.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit 056766c8601a3808bea1761f6cc833197a35a3e0 Author: Lasse Collin Date: 2023-01-06 17:51:41 +0200 Tests: test_index_hash: Fix a typo in a comment. tests/test_index_hash.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 873e684028ba9738f071c5236db7d452ed797b4c Author: Lasse Collin Date: 2023-01-06 17:44:29 +0200 Tests: test_index_hash: Avoid the variable name "index". It can trigger warnings from -Wshadow on some systems. tests/test_index_hash.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) commit d1f24c35874eeba8432d75aa77b06c50375ed937 Author: Lasse Collin Date: 2023-01-06 17:35:50 +0200 Tests: test_index_hash: Use the word "Record" instead of "entry". tests/test_index_hash.c | 102 ++++++++++++++++++++++++------------------------ 1 file changed, 51 insertions(+), 51 deletions(-) commit b93f7c5cbb02b42024ac866fc0af541de3d816e2 Author: Lasse Collin Date: 2023-01-06 17:35:05 +0200 Tests: test_index_hash: Tweak comments and style. The words defined in the .xz file format specification begin with capital letter to emphasize that they have a specific meaning. tests/test_index_hash.c | 62 ++++++++++++++++++++++++++----------------------- 1 file changed, 33 insertions(+), 29 deletions(-) commit c48b24fc06d98569adb72f13c2e8e5ff30bb8036 Author: Lasse Collin Date: 2023-01-06 17:17:37 +0200 Tests: test_index_hash: Use INDEX_INDICATOR constant instead of 0. tests/test_index_hash.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 78e0561dfebaa9d5e34558de537efcda890e0629 Author: Jia Tan Date: 2023-01-06 20:43:31 +0800 Style: Change #if !defined() to #ifndef in mythread.h. src/common/mythread.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit e834e1e934ed0af673598d8c0c34afb2af56bee0 Author: Jia Tan Date: 2023-01-06 20:35:55 +0800 Build: Add missing stream_decoder_mt.c to .vcxproj files. The line in the .vcxproj files for building with was missing in 5.4.0. Thank to Hajin Jang for reporting the issue. windows/vs2013/liblzma.vcxproj | 1 + windows/vs2013/liblzma_dll.vcxproj | 1 + windows/vs2017/liblzma.vcxproj | 1 + windows/vs2017/liblzma_dll.vcxproj | 1 + windows/vs2019/liblzma.vcxproj | 1 + windows/vs2019/liblzma_dll.vcxproj | 1 + 6 files changed, 6 insertions(+) commit 84f9687cbae972c2c342e10bf69f8ec8f70ae111 Author: Jia Tan Date: 2023-01-05 20:57:25 +0800 liblzma: Remove common.h include from common/index.h. common/index.h is needed by liblzma internally and tests. common.h will include and define many things that are not needed by the tests. Also, this prevents include order problems because common.h will redefine LZMA_API resulting in a warning. src/liblzma/common/index.c | 1 + src/liblzma/common/index.h | 9 +++++++-- src/liblzma/common/index_decoder.h | 1 + src/liblzma/common/stream_buffer_encoder.c | 1 + 4 files changed, 10 insertions(+), 2 deletions(-) commit 7657ce1c3c4abff7560336a7b687d98e0e2bd14f Author: Lasse Collin Date: 2023-01-04 22:40:54 +0200 Update THANKS. THANKS | 1 + 1 file changed, 1 insertion(+) commit aafd67fba045ab99683971263a5a26fb2a6e8ce2 Author: Lasse Collin Date: 2023-01-04 18:40:28 +0200 Tests: Adjust style in test_compress.sh. tests/test_compress.sh | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) commit 52380678f42364daa4510f92f6d3b18ec98c3638 Author: Jia Tan Date: 2023-01-04 23:58:58 +0800 Tests: Replace non portable shell parameter expansion The shell parameter expansion using # and ## is not supported in Solaris 10 Bourne shell (/bin/sh). Even though this is POSIX, it is not fully portable, so we should avoid it. tests/create_compress_files.c | 2 +- tests/test_compress.sh | 20 +++++++++++++------- tests/test_compress_prepared_bcj_sparc | 2 +- tests/test_compress_prepared_bcj_x86 | 2 +- 4 files changed, 16 insertions(+), 10 deletions(-) commit d0eb345bb7d148a62883ee299adec2b74a0f6f3b Author: Jia Tan Date: 2023-01-03 21:02:38 +0800 Translations: Add Korean translation of man pages. Thanks to Seong-ho Cho po4a/ko.po | 5552 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ po4a/po4a.conf | 2 +- 2 files changed, 5553 insertions(+), 1 deletion(-) commit c4145978d95ebf1690c778d354e15f7c2823d7a8 Author: Jia Tan Date: 2023-01-03 20:47:27 +0800 Translations: Update the Esperanto translation. po/eo.po | 620 ++++++++++++++++++++++++++++++++++----------------------------- 1 file changed, 332 insertions(+), 288 deletions(-) commit 4103a2e78ac60b00c888485cd967a5fe5d1b917c Author: Lasse Collin Date: 2023-01-02 17:20:47 +0200 Bump version and soname for 5.5.0alpha. 5.5.0alpha won't be released, it's just to mark that the branch is not for stable 5.4.x. Once again there is no API/ABI stability for new features in devel versions. The major soname won't be bumped even if API/ABI of new features breaks between devel releases. src/liblzma/Makefile.am | 2 +- src/liblzma/api/lzma/version.h | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) commit 73c9e6d6b970ccc3d5ad61dcaa21cba050e5df0a Author: Lasse Collin Date: 2023-01-02 17:05:07 +0200 Build: Fix config.h comments. configure.ac | 2 +- m4/tuklib_progname.m4 | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) commit bb740e3b117f1a3c65152d01e5755523a908ecb1 Author: Jia Tan Date: 2023-01-02 22:33:48 +0800 Build: Only define HAVE_PROGRAM_INVOCATION_NAME if it is set to 1. HAVE_DECL_PROGRAM_INVOCATION_NAME is renamed to HAVE_PROGRAM_INVOCATION_NAME. Previously, HAVE_DECL_PROGRAM_INVOCATION_NAME was always set when building with autotools. CMake would only set this when it was 1, and the dos/config.h did not define it. The new macro definition is consistent across build systems. cmake/tuklib_progname.cmake | 5 ++--- m4/tuklib_progname.m4 | 5 ++++- src/common/tuklib_progname.c | 2 +- src/common/tuklib_progname.h | 2 +- 4 files changed, 8 insertions(+), 6 deletions(-) commit 064cd385a716abc78d93a3612411a82d69ceb221 Author: Jia Tan Date: 2022-12-29 00:30:52 +0800 Adds test_index_hash to .gitignore. .gitignore | 1 + 1 file changed, 1 insertion(+) commit 3959162baec074511d83ba0fec1284c3ed724799 Author: Jia Tan Date: 2022-12-29 00:25:18 +0800 Tests: Creates test_index_hash.c Tests all API functions exported from index_hash.h. Does not have a dedicated test for lzma_index_hash_end. CMakeLists.txt | 2 + tests/Makefile.am | 3 + tests/test_index_hash.c | 379 ++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 384 insertions(+) commit f16e12d5e755d371247202fcccbcccd1ec16b2cf Author: Jia Tan Date: 2022-08-17 20:20:16 +0800 liblzma: Add NULL check to lzma_index_hash_append. This is for consistency with lzma_index_append. src/liblzma/common/index_hash.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 203b008eb220208981902e0db541c02d1c1c9f5e Author: Jia Tan Date: 2022-08-17 17:59:51 +0800 liblzma: Replaced hardcoded 0x0 index indicator byte with macro src/liblzma/common/index.h | 3 +++ src/liblzma/common/index_decoder.c | 2 +- src/liblzma/common/index_encoder.c | 2 +- src/liblzma/common/index_hash.c | 2 +- src/liblzma/common/stream_decoder.c | 3 ++- src/liblzma/common/stream_decoder_mt.c | 2 +- 6 files changed, 9 insertions(+), 5 deletions(-) commit dfecda875211f737d0db92dc1d3c58a3a2afb0c0 Author: Lasse Collin Date: 2022-12-30 20:10:08 +0200 Tests: test_check: Test corner cases of CLMUL CRC64. tests/test_check.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) commit ce96bb20435212fe797d6d84738fb9fd4ea13cc7 Author: Lasse Collin Date: 2022-12-30 19:36:49 +0200 Tests: Clarify a comment in test_lzip_decoder.c. tests/test_lzip_decoder.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) commit 2fcba17fc4d7eda8fc60567169cf2a0e6fcfb2f8 Author: Jia Tan Date: 2022-12-29 01:55:19 +0800 xz: Includes and conditionally in mytime.c. Previously, mytime.c depended on mythread.h for to be included. src/xz/mytime.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) commit f82294c8318a7a0990583d51ac5c7de682ad36ef Author: Jia Tan Date: 2022-12-29 01:15:27 +0800 liblzma: Includes sys/time.h conditionally in mythread Previously, was always included, even if mythread only used clock_gettime. is still needed even if clock_gettime is not used though because struct timespec is needed for mythread_condtime. src/common/mythread.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) commit 74dae7d30091e906d6a92a57952dea4354473f9b Author: Jia Tan Date: 2022-12-29 01:10:53 +0800 Build: No longer require HAVE_DECL_CLOCK_MONOTONIC to always be set. Previously, if threading was enabled HAVE_DECL_CLOCK_MONOTONIC would always be set to 0 or 1. However, this macro was needed in xz so if xz was not built with threading and HAVE_DECL_CLOCK_MONOTONIC was not defined but HAVE_CLOCK_GETTIME was, it caused a warning during build. Now, HAVE_DECL_CLOCK_MONOTONIC has been renamed to HAVE_CLOCK_MONOTONIC and will only be set if it is 1. CMakeLists.txt | 8 +++----- configure.ac | 5 ++++- src/common/mythread.h | 4 ++-- src/xz/mytime.c | 5 ++--- 4 files changed, 11 insertions(+), 11 deletions(-) commit 7339e39dc060df6eda74a2c5b69961befc3d5d24 Author: Jia Tan Date: 2022-12-28 01:14:07 +0800 Translations: Add Ukrainian translations of man pages. Thanks to Yuri Chornoivan po4a/po4a.conf | 2 +- po4a/uk.po | 3676 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 3677 insertions(+), 1 deletion(-) commit 9f05c27a58ce8cd7803079aa295e41c24665ce6e Author: Jia Tan Date: 2022-12-23 00:34:48 +0800 CI/CD: Create initial version of CI/CD workflow. The CI/CD workflow will only execute on Ubuntu and MacOS latest version. The workflow will attempt to build with autotools and CMake and execute the tests. The workflow will run for all pull requests and pushes done to the master branch. .github/workflows/ci.yml | 72 ++++++++++++++++++++++++ build-aux/ci_build.sh | 141 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 213 insertions(+) commit 1275ebfba74230dbd028049141423c79c8b83b8f Author: Jia Tan Date: 2022-12-22 23:14:53 +0800 liblzma: Update documentation for lzma_filter_encoder. src/liblzma/common/filter_encoder.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) commit 7c9ff5f1667a16733163b75dfd4b509662c387f4 Author: Jia Tan Date: 2022-12-21 21:12:03 +0800 Tests: Adds lzip decoder tests .gitignore | 1 + tests/Makefile.am | 2 + tests/test_lzip_decoder.c | 471 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 474 insertions(+) commit 799ead162de63b8400733603d3abcd2e1977bdca Author: Jia Cheong Tan Date: 2022-12-20 22:05:21 +0800 Doxygen: Update .gitignore for generating docs for in source build. In source builds are not recommended, but we should still ignore the generated artifacts. .gitignore | 2 ++ 1 file changed, 2 insertions(+) commit 5f7ce42a16b1e86ca8408b5c670c25e2a12acc4e Author: Jia Tan Date: 2022-12-20 20:46:44 +0800 liblzma: Fix lzma_microlzma_encoder() return value. Using return_if_error on lzma_lzma_lclppb_encode was improper because return_if_error is expecting an lzma_ret value, but lzma_lzma_lclppb_encode returns a boolean. This could result in lzma_microlzma_encoder, which would be misleading for applications. src/liblzma/common/microlzma_encoder.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) commit 8ace358d65059152d9a1f43f4770170d29d35754 Author: Jia Tan Date: 2022-12-16 20:58:55 +0800 CMake: Update .gitignore for CMake artifacts from in source build. In source builds are not recommended, but we can make it easier by ignoring the generated artifacts from CMake. .gitignore | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) commit 8fd225a2c149f30aeac377e68eb5abf6b28300ad Author: Lasse Collin Date: 2022-12-16 18:30:02 +0200 liblzma: Update authors list in arm64.c. src/liblzma/simple/arm64.c | 1 + 1 file changed, 1 insertion(+) commit b69da6d4bb6bb11fc0cf066920791990d2b22a06 Author: Lasse Collin Date: 2022-12-13 20:37:17 +0200 Bump version to 5.4.0 and soname to 5.4.0. src/liblzma/Makefile.am | 2 +- src/liblzma/api/lzma/version.h | 6 +++--- src/liblzma/liblzma_generic.map | 2 +- src/liblzma/liblzma_linux.map | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/README b/README index 9d097deff371..41671676a516 100644 --- a/README +++ b/README @@ -1,310 +1,281 @@ XZ Utils ======== 0. Overview 1. Documentation 1.1. Overall documentation 1.2. Documentation for command-line tools 1.3. Documentation for liblzma 2. Version numbering 3. Reporting bugs 4. Translations + 4.1. Testing translations 5. Other implementations of the .xz format 6. Contact information 0. Overview ----------- XZ Utils provide a general-purpose data-compression library plus command-line tools. The native file format is the .xz format, but also the legacy .lzma format is supported. The .xz format supports multiple compression algorithms, which are called "filters" in the context of XZ Utils. The primary filter is currently LZMA2. With typical files, XZ Utils create about 30 % smaller files than gzip. To ease adapting support for the .xz format into existing applications and scripts, the API of liblzma is somewhat similar to the API of the popular zlib library. For the same reason, the command-line tool xz has a command-line syntax similar to that of gzip. When aiming for the highest compression ratio, the LZMA2 encoder uses a lot of CPU time and may use, depending on the settings, even hundreds of megabytes of RAM. However, in fast modes, the LZMA2 encoder competes with bzip2 in compression speed, RAM usage, and compression ratio. LZMA2 is reasonably fast to decompress. It is a little slower than gzip, but a lot faster than bzip2. Being fast to decompress means that the .xz format is especially nice when the same file will be decompressed very many times (usually on different computers), which is the case e.g. when distributing software packages. In such situations, it's not too bad if the compression takes some time, since that needs to be done only once to benefit many people. With some file types, combining (or "chaining") LZMA2 with an additional filter can improve the compression ratio. A filter chain may contain up to four filters, although usually only one or two are used. For example, putting a BCJ (Branch/Call/Jump) filter before LZMA2 in the filter chain can improve compression ratio of executable files. Since the .xz format allows adding new filter IDs, it is possible that some day there will be a filter that is, for example, much faster to compress than LZMA2 (but probably with worse compression ratio). Similarly, it is possible that some day there is a filter that will compress better than LZMA2. XZ Utils supports multithreaded compression. XZ Utils doesn't support multithreaded decompression yet. It has been planned though and taken into account when designing the .xz file format. In the future, files that were created in threaded mode can be decompressed in threaded mode too. 1. Documentation ---------------- 1.1. Overall documentation README This file INSTALL.generic Generic install instructions for those not familiar with packages using GNU Autotools INSTALL Installation instructions specific to XZ Utils PACKAGERS Information to packagers of XZ Utils COPYING XZ Utils copyright and license information COPYING.0BSD BSD Zero Clause License COPYING.GPLv2 GNU General Public License version 2 COPYING.GPLv3 GNU General Public License version 3 COPYING.LGPLv2.1 GNU Lesser General Public License version 2.1 AUTHORS The main authors of XZ Utils THANKS Incomplete list of people who have helped making this software NEWS User-visible changes between XZ Utils releases ChangeLog Detailed list of changes (commit log) TODO Known bugs and some sort of to-do list Note that only some of the above files are included in binary packages. 1.2. Documentation for command-line tools The command-line tools are documented as man pages. In source code releases (and possibly also in some binary packages), the man pages are also provided in plain text (ASCII only) format in the directory "doc/man" to make the man pages more accessible to those whose operating system doesn't provide an easy way to view man pages. 1.3. Documentation for liblzma The liblzma API headers include short docs about each function and data type as Doxygen tags. These docs should be quite OK as a quick reference. There are a few example/tutorial programs that should help in getting started with liblzma. In the source package the examples are in "doc/examples" and in binary packages they may be under "examples" in the same directory as this README. Since the liblzma API has similarities to the zlib API, some people may find it useful to read the zlib docs and tutorial too: https://zlib.net/manual.html https://zlib.net/zlib_how.html 2. Version numbering -------------------- The version number format of XZ Utils is X.Y.ZS: - X is the major version. When this is incremented, the library API and ABI break. - Y is the minor version. It is incremented when new features are added without breaking the existing API or ABI. An even Y indicates a stable release and an odd Y indicates unstable (alpha or beta version). - Z is the revision. This has a different meaning for stable and unstable releases: * Stable: Z is incremented when bugs get fixed without adding any new features. This is intended to be convenient for downstream distributors that want bug fixes but don't want any new features to minimize the risk of introducing new bugs. * Unstable: Z is just a counter. API or ABI of features added in earlier unstable releases having the same X.Y may break. - S indicates stability of the release. It is missing from the stable releases, where Y is an even number. When Y is odd, S is either "alpha" or "beta" to make it very clear that such versions are not stable releases. The same X.Y.Z combination is not used for more than one stability level, i.e. after X.Y.Zalpha, the next version can be X.Y.(Z+1)beta but not X.Y.Zbeta. 3. Reporting bugs ----------------- Naturally it is easiest for me if you already know what causes the unexpected behavior. Even better if you have a patch to propose. However, quite often the reason for unexpected behavior is unknown, so here are a few things to do before sending a bug report: 1. Try to create a small example how to reproduce the issue. 2. Compile XZ Utils with debugging code using configure switches --enable-debug and, if possible, --disable-shared. If you are using GCC, use CFLAGS='-O0 -ggdb3'. Don't strip the resulting binaries. 3. Turn on core dumps. The exact command depends on your shell; for example in GNU bash it is done with "ulimit -c unlimited", and in tcsh with "limit coredumpsize unlimited". 4. Try to reproduce the suspected bug. If you get "assertion failed" message, be sure to include the complete message in your bug report. If the application leaves a coredump, get a backtrace using gdb: $ gdb /path/to/app-binary # Load the app to the debugger. (gdb) core core # Open the coredump. (gdb) bt # Print the backtrace. Copy & paste to bug report. (gdb) quit # Quit gdb. Report your bug via email or IRC (see Contact information below). Don't send core dump files or any executables. If you have a small example file(s) (total size less than 256 KiB), please include it/them as an attachment. If you have bigger test files, put them online somewhere and include a URL to the file(s) in the bug report. Always include the exact version number of XZ Utils in the bug report. If you are using a snapshot from the git repository, use "git describe" to get the exact snapshot version. If you are using XZ Utils shipped in an operating system distribution, mention the distribution name, distribution version, and exact xz package version; if you cannot repeat the bug with the code compiled from unpatched source code, you probably need to report a bug to your distribution's bug tracking system. 4. Translations --------------- The xz command line tool and all man pages can be translated. The translations are handled via the Translation Project. If you wish to help translating xz, please join the Translation Project: https://translationproject.org/html/translators.html - Below are notes and testing instructions specific to xz - translations. + Updates to translations won't be accepted by methods that bypass + the Translation Project because there is a risk of duplicate work: + translation updates made in the xz repository aren't seen by the + translators in the Translation Project. If you have found bugs in + a translation, please report them to the Language-Team address + which can be found near the beginning of the PO file. - Testing can be done by installing xz into a temporary directory: + If you find language problems in the original English strings, + feel free to suggest improvements. Ask if something is unclear. + + +4.1. Testing translations + + Testing can be done by installing xz into a temporary directory. + + If building from Git repository (not tarball), generate the + Autotools files: + + ./autogen.sh + + Create a subdirectory for the build files. The tmp-build directory + can be deleted after testing. + + mkdir tmp-build + cd tmp-build + ../configure --disable-shared --enable-debug --prefix=$PWD/inst + + Edit the .po file in the po directory. Then build and install to + the "tmp-build/inst" directory, and use translations.bash to see + how some of the messages look. Repeat these steps if needed: - ./configure --disable-shared --prefix=/tmp/xz-test - # make -C po update-po - make install - bash debug/translation.bash | less - bash debug/translation.bash | less -S # For --list outputs - - Repeat the above as needed (no need to re-run configure though). - - Note especially the following: - - - The output of --help and --long-help must look nice on - an 80-column terminal. It's OK to add extra lines if needed. - - - In contrast, don't add extra lines to error messages and such. - They are often preceded with e.g. a filename on the same line, - so you have no way to predict where to put a \n. Let the terminal - do the wrapping even if it looks ugly. Adding new lines will be - even uglier in the generic case even if it looks nice in a few - limited examples. - - - Be careful with column alignment in tables and table-like output - (--list, --list --verbose --verbose, --info-memory, --help, and - --long-help): - - * All descriptions of options in --help should start in the - same column (but it doesn't need to be the same column as - in the English messages; just be consistent if you change it). - Check that both --help and --long-help look OK, since they - share several strings. - - * --list --verbose and --info-memory print lines that have - the format "Description: %s". If you need a longer - description, you can put extra space between the colon - and %s. Then you may need to add extra space to other - strings too so that the result as a whole looks good (all - values start at the same column). - - * The columns of the actual tables in --list --verbose --verbose - should be aligned properly. Abbreviate if necessary. It might - be good to keep at least 2 or 3 spaces between column headings - and avoid spaces in the headings so that the columns stand out - better, but this is a matter of opinion. Do what you think - looks best. - - - Be careful to put a period at the end of a sentence when the - original version has it, and don't put it when the original - doesn't have it. Similarly, be careful with \n characters - at the beginning and end of the strings. - - - Read the TRANSLATORS comments that have been extracted from the - source code and included in xz.pot. Some comments suggest - testing with a specific command which needs an .xz file. You - may use e.g. any tests/files/good-*.xz. However, these test - commands are included in translations.bash output, so reading - translations.bash output carefully can be enough. - - - If you find language problems in the original English strings, - feel free to suggest improvements. Ask if something is unclear. - - - The translated messages should be understandable (sometimes this - may be a problem with the original English messages too). Don't - make a direct word-by-word translation from English especially if - the result doesn't sound good in your language. - - Thanks for your help! + make -j"$(nproc)" install + bash ../debug/translation.bash | less + bash ../debug/translation.bash | less -S # For --list outputs + + To test other languages, set the LANGUAGE environment variable + before running translations.bash. The value should match the PO file + name without the .po suffix. Example: + + export LANGUAGE=fi 5. Other implementations of the .xz format ------------------------------------------ 7-Zip and the p7zip port of 7-Zip support the .xz format starting from the version 9.00alpha. https://7-zip.org/ https://p7zip.sourceforge.net/ XZ Embedded is a limited implementation written for use in the Linux kernel, but it is also suitable for other embedded use. https://tukaani.org/xz/embedded.html XZ for Java is a complete implementation written in pure Java. https://tukaani.org/xz/java.html 6. Contact information ---------------------- XZ Utils in general: - Home page: https://tukaani.org/xz/ - Email to maintainer(s): xz@tukaani.org - IRC: #tukaani on Libera Chat - GitHub: https://github.com/tukaani-project/xz Lead maintainer: - Email: Lasse Collin - IRC: Larhzu on Libera Chat diff --git a/THANKS b/THANKS index 5ed0743b50f0..a6a7a6721079 100644 --- a/THANKS +++ b/THANKS @@ -1,202 +1,239 @@ Thanks ====== Some people have helped more, some less, but nevertheless everyone's help has been important. :-) In alphabetical order: - Mark Adler - Kian-Meng Ang - H. Peter Anvin - Jeff Bastian - Nelson H. F. Beebe - Karl Beldan - Karl Berry - Anders F. Björklund - Emmanuel Blot - Melanie Blower - Alexander Bluhm - Martin Blumenstingl - Ben Boeckel - Jakub Bogusz - Adam Borowski - Maarten Bosmans + - Roel Bouckaert - Lukas Braune - Benjamin Buch - Trent W. Buck - Kevin R. Bulgrien - James Buren - David Burklund - Frank Busse - Daniel Mealha Cabrita - Milo Casagrande + - Cristiano Ceglia - Marek Černocký - Tomer Chachamu - Vitaly Chikunov - Antoine Cœur + - Elijah Almeida Coimbra - Felix Collin + - Ryan Colyer + - Marcus Comstedt + - Vincent Cruz - Gabi Davar + - Ron Desmond - İhsan Doğan - Chris Donawa - Andrew Dudman - Markus Duft - İsmail Dönmez + - Dexter Castor Döpping - Paul Eggert - Robert Elz - Gilles Espinasse - Denis Excoffier - Vincent Fazio - Michael Felt + - Sean Fenian - Michael Fox - Andres Freund - Mike Frysinger + - Collin Funk - Daniel Richard G. - Tomasz Gajc - Bjarni Ingi Gislason - John Paul Adrian Glaubitz - Bill Glessner - Matthew Good - Michał Górny - Jason Gorski + - Alexander M. Greenham - Juan Manuel Guerrero - Gabriela Gutierrez - Diederik de Haas + - Jan Terje Hansen + - Tobias Lahrmann Hansen - Joachim Henke + - Lizandro Heredia - Christian Hesse - Vincenzo Innocente - Peter Ivanov - Nicholas Jackson - Sam James - Hajin Jang - Hans Jansen - Jouk Jansen - Jun I Jin - Christoph Junghans - Kiyoshi Kanazawa - Joona Kannisto - Per Øyvind Karlsen - Firas Khalil Khana - Iouri Kharon + - Kim Jinyeong - Thomas Klausner - Richard Koch - Anton Kochkov + - Harri K. Koskinen - Ville Koskinen - Sergey Kosukhin - Marcin Kowalczyk - Jan Kratochvil - Christian Kujau - Stephan Kulow - Ilya Kurdyukov - Peter Lawler - James M Leddy - Kelvin Lee - Vincent Lefevre - Hin-Tak Leung - Andraž 'ruskie' Levstik - Cary Lewis - Wim Lewis - Xin Li - Yifeng Li - Eric Lindblad - Lorenzo De Liso - H.J. Lu - Bela Lubkin - Chenxi Mao - Gregory Margo - Julien Marrec + - Pierre-Yves Martin - Ed Maste - Martin Matuška + - Scott McAllister + - Chris McCrohan + - Derwin McGeary - Ivan A. Melnikov - Jim Meyering - Arkadiusz Miskiewicz - Nathan Moinvaziri - Étienne Mollier - Conley Moorhous + - Dirk Müller + - Rainer Müller - Andrew Murray - Rafał Mużyło - Adrien Nader - Evan Nemerson - Alexander Neumann - Hongbo Ni - Jonathan Nieder + - Asgeir Storesund Nilsen - Andre Noll + - Ruarí Ødegaard - Peter O'Gorman - Dimitri Papadopoulos Orfanos - Daniel Packard - Filip Palian - Peter Pallinger - Kai Pastor + - Keith Patton - Rui Paulo - Igor Pavlov - Diego Elio Pettenò - Elbert Pol + - Guiorgy Potskhishvili - Mikko Pouru - Frank Prochnow - Rich Prohaska - Trần Ngọc Quân - Pavel Raiskup + - Matthieu Rakotojaona - Ole André Vadla Ravnås - Eric S. Raymond - Robert Readman - Bernhard Reutner-Fischer - Markus Rickert - Cristian Rodríguez + - Jeroen Roovers - Christian von Roques - Boud Roukema - Torsten Rupp - Stephen Sachs - Jukka Salmi - Agostino Sarubbo - Vijay Sarvepalli - Alexandre Sauvé - Benno Schulenberg - Andreas Schwab - Eli Schwartz - Peter Seiderer - Bhargava Shastry - Dan Shechter - Stuart Shelton - Sebastian Andrzej Siewior + - Andrej Skenderija - Ville Skyttä - Brad Smith - Bruce Stark - Pippijn van Steenhoven - Tobias Stoeckmann - Martin Storsjö - Jonathan Stott - Dan Stromberg - Douglas Thor - Vincent Torri - Alexey Tourbin - Paul Townsend - Mohammed Adnène Trojette - Orange Tsai - Taiki Tsunekawa - Mathieu Vachon - Maksym Vatsyk - Loganaden Velvindron - Patrick J. Volkerding - Martin Väth - Adam Walling - Jeffrey Walton - Christian Weisgerber - Dan Weiss - Bert Wesarg + - Mark Wielaard - Fredrik Wikstrom - Jim Wilcoxson - Ralf Wildenhues - Charles Wilson - Lars Wirzenius + - Vincent Wixsom - Pilorz Wojciech - Chien Wong + - Xi Ruoyao - Ryan Young - Andreas Zieringer + - 榆柳松 (ZhengSen Wang) Companies: - Google - Sandfly Security +Other credits: + - cleemy desu wayo working with Trend Micro Zero Day Initiative + - Orange Tsai and splitline from DEVCORE Research Team + Also thanks to all the people who have participated in the Tukaani project. I have probably forgot to add some names to the above list. Sorry about that and thanks for your help. diff --git a/TODO b/TODO index ad37f3f559aa..7a0bf16ed86e 100644 --- a/TODO +++ b/TODO @@ -1,105 +1,88 @@ XZ Utils To-Do List =================== Known bugs ---------- - The test suite is too incomplete. - - If the memory usage limit is less than about 13 MiB, xz is unable to - automatically scale down the compression settings enough even though - it would be possible by switching from BT2/BT3/BT4 match finder to - HC3/HC4. + The test suite is incomplete. XZ Utils compress some files significantly worse than LZMA Utils. This is due to faster compression presets used by XZ Utils, and can often be worked around by using "xz --extreme". With some files --extreme isn't enough though: it's most likely with files that compress extremely well, so going from compression ratio of 0.003 to 0.004 means big relative increase in the compressed file size. - xz doesn't quote unprintable characters when it displays file names - given on the command line. - tuklib_exit() doesn't block signals => EINTR is possible. If liblzma has created threads and fork() gets called, liblzma code will break in the child process unless it calls exec() and doesn't touch liblzma. Missing features ---------------- Add support for storing metadata in .xz files. A preliminary idea is to create a new Stream type for metadata. When both metadata and data are wanted in the same .xz file, two or more Streams would be concatenated. The state stored in lzma_stream should be cloneable, which would be mostly useful when using a preset dictionary in LZMA2, but it may have other uses too. Compare to deflateCopy() in zlib. - Support LZMA_FINISH in raw decoder to indicate end of LZMA1 and - other streams that don't have an end of payload marker. - Adjust dictionary size when the input file size is known. Maybe do this only if an option is given. xz doesn't support copying extended attributes, access control lists etc. from source to target file. Multithreaded compression: - Reduce memory usage of the current method. - Implement threaded match finders. - Implement pigz-style threading in LZMA2. Buffer-to-buffer coding could use less RAM (especially when decompressing LZMA1 or LZMA2). I/O library is not implemented (similar to gzopen() in zlib). It will be a separate library that supports uncompressed, .gz, .bz2, .lzma, and .xz files. Support changing lzma_options_lzma.mode with lzma_filters_update(). Support LZMA_FULL_FLUSH for lzma_stream_decoder() to stop at Block and Stream boundaries. - lzma_strerror() to convert lzma_ret to human readable form? - This is tricky, because the same error codes are used with - slightly different meanings, and this cannot be fixed anymore. + Error codes from lzma_code() aren't very specific. A more detailed + error message (string) could be provided too. It could be returned + by a new function or use a currently-reserved member of lzma_stream. Make it possible to adjust LZMA2 options in the middle of a Block so that the encoding speed vs. compression ratio can be optimized when the compressed data is streamed over network. Improved BCJ filters. The current filters are small but they aren't so great when compressing binary packages that contain various file types. Specifically, they make things worse if there are static libraries or Linux kernel modules. The filtering could also be more effective (without getting overly complex), for example, streamable variant BCJ2 from 7-Zip could be implemented. Filter that autodetects specific data types in the input stream and applies appropriate filters for the corrects parts of the input. Perhaps combine this with the BCJ filter improvement point above. Long-range LZ77 method as a separate filter or as a new LZMA2 match finder. Documentation ------------- More tutorial programs are needed for liblzma. Document the LZMA1 and LZMA2 algorithms. - -Miscellaneous ------------- - - Try to get the media type for .xz registered at IANA. - diff --git a/src/common/my_landlock.h b/src/common/my_landlock.h new file mode 100644 index 000000000000..e135d08c858f --- /dev/null +++ b/src/common/my_landlock.h @@ -0,0 +1,141 @@ +// SPDX-License-Identifier: 0BSD + +/////////////////////////////////////////////////////////////////////////////// +// +/// \file my_landlock.h +/// \brief Linux Landlock sandbox helper functions +// +// Author: Lasse Collin +// +/////////////////////////////////////////////////////////////////////////////// + +#ifndef MY_LANDLOCK_H +#define MY_LANDLOCK_H + +#include "sysdefs.h" + +#include +#include +#include + + +/// \brief Initialize Landlock ruleset attributes to forbid everything +/// +/// The supported Landlock ABI is checked at runtime and only the supported +/// actions are forbidden in the attributes. Thus, if the attributes are +/// used with my_landlock_create_ruleset(), it shouldn't fail. +/// +/// \return On success, the Landlock ABI version is returned (a positive +/// integer). If Landlock isn't supported, -1 is returned. +static int +my_landlock_ruleset_attr_forbid_all(struct landlock_ruleset_attr *attr) +{ + memzero(attr, sizeof(*attr)); + + const int abi_version = syscall(SYS_landlock_create_ruleset, + (void *)NULL, 0, LANDLOCK_CREATE_RULESET_VERSION); + if (abi_version <= 0) + return -1; + + // ABI 1 except the few at the end + attr->handled_access_fs + = LANDLOCK_ACCESS_FS_EXECUTE + | LANDLOCK_ACCESS_FS_WRITE_FILE + | LANDLOCK_ACCESS_FS_READ_FILE + | LANDLOCK_ACCESS_FS_READ_DIR + | LANDLOCK_ACCESS_FS_REMOVE_DIR + | LANDLOCK_ACCESS_FS_REMOVE_FILE + | LANDLOCK_ACCESS_FS_MAKE_CHAR + | LANDLOCK_ACCESS_FS_MAKE_DIR + | LANDLOCK_ACCESS_FS_MAKE_REG + | LANDLOCK_ACCESS_FS_MAKE_SOCK + | LANDLOCK_ACCESS_FS_MAKE_FIFO + | LANDLOCK_ACCESS_FS_MAKE_BLOCK + | LANDLOCK_ACCESS_FS_MAKE_SYM +#ifdef LANDLOCK_ACCESS_FS_REFER + | LANDLOCK_ACCESS_FS_REFER // ABI 2 +#endif +#ifdef LANDLOCK_ACCESS_FS_TRUNCATE + | LANDLOCK_ACCESS_FS_TRUNCATE // ABI 3 +#endif +#ifdef LANDLOCK_ACCESS_FS_IOCTL_DEV + | LANDLOCK_ACCESS_FS_IOCTL_DEV // ABI 5 +#endif + ; + +#ifdef LANDLOCK_ACCESS_NET_BIND_TCP + // ABI 4 + attr->handled_access_net + = LANDLOCK_ACCESS_NET_BIND_TCP + | LANDLOCK_ACCESS_NET_CONNECT_TCP; +#endif + +#ifdef LANDLOCK_SCOPE_SIGNAL + // ABI 6 + attr->scoped + = LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET + | LANDLOCK_SCOPE_SIGNAL; +#endif + + // Disable flags that require a new ABI version. + switch (abi_version) { + case 1: +#ifdef LANDLOCK_ACCESS_FS_REFER + attr->handled_access_fs &= ~LANDLOCK_ACCESS_FS_REFER; +#endif + FALLTHROUGH; + + case 2: +#ifdef LANDLOCK_ACCESS_FS_TRUNCATE + attr->handled_access_fs &= ~LANDLOCK_ACCESS_FS_TRUNCATE; +#endif + FALLTHROUGH; + + case 3: +#ifdef LANDLOCK_ACCESS_NET_BIND_TCP + attr->handled_access_net = 0; +#endif + FALLTHROUGH; + + case 4: +#ifdef LANDLOCK_ACCESS_FS_IOCTL_DEV + attr->handled_access_fs &= ~LANDLOCK_ACCESS_FS_IOCTL_DEV; +#endif + FALLTHROUGH; + + case 5: +#ifdef LANDLOCK_SCOPE_SIGNAL + attr->scoped = 0; +#endif + FALLTHROUGH; + + default: + // We only know about the features of the ABIs 1-6. + break; + } + + return abi_version; +} + + +/// \brief Wrapper for the landlock_create_ruleset(2) syscall +/// +/// Syscall wrappers provide argument type checking. +/// +/// \note Remember to call `prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)` too! +static inline int +my_landlock_create_ruleset(const struct landlock_ruleset_attr *attr, + size_t size, uint32_t flags) +{ + return syscall(SYS_landlock_create_ruleset, attr, size, flags); +} + + +/// \brief Wrapper for the landlock_restrict_self(2) syscall +static inline int +my_landlock_restrict_self(int ruleset_fd, uint32_t flags) +{ + return syscall(SYS_landlock_restrict_self, ruleset_fd, flags); +} + +#endif diff --git a/src/common/sysdefs.h b/src/common/sysdefs.h index 5f3785b5137a..b10ffa7c3b18 100644 --- a/src/common/sysdefs.h +++ b/src/common/sysdefs.h @@ -1,199 +1,229 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file sysdefs.h /// \brief Common includes, definitions, system-specific things etc. /// /// This file is used also by the lzma command line tool, that's why this /// file is separate from common.h. // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #ifndef LZMA_SYSDEFS_H #define LZMA_SYSDEFS_H ////////////// // Includes // ////////////// #ifdef HAVE_CONFIG_H # include #endif -// This #define ensures that C99 and POSIX compliant stdio functions are -// available with MinGW-w64 (both 32-bit and 64-bit). Modern MinGW-w64 adds -// this automatically, for example, when the compiler is in C99 (or later) -// mode when building against msvcrt.dll. It still doesn't hurt to be explicit -// that we always want this and #define this unconditionally. +// Choose if MinGW-w64's stdio replacement functions should be used. +// The default has varied slightly in the past so it's clearest to always +// set it explicitly. // -// With Universal CRT (UCRT) this is less important because UCRT contains -// C99-compatible stdio functions. It's still nice to #define this as UCRT -// doesn't support the POSIX thousand separator flag in printf (like "%'u"). -#ifdef __MINGW32__ +// Modern MinGW-w64 enables the replacement functions even with UCRT +// when _GNU_SOURCE is defined. That's good because UCRT doesn't support +// the POSIX thousand separator flag in printf (like "%'u"). Otherwise +// XZ Utils works with the UCRT stdio functions. +// +// The replacement functions add over 20 KiB to each executable. For +// size-optimized builds (HAVE_SMALL), disable the replacements. +// Then thousand separators aren't shown in xz's messages but this is +// a minor downside compare to the slower speed of the HAVE_SMALL builds. +// +// The legacy MSVCRT is pre-C99 and it's best to always use the stdio +// replacements functions from MinGW-w64. +#if defined(__MINGW32__) && !defined(__USE_MINGW_ANSI_STDIO) # define __USE_MINGW_ANSI_STDIO 1 +# include <_mingw.h> +# if defined(_UCRT) && defined(HAVE_SMALL) +# undef __USE_MINGW_ANSI_STDIO +# define __USE_MINGW_ANSI_STDIO 0 +# endif #endif // size_t and NULL #include #ifdef HAVE_INTTYPES_H # include #endif // C99 says that inttypes.h always includes stdint.h, but some systems // don't do that, and require including stdint.h separately. #ifdef HAVE_STDINT_H # include #endif // Some pre-C99 systems have SIZE_MAX in limits.h instead of stdint.h. The // limits are also used to figure out some macros missing from pre-C99 systems. #include // Be more compatible with systems that have non-conforming inttypes.h. // We assume that int is 32-bit and that long is either 32-bit or 64-bit. // Full Autoconf test could be more correct, but this should work well enough. // Note that this duplicates some code from lzma.h, but this is better since // we can work without inttypes.h thanks to Autoconf tests. #ifndef UINT32_C # if UINT_MAX != 4294967295U # error UINT32_C is not defined and unsigned int is not 32-bit. # endif # define UINT32_C(n) n ## U #endif #ifndef UINT32_MAX # define UINT32_MAX UINT32_C(4294967295) #endif #ifndef PRIu32 # define PRIu32 "u" #endif #ifndef PRIx32 # define PRIx32 "x" #endif #ifndef PRIX32 # define PRIX32 "X" #endif #if ULONG_MAX == 4294967295UL # ifndef UINT64_C # define UINT64_C(n) n ## ULL # endif # ifndef PRIu64 # define PRIu64 "llu" # endif # ifndef PRIx64 # define PRIx64 "llx" # endif # ifndef PRIX64 # define PRIX64 "llX" # endif #else # ifndef UINT64_C # define UINT64_C(n) n ## UL # endif # ifndef PRIu64 # define PRIu64 "lu" # endif # ifndef PRIx64 # define PRIx64 "lx" # endif # ifndef PRIX64 # define PRIX64 "lX" # endif #endif #ifndef UINT64_MAX # define UINT64_MAX UINT64_C(18446744073709551615) #endif // Incorrect(?) SIZE_MAX: // - Interix headers typedef size_t to unsigned long, // but a few lines later define SIZE_MAX to INT32_MAX. // - SCO OpenServer (x86) headers typedef size_t to unsigned int // but define SIZE_MAX to INT32_MAX. #if defined(__INTERIX) || defined(_SCO_DS) # undef SIZE_MAX #endif // The code currently assumes that size_t is either 32-bit or 64-bit. #ifndef SIZE_MAX # if SIZEOF_SIZE_T == 4 # define SIZE_MAX UINT32_MAX # elif SIZEOF_SIZE_T == 8 # define SIZE_MAX UINT64_MAX # else # error size_t is not 32-bit or 64-bit # endif #endif #if SIZE_MAX != UINT32_MAX && SIZE_MAX != UINT64_MAX # error size_t is not 32-bit or 64-bit #endif #include #include // Pre-C99 systems lack stdbool.h. All the code in XZ Utils must be written // so that it works with fake bool type, for example: // // bool foo = (flags & 0x100) != 0; // bool bar = !!(flags & 0x100); // // This works with the real C99 bool but breaks with fake bool: // // bool baz = (flags & 0x100); // #ifdef HAVE_STDBOOL_H # include #else # if ! HAVE__BOOL typedef unsigned char _Bool; # endif # define bool _Bool # define false 0 # define true 1 # define __bool_true_false_are_defined 1 #endif +// We may need alignas from C11/C17/C23. +#if __STDC_VERSION__ >= 202311 + // alignas is a keyword in C23. Do nothing. +#elif __STDC_VERSION__ >= 201112 + // Oracle Developer Studio 12.6 lacks . + // For simplicity, avoid the header with all C11/C17 compilers. +# define alignas _Alignas +#elif defined(__GNUC__) || defined(__clang__) +# define alignas(n) __attribute__((__aligned__(n))) +#else +# define alignas(n) +#endif + #include -// Visual Studio 2013 update 2 supports only __inline, not inline. -// MSVC v19.0 / VS 2015 and newer support both. +// MSVC v19.00 (VS 2015 version 14.0) and later should work. // // MSVC v19.27 (VS 2019 version 16.7) added support for restrict. // Older ones support only __restrict. #ifdef _MSC_VER -# if _MSC_VER < 1900 && !defined(inline) -# define inline __inline -# endif # if _MSC_VER < 1927 && !defined(restrict) # define restrict __restrict # endif #endif //////////// // Macros // //////////// #undef memzero #define memzero(s, n) memset(s, 0, n) // NOTE: Avoid using MIN() and MAX(), because even conditionally defining // those macros can cause some portability trouble, since on some systems // the system headers insist defining their own versions. #define my_min(x, y) ((x) < (y) ? (x) : (y)) #define my_max(x, y) ((x) > (y) ? (x) : (y)) #ifndef ARRAY_SIZE # define ARRAY_SIZE(array) (sizeof(array) / sizeof((array)[0])) #endif #if defined(__GNUC__) \ && ((__GNUC__ == 4 && __GNUC_MINOR__ >= 3) || __GNUC__ > 4) # define lzma_attr_alloc_size(x) __attribute__((__alloc_size__(x))) #else # define lzma_attr_alloc_size(x) #endif +#if __STDC_VERSION__ >= 202311 +# define FALLTHROUGH [[__fallthrough__]] +#elif (defined(__GNUC__) && __GNUC__ >= 7) \ + || (defined(__clang_major__) && __clang_major__ >= 10) +# define FALLTHROUGH __attribute__((__fallthrough__)) +#else +# define FALLTHROUGH ((void)0) +#endif + #endif diff --git a/src/common/tuklib_common.h b/src/common/tuklib_common.h index 7554dfc86fb6..d73f07255e4d 100644 --- a/src/common/tuklib_common.h +++ b/src/common/tuklib_common.h @@ -1,90 +1,95 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file tuklib_common.h /// \brief Common definitions for tuklib modules // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #ifndef TUKLIB_COMMON_H #define TUKLIB_COMMON_H // The config file may be replaced by a package-specific file. // It should include at least stddef.h, stdbool.h, inttypes.h, and limits.h. #include "tuklib_config.h" // TUKLIB_SYMBOL_PREFIX is prefixed to all symbols exported by // the tuklib modules. If you use a tuklib module in a library, // you should use TUKLIB_SYMBOL_PREFIX to make sure that there // are no symbol conflicts in case someone links your library // into application that also uses the same tuklib module. #ifndef TUKLIB_SYMBOL_PREFIX # define TUKLIB_SYMBOL_PREFIX #endif #define TUKLIB_CAT_X(a, b) a ## b #define TUKLIB_CAT(a, b) TUKLIB_CAT_X(a, b) #ifndef TUKLIB_SYMBOL # define TUKLIB_SYMBOL(sym) TUKLIB_CAT(TUKLIB_SYMBOL_PREFIX, sym) #endif #ifndef TUKLIB_DECLS_BEGIN # ifdef __cplusplus # define TUKLIB_DECLS_BEGIN extern "C" { # else # define TUKLIB_DECLS_BEGIN # endif #endif #ifndef TUKLIB_DECLS_END # ifdef __cplusplus # define TUKLIB_DECLS_END } # else # define TUKLIB_DECLS_END # endif #endif #if defined(__GNUC__) && defined(__GNUC_MINOR__) # define TUKLIB_GNUC_REQ(major, minor) \ ((__GNUC__ == (major) && __GNUC_MINOR__ >= (minor)) \ || __GNUC__ > (major)) #else # define TUKLIB_GNUC_REQ(major, minor) 0 #endif +#if defined(__GNUC__) || defined(__clang__) +# define tuklib_attr_format_printf(fmt_index, args_index) \ + __attribute__((__format__(__printf__, fmt_index, args_index))) +#else +# define tuklib_attr_format_printf(fmt_index, args_index) +#endif + // tuklib_attr_noreturn attribute is used to mark functions as non-returning. // We cannot use "noreturn" as the macro name because then C23 code that // uses [[noreturn]] would break as it would expand to [[ [[noreturn]] ]]. // // tuklib_attr_noreturn must be used at the beginning of function declaration // to work in all cases. The [[noreturn]] syntax is the most limiting, it // must be even before any GNU C's __attribute__ keywords: // // tuklib_attr_noreturn // __attribute__((nonnull(1))) // extern void foo(const char *s); // -// FIXME: Update __STDC_VERSION__ for the final C23 version. 202000 is used -// by GCC 13 and Clang 15 with -std=c2x. -#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202000 +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311 # define tuklib_attr_noreturn [[noreturn]] #elif defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112 # define tuklib_attr_noreturn _Noreturn #elif TUKLIB_GNUC_REQ(2, 5) # define tuklib_attr_noreturn __attribute__((__noreturn__)) #elif defined(_MSC_VER) # define tuklib_attr_noreturn __declspec(noreturn) #else # define tuklib_attr_noreturn #endif #if (defined(_WIN32) && !defined(__CYGWIN__)) \ || defined(__OS2__) || defined(__MSDOS__) # define TUKLIB_DOSLIKE 1 #endif #endif diff --git a/src/common/tuklib_gettext.h b/src/common/tuklib_gettext.h index 3ef5cb7292b5..e5ad5e6f78a1 100644 --- a/src/common/tuklib_gettext.h +++ b/src/common/tuklib_gettext.h @@ -1,43 +1,54 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file tuklib_gettext.h /// \brief Wrapper for gettext and friends // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #ifndef TUKLIB_GETTEXT_H #define TUKLIB_GETTEXT_H #include "tuklib_common.h" #include #ifndef TUKLIB_GETTEXT # ifdef ENABLE_NLS # define TUKLIB_GETTEXT 1 # else # define TUKLIB_GETTEXT 0 # endif #endif #if TUKLIB_GETTEXT # include # define tuklib_gettext_init(package, localedir) \ do { \ setlocale(LC_ALL, ""); \ bindtextdomain(package, localedir); \ textdomain(package); \ } while (0) # define _(msgid) gettext(msgid) #else # define tuklib_gettext_init(package, localedir) \ setlocale(LC_ALL, "") # define _(msgid) (msgid) # define ngettext(msgid1, msgid2, n) ((n) == 1 ? (msgid1) : (msgid2)) #endif #define N_(msgid) msgid +// Optional: Strings that are word wrapped using tuklib_mbstr_wrap may be +// marked with W_("foo) in the source code. xgettext can then add a comment +// to all such strings to inform translators. The following option needs to +// be added to XGETTEXT_OPTIONS in po/Makevars or in an equivalent place: +// +// '--keyword=W_:1,"This is word wrapped at spaces. The Unicode character U+00A0 works as a non-breaking space. Tab (\t) is interpret as a zero-width space (the tab itself is not displayed); U+200B is NOT supported. Manual word wrapping with \n is supported but requires care."' +// +// NOTE: The double-quotes in the --keyword argument above must be passed to +// xgettext as is, thus one needs the single-quotes in Makevars. +#define W_(msgid) _(msgid) + #endif diff --git a/src/common/tuklib_mbstr.h b/src/common/tuklib_mbstr.h index 4c8eeb7e3700..5ac06eb35e88 100644 --- a/src/common/tuklib_mbstr.h +++ b/src/common/tuklib_mbstr.h @@ -1,65 +1,78 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file tuklib_mbstr.h /// \brief Utility functions for handling multibyte strings /// /// If not enough multibyte string support is available in the C library, /// these functions keep working with the assumption that all strings /// are in a single-byte character set without combining characters, e.g. /// US-ASCII or ISO-8859-*. // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #ifndef TUKLIB_MBSTR_H #define TUKLIB_MBSTR_H #include "tuklib_common.h" TUKLIB_DECLS_BEGIN #define tuklib_mbstr_width TUKLIB_SYMBOL(tuklib_mbstr_width) extern size_t tuklib_mbstr_width(const char *str, size_t *bytes); ///< /// \brief Get the number of columns needed for the multibyte string /// /// This is somewhat similar to wcswidth() but works on multibyte strings. /// -/// \param str String whose width is to be calculated. If the -/// current locale uses a multibyte character set -/// that has shift states, the string must begin -/// and end in the initial shift state. +/// \param str String whose width is to be calculated. /// \param bytes If this is not NULL, *bytes is set to the /// value returned by strlen(str) (even if an /// error occurs when calculating the width). /// /// \return On success, the number of columns needed to display the /// string e.g. in a terminal emulator is returned. On error, /// (size_t)-1 is returned. Possible errors include invalid, -/// partial, or non-printable multibyte character in str, or -/// that str doesn't end in the initial shift state. +/// partial, or non-printable multibyte character in str. + +#define tuklib_mbstr_width_mem TUKLIB_SYMBOL(tuklib_mbstr_width_mem) +extern size_t tuklib_mbstr_width_mem(const char *str, size_t len); +///< +/// \brief Get the number of columns needed for the multibyte buffer +/// +/// This is like tuklib_mbstr_width() except that this takes the buffer +/// length in bytes as the second argument. This allows using the function +/// for buffers that aren't terminated with '\0'. +/// +/// \param str String whose width is to be calculated. +/// \param len Number of bytes to read from str. +/// +/// \return On success, the number of columns needed to display the +/// string e.g. in a terminal emulator is returned. On error, +/// (size_t)-1 is returned. Possible errors include invalid, +/// partial, or non-printable multibyte character in str. #define tuklib_mbstr_fw TUKLIB_SYMBOL(tuklib_mbstr_fw) extern int tuklib_mbstr_fw(const char *str, int columns_min); ///< /// \brief Get the field width for printf() e.g. to align table columns /// /// Printing simple tables to a terminal can be done using the field field /// feature in the printf() format string, but it works only with single-byte /// character sets. To do the same with multibyte strings, tuklib_mbstr_fw() /// can be used to calculate appropriate field width. /// /// The behavior of this function is undefined, if /// - str is NULL or not terminated with '\0'; /// - columns_min <= 0; or /// - the calculated field width exceeds INT_MAX. /// /// \return If tuklib_mbstr_width(str, NULL) fails, -1 is returned. /// If str needs more columns than columns_min, zero is returned. /// Otherwise a positive integer is returned, which can be /// used as the field width, e.g. printf("%*s", fw, str). TUKLIB_DECLS_END #endif diff --git a/src/common/tuklib_mbstr_nonprint.c b/src/common/tuklib_mbstr_nonprint.c new file mode 100644 index 000000000000..dc778757b148 --- /dev/null +++ b/src/common/tuklib_mbstr_nonprint.c @@ -0,0 +1,162 @@ +// SPDX-License-Identifier: 0BSD + +/////////////////////////////////////////////////////////////////////////////// +// +/// \file tuklib_mbstr_nonprint.c +/// \brief Find and replace non-printable characters with question marks +// +// Author: Lasse Collin +// +/////////////////////////////////////////////////////////////////////////////// + +#include "tuklib_mbstr_nonprint.h" +#include +#include +#include + +#ifdef HAVE_MBRTOWC +# include +# include +#else +# include +#endif + + +static bool +is_next_printable(const char *str, size_t len, size_t *next_len) +{ +#ifdef HAVE_MBRTOWC + // This assumes that character sets with locking shift states aren't + // used, and thus mbsinit() is never needed. + mbstate_t ps; + memset(&ps, 0, sizeof(ps)); + + wchar_t wc; + *next_len = mbrtowc(&wc, str, len, &ps); + + if (*next_len == (size_t)-2) { + // Incomplete multibyte sequence: Treat the whole sequence + // as a single non-printable multibyte character that ends + // the string. + *next_len = len; + return false; + } + + // Check more broadly than just ret == (size_t)-1 to be safe + // in case mbrtowc() returns something weird. This check + // covers (size_t)-1 (that is, SIZE_MAX) too because len is from + // strlen() and the terminating '\0' isn't part of the length. + if (*next_len < 1 || *next_len > len) { + // Invalid multibyte sequence: Treat the first byte as + // a non-printable single-byte character. Decoding will + // be restarted from the next byte on the next call to + // this function. + *next_len = 1; + return false; + } + +# if defined(_WIN32) && !defined(__CYGWIN__) + // On Windows, wchar_t stores UTF-16 code units, thus characters + // outside the Basic Multilingual Plane (BMP) don't fit into + // a single wchar_t. In an UTF-8 locale, UCRT's mbrtowc() returns + // successfully when the input is a non-BMP character but the + // output is the replacement character U+FFFD. + // + // iswprint() returns 0 for U+FFFD on Windows for some reason. Treat + // U+FFFD as printable and thus also all non-BMP chars as printable. + if (wc == 0xFFFD) + return true; +# endif + + return iswprint((wint_t)wc) != 0; +#else + (void)len; + *next_len = 1; + return isprint((unsigned char)str[0]) != 0; +#endif +} + + +static bool +has_nonprint(const char *str, size_t len) +{ + for (size_t i = 0; i < len; ) { + size_t next_len; + if (!is_next_printable(str + i, len - i, &next_len)) + return true; + + i += next_len; + } + + return false; +} + + +extern bool +tuklib_has_nonprint(const char *str) +{ + const int saved_errno = errno; + const bool ret = has_nonprint(str, strlen(str)); + errno = saved_errno; + return ret; +} + + +extern const char * +tuklib_mask_nonprint_r(const char *str, char **mem) +{ + const int saved_errno = errno; + + // Free the old string, if any. + free(*mem); + *mem = NULL; + + // If the whole input string contains only printable characters, + // return the input string. + const size_t len = strlen(str); + if (!has_nonprint(str, len)) { + errno = saved_errno; + return str; + } + + // Allocate memory for the masked string. Since we use the single-byte + // character '?' to mask non-printable characters, it's possible that + // a few bytes less memory would be needed in reality if multibyte + // characters are masked. + // + // If allocation fails, return "???" because it should be safer than + // returning the unmasked string. + *mem = malloc(len + 1); + if (*mem == NULL) { + errno = saved_errno; + return "???"; + } + + // Replace all non-printable characters with '?'. + char *dest = *mem; + + for (size_t i = 0; i < len; ) { + size_t next_len; + if (is_next_printable(str + i, len - i, &next_len)) { + memcpy(dest, str + i, next_len); + dest += next_len; + } else { + *dest++ = '?'; + } + + i += next_len; + } + + *dest = '\0'; + + errno = saved_errno; + return *mem; +} + + +extern const char * +tuklib_mask_nonprint(const char *str) +{ + static char *mem = NULL; + return tuklib_mask_nonprint_r(str, &mem); +} diff --git a/src/common/tuklib_mbstr_nonprint.h b/src/common/tuklib_mbstr_nonprint.h new file mode 100644 index 000000000000..6fc969109fe0 --- /dev/null +++ b/src/common/tuklib_mbstr_nonprint.h @@ -0,0 +1,71 @@ +// SPDX-License-Identifier: 0BSD + +/////////////////////////////////////////////////////////////////////////////// +// +/// \file tuklib_mbstr_nonprint.h +/// \brief Find and replace non-printable characters with question marks +/// +/// If mbrtowc(3) is available, it and iswprint(3) is used to check if all +/// characters are printable. Otherwise single-byte character set is assumed +/// and isprint(3) is used. +// +// Author: Lasse Collin +// +/////////////////////////////////////////////////////////////////////////////// + +#ifndef TUKLIB_MBSTR_NONPRINT_H +#define TUKLIB_MBSTR_NONPRINT_H + +#include "tuklib_common.h" +TUKLIB_DECLS_BEGIN + +#define tuklib_has_nonprint TUKLIB_SYMBOL(tuklib_has_nonprint) +extern bool tuklib_has_nonprint(const char *str); +///< +/// \brief Check if a string contains any non-printable characters +/// +/// \return false if str contains only valid multibyte characters and +/// iswprint(3) returns non-zero for all of them; true otherwise. +/// The value of errno is preserved. +/// +/// \note In case mbrtowc(3) isn't available, single-byte character set +/// is assumed and isprint(3) is used instead of iswprint(3). + +#define tuklib_mask_nonprint_r TUKLIB_SYMBOL(tuklib_mask_nonprint_r) +extern const char *tuklib_mask_nonprint_r(const char *str, char **mem); +///< +/// \brief Replace non-printable characters with question marks +/// +/// \param str Untrusted string, for example, a filename +/// \param mem This function always calls free(*mem) to free the old +/// allocation and then sets *mem = NULL. Before the first +/// call, *mem should be initialized to NULL. If this +/// function needs to allocate memory for a modified +/// string, a pointer to the allocated memory will be +/// stored to *mem. Otherwise *mem will remain NULL. +/// +/// \return If tuklib_has_nonprint(str) returns false, this function +/// returns str. Otherwise memory is allocated to hold a modified +/// string and a pointer to that is returned. The pointer to the +/// allocated memory is also stored to *mem. A modified string +/// has the problematic characters replaced by '?'. If memory +/// allocation fails, "???" is returned and *mem is NULL. +/// The value of errno is preserved. + +#define tuklib_mask_nonprint TUKLIB_SYMBOL(tuklib_mask_nonprint) +extern const char *tuklib_mask_nonprint(const char *str); +///< +/// \brief Replace non-printable characters with question marks +/// +/// This is a convenience function for single-threaded use. This calls +/// tuklib_mask_nonprint_r() using an internal static variable to hold +/// the possible allocation. +/// +/// \param str Untrusted string, for example, a filename +/// +/// \return See tuklib_mask_nonprint_r(). +/// +/// \note This function is not thread safe! + +TUKLIB_DECLS_END +#endif diff --git a/src/common/tuklib_mbstr_width.c b/src/common/tuklib_mbstr_width.c index 7a8bf0707518..98c611d8f38d 100644 --- a/src/common/tuklib_mbstr_width.c +++ b/src/common/tuklib_mbstr_width.c @@ -1,64 +1,86 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file tuklib_mbstr_width.c /// \brief Calculate width of a multibyte string // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "tuklib_mbstr.h" #include -#if defined(HAVE_MBRTOWC) && defined(HAVE_WCWIDTH) +#ifdef HAVE_MBRTOWC # include #endif extern size_t tuklib_mbstr_width(const char *str, size_t *bytes) { const size_t len = strlen(str); if (bytes != NULL) *bytes = len; -#if !(defined(HAVE_MBRTOWC) && defined(HAVE_WCWIDTH)) + return tuklib_mbstr_width_mem(str, len); +} + + +extern size_t +tuklib_mbstr_width_mem(const char *str, size_t len) +{ +#ifndef HAVE_MBRTOWC // In single-byte mode, the width of the string is the same // as its length. + (void)str; return len; #else mbstate_t state; memset(&state, 0, sizeof(state)); size_t width = 0; size_t i = 0; // Convert one multibyte character at a time to wchar_t // and get its width using wcwidth(). while (i < len) { wchar_t wc; const size_t ret = mbrtowc(&wc, str + i, len - i, &state); - if (ret < 1 || ret > len) + if (ret < 1 || ret > len - i) return (size_t)-1; i += ret; +#ifdef HAVE_WCWIDTH const int wc_width = wcwidth(wc); if (wc_width < 0) return (size_t)-1; width += (size_t)wc_width; +#else + // Without wcwidth() (like in a native Windows build), + // assume that one multibyte char == one column. With + // UTF-8, this is less bad than one byte == one column. + // This way quite a few languages will be handled correctly + // in practice; CJK chars will be very wrong though. + ++width; +#endif } - // Require that the string ends in the initial shift state. - // This way the caller can be combine the string with other - // strings without needing to worry about the shift states. + // It's good to check that the string ended in the initial state. + // However, in practice this is redundant: + // + // - No one will use this code with character sets that have + // locking shift states. + // + // - We already checked that mbrtowc() didn't return (size_t)-2 + // which would indicate a partial multibyte character. if (!mbsinit(&state)) return (size_t)-1; return width; #endif } diff --git a/src/common/tuklib_mbstr_wrap.c b/src/common/tuklib_mbstr_wrap.c new file mode 100644 index 000000000000..8d906e004d75 --- /dev/null +++ b/src/common/tuklib_mbstr_wrap.c @@ -0,0 +1,294 @@ +// SPDX-License-Identifier: 0BSD + +/////////////////////////////////////////////////////////////////////////////// +// +/// \file tuklib_mbstr_wrap.c +/// \brief Word wraps a string and prints it to a FILE stream +/// +/// This depends on tuklib_mbstr_width.c. +// +// Author: Lasse Collin +// +/////////////////////////////////////////////////////////////////////////////// + +#include "tuklib_mbstr.h" +#include "tuklib_mbstr_wrap.h" +#include +#include +#include +#include + + +extern int +tuklib_wraps(FILE *outfile, const struct tuklib_wrap_opt *opt, const char *str) +{ + // left_cont may be less than left_margin. In that case, if the first + // word is extremely long, it will stay on the first line even if + // the line then gets overlong. + // + // On the other hand, left2_cont < left2_margin isn't allowed because + // it could result in inconsistent behavior when a very long word + // comes right after a \v. + // + // It is fine to have left2_margin < left_margin although it would be + // an odd use case. + if (!(opt->left_margin < opt->right_margin + && opt->left_cont < opt->right_margin + && opt->left2_margin <= opt->left2_cont + && opt->left2_cont < opt->right_margin)) + return TUKLIB_WRAP_ERR_OPT; + + // This is set to TUKLIB_WRAP_WARN_OVERLONG if one or more + // output lines extend past opt->right_margin columns. + int warn_overlong = 0; + + // Indentation of the first output line after \n or \r. + // \v sets this to opt->left2_margin. + // \r resets this back to the original value. + size_t first_indent = opt->left_margin; + + // Indentation of the output lines that occur due to word wrapping. + // \v sets this to opt->left2_cont and \r back to the original value. + size_t cont_indent = opt->left_cont; + + // If word wrapping occurs, the newline isn't printed unless more + // text would be put on the continuation line. This is also used + // when \v needs to start on a new line. + bool pending_newline = false; + + // Spaces are printed only when there is something else to put + // after the spaces on the line. This avoids unwanted empty lines + // in the output and makes it possible to ignore possible spaces + // before a \v character. + size_t pending_spaces = first_indent; + + // Current output column. When cur_col == pending_spaces, nothing + // has been actually printed to the current output line. + size_t cur_col = pending_spaces; + + while (true) { + // Number of bytes until the *next* line-break opportunity. + size_t len = 0; + + // Number of columns until the *next* line-break opportunity. + size_t width = 0; + + // Text between a pair of \b characters is treated as + // an unbreakable block even if it contains spaces. + // It must not contain any control characters before + // the closing \b. + bool unbreakable = false; + + while (true) { + // Find the next character that we handle specially. + // In an unbreakable block, search only for the + // closing \b; if missing, the unbreakable block + // extends to the end of the string. + const size_t n = strcspn(str + len, + unbreakable ? "\b" : " \t\n\r\v\b"); + + // Calculate how many columns the characters need. + const size_t w = tuklib_mbstr_width_mem(str + len, n); + if (w == (size_t)-1) + return TUKLIB_WRAP_ERR_STR; + + width += w; + len += n; + + // \b isn't a line-break opportunity so it has to + // be handled here. For simplicity, empty blocks + // are treated as zero-width characters. + if (str[len] == '\b') { + ++len; + unbreakable = !unbreakable; + continue; + } + + break; + } + + // Determine if adding this chunk of text would make the + // current output line exceed opt->right_margin columns. + const bool too_long = cur_col + width > opt->right_margin; + + // Wrap the line if needed. However: + // + // - Don't wrap if the current column is less than where + // the continuation line would begin. In that case + // the chunk wouldn't fit on the next line either so + // we just have to produce an overlong line. + // + // - Don't wrap if so far the line only contains spaces. + // Wrapping in that case would leave a weird empty line. + // NOTE: This "only contains spaces" condition is the + // reason why left2_margin > left2_cont isn't allowed. + if (too_long && cur_col > cont_indent + && cur_col > pending_spaces) { + // There might be trailing spaces or zero-width spaces + // which need to be ignored to keep the output pretty. + // + // Spaces need to be ignored because in some + // writing styles there are two spaces after + // a full stop. Example string: + // + // "Foo bar. Abc def." + // ^ + // If the first space after the first full stop + // triggers word wrapping, both spaces must be + // ignored. Otherwise the next line would be + // indented too much. + // + // Zero-width spaces are ignored the same way + // because they are meaningless if an adjacent + // character is a space. + while (*str == ' ' || *str == '\t') + ++str; + + // Don't print the newline here; only mark it as + // pending. This avoids an unwanted empty line if + // there is a \n or \r or \0 after the spaces have + // been ignored. + pending_newline = true; + pending_spaces = cont_indent; + cur_col = pending_spaces; + + // Since str may have been incremented due to the + // ignored spaces, the loop needs to be restarted. + continue; + } + + // Print the current chunk of text before the next + // line-break opportunity. If the chunk was empty, + // don't print anything so that the pending newline + // and pending spaces aren't printed on their own. + if (len > 0) { + if (pending_newline) { + pending_newline = false; + if (putc('\n', outfile) == EOF) + return TUKLIB_WRAP_ERR_IO; + } + + while (pending_spaces > 0) { + if (putc(' ', outfile) == EOF) + return TUKLIB_WRAP_ERR_IO; + + --pending_spaces; + } + + for (size_t i = 0; i < len; ++i) { + // Ignore unbreakable block characters (\b). + const int c = (unsigned char)str[i]; + if (c != '\b' && putc(c, outfile) == EOF) + return TUKLIB_WRAP_ERR_IO; + } + + str += len; + cur_col += width; + + // Remember if the line got overlong. If no other + // errors occur, we return warn_overlong. It might + // help in catching problematic strings. + if (too_long) + warn_overlong = TUKLIB_WRAP_WARN_OVERLONG; + } + + // Handle the special character after the chunk of text. + switch (*str) { + case ' ': + // Regular space. + ++cur_col; + ++pending_spaces; + break; + + case '\v': + // Set the alternative indentation settings. + first_indent = opt->left2_margin; + cont_indent = opt->left2_cont; + + if (first_indent > cur_col) { + // Add one or more spaces to reach + // the column specified in first_indent. + pending_spaces += first_indent - cur_col; + } else { + // There is no room to add even one space + // before reaching the column first_indent. + pending_newline = true; + pending_spaces = first_indent; + } + + cur_col = first_indent; + break; + + case '\0': // Implicit newline at the end of the string. + case '\r': // Newline that also resets the effect of \v. + case '\n': // Newline without resetting the indentation mode. + if (putc('\n', outfile) == EOF) + return TUKLIB_WRAP_ERR_IO; + + if (*str == '\0') + return warn_overlong; + + if (*str == '\r') { + first_indent = opt->left_margin; + cont_indent = opt->left_cont; + } + + pending_newline = false; + pending_spaces = first_indent; + cur_col = first_indent; + break; + } + + // Skip the specially-handled character. + ++str; + } +} + + +extern int +tuklib_wrapf(FILE *stream, const struct tuklib_wrap_opt *opt, + const char *fmt, ...) +{ + va_list ap; + char *buf; + +#ifdef HAVE_VASPRINTF + va_start(ap, fmt); + +#ifdef __clang__ +# pragma GCC diagnostic push +# pragma GCC diagnostic ignored "-Wformat-nonliteral" +#endif + const int n = vasprintf(&buf, fmt, ap); +#ifdef __clang__ +# pragma GCC diagnostic pop +#endif + + va_end(ap); + if (n == -1) + return TUKLIB_WRAP_ERR_FORMAT; +#else + // Fixed buffer size is dumb but in practice one shouldn't need + // huge strings for *formatted* output. This simple method is safe + // with pre-C99 vsnprintf() implementations too which don't return + // the required buffer size (they return -1 or buf_size - 1) or + // which might not null-terminate the buffer in case it's too small. + const size_t buf_size = 128 * 1024; + buf = malloc(buf_size); + if (buf == NULL) + return TUKLIB_WRAP_ERR_FORMAT; + + va_start(ap, fmt); + const int n = vsnprintf(buf, buf_size, fmt, ap); + va_end(ap); + + if (n <= 0 || n >= (int)(buf_size - 1)) { + free(buf); + return TUKLIB_WRAP_ERR_FORMAT; + } +#endif + + const int ret = tuklib_wraps(stream, opt, buf); + free(buf); + return ret; +} diff --git a/src/common/tuklib_mbstr_wrap.h b/src/common/tuklib_mbstr_wrap.h new file mode 100644 index 000000000000..4e2f297dabb4 --- /dev/null +++ b/src/common/tuklib_mbstr_wrap.h @@ -0,0 +1,204 @@ +// SPDX-License-Identifier: 0BSD + +/////////////////////////////////////////////////////////////////////////////// +// +/// \file tuklib_mbstr_wrap.h +/// \brief Word wrapping for multibyte strings +/// +/// The word wrapping functions are intended to be usable, for example, +/// for printing --help text in command line tools. While manually-wrapped +/// --help text allows precise formatting, such freedom requires translators +/// to count spaces and determine where line breaks should occur. It's +/// tedious and error prone, and experience has shown that only some +/// translators do it well. Automatic word wrapping is less flexible but +/// results in polished-enough look with less effort from everyone. +/// Right-to-left languages and languages that don't use spaces between +/// words will still need extra effort though. +// +// Author: Lasse Collin +// +/////////////////////////////////////////////////////////////////////////////// + +#ifndef TUKLIB_MBSTR_WRAP_H +#define TUKLIB_MBSTR_WRAP_H + +#include "tuklib_common.h" +#include + +TUKLIB_DECLS_BEGIN + +/// One or more output lines exceeded right_margin. +/// This only a warning; everything was still printed successfully. +#define TUKLIB_WRAP_WARN_OVERLONG 0x01 + +/// Error writing to to the output FILE. The error flag in the FILE +/// should have been set as well. +#define TUKLIB_WRAP_ERR_IO 0x02 + +/// Invalid options in struct tuklib_wrap_opt. +/// Nothing was printed. +#define TUKLIB_WRAP_ERR_OPT 0x04 + +/// Invalid or unsupported multibyte character in the input string: +/// either mbrtowc() failed or wcwidth() returned a negative value. +#define TUKLIB_WRAP_ERR_STR 0x08 + +/// Only tuklib_wrapf(): Error in converting the format string. +/// It's either a memory allocation failure or something bad with the +/// format string or arguments. +#define TUKLIB_WRAP_ERR_FORMAT 0x10 + +/// Options for tuklib_wraps() and tuklib_wrapf() +struct tuklib_wrap_opt { + /// Indentation of the first output line after `\n` or `\r`. + /// This can be anything less than right_margin. + unsigned short left_margin; + + /// Column where word-wrapped continuation lines start. + /// This can be anything less than right_margin. + unsigned short left_cont; + + /// Column where the text after `\v` will start, either on the current + /// line (when there is room to add at least one space) or on a new + /// empty line. + unsigned short left2_margin; + + /// Like left_cont but for text after a `\v`. However, this must + /// be greater than or equal to left2_margin in addition to being + /// less than right_margin. + unsigned short left2_cont; + + /// For 80-column terminals, it is recommended to use 79 here for + /// maximum portability. 80 will work most of the time but it will + /// result in unwanted empty lines in the rare case where a terminal + /// moves the cursor to the beginning of the next line immediately + /// when the last column has been used. + unsigned short right_margin; +}; + +#define tuklib_wraps TUKLIB_SYMBOL(tuklib_wraps) +extern int tuklib_wraps(FILE *stream, const struct tuklib_wrap_opt *opt, + const char *str); +///< +/// \brief Word wrap a multibyte string and write it to a FILE +/// +/// Word wrapping is done only at spaces and at the special control characters +/// described below. Multiple consecutive spaces are handled properly: strings +/// that have two (or more) spaces after a full sentence will look good even +/// when the spaces occur at a word wrapping boundary. Trailing spaces are +/// ignored at the end of a line or at the end of a string. +/// +/// The following control characters have been repurposed: +/// +/// - `\t` = Zero-width space allows a line break without producing any +/// output by itself. This can be useful after hard hyphens as +/// hyphens aren't otherwise used for line breaking. This can also +/// be useful in languages that don't use spaces between words. +/// (The Unicode character U+200B isn't supported.) +/// - `\b` = Text between a pair of `\b` characters is treated as an +/// unbreakable block (not wrapped even if there are spaces). +/// For example, a non-breaking space can be done like +/// in `"123\b \bMiB"`. Control characters (like `\n` or `\t`) +/// aren't allowed before the closing `\b`. If closing `\b` is +/// missing, the block extends to the end of the string. Empty +/// blocks are treated as zero-width characters. If line breaks +/// are possible around an empty block (like in `"foo \b\b bar"` +/// or `"foo \b"`), it can result in weird output. +/// - `\v` = Change to alternative indentation (left2_margin). +/// - `\r` = Reset back to the initial indentation and add a newline. +/// The next line will be indented by left_margin. +/// - `\n` = Add a newline without resetting the effect of `\v`. The +/// next line will be indented by left_margin or left2_margin +/// (not left_cont or left2_cont). +/// +/// Only `\n` should appear in translatable strings. `\t` works too but +/// even that might confuse some translators even if there is a TRANSLATORS +/// comment explaining its meaning. +/// +/// To use the other control characters in messages, one should use +/// tuklib_wrapf() with appropriate printf format string to combine +/// translatable strings with non-translatable portions. For example: +/// +/// \code{.c} +/// static const struct tuklib_wrap_opt wrap2 = { 2, 2, 22, 22, 79 }; +/// int e = 0; +/// ... +/// e |= tuklib_wrapf(stdout, &wrap2, +/// "-h, --help\v%s\r" +/// " --version\v%s", +/// W_("display this help and exit"), +/// W_("display version information and exit")); +/// ... +/// if (e != 0) { +/// // Handle warning or error. +/// ... +/// } +/// \endcode +/// +/// Control characters other than `\n` and `\t` are unusable in +/// translatable strings: +/// +/// - Gettext tools show annoying warnings if C escape sequences other +/// than `\n` or `\t` are seen. (Otherwise they still work perfectly +/// fine though.) +/// +/// - While at least Poedit and Lokalize support all escapes, some +/// editors only support `\n` and `\t`. +/// +/// - They could confuse some translators, resulting in broken +/// translations. +/// +/// Using non-control characters would solve some issues but it wouldn't +/// help with the unfortunate real-world issue that some translators would +/// likely have trouble understanding a new syntax. The Gettext manual +/// specifically warns about this, see the subheading "No unusual markup" +/// in `info (gettext)Preparing Strings`. (While using `\t` for zero-width +/// space is such custom markup, most translators will never need it.) +/// +/// Translators can use the Unicode character U+00A0 (or U+202F) if they +/// need a non-breaking space. For example, in French a non-breaking space +/// may be needed before colons and question marks (U+00A0 is common in +/// real-world French PO files). +/// +/// Using a non-ASCII char in a string in the C code (like `"123\u00A0MiB"`) +/// can work if one tells xgettext that input encoding is UTF-8, one +/// ensures that the C compiler uses UTF-8 as the input charset, and one +/// is certain that the program is *always* run under an UTF-8 locale. +/// Unfortunately a portable program cannot make this kind of assumptions, +/// which means that there is no pretty way to have a non-breaking space in +/// a translatable string. +/// +/// Optional: To tell translators which strings are automatically word +/// wrapped, see the macro `W_` in tuklib_gettext.h. +/// +/// \param stream Output FILE stream. For decent performance, it +/// should be in buffered mode because this function +/// writes the output one byte at a time with fputc(). +/// \param opt Word wrapping options. +/// \param str Null-terminated multibyte string that is in +/// the encoding used by the current locale. +/// +/// \return Returns 0 on success. If an error or warning occurs, one of +/// TUKLIB_WRAP_* codes is returned. Those codes are powers +/// of two. When warning/error detection can be delayed, the +/// return values can be accumulated from multiple calls using +/// bitwise-or into a single variable which can be checked after +/// all strings have (hopefully) been printed. + +#define tuklib_wrapf TUKLIB_SYMBOL(tuklib_wrapf) +tuklib_attr_format_printf(3, 4) +extern int tuklib_wrapf(FILE *stream, const struct tuklib_wrap_opt *opt, + const char *fmt, ...); +///< +/// \brief Format and word-wrap a multibyte string and write it to a FILE +/// +/// This is like tuklib_wraps() except that this takes a printf +/// format string. +/// +/// \note On platforms that lack vasprintf(), the intermediate +/// result from vsnprintf() must fit into a 128 KiB buffer. +/// TUKLIB_WRAP_ERR_FORMAT is returned if it doesn't but +/// only on platforms that lack vasprintf(). + +TUKLIB_DECLS_END +#endif diff --git a/src/common/tuklib_physmem.c b/src/common/tuklib_physmem.c index 1009df14d9d1..5988ba77a284 100644 --- a/src/common/tuklib_physmem.c +++ b/src/common/tuklib_physmem.c @@ -1,231 +1,224 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file tuklib_physmem.c /// \brief Get the amount of physical memory // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "tuklib_physmem.h" // We want to use Windows-specific code on Cygwin, which also has memory // information available via sysconf(), but on Cygwin 1.5 and older it // gives wrong results (from our point of view). #if defined(_WIN32) || defined(__CYGWIN__) # ifndef _WIN32_WINNT # define _WIN32_WINNT 0x0500 # endif # include #elif defined(__OS2__) # define INCL_DOSMISC # include #elif defined(__DJGPP__) # include #elif defined(__VMS) # include # include # include #elif defined(AMIGA) || defined(__AROS__) # define __USE_INLINE__ # include #elif defined(__QNX__) # include # include #elif defined(TUKLIB_PHYSMEM_AIX) # include #elif defined(TUKLIB_PHYSMEM_SYSCONF) # include #elif defined(TUKLIB_PHYSMEM_SYSCTL) # ifdef HAVE_SYS_PARAM_H # include # endif # include // Tru64 #elif defined(TUKLIB_PHYSMEM_GETSYSINFO) # include # include // HP-UX #elif defined(TUKLIB_PHYSMEM_PSTAT_GETSTATIC) # include # include // IRIX #elif defined(TUKLIB_PHYSMEM_GETINVENT_R) # include // This sysinfo() is Linux-specific. #elif defined(TUKLIB_PHYSMEM_SYSINFO) # include #endif extern uint64_t tuklib_physmem(void) { uint64_t ret = 0; #if defined(_WIN32) || defined(__CYGWIN__) // This requires Windows 2000 or later. MEMORYSTATUSEX meminfo; meminfo.dwLength = sizeof(meminfo); if (GlobalMemoryStatusEx(&meminfo)) ret = meminfo.ullTotalPhys; /* // Old version that is compatible with even Win95: if ((GetVersion() & 0xFF) >= 5) { // Windows 2000 and later have GlobalMemoryStatusEx() which // supports reporting values greater than 4 GiB. To keep the // code working also on older Windows versions, use // GlobalMemoryStatusEx() conditionally. - HMODULE kernel32 = GetModuleHandle(TEXT("kernel32.dll")); + HMODULE kernel32 = GetModuleHandleA("kernel32.dll"); if (kernel32 != NULL) { typedef BOOL (WINAPI *gmse_type)(LPMEMORYSTATUSEX); -#ifdef CAN_DISABLE_WCAST_FUNCTION_TYPE -# pragma GCC diagnostic push -# pragma GCC diagnostic ignored "-Wcast-function-type" -#endif gmse_type gmse = (gmse_type)GetProcAddress( kernel32, "GlobalMemoryStatusEx"); -#ifdef CAN_DISABLE_WCAST_FUNCTION_TYPE -# pragma GCC diagnostic pop -#endif if (gmse != NULL) { MEMORYSTATUSEX meminfo; meminfo.dwLength = sizeof(meminfo); if (gmse(&meminfo)) ret = meminfo.ullTotalPhys; } } } if (ret == 0) { // GlobalMemoryStatus() is supported by Windows 95 and later, // so it is fine to link against it unconditionally. Note that // GlobalMemoryStatus() has no return value. MEMORYSTATUS meminfo; meminfo.dwLength = sizeof(meminfo); GlobalMemoryStatus(&meminfo); ret = meminfo.dwTotalPhys; } */ #elif defined(__OS2__) unsigned long mem; if (DosQuerySysInfo(QSV_TOTPHYSMEM, QSV_TOTPHYSMEM, &mem, sizeof(mem)) == 0) ret = mem; #elif defined(__DJGPP__) __dpmi_free_mem_info meminfo; if (__dpmi_get_free_memory_information(&meminfo) == 0 && meminfo.total_number_of_physical_pages != (unsigned long)-1) ret = (uint64_t)meminfo.total_number_of_physical_pages * 4096; #elif defined(__VMS) int vms_mem; int val = SYI$_MEMSIZE; if (LIB$GETSYI(&val, &vms_mem, 0, 0, 0, 0) == SS$_NORMAL) ret = (uint64_t)vms_mem * 8192; #elif defined(AMIGA) || defined(__AROS__) ret = AvailMem(MEMF_TOTAL); #elif defined(__QNX__) const struct asinfo_entry *entries = SYSPAGE_ENTRY(asinfo); size_t count = SYSPAGE_ENTRY_SIZE(asinfo) / sizeof(struct asinfo_entry); const char *strings = SYSPAGE_ENTRY(strings)->data; for (size_t i = 0; i < count; ++i) if (strcmp(strings + entries[i].name, "ram") == 0) ret += entries[i].end - entries[i].start + 1; #elif defined(TUKLIB_PHYSMEM_AIX) - ret = _system_configuration.physmem; + ret = (uint64_t)_system_configuration.physmem; #elif defined(TUKLIB_PHYSMEM_SYSCONF) const long pagesize = sysconf(_SC_PAGESIZE); const long pages = sysconf(_SC_PHYS_PAGES); if (pagesize != -1 && pages != -1) // According to docs, pagesize * pages can overflow. // Simple case is 32-bit box with 4 GiB or more RAM, // which may report exactly 4 GiB of RAM, and "long" // being 32-bit will overflow. Casting to uint64_t // hopefully avoids overflows in the near future. ret = (uint64_t)pagesize * (uint64_t)pages; #elif defined(TUKLIB_PHYSMEM_SYSCTL) int name[2] = { CTL_HW, #ifdef HW_PHYSMEM64 HW_PHYSMEM64 #else HW_PHYSMEM #endif }; union { uint32_t u32; uint64_t u64; } mem; size_t mem_ptr_size = sizeof(mem.u64); if (sysctl(name, 2, &mem.u64, &mem_ptr_size, NULL, 0) != -1) { // IIRC, 64-bit "return value" is possible on some 64-bit // BSD systems even with HW_PHYSMEM (instead of HW_PHYSMEM64), // so support both. if (mem_ptr_size == sizeof(mem.u64)) ret = mem.u64; else if (mem_ptr_size == sizeof(mem.u32)) ret = mem.u32; } #elif defined(TUKLIB_PHYSMEM_GETSYSINFO) // Docs are unclear if "start" is needed, but it doesn't hurt // much to have it. int memkb; int start = 0; if (getsysinfo(GSI_PHYSMEM, (caddr_t)&memkb, sizeof(memkb), &start) != -1) ret = (uint64_t)memkb * 1024; #elif defined(TUKLIB_PHYSMEM_PSTAT_GETSTATIC) struct pst_static pst; if (pstat_getstatic(&pst, sizeof(pst), 1, 0) != -1) ret = (uint64_t)pst.physical_memory * (uint64_t)pst.page_size; #elif defined(TUKLIB_PHYSMEM_GETINVENT_R) inv_state_t *st = NULL; if (setinvent_r(&st) != -1) { inventory_t *i; while ((i = getinvent_r(st)) != NULL) { if (i->inv_class == INV_MEMORY && i->inv_type == INV_MAIN_MB) { ret = (uint64_t)i->inv_state << 20; break; } } endinvent_r(st); } #elif defined(TUKLIB_PHYSMEM_SYSINFO) struct sysinfo si; if (sysinfo(&si) == 0) ret = (uint64_t)si.totalram * si.mem_unit; #endif return ret; } diff --git a/src/common/w32_application.manifest.comments.txt b/src/common/w32_application.manifest.comments.txt index ad0835ccb0b1..de5c2105acf9 100644 --- a/src/common/w32_application.manifest.comments.txt +++ b/src/common/w32_application.manifest.comments.txt @@ -1,178 +1,192 @@ Windows application manifest for UTF-8 and long paths ===================================================== The .manifest file is embedded as is in the executables, thus the comments are here in a separate file. These comments were written in context of XZ Utils but might be useful when porting other command line tools from POSIX environments to Windows. NOTE: On Cygwin and MSYS2, command line arguments and file system access aren't tied to a Windows code page. Cygwin and MSYS2 include a default application manifest. Replacing it doesn't seem useful and might even be harmful if Cygwin and MSYS2 some day change their default manifest. UTF-8 code page --------------- On Windows, command line applications can use main() or wmain(). With the Windows-specific wmain(), argv contains UTF-16 code units which is the native encoding on Windows. With main(), argv uses the system active code page by default. It typically is a legacy code page like Windows-1252. NOTE: On POSIX, argv for main() is constructed by the calling process. On Windows, argv is constructed by a new process itself: a program receives the command line as a single string, and the startup code splits it into individual arguments, including quote removal and wildcard expansion. Then main() or wmain() is called. This application manifest forces the process code page to UTF-8 when the application runs on Windows 10 version 1903 or later. This is useful for programs that use main(): * UTF-8 allows such programs to access files whose names contain characters that don't exist in the current legacy code page. However, filenames on Windows may contain unpaired surrogates (invalid UTF-16). Such files cannot be accesses even with the UTF-8 code page. * UTF-8 avoids a security issue in command line argument handling: If a command line contains Unicode characters (for example, filenames) that don't exist in the current legacy code page, the characters are converted to similar-looking characters with best-fit mapping. Some best-fit mappings result in ASCII characters that change the meaning of the command line, which can be exploited with malicious filenames. For example: - Double quote (") breaks quoting and makes argument injection possible. - Question mark (?) is a wildcard character which may expand to one or more filenames. - Forward slash (/) makes a directory traversal attack possible. This character can appear in a dangerous way even from a wildcard expansion; a look-alike character doesn't need to be passed directly on the command line. UTF-8 avoids best-fit mappings. However, it's still not perfect. Unpaired surrogates (invalid UTF-16) on the command line (including those from wildcard expansion) are converted to the replacement character U+FFFD. Thus, filenames with different unpaired surrogates appear identical when converted to the UTF-8 code page and aren't distinguishable from filenames that contain the actual replacement character U+FFFD. + FindFirstFileA() and FindFirstFileExA() also suffer from the above + issue where unpaired surrogates become U+FFFD. Another issue is + that filenames may require more bytes in UTF-8 than in a legacy + code page. In UTF-8, a very long filename may exceed MAX_PATH bytes + and thus these APIs cannot list such filenames anymore because + WIN32_FIND_DATAA has a member "CHAR cFileName[MAX_PATH]". + If different programs use different code pages, compatibility issues are possible. For example, if one program produces a list of filenames and another program reads it, both programs should use the same code page because the code page affects filenames in the char-based file system APIs. If building with a MinGW-w64 toolchain, it is strongly recommended to use UCRT instead of the old MSVCRT. For example, with the UTF-8 code page, MSVCRT doesn't convert non-ASCII characters correctly when writing to console with printf(). With UCRT it works. Long path names --------------- -The manifest enables support for path names longer than 259 -characters if the feature has been enabled in the Windows registry. -Omit the longPathAware element from the manifest if the application -isn't compatible with it. For example, uses of MAX_PATH might be -a sign of incompatibility. +The manifest enables support for path names longer than 260 wide +characters (UTF-16 code units) if the feature has been enabled in +the Windows registry. Omit the longPathAware element from the manifest +if the application isn't compatible with it. For example, some uses +of MAX_PATH might be a sign of incompatibility. + +Note that UTF-8 encoded filenames can exceed MAX_PATH (260) bytes when +the UTF-16 form is still within MAX_PATH wide characters. In this +situation the application doesn't need to be long path aware: functions +like _open() work with UTF-8 names that exceed MAX_PATH bytes if the +wide character form stays within MAX_PATH wide characters. (MAX_PATH +includes the terminating null character.) Documentation of the registry setting: https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry#enable-long-paths-in-windows-10-version-1607-and-later Summary of the manifest contents -------------------------------- See also Microsoft's documentation: https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests assemblyIdentity (omitted) This is documented as mandatory but not all apps in the real world have it, and of those that do, not all put an up-to-date version number there. Things seem to work correctly without so let's keep this simpler and omit it. compatibility Declare the application compatible with different Windows versions. Without this, Windows versions newer than Vista will run the application using Vista as the Operating System Context. trustInfo Declare the application as UAC-compliant. This avoids file system and registry virtualization that Windows otherwise does with 32-bit executables to make some ancient applications work. UAC-compliancy also stops Windows from using heuristics based on the filename (like setup.exe) to guess when elevated privileges might be needed which would then bring up the UAC prompt. longPathAware Declare the application as long path aware. This way many file - system operations aren't limited by MAX_PATH (260 characters - including the terminating null character) if the feature has - also been enabled in the Windows registry. + system operations aren't limited to MAX_PATH (260) wide characters + (including the terminating null character). The feature has to be + enabled in the Windows registry too. activeCodePage Force the process code page to UTF-8 on Windows 10 version 1903 and later. For example: - main() gets the command line arguments in UTF-8 instead of in a legacy code page. - File system APIs that take char-based strings use UTF-8 instead of a legacy code page. - Text written to the console via stdio.h's stdout or stderr (like calling printf()) are expected to be in UTF-8. CMake notes ----------- As of CMake 3.30, one can add a .manifest file as a source file but it only works with MSVC; it's ignored with MinGW-w64 toolchains. Embedding the manifest with a resource file works with all toolchains. However, then the default manifest needs to be disabled with MSVC in CMakeLists.txt to avoid duplicate manifests which would break the build. w32_application.manifest.rc: #include CREATEPROCESS_MANIFEST_RESOURCE_ID RT_MANIFEST "w32_application.manifest" Or the same thing without the #include: 1 24 "w32_application.manifest" CMakeLists.txt: if(MSVC) set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} /MANIFEST:NO") endif() add_executable(foo foo.c) # WIN32 isn't set on Cygwin or MSYS2, thus if(WIN32) is correct here. if(WIN32) target_sources(foo PRIVATE w32_application.manifest.rc) set_source_files_properties(w32_application.manifest.rc PROPERTIES OBJECT_DEPENDS w32_application.manifest ) endif() diff --git a/src/liblzma/api/lzma/bcj.h b/src/liblzma/api/lzma/bcj.h index 7f6611feb325..fb737cbba49c 100644 --- a/src/liblzma/api/lzma/bcj.h +++ b/src/liblzma/api/lzma/bcj.h @@ -1,98 +1,195 @@ /* SPDX-License-Identifier: 0BSD */ /** * \file lzma/bcj.h * \brief Branch/Call/Jump conversion filters * \note Never include this file directly. Use instead. */ /* * Author: Lasse Collin */ #ifndef LZMA_H_INTERNAL # error Never include this file directly. Use instead. #endif /* Filter IDs for lzma_filter.id */ /** * \brief Filter for x86 binaries */ #define LZMA_FILTER_X86 LZMA_VLI_C(0x04) /** * \brief Filter for Big endian PowerPC binaries */ #define LZMA_FILTER_POWERPC LZMA_VLI_C(0x05) /** * \brief Filter for IA-64 (Itanium) binaries */ #define LZMA_FILTER_IA64 LZMA_VLI_C(0x06) /** * \brief Filter for ARM binaries */ #define LZMA_FILTER_ARM LZMA_VLI_C(0x07) /** * \brief Filter for ARM-Thumb binaries */ #define LZMA_FILTER_ARMTHUMB LZMA_VLI_C(0x08) /** * \brief Filter for SPARC binaries */ #define LZMA_FILTER_SPARC LZMA_VLI_C(0x09) /** * \brief Filter for ARM64 binaries */ #define LZMA_FILTER_ARM64 LZMA_VLI_C(0x0A) /** * \brief Filter for RISC-V binaries */ #define LZMA_FILTER_RISCV LZMA_VLI_C(0x0B) /** * \brief Options for BCJ filters * * The BCJ filters never change the size of the data. Specifying options * for them is optional: if pointer to options is NULL, default value is * used. You probably never need to specify options to BCJ filters, so just * set the options pointer to NULL and be happy. * * If options with non-default values have been specified when encoding, * the same options must also be specified when decoding. * * \note At the moment, none of the BCJ filters support * LZMA_SYNC_FLUSH. If LZMA_SYNC_FLUSH is specified, * LZMA_OPTIONS_ERROR will be returned. If there is need, * partial support for LZMA_SYNC_FLUSH can be added in future. * Partial means that flushing would be possible only at * offsets that are multiple of 2, 4, or 16 depending on * the filter, except x86 which cannot be made to support * LZMA_SYNC_FLUSH predictably. */ typedef struct { /** * \brief Start offset for conversions * * This setting is useful only when the same filter is used * _separately_ for multiple sections of the same executable file, * and the sections contain cross-section branch/call/jump * instructions. In that case it is beneficial to set the start * offset of the non-first sections so that the relative addresses * of the cross-section branch/call/jump instructions will use the * same absolute addresses as in the first section. * * When the pointer to options is NULL, the default value (zero) * is used. */ uint32_t start_offset; } lzma_options_bcj; + + +/** + * \brief Raw ARM64 BCJ encoder + * + * This is for special use cases only. + * + * \param start_offset The lowest 32 bits of the offset in the + * executable being filtered. For the ARM64 + * filter, this must be a multiple of four. + * For the very best results, this should also + * be in sync with 4096-byte page boundaries + * in the executable due to how ARM64's ADRP + * instruction works. + * \param buf Buffer to be filtered in place + * \param size Size of the buffer + * + * \return Number of bytes that were processed in `buf`. This is at most + * `size`. With the ARM64 filter, the return value is always + * a multiple of 4, and at most 3 bytes are left unfiltered. + * + * \since 5.7.1alpha + */ +extern LZMA_API(size_t) lzma_bcj_arm64_encode( + uint32_t start_offset, uint8_t *buf, size_t size) lzma_nothrow; + +/** + * \brief Raw ARM64 BCJ decoder + * + * See lzma_bcj_arm64_encode(). + * + * \since 5.7.1alpha + */ +extern LZMA_API(size_t) lzma_bcj_arm64_decode( + uint32_t start_offset, uint8_t *buf, size_t size) lzma_nothrow; + + +/** + * \brief Raw RISC-V BCJ encoder + * + * This is for special use cases only. + * + * \param start_offset The lowest 32 bits of the offset in the + * executable being filtered. For the RISC-V + * filter, this must be a multiple of 2. + * \param buf Buffer to be filtered in place + * \param size Size of the buffer + * + * \return Number of bytes that were processed in `buf`. This is at most + * `size`. With the RISC-V filter, the return value is always + * a multiple of 2, and at most 7 bytes are left unfiltered. + * + * \since 5.7.1alpha + */ +extern LZMA_API(size_t) lzma_bcj_riscv_encode( + uint32_t start_offset, uint8_t *buf, size_t size) lzma_nothrow; + +/** + * \brief Raw RISC-V BCJ decoder + * + * See lzma_bcj_riscv_encode(). + * + * \since 5.7.1alpha + */ +extern LZMA_API(size_t) lzma_bcj_riscv_decode( + uint32_t start_offset, uint8_t *buf, size_t size) lzma_nothrow; + + +/** + * \brief Raw x86 BCJ encoder + * + * This is for special use cases only. + * + * \param start_offset The lowest 32 bits of the offset in the + * executable being filtered. For the x86 + * filter, all values are valid. + * \param buf Buffer to be filtered in place + * \param size Size of the buffer + * + * \return Number of bytes that were processed in `buf`. This is at most + * `size`. For the x86 filter, the return value is always + * a multiple of 1, and at most 4 bytes are left unfiltered. + * + * \since 5.7.1alpha + */ +extern LZMA_API(size_t) lzma_bcj_x86_encode( + uint32_t start_offset, uint8_t *buf, size_t size) lzma_nothrow; + +/** + * \brief Raw x86 BCJ decoder + * + * See lzma_bcj_x86_encode(). + * + * \since 5.7.1alpha + */ +extern LZMA_API(size_t) lzma_bcj_x86_decode( + uint32_t start_offset, uint8_t *buf, size_t size) lzma_nothrow; diff --git a/src/liblzma/api/lzma/container.h b/src/liblzma/api/lzma/container.h index ee5d77e4f1af..dbd414cbf8c0 100644 --- a/src/liblzma/api/lzma/container.h +++ b/src/liblzma/api/lzma/container.h @@ -1,995 +1,995 @@ /* SPDX-License-Identifier: 0BSD */ /** * \file lzma/container.h * \brief File formats * \note Never include this file directly. Use instead. */ /* * Author: Lasse Collin */ #ifndef LZMA_H_INTERNAL # error Never include this file directly. Use instead. #endif /************ * Encoding * ************/ /** * \brief Default compression preset * * It's not straightforward to recommend a default preset, because in some * cases keeping the resource usage relatively low is more important that * getting the maximum compression ratio. */ #define LZMA_PRESET_DEFAULT UINT32_C(6) /** * \brief Mask for preset level * * This is useful only if you need to extract the level from the preset * variable. That should be rare. */ #define LZMA_PRESET_LEVEL_MASK UINT32_C(0x1F) /* * Preset flags * * Currently only one flag is defined. */ /** * \brief Extreme compression preset * * This flag modifies the preset to make the encoding significantly slower * while improving the compression ratio only marginally. This is useful * when you don't mind spending time to get as small result as possible. * * This flag doesn't affect the memory usage requirements of the decoder (at * least not significantly). The memory usage of the encoder may be increased * a little but only at the lowest preset levels (0-3). */ #define LZMA_PRESET_EXTREME (UINT32_C(1) << 31) /** * \brief Multithreading options */ typedef struct { /** * \brief Flags * * Set this to zero if no flags are wanted. * * Encoder: No flags are currently supported. * * Decoder: Bitwise-or of zero or more of the decoder flags: * - LZMA_TELL_NO_CHECK * - LZMA_TELL_UNSUPPORTED_CHECK * - LZMA_TELL_ANY_CHECK * - LZMA_IGNORE_CHECK * - LZMA_CONCATENATED * - LZMA_FAIL_FAST */ uint32_t flags; /** * \brief Number of worker threads to use */ uint32_t threads; /** * \brief Encoder only: Maximum uncompressed size of a Block * * The encoder will start a new .xz Block every block_size bytes. * Using LZMA_FULL_FLUSH or LZMA_FULL_BARRIER with lzma_code() * the caller may tell liblzma to start a new Block earlier. * * With LZMA2, a recommended block size is 2-4 times the LZMA2 * dictionary size. With very small dictionaries, it is recommended * to use at least 1 MiB block size for good compression ratio, even * if this is more than four times the dictionary size. Note that * these are only recommendations for typical use cases; feel free * to use other values. Just keep in mind that using a block size * less than the LZMA2 dictionary size is waste of RAM. * * Set this to 0 to let liblzma choose the block size depending * on the compression options. For LZMA2 it will be 3*dict_size * or 1 MiB, whichever is more. * * For each thread, about 3 * block_size bytes of memory will be * allocated. This may change in later liblzma versions. If so, * the memory usage will probably be reduced, not increased. */ uint64_t block_size; /** * \brief Timeout to allow lzma_code() to return early * * Multithreading can make liblzma consume input and produce * output in a very bursty way: it may first read a lot of input * to fill internal buffers, then no input or output occurs for * a while. * * In single-threaded mode, lzma_code() won't return until it has * either consumed all the input or filled the output buffer. If * this is done in multithreaded mode, it may cause a call * lzma_code() to take even tens of seconds, which isn't acceptable * in all applications. * * To avoid very long blocking times in lzma_code(), a timeout * (in milliseconds) may be set here. If lzma_code() would block * longer than this number of milliseconds, it will return with * LZMA_OK. Reasonable values are 100 ms or more. The xz command * line tool uses 300 ms. * * If long blocking times are acceptable, set timeout to a special * value of 0. This will disable the timeout mechanism and will make * lzma_code() block until all the input is consumed or the output * buffer has been filled. * * \note Even with a timeout, lzma_code() might sometimes take * a long time to return. No timing guarantees are made. */ uint32_t timeout; /** * \brief Encoder only: Compression preset * * The preset is set just like with lzma_easy_encoder(). * The preset is ignored if filters below is non-NULL. */ uint32_t preset; /** * \brief Encoder only: Filter chain (alternative to a preset) * * If this is NULL, the preset above is used. Otherwise the preset * is ignored and the filter chain specified here is used. */ const lzma_filter *filters; /** * \brief Encoder only: Integrity check type * * See check.h for available checks. The xz command line tool * defaults to LZMA_CHECK_CRC64, which is a good choice if you * are unsure. */ lzma_check check; /* * Reserved space to allow possible future extensions without * breaking the ABI. You should not touch these, because the names * of these variables may change. These are and will never be used * with the currently supported options, so it is safe to leave these * uninitialized. */ /** \private Reserved member. */ lzma_reserved_enum reserved_enum1; /** \private Reserved member. */ lzma_reserved_enum reserved_enum2; /** \private Reserved member. */ lzma_reserved_enum reserved_enum3; /** \private Reserved member. */ uint32_t reserved_int1; /** \private Reserved member. */ uint32_t reserved_int2; /** \private Reserved member. */ uint32_t reserved_int3; /** \private Reserved member. */ uint32_t reserved_int4; /** * \brief Memory usage limit to reduce the number of threads * * Encoder: Ignored. * * Decoder: * * If the number of threads has been set so high that more than * memlimit_threading bytes of memory would be needed, the number * of threads will be reduced so that the memory usage will not exceed * memlimit_threading bytes. However, if memlimit_threading cannot * be met even in single-threaded mode, then decoding will continue * in single-threaded mode and memlimit_threading may be exceeded * even by a large amount. That is, memlimit_threading will never make * lzma_code() return LZMA_MEMLIMIT_ERROR. To truly cap the memory * usage, see memlimit_stop below. * * Setting memlimit_threading to UINT64_MAX or a similar huge value * means that liblzma is allowed to keep the whole compressed file * and the whole uncompressed file in memory in addition to the memory * needed by the decompressor data structures used by each thread! * In other words, a reasonable value limit must be set here or it * will cause problems sooner or later. If you have no idea what * a reasonable value could be, try lzma_physmem() / 4 as a starting * point. Setting this limit will never prevent decompression of * a file; this will only reduce the number of threads. * * If memlimit_threading is greater than memlimit_stop, then the value * of memlimit_stop will be used for both. */ uint64_t memlimit_threading; /** * \brief Memory usage limit that should never be exceeded * * Encoder: Ignored. * * Decoder: If decompressing will need more than this amount of * memory even in the single-threaded mode, then lzma_code() will * return LZMA_MEMLIMIT_ERROR. */ uint64_t memlimit_stop; /** \private Reserved member. */ uint64_t reserved_int7; /** \private Reserved member. */ uint64_t reserved_int8; /** \private Reserved member. */ void *reserved_ptr1; /** \private Reserved member. */ void *reserved_ptr2; /** \private Reserved member. */ void *reserved_ptr3; /** \private Reserved member. */ void *reserved_ptr4; } lzma_mt; /** * \brief Calculate approximate memory usage of easy encoder * * This function is a wrapper for lzma_raw_encoder_memusage(). * * \param preset Compression preset (level and possible flags) * * \return Number of bytes of memory required for the given * preset when encoding or UINT64_MAX on error. */ extern LZMA_API(uint64_t) lzma_easy_encoder_memusage(uint32_t preset) lzma_nothrow lzma_attr_pure; /** * \brief Calculate approximate decoder memory usage of a preset * * This function is a wrapper for lzma_raw_decoder_memusage(). * * \param preset Compression preset (level and possible flags) * * \return Number of bytes of memory required to decompress a file * that was compressed using the given preset or UINT64_MAX * on error. */ extern LZMA_API(uint64_t) lzma_easy_decoder_memusage(uint32_t preset) lzma_nothrow lzma_attr_pure; /** * \brief Initialize .xz Stream encoder using a preset number * * This function is intended for those who just want to use the basic features * of liblzma (that is, most developers out there). * * If initialization fails (return value is not LZMA_OK), all the memory * allocated for *strm by liblzma is always freed. Thus, there is no need * to call lzma_end() after failed initialization. * * If initialization succeeds, use lzma_code() to do the actual encoding. * Valid values for 'action' (the second argument of lzma_code()) are * LZMA_RUN, LZMA_SYNC_FLUSH, LZMA_FULL_FLUSH, and LZMA_FINISH. In future, * there may be compression levels or flags that don't support LZMA_SYNC_FLUSH. * * \param strm Pointer to lzma_stream that is at least initialized * with LZMA_STREAM_INIT. * \param preset Compression preset to use. A preset consist of level * number and zero or more flags. Usually flags aren't * used, so preset is simply a number [0, 9] which match * the options -0 ... -9 of the xz command line tool. * Additional flags can be set using bitwise-or with * the preset level number, e.g. 6 | LZMA_PRESET_EXTREME. * \param check Integrity check type to use. See check.h for available * checks. The xz command line tool defaults to * LZMA_CHECK_CRC64, which is a good choice if you are * unsure. LZMA_CHECK_CRC32 is good too as long as the * uncompressed file is not many gigabytes. * * \return Possible lzma_ret values: * - LZMA_OK: Initialization succeeded. Use lzma_code() to * encode your data. * - LZMA_MEM_ERROR: Memory allocation failed. * - LZMA_OPTIONS_ERROR: The given compression preset is not * supported by this build of liblzma. * - LZMA_UNSUPPORTED_CHECK: The given check type is not * supported by this liblzma build. * - LZMA_PROG_ERROR: One or more of the parameters have values * that will never be valid. For example, strm == NULL. */ extern LZMA_API(lzma_ret) lzma_easy_encoder( lzma_stream *strm, uint32_t preset, lzma_check check) lzma_nothrow lzma_attr_warn_unused_result; /** * \brief Single-call .xz Stream encoding using a preset number * * The maximum required output buffer size can be calculated with * lzma_stream_buffer_bound(). * * \param preset Compression preset to use. See the description * in lzma_easy_encoder(). * \param check Type of the integrity check to calculate from * uncompressed data. * \param allocator lzma_allocator for custom allocator functions. * Set to NULL to use malloc() and free(). * \param in Beginning of the input buffer * \param in_size Size of the input buffer * \param[out] out Beginning of the output buffer * \param[out] out_pos The next byte will be written to out[*out_pos]. * *out_pos is updated only if encoding succeeds. * \param out_size Size of the out buffer; the first byte into * which no data is written to is out[out_size]. * * \return Possible lzma_ret values: * - LZMA_OK: Encoding was successful. * - LZMA_BUF_ERROR: Not enough output buffer space. * - LZMA_UNSUPPORTED_CHECK * - LZMA_OPTIONS_ERROR * - LZMA_MEM_ERROR * - LZMA_DATA_ERROR * - LZMA_PROG_ERROR */ extern LZMA_API(lzma_ret) lzma_easy_buffer_encode( uint32_t preset, lzma_check check, const lzma_allocator *allocator, const uint8_t *in, size_t in_size, uint8_t *out, size_t *out_pos, size_t out_size) lzma_nothrow; /** * \brief Initialize .xz Stream encoder using a custom filter chain * * \param strm Pointer to lzma_stream that is at least initialized * with LZMA_STREAM_INIT. * \param filters Array of filters terminated with * .id == LZMA_VLI_UNKNOWN. See filters.h for more * information. * \param check Type of the integrity check to calculate from * uncompressed data. * * \return Possible lzma_ret values: * - LZMA_OK: Initialization was successful. * - LZMA_MEM_ERROR * - LZMA_UNSUPPORTED_CHECK * - LZMA_OPTIONS_ERROR * - LZMA_PROG_ERROR */ extern LZMA_API(lzma_ret) lzma_stream_encoder(lzma_stream *strm, const lzma_filter *filters, lzma_check check) lzma_nothrow lzma_attr_warn_unused_result; /** * \brief Calculate approximate memory usage of multithreaded .xz encoder * * Since doing the encoding in threaded mode doesn't affect the memory * requirements of single-threaded decompressor, you can use * lzma_easy_decoder_memusage(options->preset) or * lzma_raw_decoder_memusage(options->filters) to calculate * the decompressor memory requirements. * * \param options Compression options * * \return Number of bytes of memory required for encoding with the * given options. If an error occurs, for example due to * unsupported preset or filter chain, UINT64_MAX is returned. */ extern LZMA_API(uint64_t) lzma_stream_encoder_mt_memusage( const lzma_mt *options) lzma_nothrow lzma_attr_pure; /** * \brief Initialize multithreaded .xz Stream encoder * * This provides the functionality of lzma_easy_encoder() and * lzma_stream_encoder() as a single function for multithreaded use. * * The supported actions for lzma_code() are LZMA_RUN, LZMA_FULL_FLUSH, * LZMA_FULL_BARRIER, and LZMA_FINISH. Support for LZMA_SYNC_FLUSH might be * added in the future. * * \param strm Pointer to lzma_stream that is at least initialized * with LZMA_STREAM_INIT. * \param options Pointer to multithreaded compression options * * \return Possible lzma_ret values: * - LZMA_OK * - LZMA_MEM_ERROR * - LZMA_UNSUPPORTED_CHECK * - LZMA_OPTIONS_ERROR * - LZMA_PROG_ERROR */ extern LZMA_API(lzma_ret) lzma_stream_encoder_mt( lzma_stream *strm, const lzma_mt *options) lzma_nothrow lzma_attr_warn_unused_result; /** * \brief Calculate recommended Block size for multithreaded .xz encoder * * This calculates a recommended Block size for multithreaded encoding given * a filter chain. This is used internally by lzma_stream_encoder_mt() to * determine the Block size if the block_size member is not set to the * special value of 0 in the lzma_mt options struct. * * If one wishes to change the filters between Blocks, this function is * helpful to set the block_size member of the lzma_mt struct before calling * lzma_stream_encoder_mt(). Since the block_size member represents the * maximum possible Block size for the multithreaded .xz encoder, one can * use this function to find the maximum recommended Block size based on * all planned filter chains. Otherwise, the multithreaded encoder will * base its maximum Block size on the first filter chain used (if the * block_size member is not set), which may unnecessarily limit the Block * size for a later filter chain. * * \param filters Array of filters terminated with * .id == LZMA_VLI_UNKNOWN. * * \return Recommended Block size in bytes, or UINT64_MAX if * an error occurred. */ extern LZMA_API(uint64_t) lzma_mt_block_size(const lzma_filter *filters) lzma_nothrow; /** * \brief Initialize .lzma encoder (legacy file format) * * The .lzma format is sometimes called the LZMA_Alone format, which is the * reason for the name of this function. The .lzma format supports only the * LZMA1 filter. There is no support for integrity checks like CRC32. * * Use this function if and only if you need to create files readable by * legacy LZMA tools such as LZMA Utils 4.32.x. Moving to the .xz format * is strongly recommended. * * The valid action values for lzma_code() are LZMA_RUN and LZMA_FINISH. * No kind of flushing is supported, because the file format doesn't make * it possible. * * \param strm Pointer to lzma_stream that is at least initialized * with LZMA_STREAM_INIT. * \param options Pointer to encoder options * * \return Possible lzma_ret values: * - LZMA_OK * - LZMA_MEM_ERROR * - LZMA_OPTIONS_ERROR * - LZMA_PROG_ERROR */ extern LZMA_API(lzma_ret) lzma_alone_encoder( lzma_stream *strm, const lzma_options_lzma *options) lzma_nothrow lzma_attr_warn_unused_result; /** * \brief Calculate output buffer size for single-call Stream encoder * * When trying to compress incompressible data, the encoded size will be * slightly bigger than the input data. This function calculates how much * output buffer space is required to be sure that lzma_stream_buffer_encode() * doesn't return LZMA_BUF_ERROR. * * The calculated value is not exact, but it is guaranteed to be big enough. * The actual maximum output space required may be slightly smaller (up to * about 100 bytes). This should not be a problem in practice. * * If the calculated maximum size doesn't fit into size_t or would make the * Stream grow past LZMA_VLI_MAX (which should never happen in practice), * zero is returned to indicate the error. * * \note The limit calculated by this function applies only to * single-call encoding. Multi-call encoding may (and probably * will) have larger maximum expansion when encoding * incompressible data. Currently there is no function to * calculate the maximum expansion of multi-call encoding. * * \param uncompressed_size Size in bytes of the uncompressed * input data * * \return Maximum number of bytes needed to store the compressed data. */ extern LZMA_API(size_t) lzma_stream_buffer_bound(size_t uncompressed_size) lzma_nothrow; /** * \brief Single-call .xz Stream encoder * * \param filters Array of filters terminated with * .id == LZMA_VLI_UNKNOWN. See filters.h for more * information. * \param check Type of the integrity check to calculate from * uncompressed data. * \param allocator lzma_allocator for custom allocator functions. * Set to NULL to use malloc() and free(). * \param in Beginning of the input buffer * \param in_size Size of the input buffer * \param[out] out Beginning of the output buffer * \param[out] out_pos The next byte will be written to out[*out_pos]. * *out_pos is updated only if encoding succeeds. * \param out_size Size of the out buffer; the first byte into * which no data is written to is out[out_size]. * * \return Possible lzma_ret values: * - LZMA_OK: Encoding was successful. * - LZMA_BUF_ERROR: Not enough output buffer space. * - LZMA_UNSUPPORTED_CHECK * - LZMA_OPTIONS_ERROR * - LZMA_MEM_ERROR * - LZMA_DATA_ERROR * - LZMA_PROG_ERROR */ extern LZMA_API(lzma_ret) lzma_stream_buffer_encode( lzma_filter *filters, lzma_check check, const lzma_allocator *allocator, const uint8_t *in, size_t in_size, uint8_t *out, size_t *out_pos, size_t out_size) lzma_nothrow lzma_attr_warn_unused_result; /** * \brief MicroLZMA encoder * * The MicroLZMA format is a raw LZMA stream whose first byte (always 0x00) * has been replaced with bitwise-negation of the LZMA properties (lc/lp/pb). * This encoding ensures that the first byte of MicroLZMA stream is never * 0x00. There is no end of payload marker and thus the uncompressed size * must be stored separately. For the best error detection the dictionary * size should be stored separately as well but alternatively one may use * the uncompressed size as the dictionary size when decoding. * * With the MicroLZMA encoder, lzma_code() behaves slightly unusually. * The action argument must be LZMA_FINISH and the return value will never be * LZMA_OK. Thus the encoding is always done with a single lzma_code() after * the initialization. The benefit of the combination of initialization - * function and lzma_code() is that memory allocations can be re-used for + * function and lzma_code() is that memory allocations can be reused for * better performance. * * lzma_code() will try to encode as much input as is possible to fit into * the given output buffer. If not all input can be encoded, the stream will * be finished without encoding all the input. The caller must check both * input and output buffer usage after lzma_code() (total_in and total_out * in lzma_stream can be convenient). Often lzma_code() can fill the output * buffer completely if there is a lot of input, but sometimes a few bytes * may remain unused because the next LZMA symbol would require more space. * * lzma_stream.avail_out must be at least 6. Otherwise LZMA_PROG_ERROR * will be returned. * * The LZMA dictionary should be reasonably low to speed up the encoder * re-initialization. A good value is bigger than the resulting * uncompressed size of most of the output chunks. For example, if output * size is 4 KiB, dictionary size of 32 KiB or 64 KiB is good. If the * data compresses extremely well, even 128 KiB may be useful. * * The MicroLZMA format and this encoder variant were made with the EROFS * file system in mind. This format may be convenient in other embedded * uses too where many small streams are needed. XZ Embedded includes a * decoder for this format. * * \param strm Pointer to lzma_stream that is at least initialized * with LZMA_STREAM_INIT. * \param options Pointer to encoder options * * \return Possible lzma_ret values: * - LZMA_STREAM_END: All good. Check the amounts of input used * and output produced. Store the amount of input used * (uncompressed size) as it needs to be known to decompress * the data. * - LZMA_OPTIONS_ERROR * - LZMA_MEM_ERROR * - LZMA_PROG_ERROR: In addition to the generic reasons for this * error code, this may also be returned if there isn't enough * output space (6 bytes) to create a valid MicroLZMA stream. */ extern LZMA_API(lzma_ret) lzma_microlzma_encoder( lzma_stream *strm, const lzma_options_lzma *options) lzma_nothrow; /************ * Decoding * ************/ /** * This flag makes lzma_code() return LZMA_NO_CHECK if the input stream * being decoded has no integrity check. Note that when used with * lzma_auto_decoder(), all .lzma files will trigger LZMA_NO_CHECK * if LZMA_TELL_NO_CHECK is used. */ #define LZMA_TELL_NO_CHECK UINT32_C(0x01) /** * This flag makes lzma_code() return LZMA_UNSUPPORTED_CHECK if the input * stream has an integrity check, but the type of the integrity check is not * supported by this liblzma version or build. Such files can still be * decoded, but the integrity check cannot be verified. */ #define LZMA_TELL_UNSUPPORTED_CHECK UINT32_C(0x02) /** * This flag makes lzma_code() return LZMA_GET_CHECK as soon as the type * of the integrity check is known. The type can then be got with * lzma_get_check(). */ #define LZMA_TELL_ANY_CHECK UINT32_C(0x04) /** * This flag makes lzma_code() not calculate and verify the integrity check * of the compressed data in .xz files. This means that invalid integrity * check values won't be detected and LZMA_DATA_ERROR won't be returned in * such cases. * * This flag only affects the checks of the compressed data itself; the CRC32 * values in the .xz headers will still be verified normally. * * Don't use this flag unless you know what you are doing. Possible reasons * to use this flag: * * - Trying to recover data from a corrupt .xz file. * * - Speeding up decompression, which matters mostly with SHA-256 * or with files that have compressed extremely well. It's recommended * to not use this flag for this purpose unless the file integrity is * verified externally in some other way. * * Support for this flag was added in liblzma 5.1.4beta. */ #define LZMA_IGNORE_CHECK UINT32_C(0x10) /** * This flag enables decoding of concatenated files with file formats that * allow concatenating compressed files as is. From the formats currently * supported by liblzma, only the .xz and .lz formats allow concatenated * files. Concatenated files are not allowed with the legacy .lzma format. * * This flag also affects the usage of the 'action' argument for lzma_code(). * When LZMA_CONCATENATED is used, lzma_code() won't return LZMA_STREAM_END * unless LZMA_FINISH is used as 'action'. Thus, the application has to set * LZMA_FINISH in the same way as it does when encoding. * * If LZMA_CONCATENATED is not used, the decoders still accept LZMA_FINISH * as 'action' for lzma_code(), but the usage of LZMA_FINISH isn't required. */ #define LZMA_CONCATENATED UINT32_C(0x08) /** * This flag makes the threaded decoder report errors (like LZMA_DATA_ERROR) * as soon as they are detected. This saves time when the application has no * interest in a partially decompressed truncated or corrupt file. Note that * due to timing randomness, if the same truncated or corrupt input is * decompressed multiple times with this flag, a different amount of output * may be produced by different runs, and even the error code might vary. * * When using LZMA_FAIL_FAST, it is recommended to use LZMA_FINISH to tell * the decoder when no more input will be coming because it can help fast * detection and reporting of truncated files. Note that in this situation * truncated files might be diagnosed with LZMA_DATA_ERROR instead of * LZMA_OK or LZMA_BUF_ERROR! * * Without this flag the threaded decoder will provide as much output as * possible at first and then report the pending error. This default behavior * matches the single-threaded decoder and provides repeatable behavior * with truncated or corrupt input. There are a few special cases where the * behavior can still differ like memory allocation failures (LZMA_MEM_ERROR). * * Single-threaded decoders currently ignore this flag. * * Support for this flag was added in liblzma 5.3.3alpha. Note that in older * versions this flag isn't supported (LZMA_OPTIONS_ERROR) even by functions * that ignore this flag in newer liblzma versions. */ #define LZMA_FAIL_FAST UINT32_C(0x20) /** * \brief Initialize .xz Stream decoder * * \param strm Pointer to lzma_stream that is at least initialized * with LZMA_STREAM_INIT. * \param memlimit Memory usage limit as bytes. Use UINT64_MAX * to effectively disable the limiter. liblzma * 5.2.3 and earlier don't allow 0 here and return * LZMA_PROG_ERROR; later versions treat 0 as if 1 * had been specified. * \param flags Bitwise-or of zero or more of the decoder flags: * LZMA_TELL_NO_CHECK, LZMA_TELL_UNSUPPORTED_CHECK, * LZMA_TELL_ANY_CHECK, LZMA_IGNORE_CHECK, * LZMA_CONCATENATED, LZMA_FAIL_FAST * * \return Possible lzma_ret values: * - LZMA_OK: Initialization was successful. * - LZMA_MEM_ERROR: Cannot allocate memory. * - LZMA_OPTIONS_ERROR: Unsupported flags * - LZMA_PROG_ERROR */ extern LZMA_API(lzma_ret) lzma_stream_decoder( lzma_stream *strm, uint64_t memlimit, uint32_t flags) lzma_nothrow lzma_attr_warn_unused_result; /** * \brief Initialize multithreaded .xz Stream decoder * * The decoder can decode multiple Blocks in parallel. This requires that each * Block Header contains the Compressed Size and Uncompressed size fields * which are added by the multi-threaded encoder, see lzma_stream_encoder_mt(). * * A Stream with one Block will only utilize one thread. A Stream with multiple * Blocks but without size information in Block Headers will be processed in * single-threaded mode in the same way as done by lzma_stream_decoder(). * Concatenated Streams are processed one Stream at a time; no inter-Stream * parallelization is done. * * This function behaves like lzma_stream_decoder() when options->threads == 1 * and options->memlimit_threading <= 1. * * \param strm Pointer to lzma_stream that is at least initialized * with LZMA_STREAM_INIT. * \param options Pointer to multithreaded compression options * * \return Possible lzma_ret values: * - LZMA_OK: Initialization was successful. * - LZMA_MEM_ERROR: Cannot allocate memory. * - LZMA_MEMLIMIT_ERROR: Memory usage limit was reached. * - LZMA_OPTIONS_ERROR: Unsupported flags. * - LZMA_PROG_ERROR */ extern LZMA_API(lzma_ret) lzma_stream_decoder_mt( lzma_stream *strm, const lzma_mt *options) lzma_nothrow lzma_attr_warn_unused_result; /** * \brief Decode .xz, .lzma, and .lz (lzip) files with autodetection * * This decoder autodetects between the .xz, .lzma, and .lz file formats, * and calls lzma_stream_decoder(), lzma_alone_decoder(), or * lzma_lzip_decoder() once the type of the input file has been detected. * * Support for .lz was added in 5.4.0. * * If the flag LZMA_CONCATENATED is used and the input is a .lzma file: * For historical reasons concatenated .lzma files aren't supported. * If there is trailing data after one .lzma stream, lzma_code() will * return LZMA_DATA_ERROR. (lzma_alone_decoder() doesn't have such a check * as it doesn't support any decoder flags. It will return LZMA_STREAM_END * after one .lzma stream.) * * \param strm Pointer to lzma_stream that is at least initialized * with LZMA_STREAM_INIT. * \param memlimit Memory usage limit as bytes. Use UINT64_MAX * to effectively disable the limiter. liblzma * 5.2.3 and earlier don't allow 0 here and return * LZMA_PROG_ERROR; later versions treat 0 as if 1 * had been specified. * \param flags Bitwise-or of zero or more of the decoder flags: * LZMA_TELL_NO_CHECK, LZMA_TELL_UNSUPPORTED_CHECK, * LZMA_TELL_ANY_CHECK, LZMA_IGNORE_CHECK, * LZMA_CONCATENATED, LZMA_FAIL_FAST * * \return Possible lzma_ret values: * - LZMA_OK: Initialization was successful. * - LZMA_MEM_ERROR: Cannot allocate memory. * - LZMA_OPTIONS_ERROR: Unsupported flags * - LZMA_PROG_ERROR */ extern LZMA_API(lzma_ret) lzma_auto_decoder( lzma_stream *strm, uint64_t memlimit, uint32_t flags) lzma_nothrow lzma_attr_warn_unused_result; /** * \brief Initialize .lzma decoder (legacy file format) * * Valid 'action' arguments to lzma_code() are LZMA_RUN and LZMA_FINISH. * There is no need to use LZMA_FINISH, but it's allowed because it may * simplify certain types of applications. * * \param strm Pointer to lzma_stream that is at least initialized * with LZMA_STREAM_INIT. * \param memlimit Memory usage limit as bytes. Use UINT64_MAX * to effectively disable the limiter. liblzma * 5.2.3 and earlier don't allow 0 here and return * LZMA_PROG_ERROR; later versions treat 0 as if 1 * had been specified. * * \return Possible lzma_ret values: * - LZMA_OK * - LZMA_MEM_ERROR * - LZMA_PROG_ERROR */ extern LZMA_API(lzma_ret) lzma_alone_decoder( lzma_stream *strm, uint64_t memlimit) lzma_nothrow lzma_attr_warn_unused_result; /** * \brief Initialize .lz (lzip) decoder (a foreign file format) * * This decoder supports the .lz format version 0 and the unextended .lz * format version 1: * * - Files in the format version 0 were produced by lzip 1.3 and older. * Such files aren't common but may be found from file archives * as a few source packages were released in this format. People * might have old personal files in this format too. Decompression * support for the format version 0 was removed in lzip 1.18. * * - lzip 1.3 added decompression support for .lz format version 1 files. * Compression support was added in lzip 1.4. In lzip 1.6 the .lz format * version 1 was extended to support the Sync Flush marker. This extension * is not supported by liblzma. lzma_code() will return LZMA_DATA_ERROR * at the location of the Sync Flush marker. In practice files with * the Sync Flush marker are very rare and thus liblzma can decompress * almost all .lz files. * * Just like with lzma_stream_decoder() for .xz files, LZMA_CONCATENATED * should be used when decompressing normal standalone .lz files. * * The .lz format allows putting non-.lz data at the end of a file after at * least one valid .lz member. That is, one can append custom data at the end * of a .lz file and the decoder is required to ignore it. In liblzma this * is relevant only when LZMA_CONCATENATED is used. In that case lzma_code() * will return LZMA_STREAM_END and leave lzma_stream.next_in pointing to * the first byte of the non-.lz data. An exception to this is if the first * 1-3 bytes of the non-.lz data are identical to the .lz magic bytes * (0x4C, 0x5A, 0x49, 0x50; "LZIP" in US-ASCII). In such a case the 1-3 bytes * will have been ignored by lzma_code(). If one wishes to locate the non-.lz * data reliably, one must ensure that the first byte isn't 0x4C. Actually * one should ensure that none of the first four bytes of trailing data are * equal to the magic bytes because lzip >= 1.20 requires it by default. * * \param strm Pointer to lzma_stream that is at least initialized * with LZMA_STREAM_INIT. * \param memlimit Memory usage limit as bytes. Use UINT64_MAX * to effectively disable the limiter. * \param flags Bitwise-or of flags, or zero for no flags. * All decoder flags listed above are supported * although only LZMA_CONCATENATED and (in very rare * cases) LZMA_IGNORE_CHECK are actually useful. * LZMA_TELL_NO_CHECK, LZMA_TELL_UNSUPPORTED_CHECK, * and LZMA_FAIL_FAST do nothing. LZMA_TELL_ANY_CHECK * is supported for consistency only as CRC32 is * always used in the .lz format. * * \return Possible lzma_ret values: * - LZMA_OK: Initialization was successful. * - LZMA_MEM_ERROR: Cannot allocate memory. * - LZMA_OPTIONS_ERROR: Unsupported flags * - LZMA_PROG_ERROR */ extern LZMA_API(lzma_ret) lzma_lzip_decoder( lzma_stream *strm, uint64_t memlimit, uint32_t flags) lzma_nothrow lzma_attr_warn_unused_result; /** * \brief Single-call .xz Stream decoder * * \param memlimit Pointer to how much memory the decoder is allowed * to allocate. The value pointed by this pointer is * modified if and only if LZMA_MEMLIMIT_ERROR is * returned. * \param flags Bitwise-or of zero or more of the decoder flags: * LZMA_TELL_NO_CHECK, LZMA_TELL_UNSUPPORTED_CHECK, * LZMA_IGNORE_CHECK, LZMA_CONCATENATED, * LZMA_FAIL_FAST. Note that LZMA_TELL_ANY_CHECK * is not allowed and will return LZMA_PROG_ERROR. * \param allocator lzma_allocator for custom allocator functions. * Set to NULL to use malloc() and free(). * \param in Beginning of the input buffer * \param in_pos The next byte will be read from in[*in_pos]. * *in_pos is updated only if decoding succeeds. * \param in_size Size of the input buffer; the first byte that * won't be read is in[in_size]. * \param[out] out Beginning of the output buffer * \param[out] out_pos The next byte will be written to out[*out_pos]. * *out_pos is updated only if decoding succeeds. * \param out_size Size of the out buffer; the first byte into * which no data is written to is out[out_size]. * * \return Possible lzma_ret values: * - LZMA_OK: Decoding was successful. * - LZMA_FORMAT_ERROR * - LZMA_OPTIONS_ERROR * - LZMA_DATA_ERROR * - LZMA_NO_CHECK: This can be returned only if using * the LZMA_TELL_NO_CHECK flag. * - LZMA_UNSUPPORTED_CHECK: This can be returned only if using * the LZMA_TELL_UNSUPPORTED_CHECK flag. * - LZMA_MEM_ERROR * - LZMA_MEMLIMIT_ERROR: Memory usage limit was reached. * The minimum required memlimit value was stored to *memlimit. * - LZMA_BUF_ERROR: Output buffer was too small. * - LZMA_PROG_ERROR */ extern LZMA_API(lzma_ret) lzma_stream_buffer_decode( uint64_t *memlimit, uint32_t flags, const lzma_allocator *allocator, const uint8_t *in, size_t *in_pos, size_t in_size, uint8_t *out, size_t *out_pos, size_t out_size) lzma_nothrow lzma_attr_warn_unused_result; /** * \brief MicroLZMA decoder * * See lzma_microlzma_encoder() for more information. * * The lzma_code() usage with this decoder is completely normal. The * special behavior of lzma_code() applies to lzma_microlzma_encoder() only. * * \param strm Pointer to lzma_stream that is at least initialized * with LZMA_STREAM_INIT. * \param comp_size Compressed size of the MicroLZMA stream. * The caller must somehow know this exactly. * \param uncomp_size Uncompressed size of the MicroLZMA stream. * If the exact uncompressed size isn't known, this * can be set to a value that is at most as big as * the exact uncompressed size would be, but then the * next argument uncomp_size_is_exact must be false. * \param uncomp_size_is_exact * If true, uncomp_size must be exactly correct. * This will improve error detection at the end of * the stream. If the exact uncompressed size isn't * known, this must be false. uncomp_size must still * be at most as big as the exact uncompressed size * is. Setting this to false when the exact size is * known will work but error detection at the end of * the stream will be weaker. * \param dict_size LZMA dictionary size that was used when * compressing the data. It is OK to use a bigger * value too but liblzma will then allocate more * memory than would actually be required and error * detection will be slightly worse. (Note that with * the implementation in XZ Embedded it doesn't * affect the memory usage if one specifies bigger * dictionary than actually required.) * * \return Possible lzma_ret values: * - LZMA_OK * - LZMA_MEM_ERROR * - LZMA_OPTIONS_ERROR * - LZMA_PROG_ERROR */ extern LZMA_API(lzma_ret) lzma_microlzma_decoder( lzma_stream *strm, uint64_t comp_size, uint64_t uncomp_size, lzma_bool uncomp_size_is_exact, uint32_t dict_size) lzma_nothrow; diff --git a/src/liblzma/api/lzma/lzma12.h b/src/liblzma/api/lzma/lzma12.h index 05f5b66eb56a..fec3e0dadb23 100644 --- a/src/liblzma/api/lzma/lzma12.h +++ b/src/liblzma/api/lzma/lzma12.h @@ -1,568 +1,568 @@ /* SPDX-License-Identifier: 0BSD */ /** * \file lzma/lzma12.h * \brief LZMA1 and LZMA2 filters * \note Never include this file directly. Use instead. */ /* * Author: Lasse Collin */ #ifndef LZMA_H_INTERNAL # error Never include this file directly. Use instead. #endif /** * \brief LZMA1 Filter ID (for raw encoder/decoder only, not in .xz) * * LZMA1 is the very same thing as what was called just LZMA in LZMA Utils, * 7-Zip, and LZMA SDK. It's called LZMA1 here to prevent developers from * accidentally using LZMA when they actually want LZMA2. */ #define LZMA_FILTER_LZMA1 LZMA_VLI_C(0x4000000000000001) /** * \brief LZMA1 Filter ID with extended options (for raw encoder/decoder) * * This is like LZMA_FILTER_LZMA1 but with this ID a few extra options * are supported in the lzma_options_lzma structure: * * - A flag to tell the encoder if the end of payload marker (EOPM) alias * end of stream (EOS) marker must be written at the end of the stream. * In contrast, LZMA_FILTER_LZMA1 always writes the end marker. * * - Decoder needs to be told the uncompressed size of the stream * or that it is unknown (using the special value UINT64_MAX). * If the size is known, a flag can be set to allow the presence of * the end marker anyway. In contrast, LZMA_FILTER_LZMA1 always * behaves as if the uncompressed size was unknown. * * This allows handling file formats where LZMA1 streams are used but where * the end marker isn't allowed or where it might not (always) be present. * This extended LZMA1 functionality is provided as a Filter ID for raw * encoder and decoder instead of adding new encoder and decoder initialization * functions because this way it is possible to also use extra filters, * for example, LZMA_FILTER_X86 in a filter chain with LZMA_FILTER_LZMA1EXT, * which might be needed to handle some file formats. */ #define LZMA_FILTER_LZMA1EXT LZMA_VLI_C(0x4000000000000002) /** * \brief LZMA2 Filter ID * * Usually you want this instead of LZMA1. Compared to LZMA1, LZMA2 adds * support for LZMA_SYNC_FLUSH, uncompressed chunks (smaller expansion * when trying to compress incompressible data), possibility to change * lc/lp/pb in the middle of encoding, and some other internal improvements. */ #define LZMA_FILTER_LZMA2 LZMA_VLI_C(0x21) /** * \brief Match finders * * Match finder has major effect on both speed and compression ratio. * Usually hash chains are faster than binary trees. * * If you will use LZMA_SYNC_FLUSH often, the hash chains may be a better * choice, because binary trees get much higher compression ratio penalty * with LZMA_SYNC_FLUSH. * * The memory usage formulas are only rough estimates, which are closest to * reality when dict_size is a power of two. The formulas are more complex * in reality, and can also change a little between liblzma versions. Use * lzma_raw_encoder_memusage() to get more accurate estimate of memory usage. */ typedef enum { LZMA_MF_HC3 = 0x03, /**< * \brief Hash Chain with 2- and 3-byte hashing * * Minimum nice_len: 3 * * Memory usage: * - dict_size <= 16 MiB: dict_size * 7.5 * - dict_size > 16 MiB: dict_size * 5.5 + 64 MiB */ LZMA_MF_HC4 = 0x04, /**< * \brief Hash Chain with 2-, 3-, and 4-byte hashing * * Minimum nice_len: 4 * * Memory usage: * - dict_size <= 32 MiB: dict_size * 7.5 * - dict_size > 32 MiB: dict_size * 6.5 */ LZMA_MF_BT2 = 0x12, /**< * \brief Binary Tree with 2-byte hashing * * Minimum nice_len: 2 * * Memory usage: dict_size * 9.5 */ LZMA_MF_BT3 = 0x13, /**< * \brief Binary Tree with 2- and 3-byte hashing * * Minimum nice_len: 3 * * Memory usage: * - dict_size <= 16 MiB: dict_size * 11.5 * - dict_size > 16 MiB: dict_size * 9.5 + 64 MiB */ LZMA_MF_BT4 = 0x14 /**< * \brief Binary Tree with 2-, 3-, and 4-byte hashing * * Minimum nice_len: 4 * * Memory usage: * - dict_size <= 32 MiB: dict_size * 11.5 * - dict_size > 32 MiB: dict_size * 10.5 */ } lzma_match_finder; /** * \brief Test if given match finder is supported * * It is safe to call this with a value that isn't listed in * lzma_match_finder enumeration; the return value will be false. * * There is no way to list which match finders are available in this * particular liblzma version and build. It would be useless, because * a new match finder, which the application developer wasn't aware, * could require giving additional options to the encoder that the older * match finders don't need. * * \param match_finder Match finder ID * * \return lzma_bool: * - true if the match finder is supported by this liblzma build. * - false otherwise. */ extern LZMA_API(lzma_bool) lzma_mf_is_supported(lzma_match_finder match_finder) lzma_nothrow lzma_attr_const; /** * \brief Compression modes * * This selects the function used to analyze the data produced by the match * finder. */ typedef enum { LZMA_MODE_FAST = 1, /**< * \brief Fast compression * * Fast mode is usually at its best when combined with * a hash chain match finder. */ LZMA_MODE_NORMAL = 2 /**< * \brief Normal compression * * This is usually notably slower than fast mode. Use this * together with binary tree match finders to expose the * full potential of the LZMA1 or LZMA2 encoder. */ } lzma_mode; /** * \brief Test if given compression mode is supported * * It is safe to call this with a value that isn't listed in lzma_mode * enumeration; the return value will be false. * * There is no way to list which modes are available in this particular * liblzma version and build. It would be useless, because a new compression * mode, which the application developer wasn't aware, could require giving * additional options to the encoder that the older modes don't need. * * \param mode Mode ID. * * \return lzma_bool: * - true if the compression mode is supported by this liblzma * build. * - false otherwise. */ extern LZMA_API(lzma_bool) lzma_mode_is_supported(lzma_mode mode) lzma_nothrow lzma_attr_const; /** * \brief Options specific to the LZMA1 and LZMA2 filters * * Since LZMA1 and LZMA2 share most of the code, it's simplest to share * the options structure too. For encoding, all but the reserved variables * need to be initialized unless specifically mentioned otherwise. * lzma_lzma_preset() can be used to get a good starting point. * * For raw decoding, both LZMA1 and LZMA2 need dict_size, preset_dict, and * preset_dict_size (if preset_dict != NULL). LZMA1 needs also lc, lp, and pb. */ typedef struct { /** * \brief Dictionary size in bytes * * Dictionary size indicates how many bytes of the recently processed * uncompressed data is kept in memory. One method to reduce size of * the uncompressed data is to store distance-length pairs, which * indicate what data to repeat from the dictionary buffer. Thus, * the bigger the dictionary, the better the compression ratio * usually is. * * Maximum size of the dictionary depends on multiple things: * - Memory usage limit * - Available address space (not a problem on 64-bit systems) * - Selected match finder (encoder only) * * Currently the maximum dictionary size for encoding is 1.5 GiB * (i.e. (UINT32_C(1) << 30) + (UINT32_C(1) << 29)) even on 64-bit * systems for certain match finder implementation reasons. In the * future, there may be match finders that support bigger * dictionaries. * * Decoder already supports dictionaries up to 4 GiB - 1 B (i.e. * UINT32_MAX), so increasing the maximum dictionary size of the * encoder won't cause problems for old decoders. * * Because extremely small dictionaries sizes would have unneeded * overhead in the decoder, the minimum dictionary size is 4096 bytes. * * \note When decoding, too big dictionary does no other harm * than wasting memory. */ uint32_t dict_size; # define LZMA_DICT_SIZE_MIN UINT32_C(4096) # define LZMA_DICT_SIZE_DEFAULT (UINT32_C(1) << 23) /** * \brief Pointer to an initial dictionary * * It is possible to initialize the LZ77 history window using * a preset dictionary. It is useful when compressing many * similar, relatively small chunks of data independently from * each other. The preset dictionary should contain typical * strings that occur in the files being compressed. The most * probable strings should be near the end of the preset dictionary. * * This feature should be used only in special situations. For * now, it works correctly only with raw encoding and decoding. * Currently none of the container formats supported by * liblzma allow preset dictionary when decoding, thus if * you create a .xz or .lzma file with preset dictionary, it * cannot be decoded with the regular decoder functions. In the * future, the .xz format will likely get support for preset * dictionary though. */ const uint8_t *preset_dict; /** * \brief Size of the preset dictionary * * Specifies the size of the preset dictionary. If the size is * bigger than dict_size, only the last dict_size bytes are * processed. * * This variable is read only when preset_dict is not NULL. * If preset_dict is not NULL but preset_dict_size is zero, * no preset dictionary is used (identical to only setting * preset_dict to NULL). */ uint32_t preset_dict_size; /** * \brief Number of literal context bits * * How many of the highest bits of the previous uncompressed * eight-bit byte (also known as 'literal') are taken into * account when predicting the bits of the next literal. * * E.g. in typical English text, an upper-case letter is * often followed by a lower-case letter, and a lower-case * letter is usually followed by another lower-case letter. * In the US-ASCII character set, the highest three bits are 010 * for upper-case letters and 011 for lower-case letters. * When lc is at least 3, the literal coding can take advantage of * this property in the uncompressed data. * * There is a limit that applies to literal context bits and literal * position bits together: lc + lp <= 4. Without this limit the * decoding could become very slow, which could have security related * results in some cases like email servers doing virus scanning. * This limit also simplifies the internal implementation in liblzma. * * There may be LZMA1 streams that have lc + lp > 4 (maximum possible * lc would be 8). It is not possible to decode such streams with * liblzma. */ uint32_t lc; # define LZMA_LCLP_MIN 0 # define LZMA_LCLP_MAX 4 # define LZMA_LC_DEFAULT 3 /** * \brief Number of literal position bits * * lp affects what kind of alignment in the uncompressed data is * assumed when encoding literals. A literal is a single 8-bit byte. * See pb below for more information about alignment. */ uint32_t lp; # define LZMA_LP_DEFAULT 0 /** * \brief Number of position bits * * pb affects what kind of alignment in the uncompressed data is * assumed in general. The default means four-byte alignment * (2^ pb =2^2=4), which is often a good choice when there's * no better guess. * * When the alignment is known, setting pb accordingly may reduce * the file size a little. E.g. with text files having one-byte * alignment (US-ASCII, ISO-8859-*, UTF-8), setting pb=0 can * improve compression slightly. For UTF-16 text, pb=1 is a good * choice. If the alignment is an odd number like 3 bytes, pb=0 * might be the best choice. * * Even though the assumed alignment can be adjusted with pb and * lp, LZMA1 and LZMA2 still slightly favor 16-byte alignment. * It might be worth taking into account when designing file formats * that are likely to be often compressed with LZMA1 or LZMA2. */ uint32_t pb; # define LZMA_PB_MIN 0 # define LZMA_PB_MAX 4 # define LZMA_PB_DEFAULT 2 /** Compression mode */ lzma_mode mode; /** * \brief Nice length of a match * * This determines how many bytes the encoder compares from the match * candidates when looking for the best match. Once a match of at * least nice_len bytes long is found, the encoder stops looking for * better candidates and encodes the match. (Naturally, if the found * match is actually longer than nice_len, the actual length is * encoded; it's not truncated to nice_len.) * * Bigger values usually increase the compression ratio and * compression time. For most files, 32 to 128 is a good value, * which gives very good compression ratio at good speed. * * The exact minimum value depends on the match finder. The maximum * is 273, which is the maximum length of a match that LZMA1 and * LZMA2 can encode. */ uint32_t nice_len; /** Match finder ID */ lzma_match_finder mf; /** * \brief Maximum search depth in the match finder * * For every input byte, match finder searches through the hash chain * or binary tree in a loop, each iteration going one step deeper in * the chain or tree. The searching stops if * - a match of at least nice_len bytes long is found; * - all match candidates from the hash chain or binary tree have * been checked; or * - maximum search depth is reached. * * Maximum search depth is needed to prevent the match finder from * wasting too much time in case there are lots of short match * candidates. On the other hand, stopping the search before all * candidates have been checked can reduce compression ratio. * * Setting depth to zero tells liblzma to use an automatic default * value, that depends on the selected match finder and nice_len. * The default is in the range [4, 200] or so (it may vary between * liblzma versions). * * Using a bigger depth value than the default can increase * compression ratio in some cases. There is no strict maximum value, * but high values (thousands or millions) should be used with care: * the encoder could remain fast enough with typical input, but * malicious input could cause the match finder to slow down * dramatically, possibly creating a denial of service attack. */ uint32_t depth; /** * \brief For LZMA_FILTER_LZMA1EXT: Extended flags * * This is used only with LZMA_FILTER_LZMA1EXT. * * Currently only one flag is supported, LZMA_LZMA1EXT_ALLOW_EOPM: * * - Encoder: If the flag is set, then end marker is written just * like it is with LZMA_FILTER_LZMA1. Without this flag the * end marker isn't written and the application has to store * the uncompressed size somewhere outside the compressed stream. * To decompress streams without the end marker, the application * has to set the correct uncompressed size in ext_size_low and * ext_size_high. * * - Decoder: If the uncompressed size in ext_size_low and * ext_size_high is set to the special value UINT64_MAX * (indicating unknown uncompressed size) then this flag is * ignored and the end marker must always be present, that is, * the behavior is identical to LZMA_FILTER_LZMA1. * * Otherwise, if this flag isn't set, then the input stream * must not have the end marker; if the end marker is detected * then it will result in LZMA_DATA_ERROR. This is useful when * it is known that the stream must not have the end marker and * strict validation is wanted. * * If this flag is set, then it is autodetected if the end marker * is present after the specified number of uncompressed bytes * has been decompressed (ext_size_low and ext_size_high). The * end marker isn't allowed in any other position. This behavior * is useful when uncompressed size is known but the end marker * may or may not be present. This is the case, for example, * in .7z files (valid .7z files that have the end marker in * LZMA1 streams are rare but they do exist). */ uint32_t ext_flags; # define LZMA_LZMA1EXT_ALLOW_EOPM UINT32_C(0x01) /** * \brief For LZMA_FILTER_LZMA1EXT: Uncompressed size (low bits) * * The 64-bit uncompressed size is needed for decompression with * LZMA_FILTER_LZMA1EXT. The size is ignored by the encoder. * * The special value UINT64_MAX indicates that the uncompressed size * is unknown and that the end of payload marker (also known as * end of stream marker) must be present to indicate the end of * the LZMA1 stream. Any other value indicates the expected * uncompressed size of the LZMA1 stream. (If LZMA1 was used together * with filters that change the size of the data then the uncompressed * size of the LZMA1 stream could be different than the final * uncompressed size of the filtered stream.) * * ext_size_low holds the least significant 32 bits of the * uncompressed size. The most significant 32 bits must be set - * in ext_size_high. The macro lzma_ext_size_set(opt_lzma, u64size) + * in ext_size_high. The macro lzma_set_ext_size(opt_lzma, u64size) * can be used to set these members. * * The 64-bit uncompressed size is split into two uint32_t variables * because there were no reserved uint64_t members and using the * same options structure for LZMA_FILTER_LZMA1, LZMA_FILTER_LZMA1EXT, * and LZMA_FILTER_LZMA2 was otherwise more convenient than having * a new options structure for LZMA_FILTER_LZMA1EXT. (Replacing two * uint32_t members with one uint64_t changes the ABI on some systems * as the alignment of this struct can increase from 4 bytes to 8.) */ uint32_t ext_size_low; /** * \brief For LZMA_FILTER_LZMA1EXT: Uncompressed size (high bits) * * This holds the most significant 32 bits of the uncompressed size. */ uint32_t ext_size_high; /* * Reserved space to allow possible future extensions without * breaking the ABI. You should not touch these, because the names * of these variables may change. These are and will never be used * with the currently supported options, so it is safe to leave these * uninitialized. */ /** \private Reserved member. */ uint32_t reserved_int4; /** \private Reserved member. */ uint32_t reserved_int5; /** \private Reserved member. */ uint32_t reserved_int6; /** \private Reserved member. */ uint32_t reserved_int7; /** \private Reserved member. */ uint32_t reserved_int8; /** \private Reserved member. */ lzma_reserved_enum reserved_enum1; /** \private Reserved member. */ lzma_reserved_enum reserved_enum2; /** \private Reserved member. */ lzma_reserved_enum reserved_enum3; /** \private Reserved member. */ lzma_reserved_enum reserved_enum4; /** \private Reserved member. */ void *reserved_ptr1; /** \private Reserved member. */ void *reserved_ptr2; } lzma_options_lzma; /** * \brief Macro to set the 64-bit uncompressed size in ext_size_* * * This might be convenient when decoding using LZMA_FILTER_LZMA1EXT. * This isn't used with LZMA_FILTER_LZMA1 or LZMA_FILTER_LZMA2. */ #define lzma_set_ext_size(opt_lzma2, u64size) \ do { \ (opt_lzma2).ext_size_low = (uint32_t)(u64size); \ (opt_lzma2).ext_size_high = (uint32_t)((uint64_t)(u64size) >> 32); \ } while (0) /** * \brief Set a compression preset to lzma_options_lzma structure * * 0 is the fastest and 9 is the slowest. These match the switches -0 .. -9 * of the xz command line tool. In addition, it is possible to bitwise-or * flags to the preset. Currently only LZMA_PRESET_EXTREME is supported. * The flags are defined in container.h, because the flags are used also * with lzma_easy_encoder(). * * The preset levels are subject to changes between liblzma versions. * * This function is available only if LZMA1 or LZMA2 encoder has been enabled * when building liblzma. * * If features (like certain match finders) have been disabled at build time, * then the function may return success (false) even though the resulting * LZMA1/LZMA2 options may not be usable for encoder initialization * (LZMA_OPTIONS_ERROR). * * \param[out] options Pointer to LZMA1 or LZMA2 options to be filled * \param preset Preset level bitwse-ORed with preset flags * * \return lzma_bool: * - true if the preset is not supported (failure). * - false otherwise (success). */ extern LZMA_API(lzma_bool) lzma_lzma_preset( lzma_options_lzma *options, uint32_t preset) lzma_nothrow; diff --git a/src/liblzma/api/lzma/version.h b/src/liblzma/api/lzma/version.h index e86c0ea4c3d1..86b355635961 100644 --- a/src/liblzma/api/lzma/version.h +++ b/src/liblzma/api/lzma/version.h @@ -1,134 +1,134 @@ /* SPDX-License-Identifier: 0BSD */ /** * \file lzma/version.h * \brief Version number * \note Never include this file directly. Use instead. */ /* * Author: Lasse Collin */ #ifndef LZMA_H_INTERNAL # error Never include this file directly. Use instead. #endif /** \brief Major version number of the liblzma release. */ #define LZMA_VERSION_MAJOR 5 /** \brief Minor version number of the liblzma release. */ -#define LZMA_VERSION_MINOR 6 +#define LZMA_VERSION_MINOR 8 /** \brief Patch version number of the liblzma release. */ -#define LZMA_VERSION_PATCH 3 +#define LZMA_VERSION_PATCH 1 /** * \brief Version stability marker * * This will always be one of three values: * - LZMA_VERSION_STABILITY_ALPHA * - LZMA_VERSION_STABILITY_BETA * - LZMA_VERSION_STABILITY_STABLE */ #define LZMA_VERSION_STABILITY LZMA_VERSION_STABILITY_STABLE /** \brief Commit version number of the liblzma release */ #ifndef LZMA_VERSION_COMMIT # define LZMA_VERSION_COMMIT "" #endif /* * Map symbolic stability levels to integers. */ #define LZMA_VERSION_STABILITY_ALPHA 0 #define LZMA_VERSION_STABILITY_BETA 1 #define LZMA_VERSION_STABILITY_STABLE 2 /** * \brief Compile-time version number * * The version number is of format xyyyzzzs where * - x = major * - yyy = minor * - zzz = revision * - s indicates stability: 0 = alpha, 1 = beta, 2 = stable * * The same xyyyzzz triplet is never reused with different stability levels. * For example, if 5.1.0alpha has been released, there will never be 5.1.0beta * or 5.1.0 stable. * * \note The version number of liblzma has nothing to with * the version number of Igor Pavlov's LZMA SDK. */ #define LZMA_VERSION (LZMA_VERSION_MAJOR * UINT32_C(10000000) \ + LZMA_VERSION_MINOR * UINT32_C(10000) \ + LZMA_VERSION_PATCH * UINT32_C(10) \ + LZMA_VERSION_STABILITY) /* * Macros to construct the compile-time version string */ #if LZMA_VERSION_STABILITY == LZMA_VERSION_STABILITY_ALPHA # define LZMA_VERSION_STABILITY_STRING "alpha" #elif LZMA_VERSION_STABILITY == LZMA_VERSION_STABILITY_BETA # define LZMA_VERSION_STABILITY_STRING "beta" #elif LZMA_VERSION_STABILITY == LZMA_VERSION_STABILITY_STABLE # define LZMA_VERSION_STABILITY_STRING "" #else # error Incorrect LZMA_VERSION_STABILITY #endif #define LZMA_VERSION_STRING_C_(major, minor, patch, stability, commit) \ #major "." #minor "." #patch stability commit #define LZMA_VERSION_STRING_C(major, minor, patch, stability, commit) \ LZMA_VERSION_STRING_C_(major, minor, patch, stability, commit) /** * \brief Compile-time version as a string * * This can be for example "4.999.5alpha", "4.999.8beta", or "5.0.0" (stable * versions don't have any "stable" suffix). In future, a snapshot built * from source code repository may include an additional suffix, for example * "4.999.8beta-21-g1d92". The commit ID won't be available in numeric form * in LZMA_VERSION macro. */ #define LZMA_VERSION_STRING LZMA_VERSION_STRING_C( \ LZMA_VERSION_MAJOR, LZMA_VERSION_MINOR, \ LZMA_VERSION_PATCH, LZMA_VERSION_STABILITY_STRING, \ LZMA_VERSION_COMMIT) /* #ifndef is needed for use with windres (MinGW-w64 or Cygwin). */ #ifndef LZMA_H_INTERNAL_RC /** * \brief Run-time version number as an integer * * This allows an application to compare if it was built against the same, * older, or newer version of liblzma that is currently running. * * \return The value of LZMA_VERSION macro at the compile time of liblzma */ extern LZMA_API(uint32_t) lzma_version_number(void) lzma_nothrow lzma_attr_const; /** * \brief Run-time version as a string * * This function may be useful to display which version of liblzma an * application is currently using. * * \return Run-time version of liblzma */ extern LZMA_API(const char *) lzma_version_string(void) lzma_nothrow lzma_attr_const; #endif diff --git a/src/liblzma/check/check.h b/src/liblzma/check/check.h index f0eb1172d907..16a56334211a 100644 --- a/src/liblzma/check/check.h +++ b/src/liblzma/check/check.h @@ -1,174 +1,156 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file check.h /// \brief Internal API to different integrity check functions // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #ifndef LZMA_CHECK_H #define LZMA_CHECK_H #include "common.h" // If the function for external SHA-256 is missing, use the internal SHA-256 // code. Due to how configure works, these defines can only get defined when // both a usable header and a type have already been found. #if !(defined(HAVE_CC_SHA256_INIT) \ || defined(HAVE_SHA256_INIT) \ || defined(HAVE_SHA256INIT)) # define HAVE_INTERNAL_SHA256 1 #endif #if defined(HAVE_INTERNAL_SHA256) // Nothing #elif defined(HAVE_COMMONCRYPTO_COMMONDIGEST_H) # include #elif defined(HAVE_SHA256_H) # include # include #elif defined(HAVE_SHA2_H) # include # include #endif #if defined(HAVE_INTERNAL_SHA256) /// State for the internal SHA-256 implementation typedef struct { /// Internal state uint32_t state[8]; /// Size of the message excluding padding uint64_t size; } lzma_sha256_state; #elif defined(HAVE_CC_SHA256_CTX) typedef CC_SHA256_CTX lzma_sha256_state; #elif defined(HAVE_SHA256_CTX) typedef SHA256_CTX lzma_sha256_state; #elif defined(HAVE_SHA2_CTX) typedef SHA2_CTX lzma_sha256_state; #endif #if defined(HAVE_INTERNAL_SHA256) // Nothing #elif defined(HAVE_CC_SHA256_INIT) # define LZMA_SHA256FUNC(x) CC_SHA256_ ## x #elif defined(HAVE_SHA256_INIT) # define LZMA_SHA256FUNC(x) SHA256_ ## x #elif defined(HAVE_SHA256INIT) # define LZMA_SHA256FUNC(x) SHA256 ## x #endif // Index hashing needs the best possible hash function (preferably // a cryptographic hash) for maximum reliability. #if defined(HAVE_CHECK_SHA256) # define LZMA_CHECK_BEST LZMA_CHECK_SHA256 #elif defined(HAVE_CHECK_CRC64) # define LZMA_CHECK_BEST LZMA_CHECK_CRC64 #else # define LZMA_CHECK_BEST LZMA_CHECK_CRC32 #endif /// \brief Structure to hold internal state of the check being calculated /// /// \note This is not in the public API because this structure may /// change in future if new integrity check algorithms are added. typedef struct { /// Buffer to hold the final result and a temporary buffer for SHA256. union { uint8_t u8[64]; uint32_t u32[16]; uint64_t u64[8]; } buffer; /// Check-specific data union { uint32_t crc32; uint64_t crc64; lzma_sha256_state sha256; } state; } lzma_check_state; -/// lzma_crc32_table[0] is needed by LZ encoder so we need to keep -/// the array two-dimensional. -#ifdef HAVE_SMALL -lzma_attr_visibility_hidden -extern uint32_t lzma_crc32_table[1][256]; - -extern void lzma_crc32_init(void); - -#else - -lzma_attr_visibility_hidden -extern const uint32_t lzma_crc32_table[8][256]; - -lzma_attr_visibility_hidden -extern const uint64_t lzma_crc64_table[4][256]; -#endif - - /// \brief Initialize *check depending on type extern void lzma_check_init(lzma_check_state *check, lzma_check type); /// Update the check state extern void lzma_check_update(lzma_check_state *check, lzma_check type, const uint8_t *buf, size_t size); /// Finish the check calculation and store the result to check->buffer.u8. extern void lzma_check_finish(lzma_check_state *check, lzma_check type); #ifndef LZMA_SHA256FUNC /// Prepare SHA-256 state for new input. extern void lzma_sha256_init(lzma_check_state *check); /// Update the SHA-256 hash state extern void lzma_sha256_update( const uint8_t *buf, size_t size, lzma_check_state *check); /// Finish the SHA-256 calculation and store the result to check->buffer.u8. extern void lzma_sha256_finish(lzma_check_state *check); #else static inline void lzma_sha256_init(lzma_check_state *check) { LZMA_SHA256FUNC(Init)(&check->state.sha256); } static inline void lzma_sha256_update(const uint8_t *buf, size_t size, lzma_check_state *check) { #if defined(HAVE_CC_SHA256_INIT) && SIZE_MAX > UINT32_MAX // Darwin's CC_SHA256_Update takes uint32_t as the buffer size, // so use a loop to support size_t. while (size > UINT32_MAX) { LZMA_SHA256FUNC(Update)(&check->state.sha256, buf, UINT32_MAX); buf += UINT32_MAX; size -= UINT32_MAX; } #endif LZMA_SHA256FUNC(Update)(&check->state.sha256, buf, size); } static inline void lzma_sha256_finish(lzma_check_state *check) { LZMA_SHA256FUNC(Final)(check->buffer.u8, &check->state.sha256); } #endif #endif diff --git a/src/liblzma/check/crc32_arm64.h b/src/liblzma/check/crc32_arm64.h index 39c1c63ec0ec..fb0e8f0105a9 100644 --- a/src/liblzma/check/crc32_arm64.h +++ b/src/liblzma/check/crc32_arm64.h @@ -1,122 +1,147 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file crc32_arm64.h /// \brief CRC32 calculation with ARM64 optimization // // Authors: Chenxi Mao // Jia Tan -// Hans Jansen +// Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #ifndef LZMA_CRC32_ARM64_H #define LZMA_CRC32_ARM64_H // MSVC always has the CRC intrinsics available when building for ARM64 // there is no need to include any header files. #ifndef _MSC_VER # include #endif // If both versions are going to be built, we need runtime detection // to check if the instructions are supported. #if defined(CRC32_GENERIC) && defined(CRC32_ARCH_OPTIMIZED) # if defined(HAVE_GETAUXVAL) || defined(HAVE_ELF_AUX_INFO) # include # elif defined(_WIN32) # include # elif defined(__APPLE__) && defined(HAVE_SYSCTLBYNAME) # include # endif #endif // Some EDG-based compilers support ARM64 and define __GNUC__ // (such as Nvidia's nvcc), but do not support function attributes. // // NOTE: Build systems check for this too, keep them in sync with this. #if (defined(__GNUC__) || defined(__clang__)) && !defined(__EDG__) # define crc_attr_target __attribute__((__target__("+crc"))) #else # define crc_attr_target #endif crc_attr_target static uint32_t crc32_arch_optimized(const uint8_t *buf, size_t size, uint32_t crc) { crc = ~crc; - // Align the input buffer because this was shown to be - // significantly faster than unaligned accesses. - const size_t align_amount = my_min(size, (0U - (uintptr_t)buf) & 7); + if (size >= 8) { + // Align the input buffer because this was shown to be + // significantly faster than unaligned accesses. + const size_t align = (0 - (uintptr_t)buf) & 7; - for (const uint8_t *limit = buf + align_amount; buf < limit; ++buf) - crc = __crc32b(crc, *buf); + if (align & 1) + crc = __crc32b(crc, *buf++); + + if (align & 2) { + crc = __crc32h(crc, aligned_read16le(buf)); + buf += 2; + } + + if (align & 4) { + crc = __crc32w(crc, aligned_read32le(buf)); + buf += 4; + } - size -= align_amount; + size -= align; - // Process 8 bytes at a time. The end point is determined by - // ignoring the least significant three bits of size to ensure - // we do not process past the bounds of the buffer. This guarantees - // that limit is a multiple of 8 and is strictly less than size. - for (const uint8_t *limit = buf + (size & ~(size_t)7); - buf < limit; buf += 8) - crc = __crc32d(crc, aligned_read64le(buf)); + // Process 8 bytes at a time. The end point is determined by + // ignoring the least significant three bits of size to + // ensure we do not process past the bounds of the buffer. + // This guarantees that limit is a multiple of 8 and is + // strictly less than size. + for (const uint8_t *limit = buf + (size & ~(size_t)7); + buf < limit; buf += 8) + crc = __crc32d(crc, aligned_read64le(buf)); + + size &= 7; + } // Process the remaining bytes that are not 8 byte aligned. - for (const uint8_t *limit = buf + (size & 7); buf < limit; ++buf) + if (size & 4) { + crc = __crc32w(crc, aligned_read32le(buf)); + buf += 4; + } + + if (size & 2) { + crc = __crc32h(crc, aligned_read16le(buf)); + buf += 2; + } + + if (size & 1) crc = __crc32b(crc, *buf); return ~crc; } #if defined(CRC32_GENERIC) && defined(CRC32_ARCH_OPTIMIZED) static inline bool is_arch_extension_supported(void) { #if defined(HAVE_GETAUXVAL) return (getauxval(AT_HWCAP) & HWCAP_CRC32) != 0; #elif defined(HAVE_ELF_AUX_INFO) unsigned long feature_flags; if (elf_aux_info(AT_HWCAP, &feature_flags, sizeof(feature_flags)) != 0) return false; return (feature_flags & HWCAP_CRC32) != 0; #elif defined(_WIN32) return IsProcessorFeaturePresent( PF_ARM_V8_CRC32_INSTRUCTIONS_AVAILABLE); #elif defined(__APPLE__) && defined(HAVE_SYSCTLBYNAME) int has_crc32 = 0; size_t size = sizeof(has_crc32); // The sysctlbyname() function requires a string identifier for the // CPU feature it tests. The Apple documentation lists the string // "hw.optional.armv8_crc32", which can be found here: // https://developer.apple.com/documentation/kernel/1387446-sysctlbyname/determining_instruction_set_characteristics#3915619 if (sysctlbyname("hw.optional.armv8_crc32", &has_crc32, &size, NULL, 0) != 0) return false; return has_crc32; #else // If a runtime detection method cannot be found, then this must // be a compile time error. The checks in crc_common.h should ensure // a runtime detection method is always found if this function is // built. It would be possible to just return false here, but this // is inefficient for binary size and runtime since only the generic // method could ever be used. # error Runtime detection method unavailable. #endif } #endif #endif // LZMA_CRC32_ARM64_H diff --git a/src/liblzma/check/crc32_fast.c b/src/liblzma/check/crc32_fast.c index 16dbb7467513..3c7cb95f57b7 100644 --- a/src/liblzma/check/crc32_fast.c +++ b/src/liblzma/check/crc32_fast.c @@ -1,204 +1,196 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file crc32.c /// \brief CRC32 calculation // // Authors: Lasse Collin // Ilya Kurdyukov -// Hans Jansen // /////////////////////////////////////////////////////////////////////////////// #include "check.h" #include "crc_common.h" #if defined(CRC_X86_CLMUL) -# define BUILDING_CRC32_CLMUL +# define BUILDING_CRC_CLMUL 32 # include "crc_x86_clmul.h" #elif defined(CRC32_ARM64) # include "crc32_arm64.h" +#elif defined(CRC32_LOONGARCH) +# include "crc32_loongarch.h" #endif #ifdef CRC32_GENERIC /////////////////// // Generic CRC32 // /////////////////// +#ifdef WORDS_BIGENDIAN +# include "crc32_table_be.h" +#else +# include "crc32_table_le.h" +#endif + + +#ifdef HAVE_CRC_X86_ASM +extern uint32_t lzma_crc32_generic( + const uint8_t *buf, size_t size, uint32_t crc); +#else static uint32_t -crc32_generic(const uint8_t *buf, size_t size, uint32_t crc) +lzma_crc32_generic(const uint8_t *buf, size_t size, uint32_t crc) { crc = ~crc; #ifdef WORDS_BIGENDIAN crc = byteswap32(crc); #endif if (size > 8) { // Fix the alignment, if needed. The if statement above // ensures that this won't read past the end of buf[]. while ((uintptr_t)(buf) & 7) { crc = lzma_crc32_table[0][*buf++ ^ A(crc)] ^ S8(crc); --size; } // Calculate the position where to stop. const uint8_t *const limit = buf + (size & ~(size_t)(7)); // Calculate how many bytes must be calculated separately // before returning the result. size &= (size_t)(7); // Calculate the CRC32 using the slice-by-eight algorithm. while (buf < limit) { crc ^= aligned_read32ne(buf); buf += 4; crc = lzma_crc32_table[7][A(crc)] ^ lzma_crc32_table[6][B(crc)] ^ lzma_crc32_table[5][C(crc)] ^ lzma_crc32_table[4][D(crc)]; const uint32_t tmp = aligned_read32ne(buf); buf += 4; // At least with some compilers, it is critical for // performance, that the crc variable is XORed // between the two table-lookup pairs. crc = lzma_crc32_table[3][A(tmp)] ^ lzma_crc32_table[2][B(tmp)] ^ crc ^ lzma_crc32_table[1][C(tmp)] ^ lzma_crc32_table[0][D(tmp)]; } } while (size-- != 0) crc = lzma_crc32_table[0][*buf++ ^ A(crc)] ^ S8(crc); #ifdef WORDS_BIGENDIAN crc = byteswap32(crc); #endif return ~crc; } -#endif +#endif // HAVE_CRC_X86_ASM +#endif // CRC32_GENERIC #if defined(CRC32_GENERIC) && defined(CRC32_ARCH_OPTIMIZED) ////////////////////////// // Function dispatching // ////////////////////////// // If both the generic and arch-optimized implementations are built, then // the function to use is selected at runtime because the system running // the binary might not have the arch-specific instruction set extension(s) // available. The dispatch methods in order of priority: // // 1. Constructor. This method uses __attribute__((__constructor__)) to // set crc32_func at load time. This avoids extra computation (and any // unlikely threading bugs) on the first call to lzma_crc32() to decide // which implementation should be used. // // 2. First Call Resolution. On the very first call to lzma_crc32(), the // call will be directed to crc32_dispatch() instead. This will set the // appropriate implementation function and will not be called again. // This method does not use any kind of locking but is safe because if // multiple threads run the dispatcher simultaneously then they will all // set crc32_func to the same value. typedef uint32_t (*crc32_func_type)( const uint8_t *buf, size_t size, uint32_t crc); // This resolver is shared between all dispatch methods. static crc32_func_type crc32_resolve(void) { return is_arch_extension_supported() - ? &crc32_arch_optimized : &crc32_generic; + ? &crc32_arch_optimized : &lzma_crc32_generic; } #ifdef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR // Constructor method. # define CRC32_SET_FUNC_ATTR __attribute__((__constructor__)) static crc32_func_type crc32_func; #else // First Call Resolution method. # define CRC32_SET_FUNC_ATTR static uint32_t crc32_dispatch(const uint8_t *buf, size_t size, uint32_t crc); static crc32_func_type crc32_func = &crc32_dispatch; #endif CRC32_SET_FUNC_ATTR static void crc32_set_func(void) { crc32_func = crc32_resolve(); return; } #ifndef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR static uint32_t crc32_dispatch(const uint8_t *buf, size_t size, uint32_t crc) { // When __attribute__((__constructor__)) isn't supported, set the // function pointer without any locking. If multiple threads run // the detection code in parallel, they will all end up setting // the pointer to the same value. This avoids the use of // mythread_once() on every call to lzma_crc32() but this likely // isn't strictly standards compliant. Let's change it if it breaks. crc32_set_func(); return crc32_func(buf, size, crc); } #endif #endif extern LZMA_API(uint32_t) lzma_crc32(const uint8_t *buf, size_t size, uint32_t crc) { #if defined(CRC32_GENERIC) && defined(CRC32_ARCH_OPTIMIZED) - // On x86-64, if CLMUL is available, it is the best for non-tiny - // inputs, being over twice as fast as the generic slice-by-four - // version. However, for size <= 16 it's different. In the extreme - // case of size == 1 the generic version can be five times faster. - // At size >= 8 the CLMUL starts to become reasonable. It - // varies depending on the alignment of buf too. - // - // The above doesn't include the overhead of mythread_once(). - // At least on x86-64 GNU/Linux, pthread_once() is very fast but - // it still makes lzma_crc32(buf, 1, crc) 50-100 % slower. When - // size reaches 12-16 bytes the overhead becomes negligible. - // - // So using the generic version for size <= 16 may give better - // performance with tiny inputs but if such inputs happen rarely - // it's not so obvious because then the lookup table of the - // generic version may not be in the processor cache. -#ifdef CRC_USE_GENERIC_FOR_SMALL_INPUTS - if (size <= 16) - return crc32_generic(buf, size, crc); -#endif - /* #ifndef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR // See crc32_dispatch(). This would be the alternative which uses // locking and doesn't use crc32_dispatch(). Note that on Windows // this method needs Vista threads. mythread_once(crc64_set_func); #endif */ return crc32_func(buf, size, crc); #elif defined(CRC32_ARCH_OPTIMIZED) return crc32_arch_optimized(buf, size, crc); #else - return crc32_generic(buf, size, crc); + return lzma_crc32_generic(buf, size, crc); #endif } diff --git a/src/liblzma/check/crc32_loongarch.h b/src/liblzma/check/crc32_loongarch.h new file mode 100644 index 000000000000..ec738b83d70a --- /dev/null +++ b/src/liblzma/check/crc32_loongarch.h @@ -0,0 +1,65 @@ +// SPDX-License-Identifier: 0BSD + +/////////////////////////////////////////////////////////////////////////////// +// +/// \file crc32_loongarch.h +/// \brief CRC32 calculation with LoongArch optimization +// +// Authors: Xi Ruoyao +// Lasse Collin +// +/////////////////////////////////////////////////////////////////////////////// + +#ifndef LZMA_CRC32_LOONGARCH_H +#define LZMA_CRC32_LOONGARCH_H + +#include + + +static uint32_t +crc32_arch_optimized(const uint8_t *buf, size_t size, uint32_t crc_unsigned) +{ + int32_t crc = (int32_t)~crc_unsigned; + + if (size >= 8) { + const size_t align = (0 - (uintptr_t)buf) & 7; + + if (align & 1) + crc = __crc_w_b_w((int8_t)*buf++, crc); + + if (align & 2) { + crc = __crc_w_h_w((int16_t)aligned_read16le(buf), crc); + buf += 2; + } + + if (align & 4) { + crc = __crc_w_w_w((int32_t)aligned_read32le(buf), crc); + buf += 4; + } + + size -= align; + + for (const uint8_t *limit = buf + (size & ~(size_t)7); + buf < limit; buf += 8) + crc = __crc_w_d_w((int64_t)aligned_read64le(buf), crc); + + size &= 7; + } + + if (size & 4) { + crc = __crc_w_w_w((int32_t)aligned_read32le(buf), crc); + buf += 4; + } + + if (size & 2) { + crc = __crc_w_h_w((int16_t)aligned_read16le(buf), crc); + buf += 2; + } + + if (size & 1) + crc = __crc_w_b_w((int8_t)*buf, crc); + + return (uint32_t)~crc; +} + +#endif // LZMA_CRC32_LOONGARCH_H diff --git a/src/liblzma/check/crc32_small.c b/src/liblzma/check/crc32_small.c index 6a1bd66185ea..4a62830c807a 100644 --- a/src/liblzma/check/crc32_small.c +++ b/src/liblzma/check/crc32_small.c @@ -1,67 +1,70 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file crc32_small.c /// \brief CRC32 calculation (size-optimized) // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "check.h" +#include "crc_common.h" +// The table is used by the LZ encoder too, thus it's not static like +// in crc64_small.c. uint32_t lzma_crc32_table[1][256]; #ifdef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR __attribute__((__constructor__)) #endif static void crc32_init(void) { static const uint32_t poly32 = UINT32_C(0xEDB88320); for (size_t b = 0; b < 256; ++b) { uint32_t r = b; for (size_t i = 0; i < 8; ++i) { if (r & 1) r = (r >> 1) ^ poly32; else r >>= 1; } lzma_crc32_table[0][b] = r; } return; } #ifndef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR extern void lzma_crc32_init(void) { mythread_once(crc32_init); return; } #endif extern LZMA_API(uint32_t) lzma_crc32(const uint8_t *buf, size_t size, uint32_t crc) { #ifndef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR lzma_crc32_init(); #endif crc = ~crc; while (size != 0) { crc = lzma_crc32_table[0][*buf++ ^ (crc & 0xFF)] ^ (crc >> 8); --size; } return ~crc; } diff --git a/src/liblzma/check/crc32_table.c b/src/liblzma/check/crc32_table.c deleted file mode 100644 index 56413eec336e..000000000000 --- a/src/liblzma/check/crc32_table.c +++ /dev/null @@ -1,42 +0,0 @@ -// SPDX-License-Identifier: 0BSD - -/////////////////////////////////////////////////////////////////////////////// -// -/// \file crc32_table.c -/// \brief Precalculated CRC32 table with correct endianness -// -// Author: Lasse Collin -// -/////////////////////////////////////////////////////////////////////////////// - -#include "common.h" - - -// FIXME: Compared to crc_common.h this has to check for __x86_64__ too -// so that in 32-bit builds crc32_x86.S won't break due to a missing table. -#if defined(HAVE_USABLE_CLMUL) && ((defined(__x86_64__) && defined(__SSSE3__) \ - && defined(__SSE4_1__) && defined(__PCLMUL__)) \ - || (defined(__e2k__) && __iset__ >= 6)) -# define NO_CRC32_TABLE - -#elif defined(HAVE_ARM64_CRC32) \ - && !defined(WORDS_BIGENDIAN) \ - && defined(__ARM_FEATURE_CRC32) -# define NO_CRC32_TABLE -#endif - - -#if !defined(HAVE_ENCODERS) && defined(NO_CRC32_TABLE) -// No table needed. Use a typedef to avoid an empty translation unit. -typedef void lzma_crc32_dummy; - -#else -// Having the declaration here silences clang -Wmissing-variable-declarations. -extern const uint32_t lzma_crc32_table[8][256]; - -# ifdef WORDS_BIGENDIAN -# include "crc32_table_be.h" -# else -# include "crc32_table_le.h" -# endif -#endif diff --git a/src/liblzma/check/crc32_x86.S b/src/liblzma/check/crc32_x86.S index ddc3cee6ea5b..37ee063d1068 100644 --- a/src/liblzma/check/crc32_x86.S +++ b/src/liblzma/check/crc32_x86.S @@ -1,312 +1,308 @@ /* SPDX-License-Identifier: 0BSD */ /* * Speed-optimized CRC32 using slicing-by-eight algorithm * * This uses only i386 instructions, but it is optimized for i686 and later * (including e.g. Pentium II/III/IV, Athlon XP, and Core 2). For i586 * (e.g. Pentium), slicing-by-four would be better, and even the C version * of slicing-by-eight built with gcc -march=i586 tends to be a little bit * better than this. Very few probably run this code on i586 or older x86 * so this shouldn't be a problem in practice. * * Authors: Igor Pavlov (original version) * Lasse Collin (AT&T syntax, PIC support, better portability) * * This code needs lzma_crc32_table, which can be created using the * following C code: uint32_t lzma_crc32_table[8][256]; void init_table(void) { // IEEE-802.3 static const uint32_t poly32 = UINT32_C(0xEDB88320); // Castagnoli // static const uint32_t poly32 = UINT32_C(0x82F63B78); // Koopman // static const uint32_t poly32 = UINT32_C(0xEB31D82E); for (size_t s = 0; s < 8; ++s) { for (size_t b = 0; b < 256; ++b) { uint32_t r = s == 0 ? b : lzma_crc32_table[s - 1][b]; for (size_t i = 0; i < 8; ++i) { if (r & 1) r = (r >> 1) ^ poly32; else r >>= 1; } lzma_crc32_table[s][b] = r; } } } * The prototype of the CRC32 function: * extern uint32_t lzma_crc32(const uint8_t *buf, size_t size, uint32_t crc); */ /* When Intel CET is enabled, include in assembly code to mark Intel CET support. */ #ifdef __CET__ # include #else # define _CET_ENDBR #endif /* * On some systems, the functions need to be prefixed. The prefix is * usually an underscore. */ #ifndef __USER_LABEL_PREFIX__ # define __USER_LABEL_PREFIX__ #endif #define MAKE_SYM_CAT(prefix, sym) prefix ## sym #define MAKE_SYM(prefix, sym) MAKE_SYM_CAT(prefix, sym) -#define LZMA_CRC32 MAKE_SYM(__USER_LABEL_PREFIX__, lzma_crc32) +#define LZMA_CRC32 MAKE_SYM(__USER_LABEL_PREFIX__, lzma_crc32_generic) #define LZMA_CRC32_TABLE MAKE_SYM(__USER_LABEL_PREFIX__, lzma_crc32_table) /* * Solaris assembler doesn't have .p2align, and Darwin uses .align * differently than GNU/Linux and Solaris. */ #if defined(__APPLE__) || defined(__MSDOS__) # define ALIGN(pow2, abs) .align pow2 #else # define ALIGN(pow2, abs) .align abs #endif .text .globl LZMA_CRC32 +#ifdef __ELF__ + .hidden LZMA_CRC32 +#endif #if !defined(__APPLE__) && !defined(_WIN32) && !defined(__CYGWIN__) \ && !defined(__MSDOS__) .type LZMA_CRC32, @function #endif ALIGN(4, 16) LZMA_CRC32: _CET_ENDBR /* * Register usage: * %eax crc * %esi buf * %edi size or buf + size * %ebx lzma_crc32_table * %ebp Table index * %ecx Temporary * %edx Temporary */ pushl %ebx pushl %esi pushl %edi pushl %ebp movl 0x14(%esp), %esi /* buf */ movl 0x18(%esp), %edi /* size */ movl 0x1C(%esp), %eax /* crc */ /* * Store the address of lzma_crc32_table to %ebx. This is needed to * get position-independent code (PIC). * * The PIC macro is defined by libtool, while __PIC__ is defined * by GCC but only on some systems. Testing for both makes it simpler * to test this code without libtool, and keeps the code working also * when built with libtool but using something else than GCC. * * I understood that libtool may define PIC on Windows even though * the code in Windows DLLs is not PIC in sense that it is in ELF * binaries, so we need a separate check to always use the non-PIC * code on Windows. */ #if (!defined(PIC) && !defined(__PIC__)) \ || (defined(_WIN32) || defined(__CYGWIN__)) /* Not PIC */ movl $ LZMA_CRC32_TABLE, %ebx #elif defined(__APPLE__) /* Mach-O */ call .L_get_pc .L_pic: leal .L_lzma_crc32_table$non_lazy_ptr-.L_pic(%ebx), %ebx movl (%ebx), %ebx #else /* ELF */ call .L_get_pc addl $_GLOBAL_OFFSET_TABLE_, %ebx movl LZMA_CRC32_TABLE@GOT(%ebx), %ebx #endif /* Complement the initial value. */ notl %eax ALIGN(4, 16) .L_align: /* * Check if there is enough input to use slicing-by-eight. * We need 16 bytes, because the loop pre-reads eight bytes. */ cmpl $16, %edi jb .L_rest /* Check if we have reached alignment of eight bytes. */ testl $7, %esi jz .L_slice /* Calculate CRC of the next input byte. */ movzbl (%esi), %ebp incl %esi movzbl %al, %ecx xorl %ecx, %ebp shrl $8, %eax xorl (%ebx, %ebp, 4), %eax decl %edi jmp .L_align ALIGN(2, 4) .L_slice: /* * If we get here, there's at least 16 bytes of aligned input * available. Make %edi multiple of eight bytes. Store the possible * remainder over the "size" variable in the argument stack. */ movl %edi, 0x18(%esp) andl $-8, %edi subl %edi, 0x18(%esp) /* * Let %edi be buf + size - 8 while running the main loop. This way * we can compare for equality to determine when exit the loop. */ addl %esi, %edi subl $8, %edi /* Read in the first eight aligned bytes. */ xorl (%esi), %eax movl 4(%esi), %ecx movzbl %cl, %ebp .L_loop: movl 0x0C00(%ebx, %ebp, 4), %edx movzbl %ch, %ebp xorl 0x0800(%ebx, %ebp, 4), %edx shrl $16, %ecx xorl 8(%esi), %edx movzbl %cl, %ebp xorl 0x0400(%ebx, %ebp, 4), %edx movzbl %ch, %ebp xorl (%ebx, %ebp, 4), %edx movzbl %al, %ebp /* * Read the next four bytes, for which the CRC is calculated * on the next iteration of the loop. */ movl 12(%esi), %ecx xorl 0x1C00(%ebx, %ebp, 4), %edx movzbl %ah, %ebp shrl $16, %eax xorl 0x1800(%ebx, %ebp, 4), %edx movzbl %ah, %ebp movzbl %al, %eax movl 0x1400(%ebx, %eax, 4), %eax addl $8, %esi xorl %edx, %eax xorl 0x1000(%ebx, %ebp, 4), %eax /* Check for end of aligned input. */ cmpl %edi, %esi movzbl %cl, %ebp jne .L_loop /* * Process the remaining eight bytes, which we have already * copied to %ecx and %edx. */ movl 0x0C00(%ebx, %ebp, 4), %edx movzbl %ch, %ebp xorl 0x0800(%ebx, %ebp, 4), %edx shrl $16, %ecx movzbl %cl, %ebp xorl 0x0400(%ebx, %ebp, 4), %edx movzbl %ch, %ebp xorl (%ebx, %ebp, 4), %edx movzbl %al, %ebp xorl 0x1C00(%ebx, %ebp, 4), %edx movzbl %ah, %ebp shrl $16, %eax xorl 0x1800(%ebx, %ebp, 4), %edx movzbl %ah, %ebp movzbl %al, %eax movl 0x1400(%ebx, %eax, 4), %eax addl $8, %esi xorl %edx, %eax xorl 0x1000(%ebx, %ebp, 4), %eax /* Copy the number of remaining bytes to %edi. */ movl 0x18(%esp), %edi .L_rest: /* Check for end of input. */ testl %edi, %edi jz .L_return /* Calculate CRC of the next input byte. */ movzbl (%esi), %ebp incl %esi movzbl %al, %ecx xorl %ecx, %ebp shrl $8, %eax xorl (%ebx, %ebp, 4), %eax decl %edi jmp .L_rest .L_return: /* Complement the final value. */ notl %eax popl %ebp popl %edi popl %esi popl %ebx ret #if defined(PIC) || defined(__PIC__) ALIGN(4, 16) .L_get_pc: movl (%esp), %ebx ret #endif #if defined(__APPLE__) && (defined(PIC) || defined(__PIC__)) /* Mach-O PIC */ .section __IMPORT,__pointers,non_lazy_symbol_pointers .L_lzma_crc32_table$non_lazy_ptr: .indirect_symbol LZMA_CRC32_TABLE .long 0 -#elif defined(_WIN32) || defined(__CYGWIN__) -# ifdef DLL_EXPORT - /* This is equivalent of __declspec(dllexport). */ - .section .drectve - .ascii " -export:lzma_crc32" -# endif - -#elif !defined(__MSDOS__) +#elif !defined(_WIN32) && !defined(__CYGWIN__) && !defined(__MSDOS__) /* ELF */ .size LZMA_CRC32, .-LZMA_CRC32 #endif /* * This is needed to support non-executable stack. It's ugly to * use __FreeBSD__ and __linux__ here, but I don't know a way to detect when * we are using GNU assembler. */ #if defined(__ELF__) && (defined(__FreeBSD__) || defined(__linux__)) .section .note.GNU-stack,"",@progbits #endif diff --git a/src/liblzma/check/crc64_fast.c b/src/liblzma/check/crc64_fast.c index 0ce83fe4ad36..8a6770a431e8 100644 --- a/src/liblzma/check/crc64_fast.c +++ b/src/liblzma/check/crc64_fast.c @@ -1,156 +1,169 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file crc64.c /// \brief CRC64 calculation // // Authors: Lasse Collin // Ilya Kurdyukov // /////////////////////////////////////////////////////////////////////////////// #include "check.h" #include "crc_common.h" #if defined(CRC_X86_CLMUL) -# define BUILDING_CRC64_CLMUL +# define BUILDING_CRC_CLMUL 64 # include "crc_x86_clmul.h" #endif #ifdef CRC64_GENERIC ///////////////////////////////// // Generic slice-by-four CRC64 // ///////////////////////////////// +#if defined(WORDS_BIGENDIAN) +# include "crc64_table_be.h" +#else +# include "crc64_table_le.h" +#endif + + +#ifdef HAVE_CRC_X86_ASM +extern uint64_t lzma_crc64_generic( + const uint8_t *buf, size_t size, uint64_t crc); +#else + #ifdef WORDS_BIGENDIAN # define A1(x) ((x) >> 56) #else # define A1 A #endif // See the comments in crc32_fast.c. They aren't duplicated here. static uint64_t -crc64_generic(const uint8_t *buf, size_t size, uint64_t crc) +lzma_crc64_generic(const uint8_t *buf, size_t size, uint64_t crc) { crc = ~crc; #ifdef WORDS_BIGENDIAN crc = byteswap64(crc); #endif if (size > 4) { while ((uintptr_t)(buf) & 3) { crc = lzma_crc64_table[0][*buf++ ^ A1(crc)] ^ S8(crc); --size; } const uint8_t *const limit = buf + (size & ~(size_t)(3)); size &= (size_t)(3); while (buf < limit) { #ifdef WORDS_BIGENDIAN const uint32_t tmp = (uint32_t)(crc >> 32) ^ aligned_read32ne(buf); #else const uint32_t tmp = (uint32_t)crc ^ aligned_read32ne(buf); #endif buf += 4; crc = lzma_crc64_table[3][A(tmp)] ^ lzma_crc64_table[2][B(tmp)] ^ S32(crc) ^ lzma_crc64_table[1][C(tmp)] ^ lzma_crc64_table[0][D(tmp)]; } } while (size-- != 0) crc = lzma_crc64_table[0][*buf++ ^ A1(crc)] ^ S8(crc); #ifdef WORDS_BIGENDIAN crc = byteswap64(crc); #endif return ~crc; } -#endif +#endif // HAVE_CRC_X86_ASM +#endif // CRC64_GENERIC #if defined(CRC64_GENERIC) && defined(CRC64_ARCH_OPTIMIZED) ////////////////////////// // Function dispatching // ////////////////////////// // If both the generic and arch-optimized implementations are usable, then // the function that is used is selected at runtime. See crc32_fast.c. typedef uint64_t (*crc64_func_type)( const uint8_t *buf, size_t size, uint64_t crc); static crc64_func_type crc64_resolve(void) { return is_arch_extension_supported() - ? &crc64_arch_optimized : &crc64_generic; + ? &crc64_arch_optimized : &lzma_crc64_generic; } #ifdef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR # define CRC64_SET_FUNC_ATTR __attribute__((__constructor__)) static crc64_func_type crc64_func; #else # define CRC64_SET_FUNC_ATTR static uint64_t crc64_dispatch(const uint8_t *buf, size_t size, uint64_t crc); static crc64_func_type crc64_func = &crc64_dispatch; #endif CRC64_SET_FUNC_ATTR static void crc64_set_func(void) { crc64_func = crc64_resolve(); return; } #ifndef HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR static uint64_t crc64_dispatch(const uint8_t *buf, size_t size, uint64_t crc) { crc64_set_func(); return crc64_func(buf, size, crc); } #endif #endif extern LZMA_API(uint64_t) lzma_crc64(const uint8_t *buf, size_t size, uint64_t crc) { -#if defined(CRC64_GENERIC) && defined(CRC64_ARCH_OPTIMIZED) - -#ifdef CRC_USE_GENERIC_FOR_SMALL_INPUTS - if (size <= 16) - return crc64_generic(buf, size, crc); +#if defined(_MSC_VER) && !defined(__INTEL_COMPILER) && !defined(__clang__) \ + && defined(_M_IX86) && defined(CRC64_ARCH_OPTIMIZED) + // VS2015-2022 might corrupt the ebx register on 32-bit x86 when + // the CLMUL code is enabled. This hack forces MSVC to store and + // restore ebx. This is only needed here, not in lzma_crc32(). + __asm mov ebx, ebx #endif + +#if defined(CRC64_GENERIC) && defined(CRC64_ARCH_OPTIMIZED) return crc64_func(buf, size, crc); #elif defined(CRC64_ARCH_OPTIMIZED) // If arch-optimized version is used unconditionally without runtime // CPU detection then omitting the generic version and its 8 KiB // lookup table makes the library smaller. - // - // FIXME: Lookup table isn't currently omitted on 32-bit x86, - // see crc64_table.c. return crc64_arch_optimized(buf, size, crc); #else - return crc64_generic(buf, size, crc); + return lzma_crc64_generic(buf, size, crc); #endif } diff --git a/src/liblzma/check/crc64_table.c b/src/liblzma/check/crc64_table.c deleted file mode 100644 index 78e427597ce6..000000000000 --- a/src/liblzma/check/crc64_table.c +++ /dev/null @@ -1,37 +0,0 @@ -// SPDX-License-Identifier: 0BSD - -/////////////////////////////////////////////////////////////////////////////// -// -/// \file crc64_table.c -/// \brief Precalculated CRC64 table with correct endianness -// -// Author: Lasse Collin -// -/////////////////////////////////////////////////////////////////////////////// - -#include "common.h" - - -// FIXME: Compared to crc_common.h this has to check for __x86_64__ too -// so that in 32-bit builds crc64_x86.S won't break due to a missing table. -#if defined(HAVE_USABLE_CLMUL) && ((defined(__x86_64__) && defined(__SSSE3__) \ - && defined(__SSE4_1__) && defined(__PCLMUL__)) \ - || (defined(__e2k__) && __iset__ >= 6)) -# define NO_CRC64_TABLE -#endif - - -#ifdef NO_CRC64_TABLE -// No table needed. Use a typedef to avoid an empty translation unit. -typedef void lzma_crc64_dummy; - -#else -// Having the declaration here silences clang -Wmissing-variable-declarations. -extern const uint64_t lzma_crc64_table[4][256]; - -# if defined(WORDS_BIGENDIAN) -# include "crc64_table_be.h" -# else -# include "crc64_table_le.h" -# endif -#endif diff --git a/src/liblzma/check/crc64_x86.S b/src/liblzma/check/crc64_x86.S index 47f608181ea8..df50018653b4 100644 --- a/src/liblzma/check/crc64_x86.S +++ b/src/liblzma/check/crc64_x86.S @@ -1,295 +1,291 @@ /* SPDX-License-Identifier: 0BSD */ /* * Speed-optimized CRC64 using slicing-by-four algorithm * * This uses only i386 instructions, but it is optimized for i686 and later * (including e.g. Pentium II/III/IV, Athlon XP, and Core 2). * * Authors: Igor Pavlov (original CRC32 assembly code) * Lasse Collin (CRC64 adaptation of the modified CRC32 code) * * This code needs lzma_crc64_table, which can be created using the * following C code: uint64_t lzma_crc64_table[4][256]; void init_table(void) { // ECMA-182 static const uint64_t poly64 = UINT64_C(0xC96C5795D7870F42); for (size_t s = 0; s < 4; ++s) { for (size_t b = 0; b < 256; ++b) { uint64_t r = s == 0 ? b : lzma_crc64_table[s - 1][b]; for (size_t i = 0; i < 8; ++i) { if (r & 1) r = (r >> 1) ^ poly64; else r >>= 1; } lzma_crc64_table[s][b] = r; } } } * The prototype of the CRC64 function: * extern uint64_t lzma_crc64(const uint8_t *buf, size_t size, uint64_t crc); */ /* When Intel CET is enabled, include in assembly code to mark Intel CET support. */ #ifdef __CET__ # include #else # define _CET_ENDBR #endif /* * On some systems, the functions need to be prefixed. The prefix is * usually an underscore. */ #ifndef __USER_LABEL_PREFIX__ # define __USER_LABEL_PREFIX__ #endif #define MAKE_SYM_CAT(prefix, sym) prefix ## sym #define MAKE_SYM(prefix, sym) MAKE_SYM_CAT(prefix, sym) -#define LZMA_CRC64 MAKE_SYM(__USER_LABEL_PREFIX__, lzma_crc64) +#define LZMA_CRC64 MAKE_SYM(__USER_LABEL_PREFIX__, lzma_crc64_generic) #define LZMA_CRC64_TABLE MAKE_SYM(__USER_LABEL_PREFIX__, lzma_crc64_table) /* * Solaris assembler doesn't have .p2align, and Darwin uses .align * differently than GNU/Linux and Solaris. */ #if defined(__APPLE__) || defined(__MSDOS__) # define ALIGN(pow2, abs) .align pow2 #else # define ALIGN(pow2, abs) .align abs #endif .text .globl LZMA_CRC64 +#ifdef __ELF__ + .hidden LZMA_CRC64 +#endif #if !defined(__APPLE__) && !defined(_WIN32) && !defined(__CYGWIN__) \ && !defined(__MSDOS__) .type LZMA_CRC64, @function #endif ALIGN(4, 16) LZMA_CRC64: _CET_ENDBR /* * Register usage: * %eax crc LSB * %edx crc MSB * %esi buf * %edi size or buf + size * %ebx lzma_crc64_table * %ebp Table index * %ecx Temporary */ pushl %ebx pushl %esi pushl %edi pushl %ebp movl 0x14(%esp), %esi /* buf */ movl 0x18(%esp), %edi /* size */ movl 0x1C(%esp), %eax /* crc LSB */ movl 0x20(%esp), %edx /* crc MSB */ /* * Store the address of lzma_crc64_table to %ebx. This is needed to * get position-independent code (PIC). * * The PIC macro is defined by libtool, while __PIC__ is defined * by GCC but only on some systems. Testing for both makes it simpler * to test this code without libtool, and keeps the code working also * when built with libtool but using something else than GCC. * * I understood that libtool may define PIC on Windows even though * the code in Windows DLLs is not PIC in sense that it is in ELF * binaries, so we need a separate check to always use the non-PIC * code on Windows. */ #if (!defined(PIC) && !defined(__PIC__)) \ || (defined(_WIN32) || defined(__CYGWIN__)) /* Not PIC */ movl $ LZMA_CRC64_TABLE, %ebx #elif defined(__APPLE__) /* Mach-O */ call .L_get_pc .L_pic: leal .L_lzma_crc64_table$non_lazy_ptr-.L_pic(%ebx), %ebx movl (%ebx), %ebx #else /* ELF */ call .L_get_pc addl $_GLOBAL_OFFSET_TABLE_, %ebx movl LZMA_CRC64_TABLE@GOT(%ebx), %ebx #endif /* Complement the initial value. */ notl %eax notl %edx .L_align: /* * Check if there is enough input to use slicing-by-four. * We need eight bytes, because the loop pre-reads four bytes. */ cmpl $8, %edi jb .L_rest /* Check if we have reached alignment of four bytes. */ testl $3, %esi jz .L_slice /* Calculate CRC of the next input byte. */ movzbl (%esi), %ebp incl %esi movzbl %al, %ecx xorl %ecx, %ebp shrdl $8, %edx, %eax xorl (%ebx, %ebp, 8), %eax shrl $8, %edx xorl 4(%ebx, %ebp, 8), %edx decl %edi jmp .L_align .L_slice: /* * If we get here, there's at least eight bytes of aligned input * available. Make %edi multiple of four bytes. Store the possible * remainder over the "size" variable in the argument stack. */ movl %edi, 0x18(%esp) andl $-4, %edi subl %edi, 0x18(%esp) /* * Let %edi be buf + size - 4 while running the main loop. This way * we can compare for equality to determine when exit the loop. */ addl %esi, %edi subl $4, %edi /* Read in the first four aligned bytes. */ movl (%esi), %ecx .L_loop: xorl %eax, %ecx movzbl %cl, %ebp movl 0x1800(%ebx, %ebp, 8), %eax xorl %edx, %eax movl 0x1804(%ebx, %ebp, 8), %edx movzbl %ch, %ebp xorl 0x1000(%ebx, %ebp, 8), %eax xorl 0x1004(%ebx, %ebp, 8), %edx shrl $16, %ecx movzbl %cl, %ebp xorl 0x0800(%ebx, %ebp, 8), %eax xorl 0x0804(%ebx, %ebp, 8), %edx movzbl %ch, %ebp addl $4, %esi xorl (%ebx, %ebp, 8), %eax xorl 4(%ebx, %ebp, 8), %edx /* Check for end of aligned input. */ cmpl %edi, %esi /* * Copy the next input byte to %ecx. It is slightly faster to * read it here than at the top of the loop. */ movl (%esi), %ecx jb .L_loop /* * Process the remaining four bytes, which we have already * copied to %ecx. */ xorl %eax, %ecx movzbl %cl, %ebp movl 0x1800(%ebx, %ebp, 8), %eax xorl %edx, %eax movl 0x1804(%ebx, %ebp, 8), %edx movzbl %ch, %ebp xorl 0x1000(%ebx, %ebp, 8), %eax xorl 0x1004(%ebx, %ebp, 8), %edx shrl $16, %ecx movzbl %cl, %ebp xorl 0x0800(%ebx, %ebp, 8), %eax xorl 0x0804(%ebx, %ebp, 8), %edx movzbl %ch, %ebp addl $4, %esi xorl (%ebx, %ebp, 8), %eax xorl 4(%ebx, %ebp, 8), %edx /* Copy the number of remaining bytes to %edi. */ movl 0x18(%esp), %edi .L_rest: /* Check for end of input. */ testl %edi, %edi jz .L_return /* Calculate CRC of the next input byte. */ movzbl (%esi), %ebp incl %esi movzbl %al, %ecx xorl %ecx, %ebp shrdl $8, %edx, %eax xorl (%ebx, %ebp, 8), %eax shrl $8, %edx xorl 4(%ebx, %ebp, 8), %edx decl %edi jmp .L_rest .L_return: /* Complement the final value. */ notl %eax notl %edx popl %ebp popl %edi popl %esi popl %ebx ret #if defined(PIC) || defined(__PIC__) ALIGN(4, 16) .L_get_pc: movl (%esp), %ebx ret #endif #if defined(__APPLE__) && (defined(PIC) || defined(__PIC__)) /* Mach-O PIC */ .section __IMPORT,__pointers,non_lazy_symbol_pointers .L_lzma_crc64_table$non_lazy_ptr: .indirect_symbol LZMA_CRC64_TABLE .long 0 -#elif defined(_WIN32) || defined(__CYGWIN__) -# ifdef DLL_EXPORT - /* This is equivalent of __declspec(dllexport). */ - .section .drectve - .ascii " -export:lzma_crc64" -# endif - -#elif !defined(__MSDOS__) +#elif !defined(_WIN32) && !defined(__CYGWIN__) && !defined(__MSDOS__) /* ELF */ .size LZMA_CRC64, .-LZMA_CRC64 #endif /* * This is needed to support non-executable stack. It's ugly to * use __FreeBSD__ and __linux__ here, but I don't know a way to detect when * we are using GNU assembler. */ #if defined(__ELF__) && (defined(__FreeBSD__) || defined(__linux__)) .section .note.GNU-stack,"",@progbits #endif diff --git a/src/liblzma/check/crc_clmul_consts_gen.c b/src/liblzma/check/crc_clmul_consts_gen.c new file mode 100644 index 000000000000..5fe14bd6f042 --- /dev/null +++ b/src/liblzma/check/crc_clmul_consts_gen.c @@ -0,0 +1,160 @@ +// SPDX-License-Identifier: 0BSD + +/////////////////////////////////////////////////////////////////////////////// +// +/// \file crc_clmul_consts_gen.c +/// \brief Generate constants for CLMUL CRC code +/// +/// Compiling: gcc -std=c99 -o crc_clmul_consts_gen crc_clmul_consts_gen.c +/// +/// This is for CRCs that use reversed bit order (bit reflection). +/// The same CLMUL CRC code can be used with CRC64 and smaller ones like +/// CRC32 apart from one special case: CRC64 needs an extra step in the +/// Barrett reduction to handle the 65th bit; the smaller ones don't. +/// Otherwise it's enough to just change the polynomial and the derived +/// constants and use the same code. +/// +/// See the Intel white paper "Fast CRC Computation for Generic Polynomials +/// Using PCLMULQDQ Instruction" from 2009. +// +// Author: Lasse Collin +// +/////////////////////////////////////////////////////////////////////////////// + +#include +#include + + +/// CRC32 (Ethernet) polynomial in reversed representation +static const uint64_t p32 = 0xedb88320; + +// CRC64 (ECMA-182) polynomial in reversed representation +static const uint64_t p64 = 0xc96c5795d7870f42; + + +/// Calculates floor(x^128 / p) where p is a CRC64 polynomial in +/// reversed representation. The result is in reversed representation too. +static uint64_t +calc_cldiv(uint64_t p) +{ + // Quotient + uint64_t q = 0; + + // Align the x^64 term with the x^128 (the implied high bits of the + // divisor and the dividend) and do the first step of polynomial long + // division, calculating the first remainder. The variable q remains + // zero because the highest bit of the quotient is an implied bit 1 + // (we kind of set q = 1 << -1). + uint64_t r = p; + + // Then process the remaining 64 terms. Note that r has no implied + // high bit, only q and p do. (And remember that a high bit in the + // polynomial is stored at a low bit in the variable due to the + // reversed bit order.) + for (unsigned i = 0; i < 64; ++i) { + q |= (r & 1) << i; + r = (r >> 1) ^ (r & 1 ? p : 0); + } + + return q; +} + + +/// Calculate the remainder of carryless division: +/// +/// x^(bits + n - 1) % p, where n=64 (for CRC64) +/// +/// p must be in reversed representation which omits the bit of +/// the highest term of the polynomial. Instead, it is an implied bit +/// at kind of like "1 << -1" position, as if it had just been shifted out. +/// +/// The return value is in the reversed bit order. (There are no implied bits.) +static uint64_t +calc_clrem(uint64_t p, unsigned bits) +{ + // Do the first step of polynomial long division. + uint64_t r = p; + + // Then process the remaining terms. Start with i = 1 instead of i = 0 + // to account for the -1 in x^(bits + n - 1). This -1 is convenient + // with the reversed bit order. See the "Bit-Reflection" section in + // the Intel white paper. + for (unsigned i = 1; i < bits; ++i) + r = (r >> 1) ^ (r & 1 ? p : 0); + + return r; +} + + +extern int +main(void) +{ + puts("// CRC64"); + + // The order of the two 64-bit constants in a vector don't matter. + // It feels logical to put them in this order as it matches the + // order in which the input bytes are read. + printf("const __m128i fold512 = _mm_set_epi64x(" + "0x%016" PRIx64 ", 0x%016" PRIx64 ");\n", + calc_clrem(p64, 4 * 128 - 64), + calc_clrem(p64, 4 * 128)); + + printf("const __m128i fold128 = _mm_set_epi64x(" + "0x%016" PRIx64 ", 0x%016" PRIx64 ");\n", + calc_clrem(p64, 128 - 64), + calc_clrem(p64, 128)); + + // When we multiply by mu, we care about the high bits of the result + // (in reversed bit order!). It doesn't matter that the low bit gets + // shifted out because the affected output bits will be ignored. + // Below we add the implied high bit with "| 1" after the shifting + // so that the high bits of the multiplication will be correct. + // + // p64 is shifted left by one so that the final multiplication + // in Barrett reduction won't be misaligned by one bit. We could + // use "(p64 << 1) | 1" instead of "p64 << 1" too but it makes + // no difference as that bit won't affect the relevant output bits + // (we only care about the lowest 64 bits of the result, that is, + // lowest in the reversed bit order). + // + // NOTE: The 65rd bit of p64 gets shifted out. It needs to be + // compensated with 64-bit shift and xor in the CRC64 code. + printf("const __m128i mu_p = _mm_set_epi64x(" + "0x%016" PRIx64 ", 0x%016" PRIx64 ");\n", + (calc_cldiv(p64) << 1) | 1, + p64 << 1); + + puts(""); + + puts("// CRC32"); + + printf("const __m128i fold512 = _mm_set_epi64x(" + "0x%08" PRIx64 ", 0x%08" PRIx64 ");\n", + calc_clrem(p32, 4 * 128 - 64), + calc_clrem(p32, 4 * 128)); + + printf("const __m128i fold128 = _mm_set_epi64x(" + "0x%08" PRIx64 ", 0x%08" PRIx64 ");\n", + calc_clrem(p32, 128 - 64), + calc_clrem(p32, 128)); + + // CRC32 calculation is done by modulus scaling it to a CRC64. + // Since the CRC is in reversed representation, only the mu + // constant changes with the modulus scaling. This method avoids + // one additional constant and one additional clmul in the final + // reduction steps, making the code both simpler and faster. + // + // p32 is shifted left by one so that the final multiplication + // in Barrett reduction won't be misaligned by one bit. We could + // use "(p32 << 1) | 1" instead of "p32 << 1" too but it makes + // no difference as that bit won't affect the relevant output bits. + // + // NOTE: The 33-bit value fits in 64 bits so, unlike with CRC64, + // there is no need to compensate for any missing bits in the code. + printf("const __m128i mu_p = _mm_set_epi64x(" + "0x%016" PRIx64 ", 0x%" PRIx64 ");\n", + (calc_cldiv(p32) << 1) | 1, + p32 << 1); + + return 0; +} diff --git a/src/liblzma/check/crc_common.h b/src/liblzma/check/crc_common.h index c15d4c675c8f..7ea1e60b043b 100644 --- a/src/liblzma/check/crc_common.h +++ b/src/liblzma/check/crc_common.h @@ -1,137 +1,170 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file crc_common.h -/// \brief Some functions and macros for CRC32 and CRC64 +/// \brief Macros and declarations for CRC32 and CRC64 // // Authors: Lasse Collin // Ilya Kurdyukov -// Hans Jansen // Jia Tan // /////////////////////////////////////////////////////////////////////////////// #ifndef LZMA_CRC_COMMON_H #define LZMA_CRC_COMMON_H #include "common.h" +///////////// +// Generic // +///////////// + #ifdef WORDS_BIGENDIAN # define A(x) ((x) >> 24) # define B(x) (((x) >> 16) & 0xFF) # define C(x) (((x) >> 8) & 0xFF) # define D(x) ((x) & 0xFF) # define S8(x) ((x) << 8) # define S32(x) ((x) << 32) #else # define A(x) ((x) & 0xFF) # define B(x) (((x) >> 8) & 0xFF) # define C(x) (((x) >> 16) & 0xFF) # define D(x) ((x) >> 24) # define S8(x) ((x) >> 8) # define S32(x) ((x) >> 32) #endif -// CRC CLMUL code needs this because accessing input buffers that aren't -// aligned to the vector size will inherently trip the address sanitizer. -#if lzma_has_attribute(__no_sanitize_address__) -# define crc_attr_no_sanitize_address \ - __attribute__((__no_sanitize_address__)) +/// lzma_crc32_table[0] is needed by LZ encoder so we need to keep +/// the array two-dimensional. +#ifdef HAVE_SMALL +lzma_attr_visibility_hidden +extern uint32_t lzma_crc32_table[1][256]; + +extern void lzma_crc32_init(void); + #else -# define crc_attr_no_sanitize_address -#endif -// Keep this in sync with changes to crc32_arm64.h -#if defined(_WIN32) || defined(HAVE_GETAUXVAL) \ - || defined(HAVE_ELF_AUX_INFO) \ - || (defined(__APPLE__) && defined(HAVE_SYSCTLBYNAME)) -# define ARM64_RUNTIME_DETECTION 1 +lzma_attr_visibility_hidden +extern const uint32_t lzma_crc32_table[8][256]; + +lzma_attr_visibility_hidden +extern const uint64_t lzma_crc64_table[4][256]; #endif +/////////////////// +// Configuration // +/////////////////// + +// NOTE: This config isn't used if HAVE_SMALL is defined! + +// These are defined if the generic slicing-by-n implementations and their +// lookup tables are built. #undef CRC32_GENERIC #undef CRC64_GENERIC +// These are defined if an arch-specific version is built. If both this +// and matching _GENERIC is defined then runtime detection must be used. #undef CRC32_ARCH_OPTIMIZED #undef CRC64_ARCH_OPTIMIZED // The x86 CLMUL is used for both CRC32 and CRC64. #undef CRC_X86_CLMUL +// Many ARM64 processor have CRC32 instructions. +// CRC64 could be done with CLMUL but it's not implemented yet. #undef CRC32_ARM64 -#undef CRC64_ARM64_CLMUL -#undef CRC_USE_GENERIC_FOR_SMALL_INPUTS +// 64-bit LoongArch has CRC32 instructions. +#undef CRC32_LOONGARCH + + +// ARM64 +// +// Keep this in sync with changes to crc32_arm64.h +#if defined(_WIN32) || defined(HAVE_GETAUXVAL) \ + || defined(HAVE_ELF_AUX_INFO) \ + || (defined(__APPLE__) && defined(HAVE_SYSCTLBYNAME)) +# define CRC_ARM64_RUNTIME_DETECTION 1 +#endif // ARM64 CRC32 instruction is only useful for CRC32. Currently, only // little endian is supported since we were unable to test on a big // endian machine. -// -// NOTE: Keep this and the next check in sync with the macro -// NO_CRC32_TABLE in crc32_table.c #if defined(HAVE_ARM64_CRC32) && !defined(WORDS_BIGENDIAN) // Allow ARM64 CRC32 instruction without a runtime check if // __ARM_FEATURE_CRC32 is defined. GCC and Clang only define // this if the proper compiler options are used. # if defined(__ARM_FEATURE_CRC32) # define CRC32_ARCH_OPTIMIZED 1 # define CRC32_ARM64 1 -# elif defined(ARM64_RUNTIME_DETECTION) +# elif defined(CRC_ARM64_RUNTIME_DETECTION) # define CRC32_ARCH_OPTIMIZED 1 # define CRC32_ARM64 1 # define CRC32_GENERIC 1 # endif #endif -#if defined(HAVE_USABLE_CLMUL) -// If CLMUL is allowed unconditionally in the compiler options then the -// generic version can be omitted. Note that this doesn't work with MSVC -// as I don't know how to detect the features here. + +// LoongArch // -// NOTE: Keep this in sync with the NO_CRC32_TABLE macro in crc32_table.c -// and NO_CRC64_TABLE in crc64_table.c. -# if (defined(__SSSE3__) && defined(__SSE4_1__) && defined(__PCLMUL__)) \ +// Only 64-bit LoongArch is supported for now. No runtime detection +// is needed because the LoongArch specification says that the CRC32 +// instructions are a part of the Basic Integer Instructions and +// they shall be implemented by 64-bit LoongArch implementations. +#ifdef HAVE_LOONGARCH_CRC32 +# define CRC32_ARCH_OPTIMIZED 1 +# define CRC32_LOONGARCH 1 +#endif + + +// x86 and E2K +#if defined(HAVE_USABLE_CLMUL) + // If CLMUL is allowed unconditionally in the compiler options then + // the generic version and the tables can be omitted. Exceptions: + // + // - If 32-bit x86 assembly files are enabled then those are always + // built and runtime detection is used even if compiler flags + // were set to allow CLMUL unconditionally. + // + // - This doesn't work with MSVC as I don't know how to detect + // the features here. + // +# if (defined(__SSSE3__) && defined(__SSE4_1__) && defined(__PCLMUL__) \ + && !defined(HAVE_CRC_X86_ASM)) \ || (defined(__e2k__) && __iset__ >= 6) # define CRC32_ARCH_OPTIMIZED 1 # define CRC64_ARCH_OPTIMIZED 1 # define CRC_X86_CLMUL 1 # else # define CRC32_GENERIC 1 # define CRC64_GENERIC 1 # define CRC32_ARCH_OPTIMIZED 1 # define CRC64_ARCH_OPTIMIZED 1 # define CRC_X86_CLMUL 1 - -/* - // The generic code is much faster with 1-8-byte inputs and - // has similar performance up to 16 bytes at least in - // microbenchmarks (it depends on input buffer alignment - // too). If both versions are built, this #define will use - // the generic version for inputs up to 16 bytes and CLMUL - // for bigger inputs. It saves a little in code size since - // the special cases for 0-16-byte inputs will be omitted - // from the CLMUL code. -# define CRC_USE_GENERIC_FOR_SMALL_INPUTS 1 -*/ # endif #endif + +// Fallback configuration +// // For CRC32 use the generic slice-by-eight implementation if no optimized // version is available. #if !defined(CRC32_ARCH_OPTIMIZED) && !defined(CRC32_GENERIC) # define CRC32_GENERIC 1 #endif // For CRC64 use the generic slice-by-four implementation if no optimized // version is available. #if !defined(CRC64_ARCH_OPTIMIZED) && !defined(CRC64_GENERIC) # define CRC64_GENERIC 1 #endif #endif diff --git a/src/liblzma/check/crc_x86_clmul.h b/src/liblzma/check/crc_x86_clmul.h index 50306e49a72a..b302d6cf7f51 100644 --- a/src/liblzma/check/crc_x86_clmul.h +++ b/src/liblzma/check/crc_x86_clmul.h @@ -1,432 +1,377 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file crc_x86_clmul.h /// \brief CRC32 and CRC64 implementations using CLMUL instructions. /// /// The CRC32 and CRC64 implementations use 32/64-bit x86 SSSE3, SSE4.1, and /// CLMUL instructions. This is compatible with Elbrus 2000 (E2K) too. /// -/// They were derived from +/// See the Intel white paper "Fast CRC Computation for Generic Polynomials +/// Using PCLMULQDQ Instruction" from 2009. The original file seems to be +/// gone from Intel's website but a version is available here: /// https://www.researchgate.net/publication/263424619_Fast_CRC_computation -/// and the public domain code from https://github.com/rawrunprotected/crc -/// (URLs were checked on 2023-10-14). +/// (The link was checked on 2024-06-11.) /// /// While this file has both CRC32 and CRC64 implementations, only one -/// should be built at a time to ensure that crc_simd_body() is inlined -/// even with compilers with which lzma_always_inline expands to plain inline. -/// The version to build is selected by defining BUILDING_CRC32_CLMUL or -/// BUILDING_CRC64_CLMUL before including this file. +/// can be built at a time. The version to build is selected by defining +/// BUILDING_CRC_CLMUL to 32 or 64 before including this file. /// -/// FIXME: Builds for 32-bit x86 use the assembly .S files by default -/// unless configured with --disable-assembler. Even then the lookup table -/// isn't omitted in crc64_table.c since it doesn't know that assembly -/// code has been disabled. +/// NOTE: The x86 CLMUL CRC implementation was rewritten for XZ Utils 5.8.0. // -// Authors: Ilya Kurdyukov -// Hans Jansen -// Lasse Collin -// Jia Tan +// Authors: Lasse Collin +// Ilya Kurdyukov // /////////////////////////////////////////////////////////////////////////////// // This file must not be included more than once. #ifdef LZMA_CRC_X86_CLMUL_H # error crc_x86_clmul.h was included twice. #endif #define LZMA_CRC_X86_CLMUL_H +#if BUILDING_CRC_CLMUL != 32 && BUILDING_CRC_CLMUL != 64 +# error BUILDING_CRC_CLMUL is undefined or has an invalid value +#endif + #include #if defined(_MSC_VER) # include #elif defined(HAVE_CPUID_H) # include #endif // EDG-based compilers (Intel's classic compiler and compiler for E2K) can // define __GNUC__ but the attribute must not be used with them. // The new Clang-based ICX needs the attribute. // // NOTE: Build systems check for this too, keep them in sync with this. #if (defined(__GNUC__) || defined(__clang__)) && !defined(__EDG__) # define crc_attr_target \ __attribute__((__target__("ssse3,sse4.1,pclmul"))) #else # define crc_attr_target #endif -#define MASK_L(in, mask, r) r = _mm_shuffle_epi8(in, mask) +// GCC and Clang would produce good code with _mm_set_epi64x +// but MSVC needs _mm_cvtsi64_si128 on x86-64. +#if defined(__i386__) || defined(_M_IX86) +# define my_set_low64(a) _mm_set_epi64x(0, (a)) +#else +# define my_set_low64(a) _mm_cvtsi64_si128(a) +#endif -#define MASK_H(in, mask, r) \ - r = _mm_shuffle_epi8(in, _mm_xor_si128(mask, vsign)) -#define MASK_LH(in, mask, low, high) \ - MASK_L(in, mask, low); \ - MASK_H(in, mask, high) +// Align it so that the whole array is within the same cache line. +// More than one unaligned load can be done from this during the +// same CRC function call. +// +// The bytes [0] to [31] are used with AND to clear the low bytes. (With ANDN +// those could be used to clear the high bytes too but it's not needed here.) +// +// The bytes [16] to [47] are for left shifts. +// The bytes [32] to [63] are for right shifts. +alignas(64) +static uint8_t vmasks[64] = { + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, + 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, + 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F, + 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, + 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, +}; + + +// *Unaligned* 128-bit load +crc_attr_target +static inline __m128i +my_load128(const uint8_t *p) +{ + return _mm_loadu_si128((const __m128i *)p); +} +// Keep the highest "count" bytes as is and clear the remaining low bytes. crc_attr_target -crc_attr_no_sanitize_address -static lzma_always_inline void -crc_simd_body(const uint8_t *buf, const size_t size, __m128i *v0, __m128i *v1, - const __m128i vfold16, const __m128i initial_crc) +static inline __m128i +keep_high_bytes(__m128i v, size_t count) { - // Create a vector with 8-bit values 0 to 15. This is used to - // construct control masks for _mm_blendv_epi8 and _mm_shuffle_epi8. - const __m128i vramp = _mm_setr_epi32( - 0x03020100, 0x07060504, 0x0b0a0908, 0x0f0e0d0c); - - // This is used to inverse the control mask of _mm_shuffle_epi8 - // so that bytes that wouldn't be picked with the original mask - // will be picked and vice versa. - const __m128i vsign = _mm_set1_epi8(-0x80); + return _mm_and_si128(my_load128((vmasks + count)), v); +} - // Memory addresses A to D and the distances between them: - // - // A B C D - // [skip_start][size][skip_end] - // [ size2 ] - // - // A and D are 16-byte aligned. B and C are 1-byte aligned. - // skip_start and skip_end are 0-15 bytes. size is at least 1 byte. - // - // A = aligned_buf will initially point to this address. - // B = The address pointed by the caller-supplied buf. - // C = buf + size == aligned_buf + size2 - // D = buf + size + skip_end == aligned_buf + size2 + skip_end - const size_t skip_start = (size_t)((uintptr_t)buf & 15); - const size_t skip_end = (size_t)((0U - (uintptr_t)(buf + size)) & 15); - const __m128i *aligned_buf = (const __m128i *)( - (uintptr_t)buf & ~(uintptr_t)15); - - // If size2 <= 16 then the whole input fits into a single 16-byte - // vector. If size2 > 16 then at least two 16-byte vectors must - // be processed. If size2 > 16 && size <= 16 then there is only - // one 16-byte vector's worth of input but it is unaligned in memory. - // - // NOTE: There is no integer overflow here if the arguments - // are valid. If this overflowed, buf + size would too. - const size_t size2 = skip_start + size; - - // Masks to be used with _mm_blendv_epi8 and _mm_shuffle_epi8: - // The first skip_start or skip_end bytes in the vectors will have - // the high bit (0x80) set. _mm_blendv_epi8 and _mm_shuffle_epi8 - // will produce zeros for these positions. (Bitwise-xor of these - // masks with vsign will produce the opposite behavior.) - const __m128i mask_start - = _mm_sub_epi8(vramp, _mm_set1_epi8((char)skip_start)); - const __m128i mask_end - = _mm_sub_epi8(vramp, _mm_set1_epi8((char)skip_end)); - - // Get the first 1-16 bytes into data0. If loading less than 16 - // bytes, the bytes are loaded to the high bits of the vector and - // the least significant positions are filled with zeros. - const __m128i data0 = _mm_blendv_epi8(_mm_load_si128(aligned_buf), - _mm_setzero_si128(), mask_start); - aligned_buf++; - - __m128i v2, v3; - -#ifndef CRC_USE_GENERIC_FOR_SMALL_INPUTS - if (size <= 16) { - // Right-shift initial_crc by 1-16 bytes based on "size" - // and store the result in v1 (high bytes) and v0 (low bytes). - // - // NOTE: The highest 8 bytes of initial_crc are zeros so - // v1 will be filled with zeros if size >= 8. The highest - // 8 bytes of v1 will always become zeros. - // - // [ v1 ][ v0 ] - // [ initial_crc ] size == 1 - // [ initial_crc ] size == 2 - // [ initial_crc ] size == 15 - // [ initial_crc ] size == 16 (all in v0) - const __m128i mask_low = _mm_add_epi8( - vramp, _mm_set1_epi8((char)(size - 16))); - MASK_LH(initial_crc, mask_low, *v0, *v1); - - if (size2 <= 16) { - // There are 1-16 bytes of input and it is all - // in data0. Copy the input bytes to v3. If there - // are fewer than 16 bytes, the low bytes in v3 - // will be filled with zeros. That is, the input - // bytes are stored to the same position as - // (part of) initial_crc is in v0. - MASK_L(data0, mask_end, v3); - } else { - // There are 2-16 bytes of input but not all bytes - // are in data0. - const __m128i data1 = _mm_load_si128(aligned_buf); - - // Collect the 2-16 input bytes from data0 and data1 - // to v2 and v3, and bitwise-xor them with the - // low bits of initial_crc in v0. Note that the - // the second xor is below this else-block as it - // is shared with the other branch. - MASK_H(data0, mask_end, v2); - MASK_L(data1, mask_end, v3); - *v0 = _mm_xor_si128(*v0, v2); - } - *v0 = _mm_xor_si128(*v0, v3); - *v1 = _mm_alignr_epi8(*v1, *v0, 8); - } else -#endif - { - // There is more than 16 bytes of input. - const __m128i data1 = _mm_load_si128(aligned_buf); - const __m128i *end = (const __m128i*)( - (const char *)aligned_buf - 16 + size2); - aligned_buf++; - - MASK_LH(initial_crc, mask_start, *v0, *v1); - *v0 = _mm_xor_si128(*v0, data0); - *v1 = _mm_xor_si128(*v1, data1); - - while (aligned_buf < end) { - *v1 = _mm_xor_si128(*v1, _mm_clmulepi64_si128( - *v0, vfold16, 0x00)); - *v0 = _mm_xor_si128(*v1, _mm_clmulepi64_si128( - *v0, vfold16, 0x11)); - *v1 = _mm_load_si128(aligned_buf++); - } +// Shift the 128-bit value left by "amount" bytes (not bits). +crc_attr_target +static inline __m128i +shift_left(__m128i v, size_t amount) +{ + return _mm_shuffle_epi8(v, my_load128((vmasks + 32 - amount))); +} - if (aligned_buf != end) { - MASK_H(*v0, mask_end, v2); - MASK_L(*v0, mask_end, *v0); - MASK_L(*v1, mask_end, v3); - *v1 = _mm_or_si128(v2, v3); - } - *v1 = _mm_xor_si128(*v1, _mm_clmulepi64_si128( - *v0, vfold16, 0x00)); - *v0 = _mm_xor_si128(*v1, _mm_clmulepi64_si128( - *v0, vfold16, 0x11)); - *v1 = _mm_srli_si128(*v0, 8); - } +// Shift the 128-bit value right by "amount" bytes (not bits). +crc_attr_target +static inline __m128i +shift_right(__m128i v, size_t amount) +{ + return _mm_shuffle_epi8(v, my_load128((vmasks + 32 + amount))); } -///////////////////// -// x86 CLMUL CRC32 // -///////////////////// - -/* -// These functions were used to generate the constants -// at the top of crc32_arch_optimized(). -static uint64_t -calc_lo(uint64_t p, uint64_t a, int n) +crc_attr_target +static inline __m128i +fold(__m128i v, __m128i k) { - uint64_t b = 0; int i; - for (i = 0; i < n; i++) { - b = b >> 1 | (a & 1) << (n - 1); - a = (a >> 1) ^ ((0 - (a & 1)) & p); - } - return b; + __m128i a = _mm_clmulepi64_si128(v, k, 0x00); + __m128i b = _mm_clmulepi64_si128(v, k, 0x11); + return _mm_xor_si128(a, b); } -// same as ~crc(&a, sizeof(a), ~0) -static uint64_t -calc_hi(uint64_t p, uint64_t a, int n) + +crc_attr_target +static inline __m128i +fold_xor(__m128i v, __m128i k, const uint8_t *buf) { - int i; - for (i = 0; i < n; i++) - a = (a >> 1) ^ ((0 - (a & 1)) & p); - return a; + return _mm_xor_si128(my_load128(buf), fold(v, k)); } -*/ -#ifdef BUILDING_CRC32_CLMUL +#if BUILDING_CRC_CLMUL == 32 crc_attr_target -crc_attr_no_sanitize_address static uint32_t crc32_arch_optimized(const uint8_t *buf, size_t size, uint32_t crc) +#else +crc_attr_target +static uint64_t +crc64_arch_optimized(const uint8_t *buf, size_t size, uint64_t crc) +#endif { -#ifndef CRC_USE_GENERIC_FOR_SMALL_INPUTS - // The code assumes that there is at least one byte of input. + // We will assume that there is at least one byte of input. if (size == 0) return crc; -#endif - // uint32_t poly = 0xedb88320; - const int64_t p = 0x1db710640; // p << 1 - const int64_t mu = 0x1f7011641; // calc_lo(p, p, 32) << 1 | 1 - const int64_t k5 = 0x163cd6124; // calc_hi(p, p, 32) << 1 - const int64_t k4 = 0x0ccaa009e; // calc_hi(p, p, 64) << 1 - const int64_t k3 = 0x1751997d0; // calc_hi(p, p, 128) << 1 - - const __m128i vfold4 = _mm_set_epi64x(mu, p); - const __m128i vfold8 = _mm_set_epi64x(0, k5); - const __m128i vfold16 = _mm_set_epi64x(k4, k3); - - __m128i v0, v1, v2; - - crc_simd_body(buf, size, &v0, &v1, vfold16, - _mm_cvtsi32_si128((int32_t)~crc)); - - v1 = _mm_xor_si128( - _mm_clmulepi64_si128(v0, vfold16, 0x10), v1); // xxx0 - v2 = _mm_shuffle_epi32(v1, 0xe7); // 0xx0 - v0 = _mm_slli_epi64(v1, 32); // [0] - v0 = _mm_clmulepi64_si128(v0, vfold8, 0x00); - v0 = _mm_xor_si128(v0, v2); // [1] [2] - v2 = _mm_clmulepi64_si128(v0, vfold4, 0x10); - v2 = _mm_clmulepi64_si128(v2, vfold4, 0x00); - v0 = _mm_xor_si128(v0, v2); // [2] - return ~(uint32_t)_mm_extract_epi32(v0, 2); -} -#endif // BUILDING_CRC32_CLMUL + // See crc_clmul_consts_gen.c. +#if BUILDING_CRC_CLMUL == 32 + const __m128i fold512 = _mm_set_epi64x(0x1d9513d7, 0x8f352d95); + const __m128i fold128 = _mm_set_epi64x(0xccaa009e, 0xae689191); + const __m128i mu_p = _mm_set_epi64x( + (int64_t)0xb4e5b025f7011641, 0x1db710640); +#else + const __m128i fold512 = _mm_set_epi64x( + (int64_t)0x081f6054a7842df4, (int64_t)0x6ae3efbb9dd441f3); + const __m128i fold128 = _mm_set_epi64x( + (int64_t)0xdabe95afc7875f40, (int64_t)0xe05dd497ca393ae4); -///////////////////// -// x86 CLMUL CRC64 // -///////////////////// + const __m128i mu_p = _mm_set_epi64x( + (int64_t)0x9c3e466c172963d5, (int64_t)0x92d8af2baf0e1e84); +#endif -/* -// These functions were used to generate the constants -// at the top of crc64_arch_optimized(). -static uint64_t -calc_lo(uint64_t poly) -{ - uint64_t a = poly; - uint64_t b = 0; + __m128i v0, v1, v2, v3; - for (unsigned i = 0; i < 64; ++i) { - b = (b >> 1) | (a << 63); - a = (a >> 1) ^ (a & 1 ? poly : 0); - } + crc = ~crc; - return b; -} + if (size < 8) { + uint64_t x = crc; + size_t i = 0; -static uint64_t -calc_hi(uint64_t poly, uint64_t a) -{ - for (unsigned i = 0; i < 64; ++i) - a = (a >> 1) ^ (a & 1 ? poly : 0); + // Checking the bit instead of comparing the size means + // that we don't need to update the size between the steps. + if (size & 4) { + x ^= read32le(buf); + buf += 4; + i = 32; + } - return a; -} -*/ + if (size & 2) { + x ^= (uint64_t)read16le(buf) << i; + buf += 2; + i += 16; + } -#ifdef BUILDING_CRC64_CLMUL + if (size & 1) + x ^= (uint64_t)*buf << i; -// MSVC (VS2015 - VS2022) produces bad 32-bit x86 code from the CLMUL CRC -// code when optimizations are enabled (release build). According to the bug -// report, the ebx register is corrupted and the calculated result is wrong. -// Trying to workaround the problem with "__asm mov ebx, ebx" didn't help. -// The following pragma works and performance is still good. x86-64 builds -// and CRC32 CLMUL aren't affected by this problem. The problem does not -// happen in crc_simd_body() either (which is shared with CRC32 CLMUL anyway). -// -// NOTE: Another pragma after crc64_arch_optimized() restores -// the optimizations. If the #if condition here is updated, -// the other one must be updated too. -#if defined(_MSC_VER) && !defined(__INTEL_COMPILER) && !defined(__clang__) \ - && defined(_M_IX86) -# pragma optimize("g", off) -#endif + v0 = my_set_low64((int64_t)x); + v0 = shift_left(v0, 8 - size); -crc_attr_target -crc_attr_no_sanitize_address -static uint64_t -crc64_arch_optimized(const uint8_t *buf, size_t size, uint64_t crc) -{ -#ifndef CRC_USE_GENERIC_FOR_SMALL_INPUTS - // The code assumes that there is at least one byte of input. - if (size == 0) - return crc; -#endif - - // const uint64_t poly = 0xc96c5795d7870f42; // CRC polynomial - const uint64_t p = 0x92d8af2baf0e1e85; // (poly << 1) | 1 - const uint64_t mu = 0x9c3e466c172963d5; // (calc_lo(poly) << 1) | 1 - const uint64_t k2 = 0xdabe95afc7875f40; // calc_hi(poly, 1) - const uint64_t k1 = 0xe05dd497ca393ae4; // calc_hi(poly, k2) + } else if (size < 16) { + v0 = my_set_low64((int64_t)(crc ^ read64le(buf))); - const __m128i vfold8 = _mm_set_epi64x((int64_t)p, (int64_t)mu); - const __m128i vfold16 = _mm_set_epi64x((int64_t)k2, (int64_t)k1); + // NOTE: buf is intentionally left 8 bytes behind so that + // we can read the last 1-7 bytes with read64le(buf + size). + size -= 8; - __m128i v0, v1, v2; + // Handling 8-byte input specially is a speed optimization + // as the clmul can be skipped. A branch is also needed to + // avoid a too high shift amount. + if (size > 0) { + const size_t padding = 8 - size; + uint64_t high = read64le(buf + size) >> (padding * 8); #if defined(__i386__) || defined(_M_IX86) - crc_simd_body(buf, size, &v0, &v1, vfold16, - _mm_set_epi64x(0, (int64_t)~crc)); + // Simple but likely not the best code for 32-bit x86. + v0 = _mm_insert_epi32(v0, (int32_t)high, 2); + v0 = _mm_insert_epi32(v0, (int32_t)(high >> 32), 3); #else - // GCC and Clang would produce good code with _mm_set_epi64x - // but MSVC needs _mm_cvtsi64_si128 on x86-64. - crc_simd_body(buf, size, &v0, &v1, vfold16, - _mm_cvtsi64_si128((int64_t)~crc)); + v0 = _mm_insert_epi64(v0, (int64_t)high, 1); #endif - v1 = _mm_xor_si128(_mm_clmulepi64_si128(v0, vfold16, 0x10), v1); - v0 = _mm_clmulepi64_si128(v1, vfold8, 0x00); - v2 = _mm_clmulepi64_si128(v0, vfold8, 0x10); - v0 = _mm_xor_si128(_mm_xor_si128(v1, _mm_slli_si128(v0, 8)), v2); + v0 = shift_left(v0, padding); + + v1 = _mm_srli_si128(v0, 8); + v0 = _mm_clmulepi64_si128(v0, fold128, 0x10); + v0 = _mm_xor_si128(v0, v1); + } + } else { + v0 = my_set_low64((int64_t)crc); + + // To align or not to align the buf pointer? If the end of + // the buffer isn't aligned, aligning the pointer here would + // make us do an extra folding step with the associated byte + // shuffling overhead. The cost of that would need to be + // lower than the benefit of aligned reads. Testing on an old + // Intel Ivy Bridge processor suggested that aligning isn't + // worth the cost but it likely depends on the processor and + // buffer size. Unaligned loads (MOVDQU) should be fast on + // x86 processors that support PCLMULQDQ, so we don't align + // the buf pointer here. + + // Read the first (and possibly the only) full 16 bytes. + v0 = _mm_xor_si128(v0, my_load128(buf)); + buf += 16; + size -= 16; + + if (size >= 48) { + v1 = my_load128(buf); + v2 = my_load128(buf + 16); + v3 = my_load128(buf + 32); + buf += 48; + size -= 48; + + while (size >= 64) { + v0 = fold_xor(v0, fold512, buf); + v1 = fold_xor(v1, fold512, buf + 16); + v2 = fold_xor(v2, fold512, buf + 32); + v3 = fold_xor(v3, fold512, buf + 48); + buf += 64; + size -= 64; + } + + v0 = _mm_xor_si128(v1, fold(v0, fold128)); + v0 = _mm_xor_si128(v2, fold(v0, fold128)); + v0 = _mm_xor_si128(v3, fold(v0, fold128)); + } + + while (size >= 16) { + v0 = fold_xor(v0, fold128, buf); + buf += 16; + size -= 16; + } + + if (size > 0) { + // We want the last "size" number of input bytes to + // be at the high bits of v1. First do a full 16-byte + // load and then mask the low bytes to zeros. + v1 = my_load128(buf + size - 16); + v1 = keep_high_bytes(v1, size); + + // Shift high bytes from v0 to the low bytes of v1. + // + // Alternatively we could replace the combination + // keep_high_bytes + shift_right + _mm_or_si128 with + // _mm_shuffle_epi8 + _mm_blendv_epi8 but that would + // require larger tables for the masks. Now there are + // three loads (instead of two) from the mask tables + // but they all are from the same cache line. + v1 = _mm_or_si128(v1, shift_right(v0, size)); + + // Shift high bytes of v0 away, padding the + // low bytes with zeros. + v0 = shift_left(v0, 16 - size); + + v0 = _mm_xor_si128(v1, fold(v0, fold128)); + } + v1 = _mm_srli_si128(v0, 8); + v0 = _mm_clmulepi64_si128(v0, fold128, 0x10); + v0 = _mm_xor_si128(v0, v1); + } + + // Barrett reduction + +#if BUILDING_CRC_CLMUL == 32 + v1 = _mm_clmulepi64_si128(v0, mu_p, 0x10); // v0 * mu + v1 = _mm_clmulepi64_si128(v1, mu_p, 0x00); // v1 * p + v0 = _mm_xor_si128(v0, v1); + return ~(uint32_t)_mm_extract_epi32(v0, 2); +#else + // Because p is 65 bits but one bit doesn't fit into the 64-bit + // half of __m128i, finish the second clmul by shifting v1 left + // by 64 bits and xorring it to the final result. + v1 = _mm_clmulepi64_si128(v0, mu_p, 0x10); // v0 * mu + v2 = _mm_slli_si128(v1, 8); + v1 = _mm_clmulepi64_si128(v1, mu_p, 0x00); // v1 * p + v0 = _mm_xor_si128(v0, v2); + v0 = _mm_xor_si128(v0, v1); #if defined(__i386__) || defined(_M_IX86) return ~(((uint64_t)(uint32_t)_mm_extract_epi32(v0, 3) << 32) | (uint64_t)(uint32_t)_mm_extract_epi32(v0, 2)); #else return ~(uint64_t)_mm_extract_epi64(v0, 1); #endif -} - -#if defined(_MSC_VER) && !defined(__INTEL_COMPILER) && !defined(__clang__) \ - && defined(_M_IX86) -# pragma optimize("", on) #endif - -#endif // BUILDING_CRC64_CLMUL +} // Even though this is an inline function, compile it only when needed. // This way it won't appear in E2K builds at all. #if defined(CRC32_GENERIC) || defined(CRC64_GENERIC) // Inlining this function duplicates the function body in crc32_resolve() and // crc64_resolve(), but this is acceptable because this is a tiny function. static inline bool is_arch_extension_supported(void) { int success = 1; uint32_t r[4]; // eax, ebx, ecx, edx #if defined(_MSC_VER) // This needs with MSVC. ICC has it as a built-in // on all platforms. __cpuid(r, 1); #elif defined(HAVE_CPUID_H) // Compared to just using __asm__ to run CPUID, this also checks // that CPUID is supported and saves and restores ebx as that is // needed with GCC < 5 with position-independent code (PIC). success = __get_cpuid(1, &r[0], &r[1], &r[2], &r[3]); #else // Just a fallback that shouldn't be needed. __asm__("cpuid\n\t" : "=a"(r[0]), "=b"(r[1]), "=c"(r[2]), "=d"(r[3]) : "a"(1), "c"(0)); #endif // Returns true if these are supported: // CLMUL (bit 1 in ecx) // SSSE3 (bit 9 in ecx) // SSE4.1 (bit 19 in ecx) const uint32_t ecx_mask = (1 << 1) | (1 << 9) | (1 << 19); return success && (r[2] & ecx_mask) == ecx_mask; // Alternative methods that weren't used: // - ICC's _may_i_use_cpu_feature: the other methods should work too. // - GCC >= 6 / Clang / ICX __builtin_cpu_supports("pclmul") // // CPUID decoding is needed with MSVC anyway and older GCC. This keeps // the feature checks in the build system simpler too. The nice thing // about __builtin_cpu_supports would be that it generates very short // code as is it only reads a variable set at startup but a few bytes // doesn't matter here. } #endif diff --git a/src/liblzma/common/alone_decoder.c b/src/liblzma/common/alone_decoder.c index 78af651578fc..e2b58e1f3758 100644 --- a/src/liblzma/common/alone_decoder.c +++ b/src/liblzma/common/alone_decoder.c @@ -1,248 +1,247 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file alone_decoder.c /// \brief Decoder for LZMA_Alone files // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "alone_decoder.h" #include "lzma_decoder.h" #include "lz_decoder.h" typedef struct { lzma_next_coder next; enum { SEQ_PROPERTIES, SEQ_DICTIONARY_SIZE, SEQ_UNCOMPRESSED_SIZE, SEQ_CODER_INIT, SEQ_CODE, } sequence; /// If true, reject files that are unlikely to be .lzma files. /// If false, more non-.lzma files get accepted and will give /// LZMA_DATA_ERROR either immediately or after a few output bytes. bool picky; /// Position in the header fields size_t pos; /// Uncompressed size decoded from the header lzma_vli uncompressed_size; /// Memory usage limit uint64_t memlimit; /// Amount of memory actually needed (only an estimate) uint64_t memusage; /// Options decoded from the header needed to initialize /// the LZMA decoder lzma_options_lzma options; } lzma_alone_coder; static lzma_ret alone_decode(void *coder_ptr, const lzma_allocator *allocator, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size, uint8_t *restrict out, size_t *restrict out_pos, size_t out_size, lzma_action action) { lzma_alone_coder *coder = coder_ptr; while (*out_pos < out_size && (coder->sequence == SEQ_CODE || *in_pos < in_size)) switch (coder->sequence) { case SEQ_PROPERTIES: if (lzma_lzma_lclppb_decode(&coder->options, in[*in_pos])) return LZMA_FORMAT_ERROR; coder->sequence = SEQ_DICTIONARY_SIZE; ++*in_pos; break; case SEQ_DICTIONARY_SIZE: coder->options.dict_size |= (size_t)(in[*in_pos]) << (coder->pos * 8); if (++coder->pos == 4) { if (coder->picky && coder->options.dict_size != UINT32_MAX) { // A hack to ditch tons of false positives: // We allow only dictionary sizes that are // 2^n or 2^n + 2^(n-1). LZMA_Alone created // only files with 2^n, but accepts any // dictionary size. uint32_t d = coder->options.dict_size - 1; d |= d >> 2; d |= d >> 3; d |= d >> 4; d |= d >> 8; d |= d >> 16; ++d; if (d != coder->options.dict_size) return LZMA_FORMAT_ERROR; } coder->pos = 0; coder->sequence = SEQ_UNCOMPRESSED_SIZE; } ++*in_pos; break; case SEQ_UNCOMPRESSED_SIZE: coder->uncompressed_size |= (lzma_vli)(in[*in_pos]) << (coder->pos * 8); ++*in_pos; if (++coder->pos < 8) break; // Another hack to ditch false positives: Assume that // if the uncompressed size is known, it must be less // than 256 GiB. // // FIXME? Without picky we allow > LZMA_VLI_MAX which doesn't // really matter in this specific situation (> LZMA_VLI_MAX is // safe in the LZMA decoder) but it's somewhat weird still. if (coder->picky && coder->uncompressed_size != LZMA_VLI_UNKNOWN && coder->uncompressed_size >= (LZMA_VLI_C(1) << 38)) return LZMA_FORMAT_ERROR; // Use LZMA_FILTER_LZMA1EXT features to specify the // uncompressed size and that the end marker is allowed // even when the uncompressed size is known. Both .lzma // header and LZMA1EXT use UINT64_MAX indicate that size // is unknown. coder->options.ext_flags = LZMA_LZMA1EXT_ALLOW_EOPM; lzma_set_ext_size(coder->options, coder->uncompressed_size); // Calculate the memory usage so that it is ready // for SEQ_CODER_INIT. coder->memusage = lzma_lzma_decoder_memusage(&coder->options) + LZMA_MEMUSAGE_BASE; coder->pos = 0; coder->sequence = SEQ_CODER_INIT; - - // Fall through + FALLTHROUGH; case SEQ_CODER_INIT: { if (coder->memusage > coder->memlimit) return LZMA_MEMLIMIT_ERROR; lzma_filter_info filters[2] = { { .id = LZMA_FILTER_LZMA1EXT, .init = &lzma_lzma_decoder_init, .options = &coder->options, }, { .init = NULL, } }; return_if_error(lzma_next_filter_init(&coder->next, allocator, filters)); coder->sequence = SEQ_CODE; break; } case SEQ_CODE: { return coder->next.code(coder->next.coder, allocator, in, in_pos, in_size, out, out_pos, out_size, action); } default: return LZMA_PROG_ERROR; } return LZMA_OK; } static void alone_decoder_end(void *coder_ptr, const lzma_allocator *allocator) { lzma_alone_coder *coder = coder_ptr; lzma_next_end(&coder->next, allocator); lzma_free(coder, allocator); return; } static lzma_ret alone_decoder_memconfig(void *coder_ptr, uint64_t *memusage, uint64_t *old_memlimit, uint64_t new_memlimit) { lzma_alone_coder *coder = coder_ptr; *memusage = coder->memusage; *old_memlimit = coder->memlimit; if (new_memlimit != 0) { if (new_memlimit < coder->memusage) return LZMA_MEMLIMIT_ERROR; coder->memlimit = new_memlimit; } return LZMA_OK; } extern lzma_ret lzma_alone_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator, uint64_t memlimit, bool picky) { lzma_next_coder_init(&lzma_alone_decoder_init, next, allocator); lzma_alone_coder *coder = next->coder; if (coder == NULL) { coder = lzma_alloc(sizeof(lzma_alone_coder), allocator); if (coder == NULL) return LZMA_MEM_ERROR; next->coder = coder; next->code = &alone_decode; next->end = &alone_decoder_end; next->memconfig = &alone_decoder_memconfig; coder->next = LZMA_NEXT_CODER_INIT; } coder->sequence = SEQ_PROPERTIES; coder->picky = picky; coder->pos = 0; coder->options.dict_size = 0; coder->options.preset_dict = NULL; coder->options.preset_dict_size = 0; coder->uncompressed_size = 0; coder->memlimit = my_max(1, memlimit); coder->memusage = LZMA_MEMUSAGE_BASE; return LZMA_OK; } extern LZMA_API(lzma_ret) lzma_alone_decoder(lzma_stream *strm, uint64_t memlimit) { lzma_next_strm_init(lzma_alone_decoder_init, strm, memlimit, false); strm->internal->supported_actions[LZMA_RUN] = true; strm->internal->supported_actions[LZMA_FINISH] = true; return LZMA_OK; } diff --git a/src/liblzma/common/auto_decoder.c b/src/liblzma/common/auto_decoder.c index fdd520f905c5..da49345f909d 100644 --- a/src/liblzma/common/auto_decoder.c +++ b/src/liblzma/common/auto_decoder.c @@ -1,205 +1,204 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file auto_decoder.c /// \brief Autodetect between .xz, .lzma (LZMA_Alone), and .lz (lzip) // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "stream_decoder.h" #include "alone_decoder.h" #ifdef HAVE_LZIP_DECODER # include "lzip_decoder.h" #endif typedef struct { /// .xz Stream decoder, LZMA_Alone decoder, or lzip decoder lzma_next_coder next; uint64_t memlimit; uint32_t flags; enum { SEQ_INIT, SEQ_CODE, SEQ_FINISH, } sequence; } lzma_auto_coder; static lzma_ret auto_decode(void *coder_ptr, const lzma_allocator *allocator, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size, uint8_t *restrict out, size_t *restrict out_pos, size_t out_size, lzma_action action) { lzma_auto_coder *coder = coder_ptr; switch (coder->sequence) { case SEQ_INIT: if (*in_pos >= in_size) return LZMA_OK; // Update the sequence now, because we want to continue from // SEQ_CODE even if we return some LZMA_*_CHECK. coder->sequence = SEQ_CODE; // Detect the file format. .xz files start with 0xFD which // cannot be the first byte of .lzma (LZMA_Alone) format. // The .lz format starts with 0x4C which could be the // first byte of a .lzma file but luckily it would mean // lc/lp/pb being 4/3/1 which liblzma doesn't support because // lc + lp > 4. So using just 0x4C to detect .lz is OK here. if (in[*in_pos] == 0xFD) { return_if_error(lzma_stream_decoder_init( &coder->next, allocator, coder->memlimit, coder->flags)); #ifdef HAVE_LZIP_DECODER } else if (in[*in_pos] == 0x4C) { return_if_error(lzma_lzip_decoder_init( &coder->next, allocator, coder->memlimit, coder->flags)); #endif } else { return_if_error(lzma_alone_decoder_init(&coder->next, allocator, coder->memlimit, true)); // If the application wants to know about missing // integrity check or about the check in general, we // need to handle it here, because LZMA_Alone decoder // doesn't accept any flags. if (coder->flags & LZMA_TELL_NO_CHECK) return LZMA_NO_CHECK; if (coder->flags & LZMA_TELL_ANY_CHECK) return LZMA_GET_CHECK; } - // Fall through + FALLTHROUGH; case SEQ_CODE: { const lzma_ret ret = coder->next.code( coder->next.coder, allocator, in, in_pos, in_size, out, out_pos, out_size, action); if (ret != LZMA_STREAM_END || (coder->flags & LZMA_CONCATENATED) == 0) return ret; coder->sequence = SEQ_FINISH; + FALLTHROUGH; } - // Fall through - case SEQ_FINISH: // When LZMA_CONCATENATED was used and we were decoding // a LZMA_Alone file, we need to check that there is no // trailing garbage and wait for LZMA_FINISH. if (*in_pos < in_size) return LZMA_DATA_ERROR; return action == LZMA_FINISH ? LZMA_STREAM_END : LZMA_OK; default: assert(0); return LZMA_PROG_ERROR; } } static void auto_decoder_end(void *coder_ptr, const lzma_allocator *allocator) { lzma_auto_coder *coder = coder_ptr; lzma_next_end(&coder->next, allocator); lzma_free(coder, allocator); return; } static lzma_check auto_decoder_get_check(const void *coder_ptr) { const lzma_auto_coder *coder = coder_ptr; // It is LZMA_Alone if get_check is NULL. return coder->next.get_check == NULL ? LZMA_CHECK_NONE : coder->next.get_check(coder->next.coder); } static lzma_ret auto_decoder_memconfig(void *coder_ptr, uint64_t *memusage, uint64_t *old_memlimit, uint64_t new_memlimit) { lzma_auto_coder *coder = coder_ptr; lzma_ret ret; if (coder->next.memconfig != NULL) { ret = coder->next.memconfig(coder->next.coder, memusage, old_memlimit, new_memlimit); assert(*old_memlimit == coder->memlimit); } else { // No coder is configured yet. Use the base value as // the current memory usage. *memusage = LZMA_MEMUSAGE_BASE; *old_memlimit = coder->memlimit; ret = LZMA_OK; if (new_memlimit != 0 && new_memlimit < *memusage) ret = LZMA_MEMLIMIT_ERROR; } if (ret == LZMA_OK && new_memlimit != 0) coder->memlimit = new_memlimit; return ret; } static lzma_ret auto_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator, uint64_t memlimit, uint32_t flags) { lzma_next_coder_init(&auto_decoder_init, next, allocator); if (flags & ~LZMA_SUPPORTED_FLAGS) return LZMA_OPTIONS_ERROR; lzma_auto_coder *coder = next->coder; if (coder == NULL) { coder = lzma_alloc(sizeof(lzma_auto_coder), allocator); if (coder == NULL) return LZMA_MEM_ERROR; next->coder = coder; next->code = &auto_decode; next->end = &auto_decoder_end; next->get_check = &auto_decoder_get_check; next->memconfig = &auto_decoder_memconfig; coder->next = LZMA_NEXT_CODER_INIT; } coder->memlimit = my_max(1, memlimit); coder->flags = flags; coder->sequence = SEQ_INIT; return LZMA_OK; } extern LZMA_API(lzma_ret) lzma_auto_decoder(lzma_stream *strm, uint64_t memlimit, uint32_t flags) { lzma_next_strm_init(auto_decoder_init, strm, memlimit, flags); strm->internal->supported_actions[LZMA_RUN] = true; strm->internal->supported_actions[LZMA_FINISH] = true; return LZMA_OK; } diff --git a/src/liblzma/common/block_decoder.c b/src/liblzma/common/block_decoder.c index 2e369d316bdf..bbc9f5566c8b 100644 --- a/src/liblzma/common/block_decoder.c +++ b/src/liblzma/common/block_decoder.c @@ -1,288 +1,286 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file block_decoder.c /// \brief Decodes .xz Blocks // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "block_decoder.h" #include "filter_decoder.h" #include "check.h" typedef struct { enum { SEQ_CODE, SEQ_PADDING, SEQ_CHECK, } sequence; /// The filters in the chain; initialized with lzma_raw_decoder_init(). lzma_next_coder next; /// Decoding options; we also write Compressed Size and Uncompressed /// Size back to this structure when the decoding has been finished. lzma_block *block; /// Compressed Size calculated while decoding lzma_vli compressed_size; /// Uncompressed Size calculated while decoding lzma_vli uncompressed_size; /// Maximum allowed Compressed Size; this takes into account the /// size of the Block Header and Check fields when Compressed Size /// is unknown. lzma_vli compressed_limit; /// Maximum allowed Uncompressed Size. lzma_vli uncompressed_limit; /// Position when reading the Check field size_t check_pos; /// Check of the uncompressed data lzma_check_state check; /// True if the integrity check won't be calculated and verified. bool ignore_check; } lzma_block_coder; static inline bool is_size_valid(lzma_vli size, lzma_vli reference) { return reference == LZMA_VLI_UNKNOWN || reference == size; } static lzma_ret block_decode(void *coder_ptr, const lzma_allocator *allocator, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size, uint8_t *restrict out, size_t *restrict out_pos, size_t out_size, lzma_action action) { lzma_block_coder *coder = coder_ptr; switch (coder->sequence) { case SEQ_CODE: { const size_t in_start = *in_pos; const size_t out_start = *out_pos; // Limit the amount of input and output space that we give // to the raw decoder based on the information we have // (or don't have) from Block Header. const size_t in_stop = *in_pos + (size_t)my_min( in_size - *in_pos, coder->compressed_limit - coder->compressed_size); const size_t out_stop = *out_pos + (size_t)my_min( out_size - *out_pos, coder->uncompressed_limit - coder->uncompressed_size); const lzma_ret ret = coder->next.code(coder->next.coder, allocator, in, in_pos, in_stop, out, out_pos, out_stop, action); const size_t in_used = *in_pos - in_start; const size_t out_used = *out_pos - out_start; // Because we have limited the input and output sizes, // we know that these cannot grow too big or overflow. coder->compressed_size += in_used; coder->uncompressed_size += out_used; if (ret == LZMA_OK) { const bool comp_done = coder->compressed_size == coder->block->compressed_size; const bool uncomp_done = coder->uncompressed_size == coder->block->uncompressed_size; // If both input and output amounts match the sizes // in Block Header but we still got LZMA_OK instead // of LZMA_STREAM_END, the file is broken. if (comp_done && uncomp_done) return LZMA_DATA_ERROR; // If the decoder has consumed all the input that it // needs but it still couldn't fill the output buffer // or return LZMA_STREAM_END, the file is broken. if (comp_done && *out_pos < out_size) return LZMA_DATA_ERROR; // If the decoder has produced all the output but // it still didn't return LZMA_STREAM_END or consume // more input (for example, detecting an end of // payload marker may need more input but produce // no output) the file is broken. if (uncomp_done && *in_pos < in_size) return LZMA_DATA_ERROR; } // Don't waste time updating the integrity check if it will be // ignored. Also skip it if no new output was produced. This // avoids null pointer + 0 (undefined behavior) when out == 0. if (!coder->ignore_check && out_used > 0) lzma_check_update(&coder->check, coder->block->check, out + out_start, out_used); if (ret != LZMA_STREAM_END) return ret; // Compressed and Uncompressed Sizes are now at their final // values. Verify that they match the values given to us. if (!is_size_valid(coder->compressed_size, coder->block->compressed_size) || !is_size_valid(coder->uncompressed_size, coder->block->uncompressed_size)) return LZMA_DATA_ERROR; // Copy the values into coder->block. The caller // may use this information to construct Index. coder->block->compressed_size = coder->compressed_size; coder->block->uncompressed_size = coder->uncompressed_size; coder->sequence = SEQ_PADDING; + FALLTHROUGH; } - // Fall through - case SEQ_PADDING: // Compressed Data is padded to a multiple of four bytes. while (coder->compressed_size & 3) { if (*in_pos >= in_size) return LZMA_OK; // We use compressed_size here just get the Padding // right. The actual Compressed Size was stored to // coder->block already, and won't be modified by // us anymore. ++coder->compressed_size; if (in[(*in_pos)++] != 0x00) return LZMA_DATA_ERROR; } if (coder->block->check == LZMA_CHECK_NONE) return LZMA_STREAM_END; if (!coder->ignore_check) lzma_check_finish(&coder->check, coder->block->check); coder->sequence = SEQ_CHECK; - - // Fall through + FALLTHROUGH; case SEQ_CHECK: { const size_t check_size = lzma_check_size(coder->block->check); lzma_bufcpy(in, in_pos, in_size, coder->block->raw_check, &coder->check_pos, check_size); if (coder->check_pos < check_size) return LZMA_OK; // Validate the Check only if we support it. // coder->check.buffer may be uninitialized // when the Check ID is not supported. if (!coder->ignore_check && lzma_check_is_supported(coder->block->check) && memcmp(coder->block->raw_check, coder->check.buffer.u8, check_size) != 0) return LZMA_DATA_ERROR; return LZMA_STREAM_END; } } return LZMA_PROG_ERROR; } static void block_decoder_end(void *coder_ptr, const lzma_allocator *allocator) { lzma_block_coder *coder = coder_ptr; lzma_next_end(&coder->next, allocator); lzma_free(coder, allocator); return; } extern lzma_ret lzma_block_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator, lzma_block *block) { lzma_next_coder_init(&lzma_block_decoder_init, next, allocator); // Validate the options. lzma_block_unpadded_size() does that for us // except for Uncompressed Size and filters. Filters are validated // by the raw decoder. if (lzma_block_unpadded_size(block) == 0 || !lzma_vli_is_valid(block->uncompressed_size)) return LZMA_PROG_ERROR; // Allocate *next->coder if needed. lzma_block_coder *coder = next->coder; if (coder == NULL) { coder = lzma_alloc(sizeof(lzma_block_coder), allocator); if (coder == NULL) return LZMA_MEM_ERROR; next->coder = coder; next->code = &block_decode; next->end = &block_decoder_end; coder->next = LZMA_NEXT_CODER_INIT; } // Basic initializations coder->sequence = SEQ_CODE; coder->block = block; coder->compressed_size = 0; coder->uncompressed_size = 0; // If Compressed Size is not known, we calculate the maximum allowed // value so that encoded size of the Block (including Block Padding) // is still a valid VLI and a multiple of four. coder->compressed_limit = block->compressed_size == LZMA_VLI_UNKNOWN ? (LZMA_VLI_MAX & ~LZMA_VLI_C(3)) - block->header_size - lzma_check_size(block->check) : block->compressed_size; // With Uncompressed Size this is simpler. If Block Header lacks // the size info, then LZMA_VLI_MAX is the maximum possible // Uncompressed Size. coder->uncompressed_limit = block->uncompressed_size == LZMA_VLI_UNKNOWN ? LZMA_VLI_MAX : block->uncompressed_size; // Initialize the check. It's caller's problem if the Check ID is not // supported, and the Block decoder cannot verify the Check field. // Caller can test lzma_check_is_supported(block->check). coder->check_pos = 0; lzma_check_init(&coder->check, block->check); coder->ignore_check = block->version >= 1 ? block->ignore_check : false; // Initialize the filter chain. return lzma_raw_decoder_init(&coder->next, allocator, block->filters); } extern LZMA_API(lzma_ret) lzma_block_decoder(lzma_stream *strm, lzma_block *block) { lzma_next_strm_init(lzma_block_decoder_init, strm, block); strm->internal->supported_actions[LZMA_RUN] = true; strm->internal->supported_actions[LZMA_FINISH] = true; return LZMA_OK; } diff --git a/src/liblzma/common/block_encoder.c b/src/liblzma/common/block_encoder.c index ce8c1de69442..eb7997a72aeb 100644 --- a/src/liblzma/common/block_encoder.c +++ b/src/liblzma/common/block_encoder.c @@ -1,226 +1,224 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file block_encoder.c /// \brief Encodes .xz Blocks // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "block_encoder.h" #include "filter_encoder.h" #include "check.h" typedef struct { /// The filters in the chain; initialized with lzma_raw_decoder_init(). lzma_next_coder next; /// Encoding options; we also write Unpadded Size, Compressed Size, /// and Uncompressed Size back to this structure when the encoding /// has been finished. lzma_block *block; enum { SEQ_CODE, SEQ_PADDING, SEQ_CHECK, } sequence; /// Compressed Size calculated while encoding lzma_vli compressed_size; /// Uncompressed Size calculated while encoding lzma_vli uncompressed_size; /// Position in the Check field size_t pos; /// Check of the uncompressed data lzma_check_state check; } lzma_block_coder; static lzma_ret block_encode(void *coder_ptr, const lzma_allocator *allocator, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size, uint8_t *restrict out, size_t *restrict out_pos, size_t out_size, lzma_action action) { lzma_block_coder *coder = coder_ptr; // Check that our amount of input stays in proper limits. if (LZMA_VLI_MAX - coder->uncompressed_size < in_size - *in_pos) return LZMA_DATA_ERROR; switch (coder->sequence) { case SEQ_CODE: { const size_t in_start = *in_pos; const size_t out_start = *out_pos; const lzma_ret ret = coder->next.code(coder->next.coder, allocator, in, in_pos, in_size, out, out_pos, out_size, action); const size_t in_used = *in_pos - in_start; const size_t out_used = *out_pos - out_start; if (COMPRESSED_SIZE_MAX - coder->compressed_size < out_used) return LZMA_DATA_ERROR; coder->compressed_size += out_used; // No need to check for overflow because we have already // checked it at the beginning of this function. coder->uncompressed_size += in_used; // Call lzma_check_update() only if input was consumed. This // avoids null pointer + 0 (undefined behavior) when in == 0. if (in_used > 0) lzma_check_update(&coder->check, coder->block->check, in + in_start, in_used); if (ret != LZMA_STREAM_END || action == LZMA_SYNC_FLUSH) return ret; assert(*in_pos == in_size); assert(action == LZMA_FINISH); // Copy the values into coder->block. The caller // may use this information to construct Index. coder->block->compressed_size = coder->compressed_size; coder->block->uncompressed_size = coder->uncompressed_size; coder->sequence = SEQ_PADDING; + FALLTHROUGH; } - // Fall through - case SEQ_PADDING: // Pad Compressed Data to a multiple of four bytes. We can // use coder->compressed_size for this since we don't need // it for anything else anymore. while (coder->compressed_size & 3) { if (*out_pos >= out_size) return LZMA_OK; out[*out_pos] = 0x00; ++*out_pos; ++coder->compressed_size; } if (coder->block->check == LZMA_CHECK_NONE) return LZMA_STREAM_END; lzma_check_finish(&coder->check, coder->block->check); coder->sequence = SEQ_CHECK; - - // Fall through + FALLTHROUGH; case SEQ_CHECK: { const size_t check_size = lzma_check_size(coder->block->check); lzma_bufcpy(coder->check.buffer.u8, &coder->pos, check_size, out, out_pos, out_size); if (coder->pos < check_size) return LZMA_OK; memcpy(coder->block->raw_check, coder->check.buffer.u8, check_size); return LZMA_STREAM_END; } } return LZMA_PROG_ERROR; } static void block_encoder_end(void *coder_ptr, const lzma_allocator *allocator) { lzma_block_coder *coder = coder_ptr; lzma_next_end(&coder->next, allocator); lzma_free(coder, allocator); return; } static lzma_ret block_encoder_update(void *coder_ptr, const lzma_allocator *allocator, const lzma_filter *filters lzma_attribute((__unused__)), const lzma_filter *reversed_filters) { lzma_block_coder *coder = coder_ptr; if (coder->sequence != SEQ_CODE) return LZMA_PROG_ERROR; return lzma_next_filter_update( &coder->next, allocator, reversed_filters); } extern lzma_ret lzma_block_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator, lzma_block *block) { lzma_next_coder_init(&lzma_block_encoder_init, next, allocator); if (block == NULL) return LZMA_PROG_ERROR; // The contents of the structure may depend on the version so // check the version first. if (block->version > 1) return LZMA_OPTIONS_ERROR; // If the Check ID is not supported, we cannot calculate the check and // thus not create a proper Block. if ((unsigned int)(block->check) > LZMA_CHECK_ID_MAX) return LZMA_PROG_ERROR; if (!lzma_check_is_supported(block->check)) return LZMA_UNSUPPORTED_CHECK; // Allocate and initialize *next->coder if needed. lzma_block_coder *coder = next->coder; if (coder == NULL) { coder = lzma_alloc(sizeof(lzma_block_coder), allocator); if (coder == NULL) return LZMA_MEM_ERROR; next->coder = coder; next->code = &block_encode; next->end = &block_encoder_end; next->update = &block_encoder_update; coder->next = LZMA_NEXT_CODER_INIT; } // Basic initializations coder->sequence = SEQ_CODE; coder->block = block; coder->compressed_size = 0; coder->uncompressed_size = 0; coder->pos = 0; // Initialize the check lzma_check_init(&coder->check, block->check); // Initialize the requested filters. return lzma_raw_encoder_init(&coder->next, allocator, block->filters); } extern LZMA_API(lzma_ret) lzma_block_encoder(lzma_stream *strm, lzma_block *block) { lzma_next_strm_init(lzma_block_encoder_init, strm, block); strm->internal->supported_actions[LZMA_RUN] = true; strm->internal->supported_actions[LZMA_SYNC_FLUSH] = true; strm->internal->supported_actions[LZMA_FINISH] = true; return LZMA_OK; } diff --git a/src/liblzma/common/common.c b/src/liblzma/common/common.c index cc0e06a51bee..6e031a56c888 100644 --- a/src/liblzma/common/common.c +++ b/src/liblzma/common/common.c @@ -1,480 +1,486 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file common.c /// \brief Common functions needed in many places in liblzma // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "common.h" ///////////// // Version // ///////////// extern LZMA_API(uint32_t) lzma_version_number(void) { return LZMA_VERSION; } extern LZMA_API(const char *) lzma_version_string(void) { return LZMA_VERSION_STRING; } /////////////////////// // Memory allocation // /////////////////////// lzma_attr_alloc_size(1) extern void * lzma_alloc(size_t size, const lzma_allocator *allocator) { // Some malloc() variants return NULL if called with size == 0. if (size == 0) size = 1; void *ptr; if (allocator != NULL && allocator->alloc != NULL) ptr = allocator->alloc(allocator->opaque, 1, size); else ptr = malloc(size); return ptr; } lzma_attr_alloc_size(1) extern void * lzma_alloc_zero(size_t size, const lzma_allocator *allocator) { // Some calloc() variants return NULL if called with size == 0. if (size == 0) size = 1; void *ptr; if (allocator != NULL && allocator->alloc != NULL) { ptr = allocator->alloc(allocator->opaque, 1, size); if (ptr != NULL) memzero(ptr, size); } else { ptr = calloc(1, size); } return ptr; } extern void lzma_free(void *ptr, const lzma_allocator *allocator) { if (allocator != NULL && allocator->free != NULL) allocator->free(allocator->opaque, ptr); else free(ptr); return; } ////////// // Misc // ////////// extern size_t lzma_bufcpy(const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size, uint8_t *restrict out, size_t *restrict out_pos, size_t out_size) { + assert(in != NULL || *in_pos == in_size); + assert(out != NULL || *out_pos == out_size); + + assert(*in_pos <= in_size); + assert(*out_pos <= out_size); + const size_t in_avail = in_size - *in_pos; const size_t out_avail = out_size - *out_pos; const size_t copy_size = my_min(in_avail, out_avail); // Call memcpy() only if there is something to copy. If there is // nothing to copy, in or out might be NULL and then the memcpy() // call would trigger undefined behavior. if (copy_size > 0) memcpy(out + *out_pos, in + *in_pos, copy_size); *in_pos += copy_size; *out_pos += copy_size; return copy_size; } extern lzma_ret lzma_next_filter_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters) { lzma_next_coder_init(filters[0].init, next, allocator); next->id = filters[0].id; return filters[0].init == NULL ? LZMA_OK : filters[0].init(next, allocator, filters); } extern lzma_ret lzma_next_filter_update(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter *reversed_filters) { // Check that the application isn't trying to change the Filter ID. // End of filters is indicated with LZMA_VLI_UNKNOWN in both // reversed_filters[0].id and next->id. if (reversed_filters[0].id != next->id) return LZMA_PROG_ERROR; if (reversed_filters[0].id == LZMA_VLI_UNKNOWN) return LZMA_OK; assert(next->update != NULL); return next->update(next->coder, allocator, NULL, reversed_filters); } extern void lzma_next_end(lzma_next_coder *next, const lzma_allocator *allocator) { if (next->init != (uintptr_t)(NULL)) { // To avoid tiny end functions that simply call // lzma_free(coder, allocator), we allow leaving next->end // NULL and call lzma_free() here. if (next->end != NULL) next->end(next->coder, allocator); else lzma_free(next->coder, allocator); // Reset the variables so the we don't accidentally think // that it is an already initialized coder. *next = LZMA_NEXT_CODER_INIT; } return; } ////////////////////////////////////// // External to internal API wrapper // ////////////////////////////////////// extern lzma_ret lzma_strm_init(lzma_stream *strm) { if (strm == NULL) return LZMA_PROG_ERROR; if (strm->internal == NULL) { strm->internal = lzma_alloc(sizeof(lzma_internal), strm->allocator); if (strm->internal == NULL) return LZMA_MEM_ERROR; strm->internal->next = LZMA_NEXT_CODER_INIT; } memzero(strm->internal->supported_actions, sizeof(strm->internal->supported_actions)); strm->internal->sequence = ISEQ_RUN; strm->internal->allow_buf_error = false; strm->total_in = 0; strm->total_out = 0; return LZMA_OK; } extern LZMA_API(lzma_ret) lzma_code(lzma_stream *strm, lzma_action action) { // Sanity checks if ((strm->next_in == NULL && strm->avail_in != 0) || (strm->next_out == NULL && strm->avail_out != 0) || strm->internal == NULL || strm->internal->next.code == NULL || (unsigned int)(action) > LZMA_ACTION_MAX || !strm->internal->supported_actions[action]) return LZMA_PROG_ERROR; // Check if unsupported members have been set to non-zero or non-NULL, // which would indicate that some new feature is wanted. if (strm->reserved_ptr1 != NULL || strm->reserved_ptr2 != NULL || strm->reserved_ptr3 != NULL || strm->reserved_ptr4 != NULL || strm->reserved_int2 != 0 || strm->reserved_int3 != 0 || strm->reserved_int4 != 0 || strm->reserved_enum1 != LZMA_RESERVED_ENUM || strm->reserved_enum2 != LZMA_RESERVED_ENUM) return LZMA_OPTIONS_ERROR; switch (strm->internal->sequence) { case ISEQ_RUN: switch (action) { case LZMA_RUN: break; case LZMA_SYNC_FLUSH: strm->internal->sequence = ISEQ_SYNC_FLUSH; break; case LZMA_FULL_FLUSH: strm->internal->sequence = ISEQ_FULL_FLUSH; break; case LZMA_FINISH: strm->internal->sequence = ISEQ_FINISH; break; case LZMA_FULL_BARRIER: strm->internal->sequence = ISEQ_FULL_BARRIER; break; } break; case ISEQ_SYNC_FLUSH: // The same action must be used until we return // LZMA_STREAM_END, and the amount of input must not change. if (action != LZMA_SYNC_FLUSH || strm->internal->avail_in != strm->avail_in) return LZMA_PROG_ERROR; break; case ISEQ_FULL_FLUSH: if (action != LZMA_FULL_FLUSH || strm->internal->avail_in != strm->avail_in) return LZMA_PROG_ERROR; break; case ISEQ_FINISH: if (action != LZMA_FINISH || strm->internal->avail_in != strm->avail_in) return LZMA_PROG_ERROR; break; case ISEQ_FULL_BARRIER: if (action != LZMA_FULL_BARRIER || strm->internal->avail_in != strm->avail_in) return LZMA_PROG_ERROR; break; case ISEQ_END: return LZMA_STREAM_END; case ISEQ_ERROR: default: return LZMA_PROG_ERROR; } size_t in_pos = 0; size_t out_pos = 0; lzma_ret ret = strm->internal->next.code( strm->internal->next.coder, strm->allocator, strm->next_in, &in_pos, strm->avail_in, strm->next_out, &out_pos, strm->avail_out, action); // Updating next_in and next_out has to be skipped when they are NULL // to avoid null pointer + 0 (undefined behavior). Do this by checking // in_pos > 0 and out_pos > 0 because this way NULL + non-zero (a bug) // will get caught one way or other. if (in_pos > 0) { strm->next_in += in_pos; strm->avail_in -= in_pos; strm->total_in += in_pos; } if (out_pos > 0) { strm->next_out += out_pos; strm->avail_out -= out_pos; strm->total_out += out_pos; } strm->internal->avail_in = strm->avail_in; switch (ret) { case LZMA_OK: // Don't return LZMA_BUF_ERROR when it happens the first time. // This is to avoid returning LZMA_BUF_ERROR when avail_out // was zero but still there was no more data left to written // to next_out. if (out_pos == 0 && in_pos == 0) { if (strm->internal->allow_buf_error) ret = LZMA_BUF_ERROR; else strm->internal->allow_buf_error = true; } else { strm->internal->allow_buf_error = false; } break; case LZMA_TIMED_OUT: strm->internal->allow_buf_error = false; ret = LZMA_OK; break; case LZMA_SEEK_NEEDED: strm->internal->allow_buf_error = false; // If LZMA_FINISH was used, reset it back to the // LZMA_RUN-based state so that new input can be supplied // by the application. if (strm->internal->sequence == ISEQ_FINISH) strm->internal->sequence = ISEQ_RUN; break; case LZMA_STREAM_END: if (strm->internal->sequence == ISEQ_SYNC_FLUSH || strm->internal->sequence == ISEQ_FULL_FLUSH || strm->internal->sequence == ISEQ_FULL_BARRIER) strm->internal->sequence = ISEQ_RUN; else strm->internal->sequence = ISEQ_END; - // Fall through + FALLTHROUGH; case LZMA_NO_CHECK: case LZMA_UNSUPPORTED_CHECK: case LZMA_GET_CHECK: case LZMA_MEMLIMIT_ERROR: // Something else than LZMA_OK, but not a fatal error, // that is, coding may be continued (except if ISEQ_END). strm->internal->allow_buf_error = false; break; default: // All the other errors are fatal; coding cannot be continued. assert(ret != LZMA_BUF_ERROR); strm->internal->sequence = ISEQ_ERROR; break; } return ret; } extern LZMA_API(void) lzma_end(lzma_stream *strm) { if (strm != NULL && strm->internal != NULL) { lzma_next_end(&strm->internal->next, strm->allocator); lzma_free(strm->internal, strm->allocator); strm->internal = NULL; } return; } #ifdef HAVE_SYMBOL_VERSIONS_LINUX // This is for compatibility with binaries linked against liblzma that // has been patched with xz-5.2.2-compat-libs.patch from RHEL/CentOS 7. LZMA_SYMVER_API("lzma_get_progress@XZ_5.2.2", void, lzma_get_progress_522)(lzma_stream *strm, uint64_t *progress_in, uint64_t *progress_out) lzma_nothrow __attribute__((__alias__("lzma_get_progress_52"))); LZMA_SYMVER_API("lzma_get_progress@@XZ_5.2", void, lzma_get_progress_52)(lzma_stream *strm, uint64_t *progress_in, uint64_t *progress_out) lzma_nothrow; #define lzma_get_progress lzma_get_progress_52 #endif extern LZMA_API(void) lzma_get_progress(lzma_stream *strm, uint64_t *progress_in, uint64_t *progress_out) { if (strm->internal->next.get_progress != NULL) { strm->internal->next.get_progress(strm->internal->next.coder, progress_in, progress_out); } else { *progress_in = strm->total_in; *progress_out = strm->total_out; } return; } extern LZMA_API(lzma_check) lzma_get_check(const lzma_stream *strm) { // Return LZMA_CHECK_NONE if we cannot know the check type. // It's a bug in the application if this happens. if (strm->internal->next.get_check == NULL) return LZMA_CHECK_NONE; return strm->internal->next.get_check(strm->internal->next.coder); } extern LZMA_API(uint64_t) lzma_memusage(const lzma_stream *strm) { uint64_t memusage; uint64_t old_memlimit; if (strm == NULL || strm->internal == NULL || strm->internal->next.memconfig == NULL || strm->internal->next.memconfig( strm->internal->next.coder, &memusage, &old_memlimit, 0) != LZMA_OK) return 0; return memusage; } extern LZMA_API(uint64_t) lzma_memlimit_get(const lzma_stream *strm) { uint64_t old_memlimit; uint64_t memusage; if (strm == NULL || strm->internal == NULL || strm->internal->next.memconfig == NULL || strm->internal->next.memconfig( strm->internal->next.coder, &memusage, &old_memlimit, 0) != LZMA_OK) return 0; return old_memlimit; } extern LZMA_API(lzma_ret) lzma_memlimit_set(lzma_stream *strm, uint64_t new_memlimit) { // Dummy variables to simplify memconfig functions uint64_t old_memlimit; uint64_t memusage; if (strm == NULL || strm->internal == NULL || strm->internal->next.memconfig == NULL) return LZMA_PROG_ERROR; // Zero is a special value that cannot be used as an actual limit. // If 0 was specified, use 1 instead. if (new_memlimit == 0) new_memlimit = 1; return strm->internal->next.memconfig(strm->internal->next.coder, &memusage, &old_memlimit, new_memlimit); } diff --git a/src/liblzma/common/file_info.c b/src/liblzma/common/file_info.c index 7c85084a706e..4b2eb5d0400b 100644 --- a/src/liblzma/common/file_info.c +++ b/src/liblzma/common/file_info.c @@ -1,854 +1,850 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file file_info.c /// \brief Decode .xz file information into a lzma_index structure // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "index_decoder.h" typedef struct { enum { SEQ_MAGIC_BYTES, SEQ_PADDING_SEEK, SEQ_PADDING_DECODE, SEQ_FOOTER, SEQ_INDEX_INIT, SEQ_INDEX_DECODE, SEQ_HEADER_DECODE, SEQ_HEADER_COMPARE, } sequence; /// Absolute position of in[*in_pos] in the file. All code that /// modifies *in_pos also updates this. seek_to_pos() needs this /// to determine if we need to request the application to seek for /// us or if we can do the seeking internally by adjusting *in_pos. uint64_t file_cur_pos; /// This refers to absolute positions of interesting parts of the /// input file. Sometimes it points to the *beginning* of a specific /// field and sometimes to the *end* of a field. The current target /// position at each moment is explained in the comments. uint64_t file_target_pos; /// Size of the .xz file (from the application). uint64_t file_size; /// Index decoder lzma_next_coder index_decoder; /// Number of bytes remaining in the Index field that is currently /// being decoded. lzma_vli index_remaining; /// The Index decoder will store the decoded Index in this pointer. lzma_index *this_index; /// Amount of Stream Padding in the current Stream. lzma_vli stream_padding; /// The final combined index is collected here. lzma_index *combined_index; /// Pointer from the application where to store the index information /// after successful decoding. lzma_index **dest_index; /// Pointer to lzma_stream.seek_pos to be used when returning /// LZMA_SEEK_NEEDED. This is set by seek_to_pos() when needed. uint64_t *external_seek_pos; /// Memory usage limit uint64_t memlimit; /// Stream Flags from the very beginning of the file. lzma_stream_flags first_header_flags; /// Stream Flags from Stream Header of the current Stream. lzma_stream_flags header_flags; /// Stream Flags from Stream Footer of the current Stream. lzma_stream_flags footer_flags; size_t temp_pos; size_t temp_size; uint8_t temp[8192]; } lzma_file_info_coder; /// Copies data from in[*in_pos] into coder->temp until /// coder->temp_pos == coder->temp_size. This also keeps coder->file_cur_pos /// in sync with *in_pos. Returns true if more input is needed. static bool fill_temp(lzma_file_info_coder *coder, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size) { coder->file_cur_pos += lzma_bufcpy(in, in_pos, in_size, coder->temp, &coder->temp_pos, coder->temp_size); return coder->temp_pos < coder->temp_size; } /// Seeks to the absolute file position specified by target_pos. /// This tries to do the seeking by only modifying *in_pos, if possible. /// The main benefit of this is that if one passes the whole file at once /// to lzma_code(), the decoder will never need to return LZMA_SEEK_NEEDED /// as all the seeking can be done by adjusting *in_pos in this function. /// /// Returns true if an external seek is needed and the caller must return /// LZMA_SEEK_NEEDED. static bool seek_to_pos(lzma_file_info_coder *coder, uint64_t target_pos, size_t in_start, size_t *in_pos, size_t in_size) { // The input buffer doesn't extend beyond the end of the file. // This has been checked by file_info_decode() already. assert(coder->file_size - coder->file_cur_pos >= in_size - *in_pos); const uint64_t pos_min = coder->file_cur_pos - (*in_pos - in_start); const uint64_t pos_max = coder->file_cur_pos + (in_size - *in_pos); bool external_seek_needed; if (target_pos >= pos_min && target_pos <= pos_max) { // The requested position is available in the current input // buffer or right after it. That is, in a corner case we // end up setting *in_pos == in_size and thus will immediately // need new input bytes from the application. *in_pos += (size_t)(target_pos - coder->file_cur_pos); external_seek_needed = false; } else { // Ask the application to seek the input file. *coder->external_seek_pos = target_pos; external_seek_needed = true; // Mark the whole input buffer as used. This way // lzma_stream.total_in will have a better estimate // of the amount of data read. It still won't be perfect // as the value will depend on the input buffer size that // the application uses, but it should be good enough for // those few who want an estimate. *in_pos = in_size; } // After seeking (internal or external) the current position // will match the requested target position. coder->file_cur_pos = target_pos; return external_seek_needed; } /// The caller sets coder->file_target_pos so that it points to the *end* /// of the desired file position. This function then determines how far /// backwards from that position we can seek. After seeking fill_temp() /// can be used to read data into coder->temp. When fill_temp() has finished, /// coder->temp[coder->temp_size] will match coder->file_target_pos. /// /// This also validates that coder->target_file_pos is sane in sense that /// we aren't trying to seek too far backwards (too close or beyond the /// beginning of the file). static lzma_ret reverse_seek(lzma_file_info_coder *coder, size_t in_start, size_t *in_pos, size_t in_size) { // Check that there is enough data before the target position // to contain at least Stream Header and Stream Footer. If there // isn't, the file cannot be valid. if (coder->file_target_pos < 2 * LZMA_STREAM_HEADER_SIZE) return LZMA_DATA_ERROR; coder->temp_pos = 0; // The Stream Header at the very beginning of the file gets handled // specially in SEQ_MAGIC_BYTES and thus we will never need to seek // there. By not seeking to the first LZMA_STREAM_HEADER_SIZE bytes // we avoid a useless external seek after SEQ_MAGIC_BYTES if the // application uses an extremely small input buffer and the input // file is very small. if (coder->file_target_pos - LZMA_STREAM_HEADER_SIZE < sizeof(coder->temp)) coder->temp_size = (size_t)(coder->file_target_pos - LZMA_STREAM_HEADER_SIZE); else coder->temp_size = sizeof(coder->temp); // The above if-statements guarantee this. This is important because // the Stream Header/Footer decoders assume that there's at least // LZMA_STREAM_HEADER_SIZE bytes in coder->temp. assert(coder->temp_size >= LZMA_STREAM_HEADER_SIZE); if (seek_to_pos(coder, coder->file_target_pos - coder->temp_size, in_start, in_pos, in_size)) return LZMA_SEEK_NEEDED; return LZMA_OK; } /// Gets the number of zero-bytes at the end of the buffer. static size_t get_padding_size(const uint8_t *buf, size_t buf_size) { size_t padding = 0; while (buf_size > 0 && buf[--buf_size] == 0x00) ++padding; return padding; } /// With the Stream Header at the very beginning of the file, LZMA_FORMAT_ERROR /// is used to tell the application that Magic Bytes didn't match. In other /// Stream Header/Footer fields (in the middle/end of the file) it could be /// a bit confusing to return LZMA_FORMAT_ERROR as we already know that there /// is a valid Stream Header at the beginning of the file. For those cases /// this function is used to convert LZMA_FORMAT_ERROR to LZMA_DATA_ERROR. static lzma_ret hide_format_error(lzma_ret ret) { if (ret == LZMA_FORMAT_ERROR) ret = LZMA_DATA_ERROR; return ret; } /// Calls the Index decoder and updates coder->index_remaining. /// This is a separate function because the input can be either directly /// from the application or from coder->temp. static lzma_ret decode_index(lzma_file_info_coder *coder, const lzma_allocator *allocator, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size, bool update_file_cur_pos) { const size_t in_start = *in_pos; const lzma_ret ret = coder->index_decoder.code( coder->index_decoder.coder, allocator, in, in_pos, in_size, NULL, NULL, 0, LZMA_RUN); coder->index_remaining -= *in_pos - in_start; if (update_file_cur_pos) coder->file_cur_pos += *in_pos - in_start; return ret; } static lzma_ret file_info_decode(void *coder_ptr, const lzma_allocator *allocator, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size, uint8_t *restrict out lzma_attribute((__unused__)), size_t *restrict out_pos lzma_attribute((__unused__)), size_t out_size lzma_attribute((__unused__)), lzma_action action lzma_attribute((__unused__))) { lzma_file_info_coder *coder = coder_ptr; const size_t in_start = *in_pos; // If the caller provides input past the end of the file, trim // the extra bytes from the buffer so that we won't read too far. assert(coder->file_size >= coder->file_cur_pos); if (coder->file_size - coder->file_cur_pos < in_size - in_start) in_size = in_start + (size_t)(coder->file_size - coder->file_cur_pos); while (true) switch (coder->sequence) { case SEQ_MAGIC_BYTES: // Decode the Stream Header at the beginning of the file // first to check if the Magic Bytes match. The flags // are stored in coder->first_header_flags so that we // don't need to seek to it again. // // Check that the file is big enough to contain at least // Stream Header. if (coder->file_size < LZMA_STREAM_HEADER_SIZE) return LZMA_FORMAT_ERROR; // Read the Stream Header field into coder->temp. if (fill_temp(coder, in, in_pos, in_size)) return LZMA_OK; // This is the only Stream Header/Footer decoding where we // want to return LZMA_FORMAT_ERROR if the Magic Bytes don't // match. Elsewhere it will be converted to LZMA_DATA_ERROR. return_if_error(lzma_stream_header_decode( &coder->first_header_flags, coder->temp)); // Now that we know that the Magic Bytes match, check the // file size. It's better to do this here after checking the // Magic Bytes since this way we can give LZMA_FORMAT_ERROR // instead of LZMA_DATA_ERROR when the Magic Bytes don't // match in a file that is too big or isn't a multiple of // four bytes. if (coder->file_size > LZMA_VLI_MAX || (coder->file_size & 3)) return LZMA_DATA_ERROR; // Start looking for Stream Padding and Stream Footer // at the end of the file. coder->file_target_pos = coder->file_size; - - // Fall through + FALLTHROUGH; case SEQ_PADDING_SEEK: coder->sequence = SEQ_PADDING_DECODE; return_if_error(reverse_seek( coder, in_start, in_pos, in_size)); - - // Fall through + FALLTHROUGH; case SEQ_PADDING_DECODE: { // Copy to coder->temp first. This keeps the code simpler if // the application only provides input a few bytes at a time. if (fill_temp(coder, in, in_pos, in_size)) return LZMA_OK; // Scan the buffer backwards to get the size of the // Stream Padding field (if any). const size_t new_padding = get_padding_size( coder->temp, coder->temp_size); coder->stream_padding += new_padding; // Set the target position to the beginning of Stream Padding // that has been observed so far. If all Stream Padding has // been seen, then the target position will be at the end // of the Stream Footer field. coder->file_target_pos -= new_padding; if (new_padding == coder->temp_size) { // The whole buffer was padding. Seek backwards in // the file to get more input. coder->sequence = SEQ_PADDING_SEEK; break; } // Size of Stream Padding must be a multiple of 4 bytes. if (coder->stream_padding & 3) return LZMA_DATA_ERROR; coder->sequence = SEQ_FOOTER; // Calculate the amount of non-padding data in coder->temp. coder->temp_size -= new_padding; coder->temp_pos = coder->temp_size; // We can avoid an external seek if the whole Stream Footer // is already in coder->temp. In that case SEQ_FOOTER won't // read more input and will find the Stream Footer from // coder->temp[coder->temp_size - LZMA_STREAM_HEADER_SIZE]. // // Otherwise we will need to seek. The seeking is done so // that Stream Footer will be at the end of coder->temp. // This way it's likely that we also get a complete Index // field into coder->temp without needing a separate seek // for that (unless the Index field is big). if (coder->temp_size < LZMA_STREAM_HEADER_SIZE) return_if_error(reverse_seek( coder, in_start, in_pos, in_size)); - } - // Fall through + FALLTHROUGH; + } case SEQ_FOOTER: // Copy the Stream Footer field into coder->temp. // If Stream Footer was already available in coder->temp // in SEQ_PADDING_DECODE, then this does nothing. if (fill_temp(coder, in, in_pos, in_size)) return LZMA_OK; // Make coder->file_target_pos and coder->temp_size point // to the beginning of Stream Footer and thus to the end // of the Index field. coder->temp_pos will be updated // a bit later. coder->file_target_pos -= LZMA_STREAM_HEADER_SIZE; coder->temp_size -= LZMA_STREAM_HEADER_SIZE; // Decode Stream Footer. return_if_error(hide_format_error(lzma_stream_footer_decode( &coder->footer_flags, coder->temp + coder->temp_size))); // Check that we won't seek past the beginning of the file. // // LZMA_STREAM_HEADER_SIZE is added because there must be // space for Stream Header too even though we won't seek // there before decoding the Index field. // // There's no risk of integer overflow here because // Backward Size cannot be greater than 2^34. if (coder->file_target_pos < coder->footer_flags.backward_size + LZMA_STREAM_HEADER_SIZE) return LZMA_DATA_ERROR; // Set the target position to the beginning of the Index field. coder->file_target_pos -= coder->footer_flags.backward_size; coder->sequence = SEQ_INDEX_INIT; // We can avoid an external seek if the whole Index field is // already available in coder->temp. if (coder->temp_size >= coder->footer_flags.backward_size) { // Set coder->temp_pos to point to the beginning // of the Index. coder->temp_pos = coder->temp_size - coder->footer_flags.backward_size; } else { // These are set to zero to indicate that there's no // useful data (Index or anything else) in coder->temp. coder->temp_pos = 0; coder->temp_size = 0; // Seek to the beginning of the Index field. if (seek_to_pos(coder, coder->file_target_pos, in_start, in_pos, in_size)) return LZMA_SEEK_NEEDED; } - // Fall through + FALLTHROUGH; case SEQ_INDEX_INIT: { // Calculate the amount of memory already used by the earlier // Indexes so that we know how big memory limit to pass to // the Index decoder. // // NOTE: When there are multiple Streams, the separate // lzma_index structures can use more RAM (as measured by // lzma_index_memused()) than the final combined lzma_index. // Thus memlimit may need to be slightly higher than the final // calculated memory usage will be. This is perhaps a bit // confusing to the application, but I think it shouldn't // cause problems in practice. uint64_t memused = 0; if (coder->combined_index != NULL) { memused = lzma_index_memused(coder->combined_index); assert(memused <= coder->memlimit); if (memused > coder->memlimit) // Extra sanity check return LZMA_PROG_ERROR; } // Initialize the Index decoder. return_if_error(lzma_index_decoder_init( &coder->index_decoder, allocator, &coder->this_index, coder->memlimit - memused)); coder->index_remaining = coder->footer_flags.backward_size; coder->sequence = SEQ_INDEX_DECODE; + FALLTHROUGH; } - // Fall through - case SEQ_INDEX_DECODE: { // Decode (a part of) the Index. If the whole Index is already // in coder->temp, read it from there. Otherwise read from // in[*in_pos] onwards. Note that index_decode() updates // coder->index_remaining and optionally coder->file_cur_pos. lzma_ret ret; if (coder->temp_size != 0) { assert(coder->temp_size - coder->temp_pos == coder->index_remaining); ret = decode_index(coder, allocator, coder->temp, &coder->temp_pos, coder->temp_size, false); } else { // Don't give the decoder more input than the known // remaining size of the Index field. size_t in_stop = in_size; if (in_size - *in_pos > coder->index_remaining) in_stop = *in_pos + (size_t)(coder->index_remaining); ret = decode_index(coder, allocator, in, in_pos, in_stop, true); } switch (ret) { case LZMA_OK: // If the Index docoder asks for more input when we // have already given it as much input as Backward Size // indicated, the file is invalid. if (coder->index_remaining == 0) return LZMA_DATA_ERROR; // We cannot get here if we were reading Index from // coder->temp because when reading from coder->temp // we give the Index decoder exactly // coder->index_remaining bytes of input. assert(coder->temp_size == 0); return LZMA_OK; case LZMA_STREAM_END: // If the decoding seems to be successful, check also // that the Index decoder consumed as much input as // indicated by the Backward Size field. if (coder->index_remaining != 0) return LZMA_DATA_ERROR; break; default: return ret; } // Calculate how much the Index tells us to seek backwards // (relative to the beginning of the Index): Total size of // all Blocks plus the size of the Stream Header field. // No integer overflow here because lzma_index_total_size() // cannot return a value greater than LZMA_VLI_MAX. const uint64_t seek_amount = lzma_index_total_size(coder->this_index) + LZMA_STREAM_HEADER_SIZE; // Check that Index is sane in sense that seek_amount won't // make us seek past the beginning of the file when locating // the Stream Header. // // coder->file_target_pos still points to the beginning of // the Index field. if (coder->file_target_pos < seek_amount) return LZMA_DATA_ERROR; // Set the target to the beginning of Stream Header. coder->file_target_pos -= seek_amount; if (coder->file_target_pos == 0) { // We would seek to the beginning of the file, but // since we already decoded that Stream Header in // SEQ_MAGIC_BYTES, we can use the cached value from // coder->first_header_flags to avoid the seek. coder->header_flags = coder->first_header_flags; coder->sequence = SEQ_HEADER_COMPARE; break; } coder->sequence = SEQ_HEADER_DECODE; // Make coder->file_target_pos point to the end of // the Stream Header field. coder->file_target_pos += LZMA_STREAM_HEADER_SIZE; // If coder->temp_size is non-zero, it points to the end // of the Index field. Then the beginning of the Index // field is at coder->temp[coder->temp_size // - coder->footer_flags.backward_size]. assert(coder->temp_size == 0 || coder->temp_size >= coder->footer_flags.backward_size); // If coder->temp contained the whole Index, see if it has // enough data to contain also the Stream Header. If so, // we avoid an external seek. // // NOTE: This can happen only with small .xz files and only // for the non-first Stream as the Stream Flags of the first // Stream are cached and already handled a few lines above. // So this isn't as useful as the other seek-avoidance cases. if (coder->temp_size != 0 && coder->temp_size - coder->footer_flags.backward_size >= seek_amount) { // Make temp_pos and temp_size point to the *end* of // Stream Header so that SEQ_HEADER_DECODE will find // the start of Stream Header from coder->temp[ // coder->temp_size - LZMA_STREAM_HEADER_SIZE]. coder->temp_pos = coder->temp_size - coder->footer_flags.backward_size - seek_amount + LZMA_STREAM_HEADER_SIZE; coder->temp_size = coder->temp_pos; } else { // Seek so that Stream Header will be at the end of // coder->temp. With typical multi-Stream files we // will usually also get the Stream Footer and Index // of the *previous* Stream in coder->temp and thus // won't need a separate seek for them. return_if_error(reverse_seek(coder, in_start, in_pos, in_size)); } - } - // Fall through + FALLTHROUGH; + } case SEQ_HEADER_DECODE: // Copy the Stream Header field into coder->temp. // If Stream Header was already available in coder->temp // in SEQ_INDEX_DECODE, then this does nothing. if (fill_temp(coder, in, in_pos, in_size)) return LZMA_OK; // Make all these point to the beginning of Stream Header. coder->file_target_pos -= LZMA_STREAM_HEADER_SIZE; coder->temp_size -= LZMA_STREAM_HEADER_SIZE; coder->temp_pos = coder->temp_size; // Decode the Stream Header. return_if_error(hide_format_error(lzma_stream_header_decode( &coder->header_flags, coder->temp + coder->temp_size))); coder->sequence = SEQ_HEADER_COMPARE; - - // Fall through + FALLTHROUGH; case SEQ_HEADER_COMPARE: // Compare Stream Header against Stream Footer. They must // match. return_if_error(lzma_stream_flags_compare( &coder->header_flags, &coder->footer_flags)); // Store the decoded Stream Flags into the Index. Use the // Footer Flags because it contains Backward Size, although // it shouldn't matter in practice. if (lzma_index_stream_flags(coder->this_index, &coder->footer_flags) != LZMA_OK) return LZMA_PROG_ERROR; // Store also the size of the Stream Padding field. It is // needed to calculate the offsets of the Streams correctly. if (lzma_index_stream_padding(coder->this_index, coder->stream_padding) != LZMA_OK) return LZMA_PROG_ERROR; // Reset it so that it's ready for the next Stream. coder->stream_padding = 0; // Append the earlier decoded Indexes after this_index. if (coder->combined_index != NULL) return_if_error(lzma_index_cat(coder->this_index, coder->combined_index, allocator)); coder->combined_index = coder->this_index; coder->this_index = NULL; // If the whole file was decoded, tell the caller that we // are finished. if (coder->file_target_pos == 0) { // The combined index must indicate the same file // size as was told to us at initialization. assert(lzma_index_file_size(coder->combined_index) == coder->file_size); // Make the combined index available to // the application. *coder->dest_index = coder->combined_index; coder->combined_index = NULL; // Mark the input buffer as used since we may have // done internal seeking and thus don't know how // many input bytes were actually used. This way // lzma_stream.total_in gets a slightly better // estimate of the amount of input used. *in_pos = in_size; return LZMA_STREAM_END; } // We didn't hit the beginning of the file yet, so continue // reading backwards in the file. If we have unprocessed // data in coder->temp, use it before requesting more data // from the application. // // coder->file_target_pos, coder->temp_size, and // coder->temp_pos all point to the beginning of Stream Header // and thus the end of the previous Stream in the file. coder->sequence = coder->temp_size > 0 ? SEQ_PADDING_DECODE : SEQ_PADDING_SEEK; break; default: assert(0); return LZMA_PROG_ERROR; } } static lzma_ret file_info_decoder_memconfig(void *coder_ptr, uint64_t *memusage, uint64_t *old_memlimit, uint64_t new_memlimit) { lzma_file_info_coder *coder = coder_ptr; // The memory usage calculation comes from three things: // // (1) The Indexes that have already been decoded and processed into // coder->combined_index. // // (2) The latest Index in coder->this_index that has been decoded but // not yet put into coder->combined_index. // // (3) The latest Index that we have started decoding but haven't // finished and thus isn't available in coder->this_index yet. // Memory usage and limit information needs to be communicated // from/to coder->index_decoder. // // Care has to be taken to not do both (2) and (3) when calculating // the memory usage. uint64_t combined_index_memusage = 0; uint64_t this_index_memusage = 0; // (1) If we have already successfully decoded one or more Indexes, // get their memory usage. if (coder->combined_index != NULL) combined_index_memusage = lzma_index_memused( coder->combined_index); // Choose between (2), (3), or neither. if (coder->this_index != NULL) { // (2) The latest Index is available. Use its memory usage. this_index_memusage = lzma_index_memused(coder->this_index); } else if (coder->sequence == SEQ_INDEX_DECODE) { // (3) The Index decoder is activate and hasn't yet stored // the new index in coder->this_index. Get the memory usage // information from the Index decoder. // // NOTE: If the Index decoder doesn't yet know how much memory // it will eventually need, it will return a tiny value here. uint64_t dummy; if (coder->index_decoder.memconfig(coder->index_decoder.coder, &this_index_memusage, &dummy, 0) != LZMA_OK) { assert(0); return LZMA_PROG_ERROR; } } // Now we know the total memory usage/requirement. If we had neither // old Indexes nor a new Index, this will be zero which isn't // acceptable as lzma_memusage() has to return non-zero on success // and even with an empty .xz file we will end up with a lzma_index // that takes some memory. *memusage = combined_index_memusage + this_index_memusage; if (*memusage == 0) *memusage = lzma_index_memusage(1, 0); *old_memlimit = coder->memlimit; // If requested, set a new memory usage limit. if (new_memlimit != 0) { if (new_memlimit < *memusage) return LZMA_MEMLIMIT_ERROR; // In the condition (3) we need to tell the Index decoder // its new memory usage limit. if (coder->this_index == NULL && coder->sequence == SEQ_INDEX_DECODE) { const uint64_t idec_new_memlimit = new_memlimit - combined_index_memusage; assert(this_index_memusage > 0); assert(idec_new_memlimit > 0); uint64_t dummy1; uint64_t dummy2; if (coder->index_decoder.memconfig( coder->index_decoder.coder, &dummy1, &dummy2, idec_new_memlimit) != LZMA_OK) { assert(0); return LZMA_PROG_ERROR; } } coder->memlimit = new_memlimit; } return LZMA_OK; } static void file_info_decoder_end(void *coder_ptr, const lzma_allocator *allocator) { lzma_file_info_coder *coder = coder_ptr; lzma_next_end(&coder->index_decoder, allocator); lzma_index_end(coder->this_index, allocator); lzma_index_end(coder->combined_index, allocator); lzma_free(coder, allocator); return; } static lzma_ret lzma_file_info_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator, uint64_t *seek_pos, lzma_index **dest_index, uint64_t memlimit, uint64_t file_size) { lzma_next_coder_init(&lzma_file_info_decoder_init, next, allocator); if (dest_index == NULL) return LZMA_PROG_ERROR; lzma_file_info_coder *coder = next->coder; if (coder == NULL) { coder = lzma_alloc(sizeof(lzma_file_info_coder), allocator); if (coder == NULL) return LZMA_MEM_ERROR; next->coder = coder; next->code = &file_info_decode; next->end = &file_info_decoder_end; next->memconfig = &file_info_decoder_memconfig; coder->index_decoder = LZMA_NEXT_CODER_INIT; coder->this_index = NULL; coder->combined_index = NULL; } coder->sequence = SEQ_MAGIC_BYTES; coder->file_cur_pos = 0; coder->file_target_pos = 0; coder->file_size = file_size; lzma_index_end(coder->this_index, allocator); coder->this_index = NULL; lzma_index_end(coder->combined_index, allocator); coder->combined_index = NULL; coder->stream_padding = 0; coder->dest_index = dest_index; coder->external_seek_pos = seek_pos; // If memlimit is 0, make it 1 to ensure that lzma_memlimit_get() // won't return 0 (which would indicate an error). coder->memlimit = my_max(1, memlimit); // Prepare these for reading the first Stream Header into coder->temp. coder->temp_pos = 0; coder->temp_size = LZMA_STREAM_HEADER_SIZE; return LZMA_OK; } extern LZMA_API(lzma_ret) lzma_file_info_decoder(lzma_stream *strm, lzma_index **dest_index, uint64_t memlimit, uint64_t file_size) { lzma_next_strm_init(lzma_file_info_decoder_init, strm, &strm->seek_pos, dest_index, memlimit, file_size); // We allow LZMA_FINISH in addition to LZMA_RUN for convenience. // lzma_code() is able to handle the LZMA_FINISH + LZMA_SEEK_NEEDED // combination in a sane way. Applications still need to be careful // if they use LZMA_FINISH so that they remember to reset it back // to LZMA_RUN after seeking if needed. strm->internal->supported_actions[LZMA_RUN] = true; strm->internal->supported_actions[LZMA_FINISH] = true; return LZMA_OK; } diff --git a/src/liblzma/common/index_decoder.c b/src/liblzma/common/index_decoder.c index 4bcb30692115..4eab56d942e1 100644 --- a/src/liblzma/common/index_decoder.c +++ b/src/liblzma/common/index_decoder.c @@ -1,372 +1,369 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file index_decoder.c /// \brief Decodes the Index field // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "index_decoder.h" #include "check.h" typedef struct { enum { SEQ_INDICATOR, SEQ_COUNT, SEQ_MEMUSAGE, SEQ_UNPADDED, SEQ_UNCOMPRESSED, SEQ_PADDING_INIT, SEQ_PADDING, SEQ_CRC32, } sequence; /// Memory usage limit uint64_t memlimit; /// Target Index lzma_index *index; /// Pointer give by the application, which is set after /// successful decoding. lzma_index **index_ptr; /// Number of Records left to decode. lzma_vli count; /// The most recent Unpadded Size field lzma_vli unpadded_size; /// The most recent Uncompressed Size field lzma_vli uncompressed_size; /// Position in integers size_t pos; /// CRC32 of the List of Records field uint32_t crc32; } lzma_index_coder; static lzma_ret index_decode(void *coder_ptr, const lzma_allocator *allocator, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size, uint8_t *restrict out lzma_attribute((__unused__)), size_t *restrict out_pos lzma_attribute((__unused__)), size_t out_size lzma_attribute((__unused__)), lzma_action action lzma_attribute((__unused__))) { lzma_index_coder *coder = coder_ptr; // Similar optimization as in index_encoder.c const size_t in_start = *in_pos; lzma_ret ret = LZMA_OK; while (*in_pos < in_size) switch (coder->sequence) { case SEQ_INDICATOR: // Return LZMA_DATA_ERROR instead of e.g. LZMA_PROG_ERROR or // LZMA_FORMAT_ERROR, because a typical usage case for Index // decoder is when parsing the Stream backwards. If seeking // backward from the Stream Footer gives us something that // doesn't begin with Index Indicator, the file is considered // corrupt, not "programming error" or "unrecognized file // format". One could argue that the application should // verify the Index Indicator before trying to decode the // Index, but well, I suppose it is simpler this way. if (in[(*in_pos)++] != INDEX_INDICATOR) return LZMA_DATA_ERROR; coder->sequence = SEQ_COUNT; break; case SEQ_COUNT: ret = lzma_vli_decode(&coder->count, &coder->pos, in, in_pos, in_size); if (ret != LZMA_STREAM_END) goto out; coder->pos = 0; coder->sequence = SEQ_MEMUSAGE; - - // Fall through + FALLTHROUGH; case SEQ_MEMUSAGE: if (lzma_index_memusage(1, coder->count) > coder->memlimit) { ret = LZMA_MEMLIMIT_ERROR; goto out; } // Tell the Index handling code how many Records this // Index has to allow it to allocate memory more efficiently. lzma_index_prealloc(coder->index, coder->count); ret = LZMA_OK; coder->sequence = coder->count == 0 ? SEQ_PADDING_INIT : SEQ_UNPADDED; break; case SEQ_UNPADDED: case SEQ_UNCOMPRESSED: { lzma_vli *size = coder->sequence == SEQ_UNPADDED ? &coder->unpadded_size : &coder->uncompressed_size; ret = lzma_vli_decode(size, &coder->pos, in, in_pos, in_size); if (ret != LZMA_STREAM_END) goto out; ret = LZMA_OK; coder->pos = 0; if (coder->sequence == SEQ_UNPADDED) { // Validate that encoded Unpadded Size isn't too small // or too big. if (coder->unpadded_size < UNPADDED_SIZE_MIN || coder->unpadded_size > UNPADDED_SIZE_MAX) return LZMA_DATA_ERROR; coder->sequence = SEQ_UNCOMPRESSED; } else { // Add the decoded Record to the Index. return_if_error(lzma_index_append( coder->index, allocator, coder->unpadded_size, coder->uncompressed_size)); // Check if this was the last Record. coder->sequence = --coder->count == 0 ? SEQ_PADDING_INIT : SEQ_UNPADDED; } break; } case SEQ_PADDING_INIT: coder->pos = lzma_index_padding_size(coder->index); coder->sequence = SEQ_PADDING; - - // Fall through + FALLTHROUGH; case SEQ_PADDING: if (coder->pos > 0) { --coder->pos; if (in[(*in_pos)++] != 0x00) return LZMA_DATA_ERROR; break; } // Finish the CRC32 calculation. coder->crc32 = lzma_crc32(in + in_start, *in_pos - in_start, coder->crc32); coder->sequence = SEQ_CRC32; - - // Fall through + FALLTHROUGH; case SEQ_CRC32: do { if (*in_pos == in_size) return LZMA_OK; if (((coder->crc32 >> (coder->pos * 8)) & 0xFF) != in[(*in_pos)++]) { #ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION return LZMA_DATA_ERROR; #endif } } while (++coder->pos < 4); // Decoding was successful, now we can let the application // see the decoded Index. *coder->index_ptr = coder->index; // Make index NULL so we don't free it unintentionally. coder->index = NULL; return LZMA_STREAM_END; default: assert(0); return LZMA_PROG_ERROR; } out: // Update the CRC32. // // Avoid null pointer + 0 (undefined behavior) in "in + in_start". // In such a case we had no input and thus in_used == 0. { const size_t in_used = *in_pos - in_start; if (in_used > 0) coder->crc32 = lzma_crc32(in + in_start, in_used, coder->crc32); } return ret; } static void index_decoder_end(void *coder_ptr, const lzma_allocator *allocator) { lzma_index_coder *coder = coder_ptr; lzma_index_end(coder->index, allocator); lzma_free(coder, allocator); return; } static lzma_ret index_decoder_memconfig(void *coder_ptr, uint64_t *memusage, uint64_t *old_memlimit, uint64_t new_memlimit) { lzma_index_coder *coder = coder_ptr; *memusage = lzma_index_memusage(1, coder->count); *old_memlimit = coder->memlimit; if (new_memlimit != 0) { if (new_memlimit < *memusage) return LZMA_MEMLIMIT_ERROR; coder->memlimit = new_memlimit; } return LZMA_OK; } static lzma_ret index_decoder_reset(lzma_index_coder *coder, const lzma_allocator *allocator, lzma_index **i, uint64_t memlimit) { // Remember the pointer given by the application. We will set it // to point to the decoded Index only if decoding is successful. // Before that, keep it NULL so that applications can always safely // pass it to lzma_index_end() no matter did decoding succeed or not. coder->index_ptr = i; *i = NULL; // We always allocate a new lzma_index. coder->index = lzma_index_init(allocator); if (coder->index == NULL) return LZMA_MEM_ERROR; // Initialize the rest. coder->sequence = SEQ_INDICATOR; coder->memlimit = my_max(1, memlimit); coder->count = 0; // Needs to be initialized due to _memconfig(). coder->pos = 0; coder->crc32 = 0; return LZMA_OK; } extern lzma_ret lzma_index_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator, lzma_index **i, uint64_t memlimit) { lzma_next_coder_init(&lzma_index_decoder_init, next, allocator); if (i == NULL) return LZMA_PROG_ERROR; lzma_index_coder *coder = next->coder; if (coder == NULL) { coder = lzma_alloc(sizeof(lzma_index_coder), allocator); if (coder == NULL) return LZMA_MEM_ERROR; next->coder = coder; next->code = &index_decode; next->end = &index_decoder_end; next->memconfig = &index_decoder_memconfig; coder->index = NULL; } else { lzma_index_end(coder->index, allocator); } return index_decoder_reset(coder, allocator, i, memlimit); } extern LZMA_API(lzma_ret) lzma_index_decoder(lzma_stream *strm, lzma_index **i, uint64_t memlimit) { // If i isn't NULL, *i must always be initialized due to // the wording in the API docs. This way it is initialized // if we return LZMA_PROG_ERROR due to strm == NULL. if (i != NULL) *i = NULL; lzma_next_strm_init(lzma_index_decoder_init, strm, i, memlimit); strm->internal->supported_actions[LZMA_RUN] = true; strm->internal->supported_actions[LZMA_FINISH] = true; return LZMA_OK; } extern LZMA_API(lzma_ret) lzma_index_buffer_decode(lzma_index **i, uint64_t *memlimit, const lzma_allocator *allocator, const uint8_t *in, size_t *in_pos, size_t in_size) { // If i isn't NULL, *i must always be initialized due to // the wording in the API docs. if (i != NULL) *i = NULL; // Sanity checks if (i == NULL || memlimit == NULL || in == NULL || in_pos == NULL || *in_pos > in_size) return LZMA_PROG_ERROR; // Initialize the decoder. lzma_index_coder coder; return_if_error(index_decoder_reset(&coder, allocator, i, *memlimit)); // Store the input start position so that we can restore it in case // of an error. const size_t in_start = *in_pos; // Do the actual decoding. lzma_ret ret = index_decode(&coder, allocator, in, in_pos, in_size, NULL, NULL, 0, LZMA_RUN); if (ret == LZMA_STREAM_END) { ret = LZMA_OK; } else { // Something went wrong, free the Index structure and restore // the input position. lzma_index_end(coder.index, allocator); *in_pos = in_start; if (ret == LZMA_OK) { // The input is truncated or otherwise corrupt. // Use LZMA_DATA_ERROR instead of LZMA_BUF_ERROR // like lzma_vli_decode() does in single-call mode. ret = LZMA_DATA_ERROR; } else if (ret == LZMA_MEMLIMIT_ERROR) { // Tell the caller how much memory would have // been needed. *memlimit = lzma_index_memusage(1, coder.count); } } return ret; } diff --git a/src/liblzma/common/index_encoder.c b/src/liblzma/common/index_encoder.c index ecc299c0159f..80f1be1e3aea 100644 --- a/src/liblzma/common/index_encoder.c +++ b/src/liblzma/common/index_encoder.c @@ -1,262 +1,260 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file index_encoder.c /// \brief Encodes the Index field // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "index_encoder.h" #include "index.h" #include "check.h" typedef struct { enum { SEQ_INDICATOR, SEQ_COUNT, SEQ_UNPADDED, SEQ_UNCOMPRESSED, SEQ_NEXT, SEQ_PADDING, SEQ_CRC32, } sequence; /// Index being encoded const lzma_index *index; /// Iterator for the Index being encoded lzma_index_iter iter; /// Position in integers size_t pos; /// CRC32 of the List of Records field uint32_t crc32; } lzma_index_coder; static lzma_ret index_encode(void *coder_ptr, const lzma_allocator *allocator lzma_attribute((__unused__)), const uint8_t *restrict in lzma_attribute((__unused__)), size_t *restrict in_pos lzma_attribute((__unused__)), size_t in_size lzma_attribute((__unused__)), uint8_t *restrict out, size_t *restrict out_pos, size_t out_size, lzma_action action lzma_attribute((__unused__))) { lzma_index_coder *coder = coder_ptr; // Position where to start calculating CRC32. The idea is that we // need to call lzma_crc32() only once per call to index_encode(). const size_t out_start = *out_pos; // Return value to use if we return at the end of this function. // We use "goto out" to jump out of the while-switch construct // instead of returning directly, because that way we don't need // to copypaste the lzma_crc32() call to many places. lzma_ret ret = LZMA_OK; while (*out_pos < out_size) switch (coder->sequence) { case SEQ_INDICATOR: out[*out_pos] = INDEX_INDICATOR; ++*out_pos; coder->sequence = SEQ_COUNT; break; case SEQ_COUNT: { const lzma_vli count = lzma_index_block_count(coder->index); ret = lzma_vli_encode(count, &coder->pos, out, out_pos, out_size); if (ret != LZMA_STREAM_END) goto out; ret = LZMA_OK; coder->pos = 0; coder->sequence = SEQ_NEXT; break; } case SEQ_NEXT: if (lzma_index_iter_next( &coder->iter, LZMA_INDEX_ITER_BLOCK)) { // Get the size of the Index Padding field. coder->pos = lzma_index_padding_size(coder->index); assert(coder->pos <= 3); coder->sequence = SEQ_PADDING; break; } coder->sequence = SEQ_UNPADDED; - - // Fall through + FALLTHROUGH; case SEQ_UNPADDED: case SEQ_UNCOMPRESSED: { const lzma_vli size = coder->sequence == SEQ_UNPADDED ? coder->iter.block.unpadded_size : coder->iter.block.uncompressed_size; ret = lzma_vli_encode(size, &coder->pos, out, out_pos, out_size); if (ret != LZMA_STREAM_END) goto out; ret = LZMA_OK; coder->pos = 0; // Advance to SEQ_UNCOMPRESSED or SEQ_NEXT. ++coder->sequence; break; } case SEQ_PADDING: if (coder->pos > 0) { --coder->pos; out[(*out_pos)++] = 0x00; break; } // Finish the CRC32 calculation. coder->crc32 = lzma_crc32(out + out_start, *out_pos - out_start, coder->crc32); coder->sequence = SEQ_CRC32; - - // Fall through + FALLTHROUGH; case SEQ_CRC32: // We don't use the main loop, because we don't want // coder->crc32 to be touched anymore. do { if (*out_pos == out_size) return LZMA_OK; out[*out_pos] = (coder->crc32 >> (coder->pos * 8)) & 0xFF; ++*out_pos; } while (++coder->pos < 4); return LZMA_STREAM_END; default: assert(0); return LZMA_PROG_ERROR; } out: // Update the CRC32. // // Avoid null pointer + 0 (undefined behavior) in "out + out_start". // In such a case we had no input and thus out_used == 0. { const size_t out_used = *out_pos - out_start; if (out_used > 0) coder->crc32 = lzma_crc32(out + out_start, out_used, coder->crc32); } return ret; } static void index_encoder_end(void *coder, const lzma_allocator *allocator) { lzma_free(coder, allocator); return; } static void index_encoder_reset(lzma_index_coder *coder, const lzma_index *i) { lzma_index_iter_init(&coder->iter, i); coder->sequence = SEQ_INDICATOR; coder->index = i; coder->pos = 0; coder->crc32 = 0; return; } extern lzma_ret lzma_index_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_index *i) { lzma_next_coder_init(&lzma_index_encoder_init, next, allocator); if (i == NULL) return LZMA_PROG_ERROR; if (next->coder == NULL) { next->coder = lzma_alloc(sizeof(lzma_index_coder), allocator); if (next->coder == NULL) return LZMA_MEM_ERROR; next->code = &index_encode; next->end = &index_encoder_end; } index_encoder_reset(next->coder, i); return LZMA_OK; } extern LZMA_API(lzma_ret) lzma_index_encoder(lzma_stream *strm, const lzma_index *i) { lzma_next_strm_init(lzma_index_encoder_init, strm, i); strm->internal->supported_actions[LZMA_RUN] = true; strm->internal->supported_actions[LZMA_FINISH] = true; return LZMA_OK; } extern LZMA_API(lzma_ret) lzma_index_buffer_encode(const lzma_index *i, uint8_t *out, size_t *out_pos, size_t out_size) { // Validate the arguments. if (i == NULL || out == NULL || out_pos == NULL || *out_pos > out_size) return LZMA_PROG_ERROR; // Don't try to encode if there's not enough output space. if (out_size - *out_pos < lzma_index_size(i)) return LZMA_BUF_ERROR; // The Index encoder needs just one small data structure so we can // allocate it on stack. lzma_index_coder coder; index_encoder_reset(&coder, i); // Do the actual encoding. This should never fail, but store // the original *out_pos just in case. const size_t out_start = *out_pos; lzma_ret ret = index_encode(&coder, NULL, NULL, NULL, 0, out, out_pos, out_size, LZMA_RUN); if (ret == LZMA_STREAM_END) { ret = LZMA_OK; } else { // We should never get here, but just in case, restore the // output position and set the error accordingly if something // goes wrong and debugging isn't enabled. assert(0); *out_pos = out_start; ret = LZMA_PROG_ERROR; } return ret; } diff --git a/src/liblzma/common/index_hash.c b/src/liblzma/common/index_hash.c index caa5967ca496..b7f1b6b58d1a 100644 --- a/src/liblzma/common/index_hash.c +++ b/src/liblzma/common/index_hash.c @@ -1,342 +1,341 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file index_hash.c /// \brief Validates Index by using a hash function // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "common.h" #include "index.h" #include "check.h" typedef struct { /// Sum of the Block sizes (including Block Padding) lzma_vli blocks_size; /// Sum of the Uncompressed Size fields lzma_vli uncompressed_size; /// Number of Records lzma_vli count; /// Size of the List of Index Records as bytes lzma_vli index_list_size; /// Check calculated from Unpadded Sizes and Uncompressed Sizes. lzma_check_state check; } lzma_index_hash_info; struct lzma_index_hash_s { enum { SEQ_BLOCK, SEQ_COUNT, SEQ_UNPADDED, SEQ_UNCOMPRESSED, SEQ_PADDING_INIT, SEQ_PADDING, SEQ_CRC32, } sequence; /// Information collected while decoding the actual Blocks. lzma_index_hash_info blocks; /// Information collected from the Index field. lzma_index_hash_info records; /// Number of Records not fully decoded lzma_vli remaining; /// Unpadded Size currently being read from an Index Record. lzma_vli unpadded_size; /// Uncompressed Size currently being read from an Index Record. lzma_vli uncompressed_size; /// Position in variable-length integers when decoding them from /// the List of Records. size_t pos; /// CRC32 of the Index uint32_t crc32; }; extern LZMA_API(lzma_index_hash *) lzma_index_hash_init(lzma_index_hash *index_hash, const lzma_allocator *allocator) { if (index_hash == NULL) { index_hash = lzma_alloc(sizeof(lzma_index_hash), allocator); if (index_hash == NULL) return NULL; } index_hash->sequence = SEQ_BLOCK; index_hash->blocks.blocks_size = 0; index_hash->blocks.uncompressed_size = 0; index_hash->blocks.count = 0; index_hash->blocks.index_list_size = 0; index_hash->records.blocks_size = 0; index_hash->records.uncompressed_size = 0; index_hash->records.count = 0; index_hash->records.index_list_size = 0; index_hash->unpadded_size = 0; index_hash->uncompressed_size = 0; index_hash->pos = 0; index_hash->crc32 = 0; // These cannot fail because LZMA_CHECK_BEST is known to be supported. (void)lzma_check_init(&index_hash->blocks.check, LZMA_CHECK_BEST); (void)lzma_check_init(&index_hash->records.check, LZMA_CHECK_BEST); return index_hash; } extern LZMA_API(void) lzma_index_hash_end(lzma_index_hash *index_hash, const lzma_allocator *allocator) { lzma_free(index_hash, allocator); return; } extern LZMA_API(lzma_vli) lzma_index_hash_size(const lzma_index_hash *index_hash) { // Get the size of the Index from ->blocks instead of ->records for // cases where application wants to know the Index Size before // decoding the Index. return index_size(index_hash->blocks.count, index_hash->blocks.index_list_size); } /// Updates the sizes and the hash without any validation. static void hash_append(lzma_index_hash_info *info, lzma_vli unpadded_size, lzma_vli uncompressed_size) { info->blocks_size += vli_ceil4(unpadded_size); info->uncompressed_size += uncompressed_size; info->index_list_size += lzma_vli_size(unpadded_size) + lzma_vli_size(uncompressed_size); ++info->count; const lzma_vli sizes[2] = { unpadded_size, uncompressed_size }; lzma_check_update(&info->check, LZMA_CHECK_BEST, (const uint8_t *)(sizes), sizeof(sizes)); return; } extern LZMA_API(lzma_ret) lzma_index_hash_append(lzma_index_hash *index_hash, lzma_vli unpadded_size, lzma_vli uncompressed_size) { // Validate the arguments. if (index_hash == NULL || index_hash->sequence != SEQ_BLOCK || unpadded_size < UNPADDED_SIZE_MIN || unpadded_size > UNPADDED_SIZE_MAX || uncompressed_size > LZMA_VLI_MAX) return LZMA_PROG_ERROR; // Update the hash. hash_append(&index_hash->blocks, unpadded_size, uncompressed_size); // Validate the properties of *info are still in allowed limits. if (index_hash->blocks.blocks_size > LZMA_VLI_MAX || index_hash->blocks.uncompressed_size > LZMA_VLI_MAX || index_size(index_hash->blocks.count, index_hash->blocks.index_list_size) > LZMA_BACKWARD_SIZE_MAX || index_stream_size(index_hash->blocks.blocks_size, index_hash->blocks.count, index_hash->blocks.index_list_size) > LZMA_VLI_MAX) return LZMA_DATA_ERROR; return LZMA_OK; } extern LZMA_API(lzma_ret) lzma_index_hash_decode(lzma_index_hash *index_hash, const uint8_t *in, size_t *in_pos, size_t in_size) { // Catch zero input buffer here, because in contrast to Index encoder // and decoder functions, applications call this function directly // instead of via lzma_code(), which does the buffer checking. if (*in_pos >= in_size) return LZMA_BUF_ERROR; // NOTE: This function has many similarities to index_encode() and // index_decode() functions found from index_encoder.c and // index_decoder.c. See the comments especially in index_encoder.c. const size_t in_start = *in_pos; lzma_ret ret = LZMA_OK; while (*in_pos < in_size) switch (index_hash->sequence) { case SEQ_BLOCK: // Check the Index Indicator is present. if (in[(*in_pos)++] != INDEX_INDICATOR) return LZMA_DATA_ERROR; index_hash->sequence = SEQ_COUNT; break; case SEQ_COUNT: { ret = lzma_vli_decode(&index_hash->remaining, &index_hash->pos, in, in_pos, in_size); if (ret != LZMA_STREAM_END) goto out; // The count must match the count of the Blocks decoded. if (index_hash->remaining != index_hash->blocks.count) return LZMA_DATA_ERROR; ret = LZMA_OK; index_hash->pos = 0; // Handle the special case when there are no Blocks. index_hash->sequence = index_hash->remaining == 0 ? SEQ_PADDING_INIT : SEQ_UNPADDED; break; } case SEQ_UNPADDED: case SEQ_UNCOMPRESSED: { lzma_vli *size = index_hash->sequence == SEQ_UNPADDED ? &index_hash->unpadded_size : &index_hash->uncompressed_size; ret = lzma_vli_decode(size, &index_hash->pos, in, in_pos, in_size); if (ret != LZMA_STREAM_END) goto out; ret = LZMA_OK; index_hash->pos = 0; if (index_hash->sequence == SEQ_UNPADDED) { if (index_hash->unpadded_size < UNPADDED_SIZE_MIN || index_hash->unpadded_size > UNPADDED_SIZE_MAX) return LZMA_DATA_ERROR; index_hash->sequence = SEQ_UNCOMPRESSED; } else { // Update the hash. hash_append(&index_hash->records, index_hash->unpadded_size, index_hash->uncompressed_size); // Verify that we don't go over the known sizes. Note // that this validation is simpler than the one used // in lzma_index_hash_append(), because here we know // that values in index_hash->blocks are already // validated and we are fine as long as we don't // exceed them in index_hash->records. if (index_hash->blocks.blocks_size < index_hash->records.blocks_size || index_hash->blocks.uncompressed_size < index_hash->records.uncompressed_size || index_hash->blocks.index_list_size < index_hash->records.index_list_size) return LZMA_DATA_ERROR; // Check if this was the last Record. index_hash->sequence = --index_hash->remaining == 0 ? SEQ_PADDING_INIT : SEQ_UNPADDED; } break; } case SEQ_PADDING_INIT: index_hash->pos = (LZMA_VLI_C(4) - index_size_unpadded( index_hash->records.count, index_hash->records.index_list_size)) & 3; - index_hash->sequence = SEQ_PADDING; - // Fall through + index_hash->sequence = SEQ_PADDING; + FALLTHROUGH; case SEQ_PADDING: if (index_hash->pos > 0) { --index_hash->pos; if (in[(*in_pos)++] != 0x00) return LZMA_DATA_ERROR; break; } // Compare the sizes. if (index_hash->blocks.blocks_size != index_hash->records.blocks_size || index_hash->blocks.uncompressed_size != index_hash->records.uncompressed_size || index_hash->blocks.index_list_size != index_hash->records.index_list_size) return LZMA_DATA_ERROR; // Finish the hashes and compare them. lzma_check_finish(&index_hash->blocks.check, LZMA_CHECK_BEST); lzma_check_finish(&index_hash->records.check, LZMA_CHECK_BEST); if (memcmp(index_hash->blocks.check.buffer.u8, index_hash->records.check.buffer.u8, lzma_check_size(LZMA_CHECK_BEST)) != 0) return LZMA_DATA_ERROR; // Finish the CRC32 calculation. index_hash->crc32 = lzma_crc32(in + in_start, *in_pos - in_start, index_hash->crc32); index_hash->sequence = SEQ_CRC32; - - // Fall through + FALLTHROUGH; case SEQ_CRC32: do { if (*in_pos == in_size) return LZMA_OK; if (((index_hash->crc32 >> (index_hash->pos * 8)) & 0xFF) != in[(*in_pos)++]) { #ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION return LZMA_DATA_ERROR; #endif } } while (++index_hash->pos < 4); return LZMA_STREAM_END; default: assert(0); return LZMA_PROG_ERROR; } out: // Update the CRC32. // // Avoid null pointer + 0 (undefined behavior) in "in + in_start". // In such a case we had no input and thus in_used == 0. { const size_t in_used = *in_pos - in_start; if (in_used > 0) index_hash->crc32 = lzma_crc32(in + in_start, in_used, index_hash->crc32); } return ret; } diff --git a/src/liblzma/common/lzip_decoder.c b/src/liblzma/common/lzip_decoder.c index 651a0ae712c8..4dff2d5889ea 100644 --- a/src/liblzma/common/lzip_decoder.c +++ b/src/liblzma/common/lzip_decoder.c @@ -1,417 +1,413 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file lzip_decoder.c /// \brief Decodes .lz (lzip) files // // Author: Michał Górny // Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "lzip_decoder.h" #include "lzma_decoder.h" #include "check.h" // .lz format version 0 lacks the 64-bit Member size field in the footer. #define LZIP_V0_FOOTER_SIZE 12 #define LZIP_V1_FOOTER_SIZE 20 #define LZIP_FOOTER_SIZE_MAX LZIP_V1_FOOTER_SIZE // lc/lp/pb are hardcoded in the .lz format. #define LZIP_LC 3 #define LZIP_LP 0 #define LZIP_PB 2 typedef struct { enum { SEQ_ID_STRING, SEQ_VERSION, SEQ_DICT_SIZE, SEQ_CODER_INIT, SEQ_LZMA_STREAM, SEQ_MEMBER_FOOTER, } sequence; /// .lz member format version uint32_t version; /// CRC32 of the uncompressed data in the .lz member uint32_t crc32; /// Uncompressed size of the .lz member uint64_t uncompressed_size; /// Compressed size of the .lz member uint64_t member_size; /// Memory usage limit uint64_t memlimit; /// Amount of memory actually needed uint64_t memusage; /// If true, LZMA_GET_CHECK is returned after decoding the header /// fields. As all files use CRC32 this is redundant but it's /// implemented anyway since the initialization functions supports /// all other flags in addition to LZMA_TELL_ANY_CHECK. bool tell_any_check; /// If true, we won't calculate or verify the CRC32 of /// the uncompressed data. bool ignore_check; /// If true, we will decode concatenated .lz members and stop if /// non-.lz data is seen after at least one member has been /// successfully decoded. bool concatenated; /// When decoding concatenated .lz members, this is true as long as /// we are decoding the first .lz member. This is needed to avoid /// incorrect LZMA_FORMAT_ERROR in case there is non-.lz data at /// the end of the file. bool first_member; /// Reading position in the header and footer fields size_t pos; /// Buffer to hold the .lz footer fields uint8_t buffer[LZIP_FOOTER_SIZE_MAX]; /// Options decoded from the .lz header that needed to initialize /// the LZMA1 decoder. lzma_options_lzma options; /// LZMA1 decoder lzma_next_coder lzma_decoder; } lzma_lzip_coder; static lzma_ret lzip_decode(void *coder_ptr, const lzma_allocator *allocator, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size, uint8_t *restrict out, size_t *restrict out_pos, size_t out_size, lzma_action action) { lzma_lzip_coder *coder = coder_ptr; while (true) switch (coder->sequence) { case SEQ_ID_STRING: { // The "ID string" or magic bytes are "LZIP" in US-ASCII. const uint8_t lzip_id_string[4] = { 0x4C, 0x5A, 0x49, 0x50 }; while (coder->pos < sizeof(lzip_id_string)) { if (*in_pos >= in_size) { // If we are on the 2nd+ concatenated member // and the input ends before we can read // the magic bytes, we discard the bytes that // were already read (up to 3) and finish. // See the reasoning below. return !coder->first_member && action == LZMA_FINISH ? LZMA_STREAM_END : LZMA_OK; } if (in[*in_pos] != lzip_id_string[coder->pos]) { // The .lz format allows putting non-.lz data // at the end of the file. If we have seen // at least one valid .lz member already, // then we won't consume the byte at *in_pos // and will return LZMA_STREAM_END. This way // apps can easily locate and read the non-.lz // data after the .lz member(s). // // NOTE: If the first 1-3 bytes of the non-.lz // data match the .lz ID string then the first // 1-3 bytes of the junk will get ignored by // us. If apps want to properly locate the // trailing data they must ensure that the // first byte of their custom data isn't the // same as the first byte of .lz ID string. // With the liblzma API we cannot rewind the // input position across calls to lzma_code(). return !coder->first_member ? LZMA_STREAM_END : LZMA_FORMAT_ERROR; } ++*in_pos; ++coder->pos; } coder->pos = 0; coder->crc32 = 0; coder->uncompressed_size = 0; coder->member_size = sizeof(lzip_id_string); coder->sequence = SEQ_VERSION; + FALLTHROUGH; } - // Fall through - case SEQ_VERSION: if (*in_pos >= in_size) return LZMA_OK; coder->version = in[(*in_pos)++]; // We support version 0 and unextended version 1. if (coder->version > 1) return LZMA_OPTIONS_ERROR; ++coder->member_size; coder->sequence = SEQ_DICT_SIZE; // .lz versions 0 and 1 use CRC32 as the integrity check // so if the application wanted to know that // (LZMA_TELL_ANY_CHECK) we can tell it now. if (coder->tell_any_check) return LZMA_GET_CHECK; - // Fall through + FALLTHROUGH; case SEQ_DICT_SIZE: { if (*in_pos >= in_size) return LZMA_OK; const uint32_t ds = in[(*in_pos)++]; ++coder->member_size; // The five lowest bits are for the base-2 logarithm of // the dictionary size and the highest three bits are // the fractional part (0/16 to 7/16) that will be // subtracted to get the final value. // // For example, with 0xB5: // b2log = 21 // fracnum = 5 // dict_size = 2^21 - 2^21 * 5 / 16 = 1408 KiB const uint32_t b2log = ds & 0x1F; const uint32_t fracnum = ds >> 5; // The format versions 0 and 1 allow dictionary size in the // range [4 KiB, 512 MiB]. if (b2log < 12 || b2log > 29 || (b2log == 12 && fracnum > 0)) return LZMA_DATA_ERROR; // 2^[b2log] - 2^[b2log] * [fracnum] / 16 // = 2^[b2log] - [fracnum] * 2^([b2log] - 4) coder->options.dict_size = (UINT32_C(1) << b2log) - (fracnum << (b2log - 4)); assert(coder->options.dict_size >= 4096); assert(coder->options.dict_size <= (UINT32_C(512) << 20)); coder->options.preset_dict = NULL; coder->options.lc = LZIP_LC; coder->options.lp = LZIP_LP; coder->options.pb = LZIP_PB; // Calculate the memory usage. coder->memusage = lzma_lzma_decoder_memusage(&coder->options) + LZMA_MEMUSAGE_BASE; // Initialization is a separate step because if we return // LZMA_MEMLIMIT_ERROR we need to be able to restart after // the memlimit has been increased. coder->sequence = SEQ_CODER_INIT; + FALLTHROUGH; } - // Fall through - case SEQ_CODER_INIT: { if (coder->memusage > coder->memlimit) return LZMA_MEMLIMIT_ERROR; const lzma_filter_info filters[2] = { { .id = LZMA_FILTER_LZMA1, .init = &lzma_lzma_decoder_init, .options = &coder->options, }, { .init = NULL, } }; return_if_error(lzma_next_filter_init(&coder->lzma_decoder, allocator, filters)); coder->crc32 = 0; coder->sequence = SEQ_LZMA_STREAM; + FALLTHROUGH; } - // Fall through - case SEQ_LZMA_STREAM: { const size_t in_start = *in_pos; const size_t out_start = *out_pos; const lzma_ret ret = coder->lzma_decoder.code( coder->lzma_decoder.coder, allocator, in, in_pos, in_size, out, out_pos, out_size, action); const size_t out_used = *out_pos - out_start; coder->member_size += *in_pos - in_start; coder->uncompressed_size += out_used; // Don't update the CRC32 if the integrity check will be // ignored or if there was no new output. The latter is // important in case out == NULL to avoid null pointer + 0 // which is undefined behavior. if (!coder->ignore_check && out_used > 0) coder->crc32 = lzma_crc32(out + out_start, out_used, coder->crc32); if (ret != LZMA_STREAM_END) return ret; coder->sequence = SEQ_MEMBER_FOOTER; + FALLTHROUGH; } - // Fall through - case SEQ_MEMBER_FOOTER: { // The footer of .lz version 0 lacks the Member size field. // This is the only difference between version 0 and // unextended version 1 formats. const size_t footer_size = coder->version == 0 ? LZIP_V0_FOOTER_SIZE : LZIP_V1_FOOTER_SIZE; // Copy the CRC32, Data size, and Member size fields to // the internal buffer. lzma_bufcpy(in, in_pos, in_size, coder->buffer, &coder->pos, footer_size); // Return if we didn't get the whole footer yet. if (coder->pos < footer_size) return LZMA_OK; coder->pos = 0; coder->member_size += footer_size; // Check that the footer fields match the observed data. if (!coder->ignore_check && coder->crc32 != read32le(&coder->buffer[0])) return LZMA_DATA_ERROR; if (coder->uncompressed_size != read64le(&coder->buffer[4])) return LZMA_DATA_ERROR; if (coder->version > 0) { // .lz version 0 has no Member size field. if (coder->member_size != read64le(&coder->buffer[12])) return LZMA_DATA_ERROR; } // Decoding is finished if we weren't requested to decode // more than one .lz member. if (!coder->concatenated) return LZMA_STREAM_END; coder->first_member = false; coder->sequence = SEQ_ID_STRING; break; } default: assert(0); return LZMA_PROG_ERROR; } // Never reached } static void lzip_decoder_end(void *coder_ptr, const lzma_allocator *allocator) { lzma_lzip_coder *coder = coder_ptr; lzma_next_end(&coder->lzma_decoder, allocator); lzma_free(coder, allocator); return; } static lzma_check lzip_decoder_get_check(const void *coder_ptr lzma_attribute((__unused__))) { return LZMA_CHECK_CRC32; } static lzma_ret lzip_decoder_memconfig(void *coder_ptr, uint64_t *memusage, uint64_t *old_memlimit, uint64_t new_memlimit) { lzma_lzip_coder *coder = coder_ptr; *memusage = coder->memusage; *old_memlimit = coder->memlimit; if (new_memlimit != 0) { if (new_memlimit < coder->memusage) return LZMA_MEMLIMIT_ERROR; coder->memlimit = new_memlimit; } return LZMA_OK; } extern lzma_ret lzma_lzip_decoder_init( lzma_next_coder *next, const lzma_allocator *allocator, uint64_t memlimit, uint32_t flags) { lzma_next_coder_init(&lzma_lzip_decoder_init, next, allocator); if (flags & ~LZMA_SUPPORTED_FLAGS) return LZMA_OPTIONS_ERROR; lzma_lzip_coder *coder = next->coder; if (coder == NULL) { coder = lzma_alloc(sizeof(lzma_lzip_coder), allocator); if (coder == NULL) return LZMA_MEM_ERROR; next->coder = coder; next->code = &lzip_decode; next->end = &lzip_decoder_end; next->get_check = &lzip_decoder_get_check; next->memconfig = &lzip_decoder_memconfig; coder->lzma_decoder = LZMA_NEXT_CODER_INIT; } coder->sequence = SEQ_ID_STRING; coder->memlimit = my_max(1, memlimit); coder->memusage = LZMA_MEMUSAGE_BASE; coder->tell_any_check = (flags & LZMA_TELL_ANY_CHECK) != 0; coder->ignore_check = (flags & LZMA_IGNORE_CHECK) != 0; coder->concatenated = (flags & LZMA_CONCATENATED) != 0; coder->first_member = true; coder->pos = 0; return LZMA_OK; } extern LZMA_API(lzma_ret) lzma_lzip_decoder(lzma_stream *strm, uint64_t memlimit, uint32_t flags) { lzma_next_strm_init(lzma_lzip_decoder_init, strm, memlimit, flags); strm->internal->supported_actions[LZMA_RUN] = true; strm->internal->supported_actions[LZMA_FINISH] = true; return LZMA_OK; } diff --git a/src/liblzma/common/memcmplen.h b/src/liblzma/common/memcmplen.h index 394a4856dd6a..82e908542295 100644 --- a/src/liblzma/common/memcmplen.h +++ b/src/liblzma/common/memcmplen.h @@ -1,188 +1,192 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file memcmplen.h /// \brief Optimized comparison of two buffers // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #ifndef LZMA_MEMCMPLEN_H #define LZMA_MEMCMPLEN_H #include "common.h" #ifdef HAVE_IMMINTRIN_H # include #endif // Only include if it is needed. The header is only needed // on Windows when using an MSVC compatible compiler. The Intel compiler // can use the intrinsics without the header file. #if defined(TUKLIB_FAST_UNALIGNED_ACCESS) \ && defined(_MSC_VER) \ && (defined(_M_X64) \ || defined(_M_ARM64) || defined(_M_ARM64EC)) \ && !defined(__INTEL_COMPILER) # include #endif /// Find out how many equal bytes the two buffers have. /// /// \param buf1 First buffer /// \param buf2 Second buffer /// \param len How many bytes have already been compared and will /// be assumed to match /// \param limit How many bytes to compare at most, including the /// already-compared bytes. This must be significantly /// smaller than UINT32_MAX to avoid integer overflows. /// Up to LZMA_MEMCMPLEN_EXTRA bytes may be read past /// the specified limit from both buf1 and buf2. /// /// \return Number of equal bytes in the buffers is returned. /// This is always at least len and at most limit. /// /// \note LZMA_MEMCMPLEN_EXTRA defines how many extra bytes may be read. /// It's rounded up to 2^n. This extra amount needs to be /// allocated in the buffers being used. It needs to be /// initialized too to keep Valgrind quiet. static lzma_always_inline uint32_t lzma_memcmplen(const uint8_t *buf1, const uint8_t *buf2, uint32_t len, uint32_t limit) { assert(len <= limit); assert(limit <= UINT32_MAX / 2); #if defined(TUKLIB_FAST_UNALIGNED_ACCESS) \ && (((TUKLIB_GNUC_REQ(3, 4) || defined(__clang__)) \ - && (defined(__x86_64__) \ - || defined(__aarch64__))) \ + && SIZE_MAX == UINT64_MAX) \ || (defined(__INTEL_COMPILER) && defined(__x86_64__)) \ || (defined(__INTEL_COMPILER) && defined(_M_X64)) \ || (defined(_MSC_VER) && (defined(_M_X64) \ || defined(_M_ARM64) || defined(_M_ARM64EC)))) // This is only for x86-64 and ARM64 for now. This might be fine on - // other 64-bit processors too. On big endian one should use xor - // instead of subtraction and switch to __builtin_clzll(). + // other 64-bit processors too. // // Reasons to use subtraction instead of xor: // // - On some x86-64 processors (Intel Sandy Bridge to Tiger Lake), // sub+jz and sub+jnz can be fused but xor+jz or xor+jnz cannot. // Thus using subtraction has potential to be a tiny amount faster // since the code checks if the quotient is non-zero. // // - Some processors (Intel Pentium 4) used to have more ALU // resources for add/sub instructions than and/or/xor. // // The processor info is based on Agner Fog's microarchitecture.pdf // version 2023-05-26. https://www.agner.org/optimize/ #define LZMA_MEMCMPLEN_EXTRA 8 while (len < limit) { +# ifdef WORDS_BIGENDIAN + const uint64_t x = read64ne(buf1 + len) ^ read64ne(buf2 + len); +# else const uint64_t x = read64ne(buf1 + len) - read64ne(buf2 + len); +# endif if (x != 0) { // MSVC or Intel C compiler on Windows # if defined(_MSC_VER) || defined(__INTEL_COMPILER) unsigned long tmp; _BitScanForward64(&tmp, x); len += (uint32_t)tmp >> 3; // GCC, Clang, or Intel C compiler +# elif defined(WORDS_BIGENDIAN) + len += (uint32_t)__builtin_clzll(x) >> 3; # else len += (uint32_t)__builtin_ctzll(x) >> 3; # endif return my_min(len, limit); } len += 8; } return limit; #elif defined(TUKLIB_FAST_UNALIGNED_ACCESS) \ && defined(HAVE__MM_MOVEMASK_EPI8) \ && (defined(__SSE2__) \ || (defined(_MSC_VER) && defined(_M_IX86_FP) \ && _M_IX86_FP >= 2)) // NOTE: This will use 128-bit unaligned access which // TUKLIB_FAST_UNALIGNED_ACCESS wasn't meant to permit, // but it's convenient here since this is x86-only. // // SSE2 version for 32-bit and 64-bit x86. On x86-64 the above // version is sometimes significantly faster and sometimes // slightly slower than this SSE2 version, so this SSE2 // version isn't used on x86-64. # define LZMA_MEMCMPLEN_EXTRA 16 while (len < limit) { const uint32_t x = 0xFFFF ^ (uint32_t)_mm_movemask_epi8( _mm_cmpeq_epi8( _mm_loadu_si128((const __m128i *)(buf1 + len)), _mm_loadu_si128((const __m128i *)(buf2 + len)))); if (x != 0) { len += ctz32(x); return my_min(len, limit); } len += 16; } return limit; #elif defined(TUKLIB_FAST_UNALIGNED_ACCESS) && !defined(WORDS_BIGENDIAN) // Generic 32-bit little endian method # define LZMA_MEMCMPLEN_EXTRA 4 while (len < limit) { uint32_t x = read32ne(buf1 + len) - read32ne(buf2 + len); if (x != 0) { if ((x & 0xFFFF) == 0) { len += 2; x >>= 16; } if ((x & 0xFF) == 0) ++len; return my_min(len, limit); } len += 4; } return limit; #elif defined(TUKLIB_FAST_UNALIGNED_ACCESS) && defined(WORDS_BIGENDIAN) // Generic 32-bit big endian method # define LZMA_MEMCMPLEN_EXTRA 4 while (len < limit) { uint32_t x = read32ne(buf1 + len) ^ read32ne(buf2 + len); if (x != 0) { if ((x & 0xFFFF0000) == 0) { len += 2; x <<= 16; } if ((x & 0xFF000000) == 0) ++len; return my_min(len, limit); } len += 4; } return limit; #else // Simple portable version that doesn't use unaligned access. # define LZMA_MEMCMPLEN_EXTRA 0 while (len < limit && buf1[len] == buf2[len]) ++len; return len; #endif } #endif diff --git a/src/liblzma/common/stream_decoder.c b/src/liblzma/common/stream_decoder.c index 7f426841366a..94004b74a165 100644 --- a/src/liblzma/common/stream_decoder.c +++ b/src/liblzma/common/stream_decoder.c @@ -1,473 +1,469 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file stream_decoder.c /// \brief Decodes .xz Streams // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "stream_decoder.h" #include "block_decoder.h" #include "index.h" typedef struct { enum { SEQ_STREAM_HEADER, SEQ_BLOCK_HEADER, SEQ_BLOCK_INIT, SEQ_BLOCK_RUN, SEQ_INDEX, SEQ_STREAM_FOOTER, SEQ_STREAM_PADDING, } sequence; /// Block decoder lzma_next_coder block_decoder; /// Block options decoded by the Block Header decoder and used by /// the Block decoder. lzma_block block_options; /// Stream Flags from Stream Header lzma_stream_flags stream_flags; /// Index is hashed so that it can be compared to the sizes of Blocks /// with O(1) memory usage. lzma_index_hash *index_hash; /// Memory usage limit uint64_t memlimit; /// Amount of memory actually needed (only an estimate) uint64_t memusage; /// If true, LZMA_NO_CHECK is returned if the Stream has /// no integrity check. bool tell_no_check; /// If true, LZMA_UNSUPPORTED_CHECK is returned if the Stream has /// an integrity check that isn't supported by this liblzma build. bool tell_unsupported_check; /// If true, LZMA_GET_CHECK is returned after decoding Stream Header. bool tell_any_check; /// If true, we will tell the Block decoder to skip calculating /// and verifying the integrity check. bool ignore_check; /// If true, we will decode concatenated Streams that possibly have /// Stream Padding between or after them. LZMA_STREAM_END is returned /// once the application isn't giving us any new input (LZMA_FINISH), /// and we aren't in the middle of a Stream, and possible /// Stream Padding is a multiple of four bytes. bool concatenated; /// When decoding concatenated Streams, this is true as long as we /// are decoding the first Stream. This is needed to avoid misleading /// LZMA_FORMAT_ERROR in case the later Streams don't have valid magic /// bytes. bool first_stream; /// Write position in buffer[] and position in Stream Padding size_t pos; /// Buffer to hold Stream Header, Block Header, and Stream Footer. /// Block Header has biggest maximum size. uint8_t buffer[LZMA_BLOCK_HEADER_SIZE_MAX]; } lzma_stream_coder; static lzma_ret stream_decoder_reset(lzma_stream_coder *coder, const lzma_allocator *allocator) { // Initialize the Index hash used to verify the Index. coder->index_hash = lzma_index_hash_init(coder->index_hash, allocator); if (coder->index_hash == NULL) return LZMA_MEM_ERROR; // Reset the rest of the variables. coder->sequence = SEQ_STREAM_HEADER; coder->pos = 0; return LZMA_OK; } static lzma_ret stream_decode(void *coder_ptr, const lzma_allocator *allocator, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size, uint8_t *restrict out, size_t *restrict out_pos, size_t out_size, lzma_action action) { lzma_stream_coder *coder = coder_ptr; // When decoding the actual Block, it may be able to produce more // output even if we don't give it any new input. while (true) switch (coder->sequence) { case SEQ_STREAM_HEADER: { // Copy the Stream Header to the internal buffer. lzma_bufcpy(in, in_pos, in_size, coder->buffer, &coder->pos, LZMA_STREAM_HEADER_SIZE); // Return if we didn't get the whole Stream Header yet. if (coder->pos < LZMA_STREAM_HEADER_SIZE) return LZMA_OK; coder->pos = 0; // Decode the Stream Header. const lzma_ret ret = lzma_stream_header_decode( &coder->stream_flags, coder->buffer); if (ret != LZMA_OK) return ret == LZMA_FORMAT_ERROR && !coder->first_stream ? LZMA_DATA_ERROR : ret; // If we are decoding concatenated Streams, and the later // Streams have invalid Header Magic Bytes, we give // LZMA_DATA_ERROR instead of LZMA_FORMAT_ERROR. coder->first_stream = false; // Copy the type of the Check so that Block Header and Block // decoders see it. coder->block_options.check = coder->stream_flags.check; // Even if we return LZMA_*_CHECK below, we want // to continue from Block Header decoding. coder->sequence = SEQ_BLOCK_HEADER; // Detect if there's no integrity check or if it is // unsupported if those were requested by the application. if (coder->tell_no_check && coder->stream_flags.check == LZMA_CHECK_NONE) return LZMA_NO_CHECK; if (coder->tell_unsupported_check && !lzma_check_is_supported( coder->stream_flags.check)) return LZMA_UNSUPPORTED_CHECK; if (coder->tell_any_check) return LZMA_GET_CHECK; - } - // Fall through + FALLTHROUGH; + } case SEQ_BLOCK_HEADER: { if (*in_pos >= in_size) return LZMA_OK; if (coder->pos == 0) { // Detect if it's Index. if (in[*in_pos] == INDEX_INDICATOR) { coder->sequence = SEQ_INDEX; break; } // Calculate the size of the Block Header. Note that // Block Header decoder wants to see this byte too // so don't advance *in_pos. coder->block_options.header_size = lzma_block_header_size_decode( in[*in_pos]); } // Copy the Block Header to the internal buffer. lzma_bufcpy(in, in_pos, in_size, coder->buffer, &coder->pos, coder->block_options.header_size); // Return if we didn't get the whole Block Header yet. if (coder->pos < coder->block_options.header_size) return LZMA_OK; coder->pos = 0; coder->sequence = SEQ_BLOCK_INIT; + FALLTHROUGH; } - // Fall through - case SEQ_BLOCK_INIT: { // Checking memusage and doing the initialization needs // its own sequence point because we need to be able to // retry if we return LZMA_MEMLIMIT_ERROR. // Version 1 is needed to support the .ignore_check option. coder->block_options.version = 1; // Set up a buffer to hold the filter chain. Block Header // decoder will initialize all members of this array so // we don't need to do it here. lzma_filter filters[LZMA_FILTERS_MAX + 1]; coder->block_options.filters = filters; // Decode the Block Header. return_if_error(lzma_block_header_decode(&coder->block_options, allocator, coder->buffer)); // If LZMA_IGNORE_CHECK was used, this flag needs to be set. // It has to be set after lzma_block_header_decode() because // it always resets this to false. coder->block_options.ignore_check = coder->ignore_check; // Check the memory usage limit. const uint64_t memusage = lzma_raw_decoder_memusage(filters); lzma_ret ret; if (memusage == UINT64_MAX) { // One or more unknown Filter IDs. ret = LZMA_OPTIONS_ERROR; } else { // Now we can set coder->memusage since we know that // the filter chain is valid. We don't want // lzma_memusage() to return UINT64_MAX in case of // invalid filter chain. coder->memusage = memusage; if (memusage > coder->memlimit) { // The chain would need too much memory. ret = LZMA_MEMLIMIT_ERROR; } else { // Memory usage is OK. // Initialize the Block decoder. ret = lzma_block_decoder_init( &coder->block_decoder, allocator, &coder->block_options); } } // Free the allocated filter options since they are needed // only to initialize the Block decoder. lzma_filters_free(filters, allocator); coder->block_options.filters = NULL; // Check if memory usage calculation and Block decoder // initialization succeeded. if (ret != LZMA_OK) return ret; coder->sequence = SEQ_BLOCK_RUN; + FALLTHROUGH; } - // Fall through - case SEQ_BLOCK_RUN: { const lzma_ret ret = coder->block_decoder.code( coder->block_decoder.coder, allocator, in, in_pos, in_size, out, out_pos, out_size, action); if (ret != LZMA_STREAM_END) return ret; // Block decoded successfully. Add the new size pair to // the Index hash. return_if_error(lzma_index_hash_append(coder->index_hash, lzma_block_unpadded_size( &coder->block_options), coder->block_options.uncompressed_size)); coder->sequence = SEQ_BLOCK_HEADER; break; } case SEQ_INDEX: { // If we don't have any input, don't call // lzma_index_hash_decode() since it would return // LZMA_BUF_ERROR, which we must not do here. if (*in_pos >= in_size) return LZMA_OK; // Decode the Index and compare it to the hash calculated // from the sizes of the Blocks (if any). const lzma_ret ret = lzma_index_hash_decode(coder->index_hash, in, in_pos, in_size); if (ret != LZMA_STREAM_END) return ret; coder->sequence = SEQ_STREAM_FOOTER; + FALLTHROUGH; } - // Fall through - case SEQ_STREAM_FOOTER: { // Copy the Stream Footer to the internal buffer. lzma_bufcpy(in, in_pos, in_size, coder->buffer, &coder->pos, LZMA_STREAM_HEADER_SIZE); // Return if we didn't get the whole Stream Footer yet. if (coder->pos < LZMA_STREAM_HEADER_SIZE) return LZMA_OK; coder->pos = 0; // Decode the Stream Footer. The decoder gives // LZMA_FORMAT_ERROR if the magic bytes don't match, // so convert that return code to LZMA_DATA_ERROR. lzma_stream_flags footer_flags; const lzma_ret ret = lzma_stream_footer_decode( &footer_flags, coder->buffer); if (ret != LZMA_OK) return ret == LZMA_FORMAT_ERROR ? LZMA_DATA_ERROR : ret; // Check that Index Size stored in the Stream Footer matches // the real size of the Index field. if (lzma_index_hash_size(coder->index_hash) != footer_flags.backward_size) return LZMA_DATA_ERROR; // Compare that the Stream Flags fields are identical in // both Stream Header and Stream Footer. return_if_error(lzma_stream_flags_compare( &coder->stream_flags, &footer_flags)); if (!coder->concatenated) return LZMA_STREAM_END; coder->sequence = SEQ_STREAM_PADDING; + FALLTHROUGH; } - // Fall through - case SEQ_STREAM_PADDING: assert(coder->concatenated); // Skip over possible Stream Padding. while (true) { if (*in_pos >= in_size) { // Unless LZMA_FINISH was used, we cannot // know if there's more input coming later. if (action != LZMA_FINISH) return LZMA_OK; // Stream Padding must be a multiple of // four bytes. return coder->pos == 0 ? LZMA_STREAM_END : LZMA_DATA_ERROR; } // If the byte is not zero, it probably indicates // beginning of a new Stream (or the file is corrupt). if (in[*in_pos] != 0x00) break; ++*in_pos; coder->pos = (coder->pos + 1) & 3; } // Stream Padding must be a multiple of four bytes (empty // Stream Padding is OK). if (coder->pos != 0) { ++*in_pos; return LZMA_DATA_ERROR; } // Prepare to decode the next Stream. return_if_error(stream_decoder_reset(coder, allocator)); break; default: assert(0); return LZMA_PROG_ERROR; } // Never reached } static void stream_decoder_end(void *coder_ptr, const lzma_allocator *allocator) { lzma_stream_coder *coder = coder_ptr; lzma_next_end(&coder->block_decoder, allocator); lzma_index_hash_end(coder->index_hash, allocator); lzma_free(coder, allocator); return; } static lzma_check stream_decoder_get_check(const void *coder_ptr) { const lzma_stream_coder *coder = coder_ptr; return coder->stream_flags.check; } static lzma_ret stream_decoder_memconfig(void *coder_ptr, uint64_t *memusage, uint64_t *old_memlimit, uint64_t new_memlimit) { lzma_stream_coder *coder = coder_ptr; *memusage = coder->memusage; *old_memlimit = coder->memlimit; if (new_memlimit != 0) { if (new_memlimit < coder->memusage) return LZMA_MEMLIMIT_ERROR; coder->memlimit = new_memlimit; } return LZMA_OK; } extern lzma_ret lzma_stream_decoder_init( lzma_next_coder *next, const lzma_allocator *allocator, uint64_t memlimit, uint32_t flags) { lzma_next_coder_init(&lzma_stream_decoder_init, next, allocator); if (flags & ~LZMA_SUPPORTED_FLAGS) return LZMA_OPTIONS_ERROR; lzma_stream_coder *coder = next->coder; if (coder == NULL) { coder = lzma_alloc(sizeof(lzma_stream_coder), allocator); if (coder == NULL) return LZMA_MEM_ERROR; next->coder = coder; next->code = &stream_decode; next->end = &stream_decoder_end; next->get_check = &stream_decoder_get_check; next->memconfig = &stream_decoder_memconfig; coder->block_decoder = LZMA_NEXT_CODER_INIT; coder->index_hash = NULL; } coder->memlimit = my_max(1, memlimit); coder->memusage = LZMA_MEMUSAGE_BASE; coder->tell_no_check = (flags & LZMA_TELL_NO_CHECK) != 0; coder->tell_unsupported_check = (flags & LZMA_TELL_UNSUPPORTED_CHECK) != 0; coder->tell_any_check = (flags & LZMA_TELL_ANY_CHECK) != 0; coder->ignore_check = (flags & LZMA_IGNORE_CHECK) != 0; coder->concatenated = (flags & LZMA_CONCATENATED) != 0; coder->first_stream = true; return stream_decoder_reset(coder, allocator); } extern LZMA_API(lzma_ret) lzma_stream_decoder(lzma_stream *strm, uint64_t memlimit, uint32_t flags) { lzma_next_strm_init(lzma_stream_decoder_init, strm, memlimit, flags); strm->internal->supported_actions[LZMA_RUN] = true; strm->internal->supported_actions[LZMA_FINISH] = true; return LZMA_OK; } diff --git a/src/liblzma/common/stream_decoder_mt.c b/src/liblzma/common/stream_decoder_mt.c index 244624a47900..271f9b07c4b8 100644 --- a/src/liblzma/common/stream_decoder_mt.c +++ b/src/liblzma/common/stream_decoder_mt.c @@ -1,2017 +1,2015 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file stream_decoder_mt.c /// \brief Multithreaded .xz Stream decoder // // Authors: Sebastian Andrzej Siewior // Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "common.h" #include "block_decoder.h" #include "stream_decoder.h" #include "index.h" #include "outqueue.h" typedef enum { /// Waiting for work. /// Main thread may change this to THR_RUN or THR_EXIT. THR_IDLE, /// Decoding is in progress. - /// Main thread may change this to THR_STOP or THR_EXIT. + /// Main thread may change this to THR_IDLE or THR_EXIT. /// The worker thread may change this to THR_IDLE. THR_RUN, - /// The main thread wants the thread to stop whatever it was doing - /// but not exit. Main thread may change this to THR_EXIT. - /// The worker thread may change this to THR_IDLE. - THR_STOP, - /// The main thread wants the thread to exit. THR_EXIT, } worker_state; typedef enum { /// Partial updates (storing of worker thread progress /// to lzma_outbuf) are disabled. PARTIAL_DISABLED, /// Main thread requests partial updates to be enabled but /// no partial update has been done by the worker thread yet. /// /// Changing from PARTIAL_DISABLED to PARTIAL_START requires /// use of the worker-thread mutex. Other transitions don't /// need a mutex. PARTIAL_START, /// Partial updates are enabled and the worker thread has done /// at least one partial update. PARTIAL_ENABLED, } partial_update_mode; struct worker_thread { /// Worker state is protected with our mutex. worker_state state; /// Input buffer that will contain the whole Block except Block Header. uint8_t *in; /// Amount of memory allocated for "in" size_t in_size; /// Number of bytes written to "in" by the main thread size_t in_filled; /// Number of bytes consumed from "in" by the worker thread. size_t in_pos; /// Amount of uncompressed data that has been decoded. This local /// copy is needed because updating outbuf->pos requires locking /// the main mutex (coder->mutex). size_t out_pos; /// Pointer to the main structure is needed to (1) lock the main /// mutex (coder->mutex) when updating outbuf->pos and (2) when /// putting this thread back to the stack of free threads. struct lzma_stream_coder *coder; /// The allocator is set by the main thread. Since a copy of the /// pointer is kept here, the application must not change the /// allocator before calling lzma_end(). const lzma_allocator *allocator; /// Output queue buffer to which the uncompressed data is written. lzma_outbuf *outbuf; /// Amount of compressed data that has already been decompressed. /// This is updated from in_pos when our mutex is locked. /// This is size_t, not uint64_t, because per-thread progress /// is limited to sizes of allocated buffers. size_t progress_in; /// Like progress_in but for uncompressed data. size_t progress_out; /// Updating outbuf->pos requires locking the main mutex /// (coder->mutex). Since the main thread will only read output /// from the oldest outbuf in the queue, only the worker thread /// that is associated with the oldest outbuf needs to update its /// outbuf->pos. This avoids useless mutex contention that would /// happen if all worker threads were frequently locking the main /// mutex to update their outbuf->pos. /// /// Only when partial_update is something else than PARTIAL_DISABLED, /// this worker thread will update outbuf->pos after each call to /// the Block decoder. partial_update_mode partial_update; /// Block decoder lzma_next_coder block_decoder; /// Thread-specific Block options are needed because the Block /// decoder modifies the struct given to it at initialization. lzma_block block_options; /// Filter chain memory usage uint64_t mem_filters; /// Next structure in the stack of free worker threads. struct worker_thread *next; mythread_mutex mutex; mythread_cond cond; /// The ID of this thread is used to join the thread /// when it's not needed anymore. mythread thread_id; }; struct lzma_stream_coder { enum { SEQ_STREAM_HEADER, SEQ_BLOCK_HEADER, SEQ_BLOCK_INIT, SEQ_BLOCK_THR_INIT, SEQ_BLOCK_THR_RUN, SEQ_BLOCK_DIRECT_INIT, SEQ_BLOCK_DIRECT_RUN, SEQ_INDEX_WAIT_OUTPUT, SEQ_INDEX_DECODE, SEQ_STREAM_FOOTER, SEQ_STREAM_PADDING, SEQ_ERROR, } sequence; /// Block decoder lzma_next_coder block_decoder; /// Every Block Header will be decoded into this structure. /// This is also used to initialize a Block decoder when in /// direct mode. In threaded mode, a thread-specific copy will /// be made for decoder initialization because the Block decoder /// will modify the structure given to it. lzma_block block_options; /// Buffer to hold a filter chain for Block Header decoding and /// initialization. These are freed after successful Block decoder /// initialization or at stream_decoder_mt_end(). The thread-specific /// copy of block_options won't hold a pointer to filters[] after /// initialization. lzma_filter filters[LZMA_FILTERS_MAX + 1]; /// Stream Flags from Stream Header lzma_stream_flags stream_flags; /// Index is hashed so that it can be compared to the sizes of Blocks /// with O(1) memory usage. lzma_index_hash *index_hash; /// Maximum wait time if cannot use all the input and cannot /// fill the output buffer. This is in milliseconds. uint32_t timeout; /// Error code from a worker thread. /// /// \note Use mutex. lzma_ret thread_error; /// Error code to return after pending output has been copied out. If /// set in read_output_and_wait(), this is a mirror of thread_error. /// If set in stream_decode_mt() then it's, for example, error that /// occurred when decoding Block Header. lzma_ret pending_error; /// Number of threads that will be created at maximum. uint32_t threads_max; /// Number of thread structures that have been initialized from /// "threads", and thus the number of worker threads actually /// created so far. uint32_t threads_initialized; /// Array of allocated thread-specific structures. When no threads /// are in use (direct mode) this is NULL. In threaded mode this /// points to an array of threads_max number of worker_thread structs. struct worker_thread *threads; /// Stack of free threads. When a thread finishes, it puts itself /// back into this stack. This starts as empty because threads /// are created only when actually needed. /// /// \note Use mutex. struct worker_thread *threads_free; /// The most recent worker thread to which the main thread writes /// the new input from the application. struct worker_thread *thr; /// Output buffer queue for decompressed data from the worker threads /// /// \note Use mutex with operations that need it. lzma_outq outq; mythread_mutex mutex; mythread_cond cond; /// Memory usage that will not be exceeded in multi-threaded mode. /// Single-threaded mode can exceed this even by a large amount. uint64_t memlimit_threading; /// Memory usage limit that should never be exceeded. /// LZMA_MEMLIMIT_ERROR will be returned if decoding isn't possible /// even in single-threaded mode without exceeding this limit. uint64_t memlimit_stop; /// Amount of memory in use by the direct mode decoder /// (coder->block_decoder). In threaded mode this is 0. uint64_t mem_direct_mode; /// Amount of memory needed by the running worker threads. /// This doesn't include the memory needed by the output buffer. /// /// \note Use mutex. uint64_t mem_in_use; /// Amount of memory used by the idle (cached) threads. /// /// \note Use mutex. uint64_t mem_cached; /// Amount of memory needed for the filter chain of the next Block. uint64_t mem_next_filters; /// Amount of memory needed for the thread-specific input buffer /// for the next Block. uint64_t mem_next_in; /// Amount of memory actually needed to decode the next Block /// in threaded mode. This is /// mem_next_filters + mem_next_in + memory needed for lzma_outbuf. uint64_t mem_next_block; /// Amount of compressed data in Stream Header + Blocks that have /// already been finished. /// /// \note Use mutex. uint64_t progress_in; /// Amount of uncompressed data in Blocks that have already /// been finished. /// /// \note Use mutex. uint64_t progress_out; /// If true, LZMA_NO_CHECK is returned if the Stream has /// no integrity check. bool tell_no_check; /// If true, LZMA_UNSUPPORTED_CHECK is returned if the Stream has /// an integrity check that isn't supported by this liblzma build. bool tell_unsupported_check; /// If true, LZMA_GET_CHECK is returned after decoding Stream Header. bool tell_any_check; /// If true, we will tell the Block decoder to skip calculating /// and verifying the integrity check. bool ignore_check; /// If true, we will decode concatenated Streams that possibly have /// Stream Padding between or after them. LZMA_STREAM_END is returned /// once the application isn't giving us any new input (LZMA_FINISH), /// and we aren't in the middle of a Stream, and possible /// Stream Padding is a multiple of four bytes. bool concatenated; /// If true, we will return any errors immediately instead of first /// producing all output before the location of the error. bool fail_fast; /// When decoding concatenated Streams, this is true as long as we /// are decoding the first Stream. This is needed to avoid misleading /// LZMA_FORMAT_ERROR in case the later Streams don't have valid magic /// bytes. bool first_stream; /// This is used to track if the previous call to stream_decode_mt() /// had output space (*out_pos < out_size) and managed to fill the /// output buffer (*out_pos == out_size). This may be set to true /// in read_output_and_wait(). This is read and then reset to false /// at the beginning of stream_decode_mt(). /// /// This is needed to support applications that call lzma_code() in /// such a way that more input is provided only when lzma_code() /// didn't fill the output buffer completely. Basically, this makes /// it easier to convert such applications from single-threaded /// decoder to multi-threaded decoder. bool out_was_filled; /// Write position in buffer[] and position in Stream Padding size_t pos; /// Buffer to hold Stream Header, Block Header, and Stream Footer. /// Block Header has biggest maximum size. uint8_t buffer[LZMA_BLOCK_HEADER_SIZE_MAX]; }; /// Enables updating of outbuf->pos. This is a callback function that is /// used with lzma_outq_enable_partial_output(). static void worker_enable_partial_update(void *thr_ptr) { struct worker_thread *thr = thr_ptr; mythread_sync(thr->mutex) { thr->partial_update = PARTIAL_START; mythread_cond_signal(&thr->cond); } } -/// Things do to at THR_STOP or when finishing a Block. -/// This is called with thr->mutex locked. -static void -worker_stop(struct worker_thread *thr) -{ - // Update memory usage counters. - thr->coder->mem_in_use -= thr->in_size; - thr->in_size = 0; // thr->in was freed above. - - thr->coder->mem_in_use -= thr->mem_filters; - thr->coder->mem_cached += thr->mem_filters; - - // Put this thread to the stack of free threads. - thr->next = thr->coder->threads_free; - thr->coder->threads_free = thr; - - mythread_cond_signal(&thr->coder->cond); - return; -} - - static MYTHREAD_RET_TYPE worker_decoder(void *thr_ptr) { struct worker_thread *thr = thr_ptr; size_t in_filled; partial_update_mode partial_update; lzma_ret ret; next_loop_lock: mythread_mutex_lock(&thr->mutex); next_loop_unlocked: if (thr->state == THR_IDLE) { mythread_cond_wait(&thr->cond, &thr->mutex); goto next_loop_unlocked; } if (thr->state == THR_EXIT) { mythread_mutex_unlock(&thr->mutex); lzma_free(thr->in, thr->allocator); lzma_next_end(&thr->block_decoder, thr->allocator); mythread_mutex_destroy(&thr->mutex); mythread_cond_destroy(&thr->cond); return MYTHREAD_RET_VALUE; } - if (thr->state == THR_STOP) { - thr->state = THR_IDLE; - mythread_mutex_unlock(&thr->mutex); - - mythread_sync(thr->coder->mutex) { - worker_stop(thr); - } - - goto next_loop_lock; - } - assert(thr->state == THR_RUN); // Update progress info for get_progress(). thr->progress_in = thr->in_pos; thr->progress_out = thr->out_pos; // If we don't have any new input, wait for a signal from the main // thread except if partial output has just been enabled. In that // case we will do one normal run so that the partial output info // gets passed to the main thread. The call to block_decoder.code() // is useless but harmless as it can occur only once per Block. in_filled = thr->in_filled; partial_update = thr->partial_update; if (in_filled == thr->in_pos && partial_update != PARTIAL_START) { mythread_cond_wait(&thr->cond, &thr->mutex); goto next_loop_unlocked; } mythread_mutex_unlock(&thr->mutex); // Pass the input in small chunks to the Block decoder. // This way we react reasonably fast if we are told to stop/exit, // and (when partial update is enabled) we tell about our progress // to the main thread frequently enough. const size_t chunk_size = 16384; if ((in_filled - thr->in_pos) > chunk_size) in_filled = thr->in_pos + chunk_size; ret = thr->block_decoder.code( thr->block_decoder.coder, thr->allocator, thr->in, &thr->in_pos, in_filled, thr->outbuf->buf, &thr->out_pos, thr->outbuf->allocated, LZMA_RUN); if (ret == LZMA_OK) { if (partial_update != PARTIAL_DISABLED) { // The main thread uses thr->mutex to change from // PARTIAL_DISABLED to PARTIAL_START. The main thread // doesn't care about this variable after that so we // can safely change it here to PARTIAL_ENABLED // without a mutex. thr->partial_update = PARTIAL_ENABLED; // The main thread is reading decompressed data // from thr->outbuf. Tell the main thread about // our progress. // // NOTE: It's possible that we consumed input without // producing any new output so it's possible that // only in_pos has changed. In case of PARTIAL_START // it is possible that neither in_pos nor out_pos has // changed. mythread_sync(thr->coder->mutex) { thr->outbuf->pos = thr->out_pos; thr->outbuf->decoder_in_pos = thr->in_pos; mythread_cond_signal(&thr->coder->cond); } } goto next_loop_lock; } // Either we finished successfully (LZMA_STREAM_END) or an error - // occurred. Both cases are handled almost identically. The error - // case requires updating thr->coder->thread_error. + // occurred. // // The sizes are in the Block Header and the Block decoder // checks that they match, thus we know these: assert(ret != LZMA_STREAM_END || thr->in_pos == thr->in_size); assert(ret != LZMA_STREAM_END || thr->out_pos == thr->block_options.uncompressed_size); - // Free the input buffer. Don't update in_size as we need - // it later to update thr->coder->mem_in_use. - lzma_free(thr->in, thr->allocator); - thr->in = NULL; - mythread_sync(thr->mutex) { + // Block decoder ensures this, but do a sanity check anyway + // because thr->in_filled < thr->in_size means that the main + // thread is still writing to thr->in. + if (ret == LZMA_STREAM_END && thr->in_filled != thr->in_size) { + assert(0); + ret = LZMA_PROG_ERROR; + } + if (thr->state != THR_EXIT) thr->state = THR_IDLE; } + // Free the input buffer. Don't update in_size as we need + // it later to update thr->coder->mem_in_use. + // + // This step is skipped if an error occurred because the main thread + // might still be writing to thr->in. The memory will be freed after + // threads_end() sets thr->state = THR_EXIT. + if (ret == LZMA_STREAM_END) { + lzma_free(thr->in, thr->allocator); + thr->in = NULL; + } + mythread_sync(thr->coder->mutex) { // Move our progress info to the main thread. thr->coder->progress_in += thr->in_pos; thr->coder->progress_out += thr->out_pos; thr->progress_in = 0; thr->progress_out = 0; // Mark the outbuf as finished. thr->outbuf->pos = thr->out_pos; thr->outbuf->decoder_in_pos = thr->in_pos; thr->outbuf->finished = true; thr->outbuf->finish_ret = ret; thr->outbuf = NULL; // If an error occurred, tell it to the main thread. if (ret != LZMA_STREAM_END && thr->coder->thread_error == LZMA_OK) thr->coder->thread_error = ret; - worker_stop(thr); + // Return the worker thread to the stack of available + // threads only if no errors occurred. + if (ret == LZMA_STREAM_END) { + // Update memory usage counters. + thr->coder->mem_in_use -= thr->in_size; + thr->coder->mem_in_use -= thr->mem_filters; + thr->coder->mem_cached += thr->mem_filters; + + // Put this thread to the stack of free threads. + thr->next = thr->coder->threads_free; + thr->coder->threads_free = thr; + } + + mythread_cond_signal(&thr->coder->cond); } goto next_loop_lock; } /// Tells the worker threads to exit and waits for them to terminate. static void threads_end(struct lzma_stream_coder *coder, const lzma_allocator *allocator) { for (uint32_t i = 0; i < coder->threads_initialized; ++i) { mythread_sync(coder->threads[i].mutex) { coder->threads[i].state = THR_EXIT; mythread_cond_signal(&coder->threads[i].cond); } } for (uint32_t i = 0; i < coder->threads_initialized; ++i) mythread_join(coder->threads[i].thread_id); lzma_free(coder->threads, allocator); coder->threads_initialized = 0; coder->threads = NULL; coder->threads_free = NULL; // The threads don't update these when they exit. Do it here. coder->mem_in_use = 0; coder->mem_cached = 0; return; } +/// Tell worker threads to stop without doing any cleaning up. +/// The clean up will be done when threads_exit() is called; +/// it's not possible to reuse the threads after threads_stop(). +/// +/// This is called before returning an unrecoverable error code +/// to the application. It would be waste of processor time +/// to keep the threads running in such a situation. static void threads_stop(struct lzma_stream_coder *coder) { for (uint32_t i = 0; i < coder->threads_initialized; ++i) { + // The threads that are in the THR_RUN state will stop + // when they check the state the next time. There's no + // need to signal coder->threads[i].cond. mythread_sync(coder->threads[i].mutex) { - // The state must be changed conditionally because - // THR_IDLE -> THR_STOP is not a valid state change. - if (coder->threads[i].state != THR_IDLE) { - coder->threads[i].state = THR_STOP; - mythread_cond_signal(&coder->threads[i].cond); - } + coder->threads[i].state = THR_IDLE; } } return; } /// Initialize a new worker_thread structure and create a new thread. static lzma_ret initialize_new_thread(struct lzma_stream_coder *coder, const lzma_allocator *allocator) { // Allocate the coder->threads array if needed. It's done here instead // of when initializing the decoder because we don't need this if we // use the direct mode (we may even free coder->threads in the middle // of the file if we switch from threaded to direct mode). if (coder->threads == NULL) { coder->threads = lzma_alloc( coder->threads_max * sizeof(struct worker_thread), allocator); if (coder->threads == NULL) return LZMA_MEM_ERROR; } // Pick a free structure. assert(coder->threads_initialized < coder->threads_max); struct worker_thread *thr = &coder->threads[coder->threads_initialized]; if (mythread_mutex_init(&thr->mutex)) goto error_mutex; if (mythread_cond_init(&thr->cond)) goto error_cond; thr->state = THR_IDLE; thr->in = NULL; thr->in_size = 0; thr->allocator = allocator; thr->coder = coder; thr->outbuf = NULL; thr->block_decoder = LZMA_NEXT_CODER_INIT; thr->mem_filters = 0; if (mythread_create(&thr->thread_id, worker_decoder, thr)) goto error_thread; ++coder->threads_initialized; coder->thr = thr; return LZMA_OK; error_thread: mythread_cond_destroy(&thr->cond); error_cond: mythread_mutex_destroy(&thr->mutex); error_mutex: return LZMA_MEM_ERROR; } static lzma_ret get_thread(struct lzma_stream_coder *coder, const lzma_allocator *allocator) { // If there is a free structure on the stack, use it. mythread_sync(coder->mutex) { if (coder->threads_free != NULL) { coder->thr = coder->threads_free; coder->threads_free = coder->threads_free->next; // The thread is no longer in the cache so subtract // it from the cached memory usage. Don't add it // to mem_in_use though; the caller will handle it // since it knows how much memory it will actually // use (the filter chain might change). coder->mem_cached -= coder->thr->mem_filters; } } if (coder->thr == NULL) { assert(coder->threads_initialized < coder->threads_max); // Initialize a new thread. return_if_error(initialize_new_thread(coder, allocator)); } coder->thr->in_filled = 0; coder->thr->in_pos = 0; coder->thr->out_pos = 0; coder->thr->progress_in = 0; coder->thr->progress_out = 0; coder->thr->partial_update = PARTIAL_DISABLED; return LZMA_OK; } static lzma_ret read_output_and_wait(struct lzma_stream_coder *coder, const lzma_allocator *allocator, uint8_t *restrict out, size_t *restrict out_pos, size_t out_size, bool *input_is_possible, bool waiting_allowed, mythread_condtime *wait_abs, bool *has_blocked) { lzma_ret ret = LZMA_OK; mythread_sync(coder->mutex) { do { // Get as much output from the queue as is possible // without blocking. const size_t out_start = *out_pos; do { ret = lzma_outq_read(&coder->outq, allocator, out, out_pos, out_size, NULL, NULL); // If a Block was finished, tell the worker // thread of the next Block (if it is still // running) to start telling the main thread // when new output is available. if (ret == LZMA_STREAM_END) lzma_outq_enable_partial_output( &coder->outq, &worker_enable_partial_update); // Loop until a Block wasn't finished. // It's important to loop around even if // *out_pos == out_size because there could // be an empty Block that will return // LZMA_STREAM_END without needing any // output space. } while (ret == LZMA_STREAM_END); // Check if lzma_outq_read reported an error from // the Block decoder. if (ret != LZMA_OK) break; // If the output buffer is now full but it wasn't full // when this function was called, set out_was_filled. // This way the next call to stream_decode_mt() knows // that some output was produced and no output space // remained in the previous call to stream_decode_mt(). if (*out_pos == out_size && *out_pos != out_start) coder->out_was_filled = true; // Check if any thread has indicated an error. if (coder->thread_error != LZMA_OK) { // If LZMA_FAIL_FAST was used, report errors // from worker threads immediately. if (coder->fail_fast) { ret = coder->thread_error; break; } // Otherwise set pending_error. The value we // set here will not actually get used other // than working as a flag that an error has // occurred. This is because in SEQ_ERROR // all output before the error will be read // first by calling this function, and once we // reach the location of the (first) error the // error code from the above lzma_outq_read() // will be returned to the application. // // Use LZMA_PROG_ERROR since the value should // never leak to the application. It's // possible that pending_error has already // been set but that doesn't matter: if we get // here, pending_error only works as a flag. coder->pending_error = LZMA_PROG_ERROR; } // Check if decoding of the next Block can be started. // The memusage of the active threads must be low // enough, there must be a free buffer slot in the // output queue, and there must be a free thread // (that can be either created or an existing one // reused). // // NOTE: This is checked after reading the output // above because reading the output can free a slot in // the output queue and also reduce active memusage. // // NOTE: If output queue is empty, then input will // always be possible. if (input_is_possible != NULL && coder->memlimit_threading - coder->mem_in_use - coder->outq.mem_in_use >= coder->mem_next_block && lzma_outq_has_buf(&coder->outq) && (coder->threads_initialized < coder->threads_max || coder->threads_free != NULL)) { *input_is_possible = true; break; } // If the caller doesn't want us to block, return now. if (!waiting_allowed) break; // This check is needed only when input_is_possible // is NULL. We must return if we aren't waiting for // input to become possible and there is no more // output coming from the queue. if (lzma_outq_is_empty(&coder->outq)) { assert(input_is_possible == NULL); break; } // If there is more data available from the queue, // our out buffer must be full and we need to return // so that the application can provide more output // space. // // NOTE: In general lzma_outq_is_readable() can return // true also when there are no more bytes available. // This can happen when a Block has finished without // providing any new output. We know that this is not // the case because in the beginning of this loop we // tried to read as much as possible even when we had // no output space left and the mutex has been locked // all the time (so worker threads cannot have changed // anything). Thus there must be actual pending output // in the queue. if (lzma_outq_is_readable(&coder->outq)) { assert(*out_pos == out_size); break; } // If the application stops providing more input // in the middle of a Block, there will eventually // be one worker thread left that is stuck waiting for // more input (that might never arrive) and a matching // outbuf which the worker thread cannot finish due // to lack of input. We must detect this situation, // otherwise we would end up waiting indefinitely // (if no timeout is in use) or keep returning // LZMA_TIMED_OUT while making no progress. Thus, the // application would never get LZMA_BUF_ERROR from // lzma_code() which would tell the application that // no more progress is possible. No LZMA_BUF_ERROR // means that, for example, truncated .xz files could // cause an infinite loop. // // A worker thread doing partial updates will // store not only the output position in outbuf->pos // but also the matching input position in // outbuf->decoder_in_pos. Here we check if that // input position matches the amount of input that // the worker thread has been given (in_filled). // If so, we must return and not wait as no more // output will be coming without first getting more // input to the worker thread. If the application // keeps calling lzma_code() without providing more // input, it will eventually get LZMA_BUF_ERROR. // // NOTE: We can read partial_update and in_filled // without thr->mutex as only the main thread // modifies these variables. decoder_in_pos requires // coder->mutex which we are already holding. if (coder->thr != NULL && coder->thr->partial_update != PARTIAL_DISABLED) { // There is exactly one outbuf in the queue. assert(coder->thr->outbuf == coder->outq.head); assert(coder->thr->outbuf == coder->outq.tail); if (coder->thr->outbuf->decoder_in_pos == coder->thr->in_filled) break; } // Wait for input or output to become possible. if (coder->timeout != 0) { // See the comment in stream_encoder_mt.c // about why mythread_condtime_set() is used // like this. // // FIXME? // In contrast to the encoder, this calls // _condtime_set while the mutex is locked. if (!*has_blocked) { *has_blocked = true; mythread_condtime_set(wait_abs, &coder->cond, coder->timeout); } if (mythread_cond_timedwait(&coder->cond, &coder->mutex, wait_abs) != 0) { ret = LZMA_TIMED_OUT; break; } } else { mythread_cond_wait(&coder->cond, &coder->mutex); } } while (ret == LZMA_OK); } // If we are returning an error, then the application cannot get // more output from us and thus keeping the threads running is // useless and waste of CPU time. if (ret != LZMA_OK && ret != LZMA_TIMED_OUT) threads_stop(coder); return ret; } static lzma_ret decode_block_header(struct lzma_stream_coder *coder, const lzma_allocator *allocator, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size) { if (*in_pos >= in_size) return LZMA_OK; if (coder->pos == 0) { // Detect if it's Index. if (in[*in_pos] == INDEX_INDICATOR) return LZMA_INDEX_DETECTED; // Calculate the size of the Block Header. Note that // Block Header decoder wants to see this byte too // so don't advance *in_pos. coder->block_options.header_size = lzma_block_header_size_decode( in[*in_pos]); } // Copy the Block Header to the internal buffer. lzma_bufcpy(in, in_pos, in_size, coder->buffer, &coder->pos, coder->block_options.header_size); // Return if we didn't get the whole Block Header yet. if (coder->pos < coder->block_options.header_size) return LZMA_OK; coder->pos = 0; // Version 1 is needed to support the .ignore_check option. coder->block_options.version = 1; // Block Header decoder will initialize all members of this array // so we don't need to do it here. coder->block_options.filters = coder->filters; // Decode the Block Header. return_if_error(lzma_block_header_decode(&coder->block_options, allocator, coder->buffer)); // If LZMA_IGNORE_CHECK was used, this flag needs to be set. // It has to be set after lzma_block_header_decode() because // it always resets this to false. coder->block_options.ignore_check = coder->ignore_check; // coder->block_options is ready now. return LZMA_STREAM_END; } /// Get the size of the Compressed Data + Block Padding + Check. static size_t comp_blk_size(const struct lzma_stream_coder *coder) { return vli_ceil4(coder->block_options.compressed_size) + lzma_check_size(coder->stream_flags.check); } /// Returns true if the size (compressed or uncompressed) is such that /// threaded decompression cannot be used. Sizes that are too big compared /// to SIZE_MAX must be rejected to avoid integer overflows and truncations /// when lzma_vli is assigned to a size_t. static bool is_direct_mode_needed(lzma_vli size) { return size == LZMA_VLI_UNKNOWN || size > SIZE_MAX / 3; } static lzma_ret stream_decoder_reset(struct lzma_stream_coder *coder, const lzma_allocator *allocator) { // Initialize the Index hash used to verify the Index. coder->index_hash = lzma_index_hash_init(coder->index_hash, allocator); if (coder->index_hash == NULL) return LZMA_MEM_ERROR; // Reset the rest of the variables. coder->sequence = SEQ_STREAM_HEADER; coder->pos = 0; return LZMA_OK; } static lzma_ret stream_decode_mt(void *coder_ptr, const lzma_allocator *allocator, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size, uint8_t *restrict out, size_t *restrict out_pos, size_t out_size, lzma_action action) { struct lzma_stream_coder *coder = coder_ptr; mythread_condtime wait_abs; bool has_blocked = false; // Determine if in SEQ_BLOCK_HEADER and SEQ_BLOCK_THR_RUN we should // tell read_output_and_wait() to wait until it can fill the output // buffer (or a timeout occurs). Two conditions must be met: // // (1) If the caller provided no new input. The reason for this // can be, for example, the end of the file or that there is // a pause in the input stream and more input is available // a little later. In this situation we should wait for output // because otherwise we would end up in a busy-waiting loop where // we make no progress and the application just calls us again // without providing any new input. This would then result in // LZMA_BUF_ERROR even though more output would be available // once the worker threads decode more data. // // (2) Even if (1) is true, we will not wait if the previous call to // this function managed to produce some output and the output // buffer became full. This is for compatibility with applications // that call lzma_code() in such a way that new input is provided // only when the output buffer didn't become full. Without this // trick such applications would have bad performance (bad // parallelization due to decoder not getting input fast enough). // // NOTE: Such loops might require that timeout is disabled (0) // if they assume that output-not-full implies that all input has // been consumed. If and only if timeout is enabled, we may return // when output isn't full *and* not all input has been consumed. // // However, if LZMA_FINISH is used, the above is ignored and we always // wait (timeout can still cause us to return) because we know that // we won't get any more input. This matters if the input file is // truncated and we are doing single-shot decoding, that is, // timeout = 0 and LZMA_FINISH is used on the first call to // lzma_code() and the output buffer is known to be big enough // to hold all uncompressed data: // // - If LZMA_FINISH wasn't handled specially, we could return // LZMA_OK before providing all output that is possible with the // truncated input. The rest would be available if lzma_code() was // called again but then it's not single-shot decoding anymore. // // - By handling LZMA_FINISH specially here, the first call will // produce all the output, matching the behavior of the // single-threaded decoder. // // So it's a very specific corner case but also easy to avoid. Note // that this special handling of LZMA_FINISH has no effect for // single-shot decoding when the input file is valid (not truncated); // premature LZMA_OK wouldn't be possible as long as timeout = 0. const bool waiting_allowed = action == LZMA_FINISH || (*in_pos == in_size && !coder->out_was_filled); coder->out_was_filled = false; while (true) switch (coder->sequence) { case SEQ_STREAM_HEADER: { // Copy the Stream Header to the internal buffer. const size_t in_old = *in_pos; lzma_bufcpy(in, in_pos, in_size, coder->buffer, &coder->pos, LZMA_STREAM_HEADER_SIZE); coder->progress_in += *in_pos - in_old; // Return if we didn't get the whole Stream Header yet. if (coder->pos < LZMA_STREAM_HEADER_SIZE) return LZMA_OK; coder->pos = 0; // Decode the Stream Header. const lzma_ret ret = lzma_stream_header_decode( &coder->stream_flags, coder->buffer); if (ret != LZMA_OK) return ret == LZMA_FORMAT_ERROR && !coder->first_stream ? LZMA_DATA_ERROR : ret; // If we are decoding concatenated Streams, and the later // Streams have invalid Header Magic Bytes, we give // LZMA_DATA_ERROR instead of LZMA_FORMAT_ERROR. coder->first_stream = false; // Copy the type of the Check so that Block Header and Block // decoders see it. coder->block_options.check = coder->stream_flags.check; // Even if we return LZMA_*_CHECK below, we want // to continue from Block Header decoding. coder->sequence = SEQ_BLOCK_HEADER; // Detect if there's no integrity check or if it is // unsupported if those were requested by the application. if (coder->tell_no_check && coder->stream_flags.check == LZMA_CHECK_NONE) return LZMA_NO_CHECK; if (coder->tell_unsupported_check && !lzma_check_is_supported( coder->stream_flags.check)) return LZMA_UNSUPPORTED_CHECK; if (coder->tell_any_check) return LZMA_GET_CHECK; - } - // Fall through + FALLTHROUGH; + } case SEQ_BLOCK_HEADER: { const size_t in_old = *in_pos; const lzma_ret ret = decode_block_header(coder, allocator, in, in_pos, in_size); coder->progress_in += *in_pos - in_old; if (ret == LZMA_OK) { // We didn't decode the whole Block Header yet. // // Read output from the queue before returning. This // is important because it is possible that the // application doesn't have any new input available // immediately. If we didn't try to copy output from // the output queue here, lzma_code() could end up // returning LZMA_BUF_ERROR even though queued output // is available. // // If the lzma_code() call provided at least one input // byte, only copy as much data from the output queue // as is available immediately. This way the // application will be able to provide more input // without a delay. // // On the other hand, if lzma_code() was called with // an empty input buffer(*), treat it specially: try // to fill the output buffer even if it requires // waiting for the worker threads to provide output // (timeout, if specified, can still cause us to // return). // // - This way the application will be able to get all // data that can be decoded from the input provided // so far. // // - We avoid both premature LZMA_BUF_ERROR and // busy-waiting where the application repeatedly // calls lzma_code() which immediately returns // LZMA_OK without providing new data. // // - If the queue becomes empty, we won't wait // anything and will return LZMA_OK immediately // (coder->timeout is completely ignored). // // (*) See the comment at the beginning of this // function how waiting_allowed is determined // and why there is an exception to the rule // of "called with an empty input buffer". assert(*in_pos == in_size); // If LZMA_FINISH was used we know that we won't get // more input, so the file must be truncated if we // get here. If worker threads don't detect any // errors, eventually there will be no more output // while we keep returning LZMA_OK which gets // converted to LZMA_BUF_ERROR in lzma_code(). // // If fail-fast is enabled then we will return // immediately using LZMA_DATA_ERROR instead of // LZMA_OK or LZMA_BUF_ERROR. Rationale for the // error code: // // - Worker threads may have a large amount of // not-yet-decoded input data and we don't // know for sure if all data is valid. Bad // data there would result in LZMA_DATA_ERROR // when fail-fast isn't used. // // - Immediate LZMA_BUF_ERROR would be a bit weird // considering the older liblzma code. lzma_code() // even has an assertion to prevent coders from // returning LZMA_BUF_ERROR directly. // // The downside of this is that with fail-fast apps // cannot always distinguish between corrupt and // truncated files. if (action == LZMA_FINISH && coder->fail_fast) { // We won't produce any more output. Stop // the unfinished worker threads so they // won't waste CPU time. threads_stop(coder); return LZMA_DATA_ERROR; } // read_output_and_wait() will call threads_stop() // if needed so with that we can use return_if_error. return_if_error(read_output_and_wait(coder, allocator, out, out_pos, out_size, NULL, waiting_allowed, &wait_abs, &has_blocked)); if (coder->pending_error != LZMA_OK) { coder->sequence = SEQ_ERROR; break; } return LZMA_OK; } if (ret == LZMA_INDEX_DETECTED) { coder->sequence = SEQ_INDEX_WAIT_OUTPUT; break; } // See if an error occurred. if (ret != LZMA_STREAM_END) { // NOTE: Here and in all other places where // pending_error is set, it may overwrite the value // (LZMA_PROG_ERROR) set by read_output_and_wait(). // That function might overwrite value set here too. // These are fine because when read_output_and_wait() // sets pending_error, it actually works as a flag // variable only ("some error has occurred") and the // actual value of pending_error is not used in // SEQ_ERROR. In such cases SEQ_ERROR will eventually // get the correct error code from the return value of // a later read_output_and_wait() call. coder->pending_error = ret; coder->sequence = SEQ_ERROR; break; } // Calculate the memory usage of the filters / Block decoder. coder->mem_next_filters = lzma_raw_decoder_memusage( coder->filters); if (coder->mem_next_filters == UINT64_MAX) { // One or more unknown Filter IDs. coder->pending_error = LZMA_OPTIONS_ERROR; coder->sequence = SEQ_ERROR; break; } coder->sequence = SEQ_BLOCK_INIT; + FALLTHROUGH; } - // Fall through - case SEQ_BLOCK_INIT: { // Check if decoding is possible at all with the current // memlimit_stop which we must never exceed. // // This needs to be the first thing in SEQ_BLOCK_INIT // to make it possible to restart decoding after increasing // memlimit_stop with lzma_memlimit_set(). if (coder->mem_next_filters > coder->memlimit_stop) { // Flush pending output before returning // LZMA_MEMLIMIT_ERROR. If the application doesn't // want to increase the limit, at least it will get // all the output possible so far. return_if_error(read_output_and_wait(coder, allocator, out, out_pos, out_size, NULL, true, &wait_abs, &has_blocked)); if (!lzma_outq_is_empty(&coder->outq)) return LZMA_OK; return LZMA_MEMLIMIT_ERROR; } // Check if the size information is available in Block Header. // If it is, check if the sizes are small enough that we don't // need to worry *too* much about integer overflows later in // the code. If these conditions are not met, we must use the // single-threaded direct mode. if (is_direct_mode_needed(coder->block_options.compressed_size) || is_direct_mode_needed( coder->block_options.uncompressed_size)) { coder->sequence = SEQ_BLOCK_DIRECT_INIT; break; } // Calculate the amount of memory needed for the input and // output buffers in threaded mode. // // These cannot overflow because we already checked that // the sizes are small enough using is_direct_mode_needed(). coder->mem_next_in = comp_blk_size(coder); const uint64_t mem_buffers = coder->mem_next_in + lzma_outq_outbuf_memusage( coder->block_options.uncompressed_size); // Add the amount needed by the filters. // Avoid integer overflows. if (UINT64_MAX - mem_buffers < coder->mem_next_filters) { // Use direct mode if the memusage would overflow. // This is a theoretical case that shouldn't happen // in practice unless the input file is weird (broken // or malicious). coder->sequence = SEQ_BLOCK_DIRECT_INIT; break; } // Amount of memory needed to decode this Block in // threaded mode: coder->mem_next_block = coder->mem_next_filters + mem_buffers; // If this alone would exceed memlimit_threading, then we must // use the single-threaded direct mode. if (coder->mem_next_block > coder->memlimit_threading) { coder->sequence = SEQ_BLOCK_DIRECT_INIT; break; } // Use the threaded mode. Free the direct mode decoder in // case it has been initialized. lzma_next_end(&coder->block_decoder, allocator); coder->mem_direct_mode = 0; // Since we already know what the sizes are supposed to be, // we can already add them to the Index hash. The Block // decoder will verify the values while decoding. const lzma_ret ret = lzma_index_hash_append(coder->index_hash, lzma_block_unpadded_size( &coder->block_options), coder->block_options.uncompressed_size); if (ret != LZMA_OK) { coder->pending_error = ret; coder->sequence = SEQ_ERROR; break; } coder->sequence = SEQ_BLOCK_THR_INIT; + FALLTHROUGH; } - // Fall through - case SEQ_BLOCK_THR_INIT: { // We need to wait for a multiple conditions to become true // until we can initialize the Block decoder and let a worker // thread decode it: // // - Wait for the memory usage of the active threads to drop // so that starting the decoding of this Block won't make // us go over memlimit_threading. // // - Wait for at least one free output queue slot. // // - Wait for a free worker thread. // // While we wait, we must copy decompressed data to the out // buffer and catch possible decoder errors. // // read_output_and_wait() does all the above. bool block_can_start = false; return_if_error(read_output_and_wait(coder, allocator, out, out_pos, out_size, &block_can_start, true, &wait_abs, &has_blocked)); if (coder->pending_error != LZMA_OK) { coder->sequence = SEQ_ERROR; break; } if (!block_can_start) { // It's not a timeout because return_if_error handles // it already. Output queue cannot be empty either // because in that case block_can_start would have // been true. Thus the output buffer must be full and // the queue isn't empty. assert(*out_pos == out_size); assert(!lzma_outq_is_empty(&coder->outq)); return LZMA_OK; } // We know that we can start decoding this Block without // exceeding memlimit_threading. However, to stay below // memlimit_threading may require freeing some of the // cached memory. // // Get a local copy of variables that require locking the // mutex. It is fine if the worker threads modify the real // values after we read these as those changes can only be // towards more favorable conditions (less memory in use, // more in cache). // // These are initialized to silence warnings. uint64_t mem_in_use = 0; uint64_t mem_cached = 0; struct worker_thread *thr = NULL; mythread_sync(coder->mutex) { mem_in_use = coder->mem_in_use; mem_cached = coder->mem_cached; thr = coder->threads_free; } // The maximum amount of memory that can be held by other // threads and cached buffers while allowing us to start // decoding the next Block. const uint64_t mem_max = coder->memlimit_threading - coder->mem_next_block; // If the existing allocations are so large that starting // to decode this Block might exceed memlimit_threads, // try to free memory from the output queue cache first. // // NOTE: This math assumes the worst case. It's possible // that the limit wouldn't be exceeded if the existing cached // allocations are reused. if (mem_in_use + mem_cached + coder->outq.mem_allocated > mem_max) { // Clear the outq cache except leave one buffer in // the cache if its size is correct. That way we // don't free and almost immediately reallocate // an identical buffer. lzma_outq_clear_cache2(&coder->outq, allocator, coder->block_options.uncompressed_size); } // If there is at least one worker_thread in the cache and // the existing allocations are so large that starting to // decode this Block might exceed memlimit_threads, free // memory by freeing cached Block decoders. // // NOTE: The comparison is different here than above. // Here we don't care about cached buffers in outq anymore // and only look at memory actually in use. This is because // if there is something in outq cache, it's a single buffer // that can be used as is. We ensured this in the above // if-block. uint64_t mem_freed = 0; if (thr != NULL && mem_in_use + mem_cached + coder->outq.mem_in_use > mem_max) { // Don't free the first Block decoder if its memory // usage isn't greater than what this Block will need. // Typically the same filter chain is used for all // Blocks so this way the allocations can be reused // when get_thread() picks the first worker_thread // from the cache. if (thr->mem_filters <= coder->mem_next_filters) thr = thr->next; while (thr != NULL) { lzma_next_end(&thr->block_decoder, allocator); mem_freed += thr->mem_filters; thr->mem_filters = 0; thr = thr->next; } } // Update the memory usage counters. Note that coder->mem_* // may have changed since we read them so we must subtract // or add the changes. mythread_sync(coder->mutex) { coder->mem_cached -= mem_freed; // Memory needed for the filters and the input buffer. // The output queue takes care of its own counter so // we don't touch it here. // // NOTE: After this, coder->mem_in_use + // coder->mem_cached might count the same thing twice. // If so, this will get corrected in get_thread() when // a worker_thread is picked from coder->free_threads // and its memory usage is subtracted from mem_cached. coder->mem_in_use += coder->mem_next_in + coder->mem_next_filters; } // Allocate memory for the output buffer in the output queue. lzma_ret ret = lzma_outq_prealloc_buf( &coder->outq, allocator, coder->block_options.uncompressed_size); if (ret != LZMA_OK) { threads_stop(coder); return ret; } // Set up coder->thr. ret = get_thread(coder, allocator); if (ret != LZMA_OK) { threads_stop(coder); return ret; } // The new Block decoder memory usage is already counted in // coder->mem_in_use. Store it in the thread too. coder->thr->mem_filters = coder->mem_next_filters; // Initialize the Block decoder. coder->thr->block_options = coder->block_options; ret = lzma_block_decoder_init( &coder->thr->block_decoder, allocator, &coder->thr->block_options); // Free the allocated filter options since they are needed // only to initialize the Block decoder. lzma_filters_free(coder->filters, allocator); coder->thr->block_options.filters = NULL; // Check if memory usage calculation and Block encoder // initialization succeeded. if (ret != LZMA_OK) { coder->pending_error = ret; coder->sequence = SEQ_ERROR; break; } // Allocate the input buffer. coder->thr->in_size = coder->mem_next_in; coder->thr->in = lzma_alloc(coder->thr->in_size, allocator); if (coder->thr->in == NULL) { threads_stop(coder); return LZMA_MEM_ERROR; } // Get the preallocated output buffer. coder->thr->outbuf = lzma_outq_get_buf( &coder->outq, coder->thr); // Start the decoder. mythread_sync(coder->thr->mutex) { assert(coder->thr->state == THR_IDLE); coder->thr->state = THR_RUN; mythread_cond_signal(&coder->thr->cond); } // Enable output from the thread that holds the oldest output // buffer in the output queue (if such a thread exists). mythread_sync(coder->mutex) { lzma_outq_enable_partial_output(&coder->outq, &worker_enable_partial_update); } coder->sequence = SEQ_BLOCK_THR_RUN; + FALLTHROUGH; } - // Fall through - case SEQ_BLOCK_THR_RUN: { if (action == LZMA_FINISH && coder->fail_fast) { // We know that we won't get more input and that // the caller wants fail-fast behavior. If we see // that we don't have enough input to finish this // Block, return LZMA_DATA_ERROR immediately. // See SEQ_BLOCK_HEADER for the error code rationale. const size_t in_avail = in_size - *in_pos; const size_t in_needed = coder->thr->in_size - coder->thr->in_filled; if (in_avail < in_needed) { threads_stop(coder); return LZMA_DATA_ERROR; } } // Copy input to the worker thread. size_t cur_in_filled = coder->thr->in_filled; lzma_bufcpy(in, in_pos, in_size, coder->thr->in, &cur_in_filled, coder->thr->in_size); // Tell the thread how much we copied. mythread_sync(coder->thr->mutex) { coder->thr->in_filled = cur_in_filled; // NOTE: Most of the time we are copying input faster // than the thread can decode so most of the time // calling mythread_cond_signal() is useless but // we cannot make it conditional because thr->in_pos // is updated without a mutex. And the overhead should // be very much negligible anyway. mythread_cond_signal(&coder->thr->cond); } // Read output from the output queue. Just like in // SEQ_BLOCK_HEADER, we wait to fill the output buffer // only if waiting_allowed was set to true in the beginning - // of this function (see the comment there). + // of this function (see the comment there) and there is + // no input available. In SEQ_BLOCK_HEADER, there is never + // input available when read_output_and_wait() is called, + // but here there can be when LZMA_FINISH is used, thus we + // need to check if *in_pos == in_size. Otherwise we would + // wait here instead of using the available input to start + // a new thread. return_if_error(read_output_and_wait(coder, allocator, out, out_pos, out_size, - NULL, waiting_allowed, + NULL, + waiting_allowed && *in_pos == in_size, &wait_abs, &has_blocked)); if (coder->pending_error != LZMA_OK) { coder->sequence = SEQ_ERROR; break; } // Return if the input didn't contain the whole Block. + // + // NOTE: When we updated coder->thr->in_filled a few lines + // above, the worker thread might by now have finished its + // work and returned itself back to the stack of free threads. if (coder->thr->in_filled < coder->thr->in_size) { assert(*in_pos == in_size); return LZMA_OK; } // The whole Block has been copied to the thread-specific // buffer. Continue from the next Block Header or Index. coder->thr = NULL; coder->sequence = SEQ_BLOCK_HEADER; break; } case SEQ_BLOCK_DIRECT_INIT: { // Wait for the threads to finish and that all decoded data // has been copied to the output. That is, wait until the // output queue becomes empty. // // NOTE: No need to check for coder->pending_error as // we aren't consuming any input until the queue is empty // and if there is a pending error, read_output_and_wait() // will eventually return it before the queue is empty. return_if_error(read_output_and_wait(coder, allocator, out, out_pos, out_size, NULL, true, &wait_abs, &has_blocked)); if (!lzma_outq_is_empty(&coder->outq)) return LZMA_OK; // Free the cached output buffers. lzma_outq_clear_cache(&coder->outq, allocator); // Get rid of the worker threads, including the coder->threads // array. threads_end(coder, allocator); // Initialize the Block decoder. const lzma_ret ret = lzma_block_decoder_init( &coder->block_decoder, allocator, &coder->block_options); // Free the allocated filter options since they are needed // only to initialize the Block decoder. lzma_filters_free(coder->filters, allocator); coder->block_options.filters = NULL; // Check if Block decoder initialization succeeded. if (ret != LZMA_OK) return ret; // Make the memory usage visible to _memconfig(). coder->mem_direct_mode = coder->mem_next_filters; coder->sequence = SEQ_BLOCK_DIRECT_RUN; + FALLTHROUGH; } - // Fall through - case SEQ_BLOCK_DIRECT_RUN: { const size_t in_old = *in_pos; const size_t out_old = *out_pos; const lzma_ret ret = coder->block_decoder.code( coder->block_decoder.coder, allocator, in, in_pos, in_size, out, out_pos, out_size, action); coder->progress_in += *in_pos - in_old; coder->progress_out += *out_pos - out_old; if (ret != LZMA_STREAM_END) return ret; // Block decoded successfully. Add the new size pair to // the Index hash. return_if_error(lzma_index_hash_append(coder->index_hash, lzma_block_unpadded_size( &coder->block_options), coder->block_options.uncompressed_size)); coder->sequence = SEQ_BLOCK_HEADER; break; } case SEQ_INDEX_WAIT_OUTPUT: // Flush the output from all worker threads so that we can // decode the Index without thinking about threading. return_if_error(read_output_and_wait(coder, allocator, out, out_pos, out_size, NULL, true, &wait_abs, &has_blocked)); if (!lzma_outq_is_empty(&coder->outq)) return LZMA_OK; coder->sequence = SEQ_INDEX_DECODE; - - // Fall through + FALLTHROUGH; case SEQ_INDEX_DECODE: { // If we don't have any input, don't call // lzma_index_hash_decode() since it would return // LZMA_BUF_ERROR, which we must not do here. if (*in_pos >= in_size) return LZMA_OK; // Decode the Index and compare it to the hash calculated // from the sizes of the Blocks (if any). const size_t in_old = *in_pos; const lzma_ret ret = lzma_index_hash_decode(coder->index_hash, in, in_pos, in_size); coder->progress_in += *in_pos - in_old; if (ret != LZMA_STREAM_END) return ret; coder->sequence = SEQ_STREAM_FOOTER; + FALLTHROUGH; } - // Fall through - case SEQ_STREAM_FOOTER: { // Copy the Stream Footer to the internal buffer. const size_t in_old = *in_pos; lzma_bufcpy(in, in_pos, in_size, coder->buffer, &coder->pos, LZMA_STREAM_HEADER_SIZE); coder->progress_in += *in_pos - in_old; // Return if we didn't get the whole Stream Footer yet. if (coder->pos < LZMA_STREAM_HEADER_SIZE) return LZMA_OK; coder->pos = 0; // Decode the Stream Footer. The decoder gives // LZMA_FORMAT_ERROR if the magic bytes don't match, // so convert that return code to LZMA_DATA_ERROR. lzma_stream_flags footer_flags; const lzma_ret ret = lzma_stream_footer_decode( &footer_flags, coder->buffer); if (ret != LZMA_OK) return ret == LZMA_FORMAT_ERROR ? LZMA_DATA_ERROR : ret; // Check that Index Size stored in the Stream Footer matches // the real size of the Index field. if (lzma_index_hash_size(coder->index_hash) != footer_flags.backward_size) return LZMA_DATA_ERROR; // Compare that the Stream Flags fields are identical in // both Stream Header and Stream Footer. return_if_error(lzma_stream_flags_compare( &coder->stream_flags, &footer_flags)); if (!coder->concatenated) return LZMA_STREAM_END; coder->sequence = SEQ_STREAM_PADDING; + FALLTHROUGH; } - // Fall through - case SEQ_STREAM_PADDING: assert(coder->concatenated); // Skip over possible Stream Padding. while (true) { if (*in_pos >= in_size) { // Unless LZMA_FINISH was used, we cannot // know if there's more input coming later. if (action != LZMA_FINISH) return LZMA_OK; // Stream Padding must be a multiple of // four bytes. return coder->pos == 0 ? LZMA_STREAM_END : LZMA_DATA_ERROR; } // If the byte is not zero, it probably indicates // beginning of a new Stream (or the file is corrupt). if (in[*in_pos] != 0x00) break; ++*in_pos; ++coder->progress_in; coder->pos = (coder->pos + 1) & 3; } // Stream Padding must be a multiple of four bytes (empty // Stream Padding is OK). if (coder->pos != 0) { ++*in_pos; ++coder->progress_in; return LZMA_DATA_ERROR; } // Prepare to decode the next Stream. return_if_error(stream_decoder_reset(coder, allocator)); break; case SEQ_ERROR: if (!coder->fail_fast) { // Let the application get all data before the point // where the error was detected. This matches the // behavior of single-threaded use. // // FIXME? Some errors (LZMA_MEM_ERROR) don't get here, // they are returned immediately. Thus in rare cases // the output will be less than in the single-threaded // mode. Maybe this doesn't matter much in practice. return_if_error(read_output_and_wait(coder, allocator, out, out_pos, out_size, NULL, true, &wait_abs, &has_blocked)); // We get here only if the error happened in the main // thread, for example, unsupported Block Header. if (!lzma_outq_is_empty(&coder->outq)) return LZMA_OK; } // We only get here if no errors were detected by the worker // threads. Errors from worker threads would have already been // returned by the call to read_output_and_wait() above. return coder->pending_error; default: assert(0); return LZMA_PROG_ERROR; } // Never reached } static void stream_decoder_mt_end(void *coder_ptr, const lzma_allocator *allocator) { struct lzma_stream_coder *coder = coder_ptr; threads_end(coder, allocator); lzma_outq_end(&coder->outq, allocator); lzma_next_end(&coder->block_decoder, allocator); lzma_filters_free(coder->filters, allocator); lzma_index_hash_end(coder->index_hash, allocator); lzma_free(coder, allocator); return; } static lzma_check stream_decoder_mt_get_check(const void *coder_ptr) { const struct lzma_stream_coder *coder = coder_ptr; return coder->stream_flags.check; } static lzma_ret stream_decoder_mt_memconfig(void *coder_ptr, uint64_t *memusage, uint64_t *old_memlimit, uint64_t new_memlimit) { // NOTE: This function gets/sets memlimit_stop. For now, // memlimit_threading cannot be modified after initialization. // // *memusage will include cached memory too. Excluding cached memory // would be misleading and it wouldn't help the applications to // know how much memory is actually needed to decompress the file // because the higher the number of threads and the memlimits are // the more memory the decoder may use. // // Setting a new limit includes the cached memory too and too low // limits will be rejected. Alternative could be to free the cached // memory immediately if that helps to bring the limit down but // the current way is the simplest. It's unlikely that limit needs // to be lowered in the middle of a file anyway; the typical reason // to want a new limit is to increase after LZMA_MEMLIMIT_ERROR // and even such use isn't common. struct lzma_stream_coder *coder = coder_ptr; mythread_sync(coder->mutex) { *memusage = coder->mem_direct_mode + coder->mem_in_use + coder->mem_cached + coder->outq.mem_allocated; } // If no filter chains are allocated, *memusage may be zero. // Always return at least LZMA_MEMUSAGE_BASE. if (*memusage < LZMA_MEMUSAGE_BASE) *memusage = LZMA_MEMUSAGE_BASE; *old_memlimit = coder->memlimit_stop; if (new_memlimit != 0) { if (new_memlimit < *memusage) return LZMA_MEMLIMIT_ERROR; coder->memlimit_stop = new_memlimit; } return LZMA_OK; } static void stream_decoder_mt_get_progress(void *coder_ptr, uint64_t *progress_in, uint64_t *progress_out) { struct lzma_stream_coder *coder = coder_ptr; // Lock coder->mutex to prevent finishing threads from moving their // progress info from the worker_thread structure to lzma_stream_coder. mythread_sync(coder->mutex) { *progress_in = coder->progress_in; *progress_out = coder->progress_out; for (size_t i = 0; i < coder->threads_initialized; ++i) { mythread_sync(coder->threads[i].mutex) { *progress_in += coder->threads[i].progress_in; *progress_out += coder->threads[i] .progress_out; } } } return; } static lzma_ret stream_decoder_mt_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_mt *options) { struct lzma_stream_coder *coder; if (options->threads == 0 || options->threads > LZMA_THREADS_MAX) return LZMA_OPTIONS_ERROR; if (options->flags & ~LZMA_SUPPORTED_FLAGS) return LZMA_OPTIONS_ERROR; lzma_next_coder_init(&stream_decoder_mt_init, next, allocator); coder = next->coder; if (!coder) { coder = lzma_alloc(sizeof(struct lzma_stream_coder), allocator); if (coder == NULL) return LZMA_MEM_ERROR; next->coder = coder; if (mythread_mutex_init(&coder->mutex)) { lzma_free(coder, allocator); return LZMA_MEM_ERROR; } if (mythread_cond_init(&coder->cond)) { mythread_mutex_destroy(&coder->mutex); lzma_free(coder, allocator); return LZMA_MEM_ERROR; } next->code = &stream_decode_mt; next->end = &stream_decoder_mt_end; next->get_check = &stream_decoder_mt_get_check; next->memconfig = &stream_decoder_mt_memconfig; next->get_progress = &stream_decoder_mt_get_progress; coder->filters[0].id = LZMA_VLI_UNKNOWN; memzero(&coder->outq, sizeof(coder->outq)); coder->block_decoder = LZMA_NEXT_CODER_INIT; coder->mem_direct_mode = 0; coder->index_hash = NULL; coder->threads = NULL; coder->threads_free = NULL; coder->threads_initialized = 0; } // Cleanup old filter chain if one remains after unfinished decoding // of a previous Stream. lzma_filters_free(coder->filters, allocator); // By allocating threads from scratch we can start memory-usage // accounting from scratch, too. Changes in filter and block sizes may // affect number of threads. // - // FIXME? Reusing should be easy but unlike the single-threaded + // Reusing threads doesn't seem worth it. Unlike the single-threaded // decoder, with some types of input file combinations reusing // could leave quite a lot of memory allocated but unused (first // file could allocate a lot, the next files could use fewer // threads and some of the allocations from the first file would not // get freed unless memlimit_threading forces us to clear caches). // // NOTE: The direct mode decoder isn't freed here if one exists. // It will be reused or freed as needed in the main loop. threads_end(coder, allocator); // All memusage counters start at 0 (including mem_direct_mode). // The little extra that is needed for the structs in this file // get accounted well enough by the filter chain memory usage // which adds LZMA_MEMUSAGE_BASE for each chain. However, // stream_decoder_mt_memconfig() has to handle this specially so that // it will never return less than LZMA_MEMUSAGE_BASE as memory usage. coder->mem_in_use = 0; coder->mem_cached = 0; coder->mem_next_block = 0; coder->progress_in = 0; coder->progress_out = 0; coder->sequence = SEQ_STREAM_HEADER; coder->thread_error = LZMA_OK; coder->pending_error = LZMA_OK; coder->thr = NULL; coder->timeout = options->timeout; coder->memlimit_threading = my_max(1, options->memlimit_threading); coder->memlimit_stop = my_max(1, options->memlimit_stop); if (coder->memlimit_threading > coder->memlimit_stop) coder->memlimit_threading = coder->memlimit_stop; coder->tell_no_check = (options->flags & LZMA_TELL_NO_CHECK) != 0; coder->tell_unsupported_check = (options->flags & LZMA_TELL_UNSUPPORTED_CHECK) != 0; coder->tell_any_check = (options->flags & LZMA_TELL_ANY_CHECK) != 0; coder->ignore_check = (options->flags & LZMA_IGNORE_CHECK) != 0; coder->concatenated = (options->flags & LZMA_CONCATENATED) != 0; coder->fail_fast = (options->flags & LZMA_FAIL_FAST) != 0; coder->first_stream = true; coder->out_was_filled = false; coder->pos = 0; coder->threads_max = options->threads; return_if_error(lzma_outq_init(&coder->outq, allocator, coder->threads_max)); return stream_decoder_reset(coder, allocator); } extern LZMA_API(lzma_ret) lzma_stream_decoder_mt(lzma_stream *strm, const lzma_mt *options) { lzma_next_strm_init(stream_decoder_mt_init, strm, options); strm->internal->supported_actions[LZMA_RUN] = true; strm->internal->supported_actions[LZMA_FINISH] = true; return LZMA_OK; } diff --git a/src/liblzma/common/stream_encoder_mt.c b/src/liblzma/common/stream_encoder_mt.c index f0fef1523318..fd0eb98df682 100644 --- a/src/liblzma/common/stream_encoder_mt.c +++ b/src/liblzma/common/stream_encoder_mt.c @@ -1,1280 +1,1278 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file stream_encoder_mt.c /// \brief Multithreaded .xz Stream encoder // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "filter_encoder.h" #include "easy_preset.h" #include "block_encoder.h" #include "block_buffer_encoder.h" #include "index_encoder.h" #include "outqueue.h" /// Maximum supported block size. This makes it simpler to prevent integer /// overflows if we are given unusually large block size. #define BLOCK_SIZE_MAX (UINT64_MAX / LZMA_THREADS_MAX) typedef enum { /// Waiting for work. THR_IDLE, /// Encoding is in progress. THR_RUN, /// Encoding is in progress but no more input data will /// be read. THR_FINISH, /// The main thread wants the thread to stop whatever it was doing /// but not exit. THR_STOP, /// The main thread wants the thread to exit. We could use /// cancellation but since there's stopped anyway, this is lazier. THR_EXIT, } worker_state; typedef struct lzma_stream_coder_s lzma_stream_coder; typedef struct worker_thread_s worker_thread; struct worker_thread_s { worker_state state; /// Input buffer of coder->block_size bytes. The main thread will /// put new input into this and update in_size accordingly. Once /// no more input is coming, state will be set to THR_FINISH. uint8_t *in; /// Amount of data available in the input buffer. This is modified /// only by the main thread. size_t in_size; /// Output buffer for this thread. This is set by the main /// thread every time a new Block is started with this thread /// structure. lzma_outbuf *outbuf; /// Pointer to the main structure is needed when putting this /// thread back to the stack of free threads. lzma_stream_coder *coder; /// The allocator is set by the main thread. Since a copy of the /// pointer is kept here, the application must not change the /// allocator before calling lzma_end(). const lzma_allocator *allocator; /// Amount of uncompressed data that has already been compressed. uint64_t progress_in; /// Amount of compressed data that is ready. uint64_t progress_out; /// Block encoder lzma_next_coder block_encoder; /// Compression options for this Block lzma_block block_options; /// Filter chain for this thread. By copying the filters array /// to each thread it is possible to change the filter chain /// between Blocks using lzma_filters_update(). lzma_filter filters[LZMA_FILTERS_MAX + 1]; /// Next structure in the stack of free worker threads. worker_thread *next; mythread_mutex mutex; mythread_cond cond; /// The ID of this thread is used to join the thread /// when it's not needed anymore. mythread thread_id; }; struct lzma_stream_coder_s { enum { SEQ_STREAM_HEADER, SEQ_BLOCK, SEQ_INDEX, SEQ_STREAM_FOOTER, } sequence; /// Start a new Block every block_size bytes of input unless /// LZMA_FULL_FLUSH or LZMA_FULL_BARRIER is used earlier. size_t block_size; /// The filter chain to use for the next Block. /// This can be updated using lzma_filters_update() /// after LZMA_FULL_BARRIER or LZMA_FULL_FLUSH. lzma_filter filters[LZMA_FILTERS_MAX + 1]; /// A copy of filters[] will be put here when attempting to get /// a new worker thread. This will be copied to a worker thread /// when a thread becomes free and then this cache is marked as /// empty by setting [0].id = LZMA_VLI_UNKNOWN. Without this cache /// the filter options from filters[] would get uselessly copied /// multiple times (allocated and freed) when waiting for a new free /// worker thread. /// /// This is freed if filters[] is updated via lzma_filters_update(). lzma_filter filters_cache[LZMA_FILTERS_MAX + 1]; /// Index to hold sizes of the Blocks lzma_index *index; /// Index encoder lzma_next_coder index_encoder; /// Stream Flags for encoding the Stream Header and Stream Footer. lzma_stream_flags stream_flags; /// Buffer to hold Stream Header and Stream Footer. uint8_t header[LZMA_STREAM_HEADER_SIZE]; /// Read position in header[] size_t header_pos; /// Output buffer queue for compressed data lzma_outq outq; /// How much memory to allocate for each lzma_outbuf.buf size_t outbuf_alloc_size; /// Maximum wait time if cannot use all the input and cannot /// fill the output buffer. This is in milliseconds. uint32_t timeout; /// Error code from a worker thread lzma_ret thread_error; /// Array of allocated thread-specific structures worker_thread *threads; /// Number of structures in "threads" above. This is also the /// number of threads that will be created at maximum. uint32_t threads_max; /// Number of thread structures that have been initialized, and /// thus the number of worker threads actually created so far. uint32_t threads_initialized; /// Stack of free threads. When a thread finishes, it puts itself /// back into this stack. This starts as empty because threads /// are created only when actually needed. worker_thread *threads_free; /// The most recent worker thread to which the main thread writes /// the new input from the application. worker_thread *thr; /// Amount of uncompressed data in Blocks that have already /// been finished. uint64_t progress_in; /// Amount of compressed data in Stream Header + Blocks that /// have already been finished. uint64_t progress_out; mythread_mutex mutex; mythread_cond cond; }; /// Tell the main thread that something has gone wrong. static void worker_error(worker_thread *thr, lzma_ret ret) { assert(ret != LZMA_OK); assert(ret != LZMA_STREAM_END); mythread_sync(thr->coder->mutex) { if (thr->coder->thread_error == LZMA_OK) thr->coder->thread_error = ret; mythread_cond_signal(&thr->coder->cond); } return; } static worker_state worker_encode(worker_thread *thr, size_t *out_pos, worker_state state) { assert(thr->progress_in == 0); assert(thr->progress_out == 0); // Set the Block options. thr->block_options = (lzma_block){ .version = 0, .check = thr->coder->stream_flags.check, .compressed_size = thr->outbuf->allocated, .uncompressed_size = thr->coder->block_size, .filters = thr->filters, }; // Calculate maximum size of the Block Header. This amount is // reserved in the beginning of the buffer so that Block Header // along with Compressed Size and Uncompressed Size can be // written there. lzma_ret ret = lzma_block_header_size(&thr->block_options); if (ret != LZMA_OK) { worker_error(thr, ret); return THR_STOP; } // Initialize the Block encoder. ret = lzma_block_encoder_init(&thr->block_encoder, thr->allocator, &thr->block_options); if (ret != LZMA_OK) { worker_error(thr, ret); return THR_STOP; } size_t in_pos = 0; size_t in_size = 0; *out_pos = thr->block_options.header_size; const size_t out_size = thr->outbuf->allocated; do { mythread_sync(thr->mutex) { // Store in_pos and *out_pos into *thr so that // an application may read them via // lzma_get_progress() to get progress information. // // NOTE: These aren't updated when the encoding // finishes. Instead, the final values are taken // later from thr->outbuf. thr->progress_in = in_pos; thr->progress_out = *out_pos; while (in_size == thr->in_size && thr->state == THR_RUN) mythread_cond_wait(&thr->cond, &thr->mutex); state = thr->state; in_size = thr->in_size; } // Return if we were asked to stop or exit. if (state >= THR_STOP) return state; lzma_action action = state == THR_FINISH ? LZMA_FINISH : LZMA_RUN; // Limit the amount of input given to the Block encoder // at once. This way this thread can react fairly quickly // if the main thread wants us to stop or exit. static const size_t in_chunk_max = 16384; size_t in_limit = in_size; if (in_size - in_pos > in_chunk_max) { in_limit = in_pos + in_chunk_max; action = LZMA_RUN; } ret = thr->block_encoder.code( thr->block_encoder.coder, thr->allocator, thr->in, &in_pos, in_limit, thr->outbuf->buf, out_pos, out_size, action); } while (ret == LZMA_OK && *out_pos < out_size); switch (ret) { case LZMA_STREAM_END: assert(state == THR_FINISH); // Encode the Block Header. By doing it after // the compression, we can store the Compressed Size // and Uncompressed Size fields. ret = lzma_block_header_encode(&thr->block_options, thr->outbuf->buf); if (ret != LZMA_OK) { worker_error(thr, ret); return THR_STOP; } break; case LZMA_OK: // The data was incompressible. Encode it using uncompressed // LZMA2 chunks. // // First wait that we have gotten all the input. mythread_sync(thr->mutex) { while (thr->state == THR_RUN) mythread_cond_wait(&thr->cond, &thr->mutex); state = thr->state; in_size = thr->in_size; } if (state >= THR_STOP) return state; // Do the encoding. This takes care of the Block Header too. *out_pos = 0; ret = lzma_block_uncomp_encode(&thr->block_options, thr->in, in_size, thr->outbuf->buf, out_pos, out_size); // It shouldn't fail. if (ret != LZMA_OK) { worker_error(thr, LZMA_PROG_ERROR); return THR_STOP; } break; default: worker_error(thr, ret); return THR_STOP; } // Set the size information that will be read by the main thread // to write the Index field. thr->outbuf->unpadded_size = lzma_block_unpadded_size(&thr->block_options); assert(thr->outbuf->unpadded_size != 0); thr->outbuf->uncompressed_size = thr->block_options.uncompressed_size; return THR_FINISH; } static MYTHREAD_RET_TYPE worker_start(void *thr_ptr) { worker_thread *thr = thr_ptr; worker_state state = THR_IDLE; // Init to silence a warning while (true) { // Wait for work. mythread_sync(thr->mutex) { while (true) { // The thread is already idle so if we are // requested to stop, just set the state. if (thr->state == THR_STOP) { thr->state = THR_IDLE; mythread_cond_signal(&thr->cond); } state = thr->state; if (state != THR_IDLE) break; mythread_cond_wait(&thr->cond, &thr->mutex); } } size_t out_pos = 0; assert(state != THR_IDLE); assert(state != THR_STOP); if (state <= THR_FINISH) state = worker_encode(thr, &out_pos, state); if (state == THR_EXIT) break; // Mark the thread as idle unless the main thread has // told us to exit. Signal is needed for the case // where the main thread is waiting for the threads to stop. mythread_sync(thr->mutex) { if (thr->state != THR_EXIT) { thr->state = THR_IDLE; mythread_cond_signal(&thr->cond); } } mythread_sync(thr->coder->mutex) { // If no errors occurred, make the encoded data // available to be copied out. if (state == THR_FINISH) { thr->outbuf->pos = out_pos; thr->outbuf->finished = true; } // Update the main progress info. thr->coder->progress_in += thr->outbuf->uncompressed_size; thr->coder->progress_out += out_pos; thr->progress_in = 0; thr->progress_out = 0; // Return this thread to the stack of free threads. thr->next = thr->coder->threads_free; thr->coder->threads_free = thr; mythread_cond_signal(&thr->coder->cond); } } // Exiting, free the resources. lzma_filters_free(thr->filters, thr->allocator); mythread_mutex_destroy(&thr->mutex); mythread_cond_destroy(&thr->cond); lzma_next_end(&thr->block_encoder, thr->allocator); lzma_free(thr->in, thr->allocator); return MYTHREAD_RET_VALUE; } /// Make the threads stop but not exit. Optionally wait for them to stop. static void threads_stop(lzma_stream_coder *coder, bool wait_for_threads) { // Tell the threads to stop. for (uint32_t i = 0; i < coder->threads_initialized; ++i) { mythread_sync(coder->threads[i].mutex) { coder->threads[i].state = THR_STOP; mythread_cond_signal(&coder->threads[i].cond); } } if (!wait_for_threads) return; // Wait for the threads to settle in the idle state. for (uint32_t i = 0; i < coder->threads_initialized; ++i) { mythread_sync(coder->threads[i].mutex) { while (coder->threads[i].state != THR_IDLE) mythread_cond_wait(&coder->threads[i].cond, &coder->threads[i].mutex); } } return; } /// Stop the threads and free the resources associated with them. /// Wait until the threads have exited. static void threads_end(lzma_stream_coder *coder, const lzma_allocator *allocator) { for (uint32_t i = 0; i < coder->threads_initialized; ++i) { mythread_sync(coder->threads[i].mutex) { coder->threads[i].state = THR_EXIT; mythread_cond_signal(&coder->threads[i].cond); } } for (uint32_t i = 0; i < coder->threads_initialized; ++i) { int ret = mythread_join(coder->threads[i].thread_id); assert(ret == 0); (void)ret; } lzma_free(coder->threads, allocator); return; } /// Initialize a new worker_thread structure and create a new thread. static lzma_ret initialize_new_thread(lzma_stream_coder *coder, const lzma_allocator *allocator) { worker_thread *thr = &coder->threads[coder->threads_initialized]; thr->in = lzma_alloc(coder->block_size, allocator); if (thr->in == NULL) return LZMA_MEM_ERROR; if (mythread_mutex_init(&thr->mutex)) goto error_mutex; if (mythread_cond_init(&thr->cond)) goto error_cond; thr->state = THR_IDLE; thr->allocator = allocator; thr->coder = coder; thr->progress_in = 0; thr->progress_out = 0; thr->block_encoder = LZMA_NEXT_CODER_INIT; thr->filters[0].id = LZMA_VLI_UNKNOWN; if (mythread_create(&thr->thread_id, &worker_start, thr)) goto error_thread; ++coder->threads_initialized; coder->thr = thr; return LZMA_OK; error_thread: mythread_cond_destroy(&thr->cond); error_cond: mythread_mutex_destroy(&thr->mutex); error_mutex: lzma_free(thr->in, allocator); return LZMA_MEM_ERROR; } static lzma_ret get_thread(lzma_stream_coder *coder, const lzma_allocator *allocator) { // If there are no free output subqueues, there is no // point to try getting a thread. if (!lzma_outq_has_buf(&coder->outq)) return LZMA_OK; // That's also true if we cannot allocate memory for the output // buffer in the output queue. return_if_error(lzma_outq_prealloc_buf(&coder->outq, allocator, coder->outbuf_alloc_size)); // Make a thread-specific copy of the filter chain. Put it in // the cache array first so that if we cannot get a new thread yet, // the allocation is ready when we try again. if (coder->filters_cache[0].id == LZMA_VLI_UNKNOWN) return_if_error(lzma_filters_copy( coder->filters, coder->filters_cache, allocator)); // If there is a free structure on the stack, use it. mythread_sync(coder->mutex) { if (coder->threads_free != NULL) { coder->thr = coder->threads_free; coder->threads_free = coder->threads_free->next; } } if (coder->thr == NULL) { // If there are no uninitialized structures left, return. if (coder->threads_initialized == coder->threads_max) return LZMA_OK; // Initialize a new thread. return_if_error(initialize_new_thread(coder, allocator)); } // Reset the parts of the thread state that have to be done // in the main thread. mythread_sync(coder->thr->mutex) { coder->thr->state = THR_RUN; coder->thr->in_size = 0; coder->thr->outbuf = lzma_outq_get_buf(&coder->outq, NULL); // Free the old thread-specific filter options and replace // them with the already-allocated new options from // coder->filters_cache[]. Then mark the cache as empty. lzma_filters_free(coder->thr->filters, allocator); memcpy(coder->thr->filters, coder->filters_cache, sizeof(coder->filters_cache)); coder->filters_cache[0].id = LZMA_VLI_UNKNOWN; mythread_cond_signal(&coder->thr->cond); } return LZMA_OK; } static lzma_ret stream_encode_in(lzma_stream_coder *coder, const lzma_allocator *allocator, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size, lzma_action action) { while (*in_pos < in_size || (coder->thr != NULL && action != LZMA_RUN)) { if (coder->thr == NULL) { // Get a new thread. const lzma_ret ret = get_thread(coder, allocator); if (coder->thr == NULL) return ret; } // Copy the input data to thread's buffer. size_t thr_in_size = coder->thr->in_size; lzma_bufcpy(in, in_pos, in_size, coder->thr->in, &thr_in_size, coder->block_size); // Tell the Block encoder to finish if // - it has got block_size bytes of input; or // - all input was used and LZMA_FINISH, LZMA_FULL_FLUSH, // or LZMA_FULL_BARRIER was used. // // TODO: LZMA_SYNC_FLUSH and LZMA_SYNC_BARRIER. const bool finish = thr_in_size == coder->block_size || (*in_pos == in_size && action != LZMA_RUN); bool block_error = false; mythread_sync(coder->thr->mutex) { if (coder->thr->state == THR_IDLE) { // Something has gone wrong with the Block // encoder. It has set coder->thread_error // which we will read a few lines later. block_error = true; } else { // Tell the Block encoder its new amount // of input and update the state if needed. coder->thr->in_size = thr_in_size; if (finish) coder->thr->state = THR_FINISH; mythread_cond_signal(&coder->thr->cond); } } if (block_error) { lzma_ret ret = LZMA_OK; // Init to silence a warning. mythread_sync(coder->mutex) { ret = coder->thread_error; } return ret; } if (finish) coder->thr = NULL; } return LZMA_OK; } /// Wait until more input can be consumed, more output can be read, or /// an optional timeout is reached. static bool wait_for_work(lzma_stream_coder *coder, mythread_condtime *wait_abs, bool *has_blocked, bool has_input) { if (coder->timeout != 0 && !*has_blocked) { // Every time when stream_encode_mt() is called via // lzma_code(), *has_blocked starts as false. We set it // to true here and calculate the absolute time when // we must return if there's nothing to do. // // This way if we block multiple times for short moments // less than "timeout" milliseconds, we will return once // "timeout" amount of time has passed since the *first* // blocking occurred. If the absolute time was calculated // again every time we block, "timeout" would effectively // be meaningless if we never consecutively block longer // than "timeout" ms. *has_blocked = true; mythread_condtime_set(wait_abs, &coder->cond, coder->timeout); } bool timed_out = false; mythread_sync(coder->mutex) { // There are four things that we wait. If one of them // becomes possible, we return. // - If there is input left, we need to get a free // worker thread and an output buffer for it. // - Data ready to be read from the output queue. // - A worker thread indicates an error. // - Time out occurs. while ((!has_input || coder->threads_free == NULL || !lzma_outq_has_buf(&coder->outq)) && !lzma_outq_is_readable(&coder->outq) && coder->thread_error == LZMA_OK && !timed_out) { if (coder->timeout != 0) timed_out = mythread_cond_timedwait( &coder->cond, &coder->mutex, wait_abs) != 0; else mythread_cond_wait(&coder->cond, &coder->mutex); } } return timed_out; } static lzma_ret stream_encode_mt(void *coder_ptr, const lzma_allocator *allocator, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size, uint8_t *restrict out, size_t *restrict out_pos, size_t out_size, lzma_action action) { lzma_stream_coder *coder = coder_ptr; switch (coder->sequence) { case SEQ_STREAM_HEADER: lzma_bufcpy(coder->header, &coder->header_pos, sizeof(coder->header), out, out_pos, out_size); if (coder->header_pos < sizeof(coder->header)) return LZMA_OK; coder->header_pos = 0; coder->sequence = SEQ_BLOCK; - - // Fall through + FALLTHROUGH; case SEQ_BLOCK: { // Initialized to silence warnings. lzma_vli unpadded_size = 0; lzma_vli uncompressed_size = 0; lzma_ret ret = LZMA_OK; // These are for wait_for_work(). bool has_blocked = false; mythread_condtime wait_abs = { 0 }; while (true) { mythread_sync(coder->mutex) { // Check for Block encoder errors. ret = coder->thread_error; if (ret != LZMA_OK) { assert(ret != LZMA_STREAM_END); break; // Break out of mythread_sync. } // Try to read compressed data to out[]. ret = lzma_outq_read(&coder->outq, allocator, out, out_pos, out_size, &unpadded_size, &uncompressed_size); } if (ret == LZMA_STREAM_END) { // End of Block. Add it to the Index. ret = lzma_index_append(coder->index, allocator, unpadded_size, uncompressed_size); if (ret != LZMA_OK) { threads_stop(coder, false); return ret; } // If we didn't fill the output buffer yet, // try to read more data. Maybe the next // outbuf has been finished already too. if (*out_pos < out_size) continue; } if (ret != LZMA_OK) { // coder->thread_error was set. threads_stop(coder, false); return ret; } // Try to give uncompressed data to a worker thread. ret = stream_encode_in(coder, allocator, in, in_pos, in_size, action); if (ret != LZMA_OK) { threads_stop(coder, false); return ret; } // See if we should wait or return. // // TODO: LZMA_SYNC_FLUSH and LZMA_SYNC_BARRIER. if (*in_pos == in_size) { // LZMA_RUN: More data is probably coming // so return to let the caller fill the // input buffer. if (action == LZMA_RUN) return LZMA_OK; // LZMA_FULL_BARRIER: The same as with // LZMA_RUN but tell the caller that the // barrier was completed. if (action == LZMA_FULL_BARRIER) return LZMA_STREAM_END; // Finishing or flushing isn't completed until // all input data has been encoded and copied // to the output buffer. if (lzma_outq_is_empty(&coder->outq)) { // LZMA_FINISH: Continue to encode // the Index field. if (action == LZMA_FINISH) break; // LZMA_FULL_FLUSH: Return to tell // the caller that flushing was // completed. if (action == LZMA_FULL_FLUSH) return LZMA_STREAM_END; } } // Return if there is no output space left. // This check must be done after testing the input // buffer, because we might want to use a different // return code. if (*out_pos == out_size) return LZMA_OK; // Neither in nor out has been used completely. // Wait until there's something we can do. if (wait_for_work(coder, &wait_abs, &has_blocked, *in_pos < in_size)) return LZMA_TIMED_OUT; } // All Blocks have been encoded and the threads have stopped. // Prepare to encode the Index field. return_if_error(lzma_index_encoder_init( &coder->index_encoder, allocator, coder->index)); coder->sequence = SEQ_INDEX; // Update the progress info to take the Index and // Stream Footer into account. Those are very fast to encode // so in terms of progress information they can be thought // to be ready to be copied out. coder->progress_out += lzma_index_size(coder->index) + LZMA_STREAM_HEADER_SIZE; - } - // Fall through + FALLTHROUGH; + } case SEQ_INDEX: { // Call the Index encoder. It doesn't take any input, so // those pointers can be NULL. const lzma_ret ret = coder->index_encoder.code( coder->index_encoder.coder, allocator, NULL, NULL, 0, out, out_pos, out_size, LZMA_RUN); if (ret != LZMA_STREAM_END) return ret; // Encode the Stream Footer into coder->buffer. coder->stream_flags.backward_size = lzma_index_size(coder->index); if (lzma_stream_footer_encode(&coder->stream_flags, coder->header) != LZMA_OK) return LZMA_PROG_ERROR; coder->sequence = SEQ_STREAM_FOOTER; + FALLTHROUGH; } - // Fall through - case SEQ_STREAM_FOOTER: lzma_bufcpy(coder->header, &coder->header_pos, sizeof(coder->header), out, out_pos, out_size); return coder->header_pos < sizeof(coder->header) ? LZMA_OK : LZMA_STREAM_END; } assert(0); return LZMA_PROG_ERROR; } static void stream_encoder_mt_end(void *coder_ptr, const lzma_allocator *allocator) { lzma_stream_coder *coder = coder_ptr; // Threads must be killed before the output queue can be freed. threads_end(coder, allocator); lzma_outq_end(&coder->outq, allocator); lzma_filters_free(coder->filters, allocator); lzma_filters_free(coder->filters_cache, allocator); lzma_next_end(&coder->index_encoder, allocator); lzma_index_end(coder->index, allocator); mythread_cond_destroy(&coder->cond); mythread_mutex_destroy(&coder->mutex); lzma_free(coder, allocator); return; } static lzma_ret stream_encoder_mt_update(void *coder_ptr, const lzma_allocator *allocator, const lzma_filter *filters, const lzma_filter *reversed_filters lzma_attribute((__unused__))) { lzma_stream_coder *coder = coder_ptr; // Applications shouldn't attempt to change the options when // we are already encoding the Index or Stream Footer. if (coder->sequence > SEQ_BLOCK) return LZMA_PROG_ERROR; // For now the threaded encoder doesn't support changing // the options in the middle of a Block. if (coder->thr != NULL) return LZMA_PROG_ERROR; // Check if the filter chain seems mostly valid. See the comment // in stream_encoder_mt_init(). if (lzma_raw_encoder_memusage(filters) == UINT64_MAX) return LZMA_OPTIONS_ERROR; // Make a copy to a temporary buffer first. This way the encoder // state stays unchanged if an error occurs in lzma_filters_copy(). lzma_filter temp[LZMA_FILTERS_MAX + 1]; return_if_error(lzma_filters_copy(filters, temp, allocator)); // Free the options of the old chain as well as the cache. lzma_filters_free(coder->filters, allocator); lzma_filters_free(coder->filters_cache, allocator); // Copy the new filter chain in place. memcpy(coder->filters, temp, sizeof(temp)); return LZMA_OK; } /// Options handling for lzma_stream_encoder_mt_init() and /// lzma_stream_encoder_mt_memusage() static lzma_ret get_options(const lzma_mt *options, lzma_options_easy *opt_easy, const lzma_filter **filters, uint64_t *block_size, uint64_t *outbuf_size_max) { // Validate some of the options. if (options == NULL) return LZMA_PROG_ERROR; if (options->flags != 0 || options->threads == 0 || options->threads > LZMA_THREADS_MAX) return LZMA_OPTIONS_ERROR; if (options->filters != NULL) { // Filter chain was given, use it as is. *filters = options->filters; } else { // Use a preset. if (lzma_easy_preset(opt_easy, options->preset)) return LZMA_OPTIONS_ERROR; *filters = opt_easy->filters; } // If the Block size is not set, determine it from the filter chain. if (options->block_size > 0) *block_size = options->block_size; else *block_size = lzma_mt_block_size(*filters); // UINT64_MAX > BLOCK_SIZE_MAX, so the second condition // should be optimized out by any reasonable compiler. // The second condition should be there in the unlikely event that // the macros change and UINT64_MAX < BLOCK_SIZE_MAX. if (*block_size > BLOCK_SIZE_MAX || *block_size == UINT64_MAX) return LZMA_OPTIONS_ERROR; // Calculate the maximum amount output that a single output buffer // may need to hold. This is the same as the maximum total size of // a Block. *outbuf_size_max = lzma_block_buffer_bound64(*block_size); if (*outbuf_size_max == 0) return LZMA_MEM_ERROR; return LZMA_OK; } static void get_progress(void *coder_ptr, uint64_t *progress_in, uint64_t *progress_out) { lzma_stream_coder *coder = coder_ptr; // Lock coder->mutex to prevent finishing threads from moving their // progress info from the worker_thread structure to lzma_stream_coder. mythread_sync(coder->mutex) { *progress_in = coder->progress_in; *progress_out = coder->progress_out; for (size_t i = 0; i < coder->threads_initialized; ++i) { mythread_sync(coder->threads[i].mutex) { *progress_in += coder->threads[i].progress_in; *progress_out += coder->threads[i] .progress_out; } } } return; } static lzma_ret stream_encoder_mt_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_mt *options) { lzma_next_coder_init(&stream_encoder_mt_init, next, allocator); // Get the filter chain. lzma_options_easy easy; const lzma_filter *filters; uint64_t block_size; uint64_t outbuf_size_max; return_if_error(get_options(options, &easy, &filters, &block_size, &outbuf_size_max)); #if SIZE_MAX < UINT64_MAX if (block_size > SIZE_MAX || outbuf_size_max > SIZE_MAX) return LZMA_MEM_ERROR; #endif // Validate the filter chain so that we can give an error in this // function instead of delaying it to the first call to lzma_code(). // The memory usage calculation verifies the filter chain as // a side effect so we take advantage of that. It's not a perfect // check though as raw encoder allows LZMA1 too but such problems // will be caught eventually with Block Header encoder. if (lzma_raw_encoder_memusage(filters) == UINT64_MAX) return LZMA_OPTIONS_ERROR; // Validate the Check ID. if ((unsigned int)(options->check) > LZMA_CHECK_ID_MAX) return LZMA_PROG_ERROR; if (!lzma_check_is_supported(options->check)) return LZMA_UNSUPPORTED_CHECK; // Allocate and initialize the base structure if needed. lzma_stream_coder *coder = next->coder; if (coder == NULL) { coder = lzma_alloc(sizeof(lzma_stream_coder), allocator); if (coder == NULL) return LZMA_MEM_ERROR; next->coder = coder; // For the mutex and condition variable initializations // the error handling has to be done here because // stream_encoder_mt_end() doesn't know if they have // already been initialized or not. if (mythread_mutex_init(&coder->mutex)) { lzma_free(coder, allocator); next->coder = NULL; return LZMA_MEM_ERROR; } if (mythread_cond_init(&coder->cond)) { mythread_mutex_destroy(&coder->mutex); lzma_free(coder, allocator); next->coder = NULL; return LZMA_MEM_ERROR; } next->code = &stream_encode_mt; next->end = &stream_encoder_mt_end; next->get_progress = &get_progress; next->update = &stream_encoder_mt_update; coder->filters[0].id = LZMA_VLI_UNKNOWN; coder->filters_cache[0].id = LZMA_VLI_UNKNOWN; coder->index_encoder = LZMA_NEXT_CODER_INIT; coder->index = NULL; memzero(&coder->outq, sizeof(coder->outq)); coder->threads = NULL; coder->threads_max = 0; coder->threads_initialized = 0; } // Basic initializations coder->sequence = SEQ_STREAM_HEADER; coder->block_size = (size_t)(block_size); coder->outbuf_alloc_size = (size_t)(outbuf_size_max); coder->thread_error = LZMA_OK; coder->thr = NULL; // Allocate the thread-specific base structures. assert(options->threads > 0); if (coder->threads_max != options->threads) { threads_end(coder, allocator); coder->threads = NULL; coder->threads_max = 0; coder->threads_initialized = 0; coder->threads_free = NULL; coder->threads = lzma_alloc( options->threads * sizeof(worker_thread), allocator); if (coder->threads == NULL) return LZMA_MEM_ERROR; coder->threads_max = options->threads; } else { // Reuse the old structures and threads. Tell the running // threads to stop and wait until they have stopped. threads_stop(coder, true); } // Output queue return_if_error(lzma_outq_init(&coder->outq, allocator, options->threads)); // Timeout coder->timeout = options->timeout; // Free the old filter chain and the cache. lzma_filters_free(coder->filters, allocator); lzma_filters_free(coder->filters_cache, allocator); // Copy the new filter chain. return_if_error(lzma_filters_copy( filters, coder->filters, allocator)); // Index lzma_index_end(coder->index, allocator); coder->index = lzma_index_init(allocator); if (coder->index == NULL) return LZMA_MEM_ERROR; // Stream Header coder->stream_flags.version = 0; coder->stream_flags.check = options->check; return_if_error(lzma_stream_header_encode( &coder->stream_flags, coder->header)); coder->header_pos = 0; // Progress info coder->progress_in = 0; coder->progress_out = LZMA_STREAM_HEADER_SIZE; return LZMA_OK; } #ifdef HAVE_SYMBOL_VERSIONS_LINUX // These are for compatibility with binaries linked against liblzma that // has been patched with xz-5.2.2-compat-libs.patch from RHEL/CentOS 7. // Actually that patch didn't create lzma_stream_encoder_mt@XZ_5.2.2 // but it has been added here anyway since someone might misread the // RHEL patch and think both @XZ_5.1.2alpha and @XZ_5.2.2 exist. LZMA_SYMVER_API("lzma_stream_encoder_mt@XZ_5.1.2alpha", lzma_ret, lzma_stream_encoder_mt_512a)( lzma_stream *strm, const lzma_mt *options) lzma_nothrow lzma_attr_warn_unused_result __attribute__((__alias__("lzma_stream_encoder_mt_52"))); LZMA_SYMVER_API("lzma_stream_encoder_mt@XZ_5.2.2", lzma_ret, lzma_stream_encoder_mt_522)( lzma_stream *strm, const lzma_mt *options) lzma_nothrow lzma_attr_warn_unused_result __attribute__((__alias__("lzma_stream_encoder_mt_52"))); LZMA_SYMVER_API("lzma_stream_encoder_mt@@XZ_5.2", lzma_ret, lzma_stream_encoder_mt_52)( lzma_stream *strm, const lzma_mt *options) lzma_nothrow lzma_attr_warn_unused_result; #define lzma_stream_encoder_mt lzma_stream_encoder_mt_52 #endif extern LZMA_API(lzma_ret) lzma_stream_encoder_mt(lzma_stream *strm, const lzma_mt *options) { lzma_next_strm_init(stream_encoder_mt_init, strm, options); strm->internal->supported_actions[LZMA_RUN] = true; // strm->internal->supported_actions[LZMA_SYNC_FLUSH] = true; strm->internal->supported_actions[LZMA_FULL_FLUSH] = true; strm->internal->supported_actions[LZMA_FULL_BARRIER] = true; strm->internal->supported_actions[LZMA_FINISH] = true; return LZMA_OK; } #ifdef HAVE_SYMBOL_VERSIONS_LINUX LZMA_SYMVER_API("lzma_stream_encoder_mt_memusage@XZ_5.1.2alpha", uint64_t, lzma_stream_encoder_mt_memusage_512a)( const lzma_mt *options) lzma_nothrow lzma_attr_pure __attribute__((__alias__("lzma_stream_encoder_mt_memusage_52"))); LZMA_SYMVER_API("lzma_stream_encoder_mt_memusage@XZ_5.2.2", uint64_t, lzma_stream_encoder_mt_memusage_522)( const lzma_mt *options) lzma_nothrow lzma_attr_pure __attribute__((__alias__("lzma_stream_encoder_mt_memusage_52"))); LZMA_SYMVER_API("lzma_stream_encoder_mt_memusage@@XZ_5.2", uint64_t, lzma_stream_encoder_mt_memusage_52)( const lzma_mt *options) lzma_nothrow lzma_attr_pure; #define lzma_stream_encoder_mt_memusage lzma_stream_encoder_mt_memusage_52 #endif // This function name is a monster but it's consistent with the older // monster names. :-( 31 chars is the max that C99 requires so in that // sense it's not too long. ;-) extern LZMA_API(uint64_t) lzma_stream_encoder_mt_memusage(const lzma_mt *options) { lzma_options_easy easy; const lzma_filter *filters; uint64_t block_size; uint64_t outbuf_size_max; if (get_options(options, &easy, &filters, &block_size, &outbuf_size_max) != LZMA_OK) return UINT64_MAX; // Memory usage of the input buffers const uint64_t inbuf_memusage = options->threads * block_size; // Memory usage of the filter encoders uint64_t filters_memusage = lzma_raw_encoder_memusage(filters); if (filters_memusage == UINT64_MAX) return UINT64_MAX; filters_memusage *= options->threads; // Memory usage of the output queue const uint64_t outq_memusage = lzma_outq_memusage( outbuf_size_max, options->threads); if (outq_memusage == UINT64_MAX) return UINT64_MAX; // Sum them with overflow checking. uint64_t total_memusage = LZMA_MEMUSAGE_BASE + sizeof(lzma_stream_coder) + options->threads * sizeof(worker_thread); if (UINT64_MAX - total_memusage < inbuf_memusage) return UINT64_MAX; total_memusage += inbuf_memusage; if (UINT64_MAX - total_memusage < filters_memusage) return UINT64_MAX; total_memusage += filters_memusage; if (UINT64_MAX - total_memusage < outq_memusage) return UINT64_MAX; return total_memusage + outq_memusage; } diff --git a/src/liblzma/common/string_conversion.c b/src/liblzma/common/string_conversion.c index c899783c642a..015acf225856 100644 --- a/src/liblzma/common/string_conversion.c +++ b/src/liblzma/common/string_conversion.c @@ -1,1338 +1,1363 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file string_conversion.c /// \brief Conversion of strings to filter chain and vice versa // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "filter_common.h" +// liblzma itself doesn't use gettext to translate messages. +// Mark the strings still so that xz can translate them. +#define N_(msgid) msgid + + ///////////////////// // String building // ///////////////////// /// How much memory to allocate for strings. For now, no realloc is used /// so this needs to be big enough even though there of course is /// an overflow check still. /// /// FIXME? Using a fixed size is wasteful if the application doesn't free /// the string fairly quickly but this can be improved later if needed. #define STR_ALLOC_SIZE 800 typedef struct { char *buf; size_t pos; } lzma_str; static lzma_ret str_init(lzma_str *str, const lzma_allocator *allocator) { str->buf = lzma_alloc(STR_ALLOC_SIZE, allocator); if (str->buf == NULL) return LZMA_MEM_ERROR; str->pos = 0; return LZMA_OK; } static void str_free(lzma_str *str, const lzma_allocator *allocator) { lzma_free(str->buf, allocator); return; } static bool str_is_full(const lzma_str *str) { return str->pos == STR_ALLOC_SIZE - 1; } static lzma_ret str_finish(char **dest, lzma_str *str, const lzma_allocator *allocator) { if (str_is_full(str)) { // The preallocated buffer was too small. // This shouldn't happen as STR_ALLOC_SIZE should // be adjusted if new filters are added. lzma_free(str->buf, allocator); *dest = NULL; assert(0); return LZMA_PROG_ERROR; } str->buf[str->pos] = '\0'; *dest = str->buf; return LZMA_OK; } static void str_append_str(lzma_str *str, const char *s) { const size_t len = strlen(s); const size_t limit = STR_ALLOC_SIZE - 1 - str->pos; const size_t copy_size = my_min(len, limit); memcpy(str->buf + str->pos, s, copy_size); str->pos += copy_size; return; } static void str_append_u32(lzma_str *str, uint32_t v, bool use_byte_suffix) { if (v == 0) { str_append_str(str, "0"); } else { // NOTE: Don't use plain "B" because xz and the parser in this // file don't support it and at glance it may look like 8 // (there cannot be a space before the suffix). static const char suffixes[4][4] = { "", "KiB", "MiB", "GiB" }; size_t suf = 0; if (use_byte_suffix) { while ((v & 1023) == 0 && suf < ARRAY_SIZE(suffixes) - 1) { v >>= 10; ++suf; } } // UINT32_MAX in base 10 would need 10 + 1 bytes. Remember // that initializing to "" initializes all elements to // zero so '\0'-termination gets handled by this. char buf[16] = ""; size_t pos = sizeof(buf) - 1; do { buf[--pos] = '0' + (v % 10); v /= 10; } while (v != 0); str_append_str(str, buf + pos); str_append_str(str, suffixes[suf]); } return; } ////////////////////////////////////////////// // Parsing and stringification declarations // ////////////////////////////////////////////// /// Maximum length for filter and option names. /// 11 chars + terminating '\0' + sizeof(uint32_t) = 16 bytes #define NAME_LEN_MAX 11 /// For option_map.flags: Use .u.map to do convert the input value /// to an integer. Without this flag, .u.range.{min,max} are used /// as the allowed range for the integer. #define OPTMAP_USE_NAME_VALUE_MAP 0x01 /// For option_map.flags: Allow KiB/MiB/GiB in input string and use them in /// the stringified output if the value is an exact multiple of these. /// This is used e.g. for LZMA1/2 dictionary size. #define OPTMAP_USE_BYTE_SUFFIX 0x02 /// For option_map.flags: If the integer value is zero then this option /// won't be included in the stringified output. It's used e.g. for /// BCJ filter start offset which usually is zero. #define OPTMAP_NO_STRFY_ZERO 0x04 /// Possible values for option_map.type. Since OPTMAP_TYPE_UINT32 is 0, /// it doesn't need to be specified in the initializers as it is /// the implicit value. enum { OPTMAP_TYPE_UINT32, OPTMAP_TYPE_LZMA_MODE, OPTMAP_TYPE_LZMA_MATCH_FINDER, OPTMAP_TYPE_LZMA_PRESET, }; /// This is for mapping string values in options to integers. /// The last element of an array must have "" as the name. /// It's used e.g. for match finder names in LZMA1/2. typedef struct { const char name[NAME_LEN_MAX + 1]; const uint32_t value; } name_value_map; /// Each filter that has options needs an array of option_map structures. /// The array doesn't need to be terminated as the functions take the /// length of the array as an argument. /// /// When converting a string to filter options structure, option values /// will be handled in a few different ways: /// /// (1) If .type equals OPTMAP_TYPE_LZMA_PRESET then LZMA1/2 preset string /// is handled specially. /// /// (2) If .flags has OPTMAP_USE_NAME_VALUE_MAP set then the string is /// converted to an integer using the name_value_map pointed by .u.map. /// The last element in .u.map must have .name = "" as the terminator. /// /// (3) Otherwise the string is treated as a non-negative unsigned decimal /// integer which must be in the range set in .u.range. If .flags has /// OPTMAP_USE_BYTE_SUFFIX then KiB, MiB, and GiB suffixes are allowed. /// /// The integer value from (2) or (3) is then stored to filter_options /// at the offset specified in .offset using the type specified in .type /// (default is uint32_t). /// /// Stringifying a filter is done by processing a given number of options /// in order from the beginning of an option_map array. The integer is /// read from filter_options at .offset using the type from .type. /// /// If the integer is zero and .flags has OPTMAP_NO_STRFY_ZERO then the /// option is skipped. /// /// If .flags has OPTMAP_USE_NAME_VALUE_MAP set then .u.map will be used /// to convert the option to a string. If the map doesn't contain a string /// for the integer value then "UNKNOWN" is used. /// /// If .flags doesn't have OPTMAP_USE_NAME_VALUE_MAP set then the integer is /// converted to a decimal value. If OPTMAP_USE_BYTE_SUFFIX is used then KiB, /// MiB, or GiB suffix is used if the value is an exact multiple of these. /// Plain "B" suffix is never used. typedef struct { char name[NAME_LEN_MAX + 1]; uint8_t type; uint8_t flags; uint16_t offset; union { // NVHPC has problems with unions that contain pointers that // are not the first members, so keep "map" at the top. const name_value_map *map; struct { uint32_t min; uint32_t max; } range; } u; } option_map; static const char *parse_options(const char **const str, const char *str_end, void *filter_options, const option_map *const optmap, const size_t optmap_size); ///////// // BCJ // ///////// #if defined(HAVE_ENCODER_X86) \ || defined(HAVE_DECODER_X86) \ || defined(HAVE_ENCODER_ARM) \ || defined(HAVE_DECODER_ARM) \ || defined(HAVE_ENCODER_ARMTHUMB) \ || defined(HAVE_DECODER_ARMTHUMB) \ || defined(HAVE_ENCODER_ARM64) \ || defined(HAVE_DECODER_ARM64) \ || defined(HAVE_ENCODER_POWERPC) \ || defined(HAVE_DECODER_POWERPC) \ || defined(HAVE_ENCODER_IA64) \ || defined(HAVE_DECODER_IA64) \ || defined(HAVE_ENCODER_SPARC) \ || defined(HAVE_DECODER_SPARC) \ || defined(HAVE_ENCODER_RISCV) \ || defined(HAVE_DECODER_RISCV) static const option_map bcj_optmap[] = { { .name = "start", .flags = OPTMAP_NO_STRFY_ZERO | OPTMAP_USE_BYTE_SUFFIX, .offset = offsetof(lzma_options_bcj, start_offset), .u.range.min = 0, .u.range.max = UINT32_MAX, } }; static const char * parse_bcj(const char **const str, const char *str_end, void *filter_options) { // filter_options was zeroed on allocation and that is enough // for the default value. return parse_options(str, str_end, filter_options, bcj_optmap, ARRAY_SIZE(bcj_optmap)); } #endif /////////// // Delta // /////////// #if defined(HAVE_ENCODER_DELTA) || defined(HAVE_DECODER_DELTA) static const option_map delta_optmap[] = { { .name = "dist", .offset = offsetof(lzma_options_delta, dist), .u.range.min = LZMA_DELTA_DIST_MIN, .u.range.max = LZMA_DELTA_DIST_MAX, } }; static const char * parse_delta(const char **const str, const char *str_end, void *filter_options) { lzma_options_delta *opts = filter_options; opts->type = LZMA_DELTA_TYPE_BYTE; opts->dist = LZMA_DELTA_DIST_MIN; return parse_options(str, str_end, filter_options, delta_optmap, ARRAY_SIZE(delta_optmap)); } #endif /////////////////// // LZMA1 & LZMA2 // /////////////////// /// Help string for presets #define LZMA12_PRESET_STR "0-9[e]" static const char * parse_lzma12_preset(const char **const str, const char *str_end, uint32_t *preset) { assert(*str < str_end); + + if (!(**str >= '0' && **str <= '9')) + return N_("Unsupported preset"); + *preset = (uint32_t)(**str - '0'); // NOTE: Remember to update LZMA12_PRESET_STR if this is modified! while (++*str < str_end) { switch (**str) { case 'e': *preset |= LZMA_PRESET_EXTREME; break; default: - return "Unsupported preset flag"; + return N_("Unsupported flag in the preset"); } } return NULL; } static const char * set_lzma12_preset(const char **const str, const char *str_end, void *filter_options) { uint32_t preset; const char *errmsg = parse_lzma12_preset(str, str_end, &preset); if (errmsg != NULL) return errmsg; lzma_options_lzma *opts = filter_options; if (lzma_lzma_preset(opts, preset)) - return "Unsupported preset"; + return N_("Unsupported preset"); return NULL; } static const name_value_map lzma12_mode_map[] = { { "fast", LZMA_MODE_FAST }, { "normal", LZMA_MODE_NORMAL }, { "", 0 } }; static const name_value_map lzma12_mf_map[] = { { "hc3", LZMA_MF_HC3 }, { "hc4", LZMA_MF_HC4 }, { "bt2", LZMA_MF_BT2 }, { "bt3", LZMA_MF_BT3 }, { "bt4", LZMA_MF_BT4 }, { "", 0 } }; static const option_map lzma12_optmap[] = { { .name = "preset", .type = OPTMAP_TYPE_LZMA_PRESET, }, { .name = "dict", .flags = OPTMAP_USE_BYTE_SUFFIX, .offset = offsetof(lzma_options_lzma, dict_size), .u.range.min = LZMA_DICT_SIZE_MIN, // FIXME? The max is really max for encoding but decoding // would allow 4 GiB - 1 B. .u.range.max = (UINT32_C(1) << 30) + (UINT32_C(1) << 29), }, { .name = "lc", .offset = offsetof(lzma_options_lzma, lc), .u.range.min = LZMA_LCLP_MIN, .u.range.max = LZMA_LCLP_MAX, }, { .name = "lp", .offset = offsetof(lzma_options_lzma, lp), .u.range.min = LZMA_LCLP_MIN, .u.range.max = LZMA_LCLP_MAX, }, { .name = "pb", .offset = offsetof(lzma_options_lzma, pb), .u.range.min = LZMA_PB_MIN, .u.range.max = LZMA_PB_MAX, }, { .name = "mode", .type = OPTMAP_TYPE_LZMA_MODE, .flags = OPTMAP_USE_NAME_VALUE_MAP, .offset = offsetof(lzma_options_lzma, mode), .u.map = lzma12_mode_map, }, { .name = "nice", .offset = offsetof(lzma_options_lzma, nice_len), .u.range.min = 2, .u.range.max = 273, }, { .name = "mf", .type = OPTMAP_TYPE_LZMA_MATCH_FINDER, .flags = OPTMAP_USE_NAME_VALUE_MAP, .offset = offsetof(lzma_options_lzma, mf), .u.map = lzma12_mf_map, }, { .name = "depth", .offset = offsetof(lzma_options_lzma, depth), .u.range.min = 0, .u.range.max = UINT32_MAX, } }; static const char * parse_lzma12(const char **const str, const char *str_end, void *filter_options) { lzma_options_lzma *opts = filter_options; // It cannot fail. const bool preset_ret = lzma_lzma_preset(opts, LZMA_PRESET_DEFAULT); assert(!preset_ret); (void)preset_ret; const char *errmsg = parse_options(str, str_end, filter_options, lzma12_optmap, ARRAY_SIZE(lzma12_optmap)); if (errmsg != NULL) return errmsg; if (opts->lc + opts->lp > LZMA_LCLP_MAX) - return "The sum of lc and lp must not exceed 4"; + return N_("The sum of lc and lp must not exceed 4"); return NULL; } ///////////////////////////////////////// // Generic parsing and stringification // ///////////////////////////////////////// static const struct { /// Name of the filter char name[NAME_LEN_MAX + 1]; /// For lzma_str_to_filters: /// Size of the filter-specific options structure. uint32_t opts_size; /// Filter ID lzma_vli id; /// For lzma_str_to_filters: /// Function to parse the filter-specific options. The filter_options /// will already have been allocated using lzma_alloc_zero(). const char *(*parse)(const char **str, const char *str_end, void *filter_options); /// For lzma_str_from_filters: /// If the flag LZMA_STR_ENCODER is used then the first /// strfy_encoder elements of optmap are stringified. /// With LZMA_STR_DECODER strfy_decoder is used. /// Currently encoders use all options that decoders do but if /// that changes then this needs to be changed too, for example, /// add a new OPTMAP flag to skip printing some decoder-only options. const option_map *optmap; uint8_t strfy_encoder; uint8_t strfy_decoder; /// For lzma_str_from_filters: /// If true, lzma_filter.options is allowed to be NULL. In that case, /// only the filter name is printed without any options. bool allow_null; } filter_name_map[] = { #if defined (HAVE_ENCODER_LZMA1) || defined(HAVE_DECODER_LZMA1) { "lzma1", sizeof(lzma_options_lzma), LZMA_FILTER_LZMA1, &parse_lzma12, lzma12_optmap, 9, 5, false }, #endif #if defined(HAVE_ENCODER_LZMA2) || defined(HAVE_DECODER_LZMA2) { "lzma2", sizeof(lzma_options_lzma), LZMA_FILTER_LZMA2, &parse_lzma12, lzma12_optmap, 9, 2, false }, #endif #if defined(HAVE_ENCODER_X86) || defined(HAVE_DECODER_X86) { "x86", sizeof(lzma_options_bcj), LZMA_FILTER_X86, &parse_bcj, bcj_optmap, 1, 1, true }, #endif #if defined(HAVE_ENCODER_ARM) || defined(HAVE_DECODER_ARM) { "arm", sizeof(lzma_options_bcj), LZMA_FILTER_ARM, &parse_bcj, bcj_optmap, 1, 1, true }, #endif #if defined(HAVE_ENCODER_ARMTHUMB) || defined(HAVE_DECODER_ARMTHUMB) { "armthumb", sizeof(lzma_options_bcj), LZMA_FILTER_ARMTHUMB, &parse_bcj, bcj_optmap, 1, 1, true }, #endif #if defined(HAVE_ENCODER_ARM64) || defined(HAVE_DECODER_ARM64) { "arm64", sizeof(lzma_options_bcj), LZMA_FILTER_ARM64, &parse_bcj, bcj_optmap, 1, 1, true }, #endif #if defined(HAVE_ENCODER_RISCV) || defined(HAVE_DECODER_RISCV) { "riscv", sizeof(lzma_options_bcj), LZMA_FILTER_RISCV, &parse_bcj, bcj_optmap, 1, 1, true }, #endif #if defined(HAVE_ENCODER_POWERPC) || defined(HAVE_DECODER_POWERPC) { "powerpc", sizeof(lzma_options_bcj), LZMA_FILTER_POWERPC, &parse_bcj, bcj_optmap, 1, 1, true }, #endif #if defined(HAVE_ENCODER_IA64) || defined(HAVE_DECODER_IA64) { "ia64", sizeof(lzma_options_bcj), LZMA_FILTER_IA64, &parse_bcj, bcj_optmap, 1, 1, true }, #endif #if defined(HAVE_ENCODER_SPARC) || defined(HAVE_DECODER_SPARC) { "sparc", sizeof(lzma_options_bcj), LZMA_FILTER_SPARC, &parse_bcj, bcj_optmap, 1, 1, true }, #endif #if defined(HAVE_ENCODER_DELTA) || defined(HAVE_DECODER_DELTA) { "delta", sizeof(lzma_options_delta), LZMA_FILTER_DELTA, &parse_delta, delta_optmap, 1, 1, false }, #endif }; /// Decodes options from a string for one filter (name1=value1,name2=value2). /// Caller must have allocated memory for filter_options already and set /// the initial default values. This is called from the filter-specific /// parse_* functions. /// /// The input string starts at *str and the address in str_end is the first /// char that is not part of the string anymore. So no '\0' terminator is /// used. *str is advanced every time something has been decoded successfully. static const char * parse_options(const char **const str, const char *str_end, void *filter_options, const option_map *const optmap, const size_t optmap_size) { while (*str < str_end && **str != '\0') { // Each option is of the form name=value. // Commas (',') separate options. Extra commas are ignored. // Ignoring extra commas makes it simpler if an optional // option stored in a shell variable which can be empty. if (**str == ',') { ++*str; continue; } // Find where the next name=value ends. const size_t str_len = (size_t)(str_end - *str); const char *name_eq_value_end = memchr(*str, ',', str_len); if (name_eq_value_end == NULL) name_eq_value_end = str_end; const char *equals_sign = memchr(*str, '=', (size_t)(name_eq_value_end - *str)); // Fail if the '=' wasn't found or the option name is missing // (the first char is '='). if (equals_sign == NULL || **str == '=') - return "Options must be 'name=value' pairs separated " - "with commas"; + return N_("Options must be 'name=value' pairs " + "separated with commas"); // Reject a too long option name so that the memcmp() // in the loop below won't read past the end of the // string in optmap[i].name. const size_t name_len = (size_t)(equals_sign - *str); if (name_len > NAME_LEN_MAX) - return "Unknown option name"; + return N_("Unknown option name"); // Find the option name from optmap[]. size_t i = 0; while (true) { if (i == optmap_size) - return "Unknown option name"; + return N_("Unknown option name"); if (memcmp(*str, optmap[i].name, name_len) == 0 && optmap[i].name[name_len] == '\0') break; ++i; } // The input string is good at least until the start of // the option value. *str = equals_sign + 1; // The code assumes that the option value isn't an empty // string so check it here. const size_t value_len = (size_t)(name_eq_value_end - *str); if (value_len == 0) - return "Option value cannot be empty"; + return N_("Option value cannot be empty"); // LZMA1/2 preset has its own parsing function. if (optmap[i].type == OPTMAP_TYPE_LZMA_PRESET) { const char *errmsg = set_lzma12_preset(str, name_eq_value_end, filter_options); if (errmsg != NULL) return errmsg; continue; } // It's an integer value. uint32_t v; if (optmap[i].flags & OPTMAP_USE_NAME_VALUE_MAP) { // The integer is picked from a string-to-integer map. // // Reject a too long value string so that the memcmp() // in the loop below won't read past the end of the // string in optmap[i].u.map[j].name. if (value_len > NAME_LEN_MAX) - return "Invalid option value"; + return N_("Invalid option value"); const name_value_map *map = optmap[i].u.map; size_t j = 0; while (true) { // The array is terminated with an empty name. if (map[j].name[0] == '\0') - return "Invalid option value"; + return N_("Invalid option value"); if (memcmp(*str, map[j].name, value_len) == 0 && map[j].name[value_len] == '\0') { v = map[j].value; break; } ++j; } } else if (**str < '0' || **str > '9') { // Note that "max" isn't supported while it is // supported in xz. It's not useful here. - return "Value is not a non-negative decimal integer"; + return N_("Value is not a non-negative " + "decimal integer"); } else { // strtoul() has locale-specific behavior so it cannot // be relied on to get reproducible results since we // cannot change the locate in a thread-safe library. // It also needs '\0'-termination. // // Use a temporary pointer so that *str will point // to the beginning of the value string in case // an error occurs. const char *p = *str; v = 0; do { if (v > UINT32_MAX / 10) - return "Value out of range"; + return N_("Value out of range"); v *= 10; const uint32_t add = (uint32_t)(*p - '0'); if (UINT32_MAX - add < v) - return "Value out of range"; + return N_("Value out of range"); v += add; ++p; } while (p < name_eq_value_end && *p >= '0' && *p <= '9'); if (p < name_eq_value_end) { // Remember this position so that it can be // used for error messages that are // specifically about the suffix. (Out of // range values are about the whole value // and those error messages point to the // beginning of the number part, // not to the suffix.) const char *multiplier_start = p; // If multiplier suffix shouldn't be used // then don't allow them even if the value // would stay within limits. This is a somewhat // unnecessary check but it rejects silly // things like lzma2:pb=0MiB which xz allows. if ((optmap[i].flags & OPTMAP_USE_BYTE_SUFFIX) == 0) { *str = multiplier_start; - return "This option does not support " - "any integer suffixes"; + return N_("This option does not " + "support any multiplier " + "suffixes"); } uint32_t shift; switch (*p) { case 'k': case 'K': shift = 10; break; case 'm': case 'M': shift = 20; break; case 'g': case 'G': shift = 30; break; default: *str = multiplier_start; - return "Invalid multiplier suffix " - "(KiB, MiB, or GiB)"; + + // TRANSLATORS: Don't translate the + // suffixes "KiB", "MiB", or "GiB" + // because a user can only specify + // untranslated suffixes. + return N_("Invalid multiplier suffix " + "(KiB, MiB, or GiB)"); } ++p; // Allow "M", "Mi", "MB", "MiB" and the same // for the other five characters from the // switch-statement above. All are handled // as base-2 (perhaps a mistake, perhaps not). // Note that 'i' and 'B' are case sensitive. if (p < name_eq_value_end && *p == 'i') ++p; if (p < name_eq_value_end && *p == 'B') ++p; // Now we must have no chars remaining. if (p < name_eq_value_end) { *str = multiplier_start; - return "Invalid multiplier suffix " - "(KiB, MiB, or GiB)"; + return N_("Invalid multiplier suffix " + "(KiB, MiB, or GiB)"); } if (v > (UINT32_MAX >> shift)) - return "Value out of range"; + return N_("Value out of range"); v <<= shift; } if (v < optmap[i].u.range.min || v > optmap[i].u.range.max) - return "Value out of range"; + return N_("Value out of range"); } // Set the value in filter_options. Enums are handled // specially since the underlying type isn't the same // as uint32_t on all systems. void *ptr = (char *)filter_options + optmap[i].offset; switch (optmap[i].type) { case OPTMAP_TYPE_LZMA_MODE: *(lzma_mode *)ptr = (lzma_mode)v; break; case OPTMAP_TYPE_LZMA_MATCH_FINDER: *(lzma_match_finder *)ptr = (lzma_match_finder)v; break; default: *(uint32_t *)ptr = v; break; } // This option has been successfully handled. *str = name_eq_value_end; } // No errors. return NULL; } /// Finds the name of the filter at the beginning of the string and /// calls filter_name_map[i].parse() to decode the filter-specific options. /// The caller must have set str_end so that exactly one filter and its /// options are present without any trailing characters. static const char * parse_filter(const char **const str, const char *str_end, lzma_filter *filter, const lzma_allocator *allocator, bool only_xz) { // Search for a colon or equals sign that would separate the filter // name from filter options. If neither is found, then the input // string only contains a filter name and there are no options. // // First assume that a colon or equals sign won't be found: const char *name_end = str_end; const char *opts_start = str_end; for (const char *p = *str; p < str_end; ++p) { if (*p == ':' || *p == '=') { name_end = p; // Filter options (name1=value1,name2=value2,...) // begin after the colon or equals sign. opts_start = p + 1; break; } } // Reject a too long filter name so that the memcmp() // in the loop below won't read past the end of the // string in filter_name_map[i].name. const size_t name_len = (size_t)(name_end - *str); if (name_len > NAME_LEN_MAX) - return "Unknown filter name"; + return N_("Unknown filter name"); for (size_t i = 0; i < ARRAY_SIZE(filter_name_map); ++i) { if (memcmp(*str, filter_name_map[i].name, name_len) == 0 && filter_name_map[i].name[name_len] == '\0') { if (only_xz && filter_name_map[i].id >= LZMA_FILTER_RESERVED_START) - return "This filter cannot be used in " - "the .xz format"; + return N_("This filter cannot be used in " + "the .xz format"); // Allocate the filter-specific options and // initialize the memory with zeros. void *options = lzma_alloc_zero( filter_name_map[i].opts_size, allocator); if (options == NULL) - return "Memory allocation failed"; + return N_("Memory allocation failed"); // Filter name was found so the input string is good // at least this far. *str = opts_start; const char *errmsg = filter_name_map[i].parse( str, str_end, options); if (errmsg != NULL) { lzma_free(options, allocator); return errmsg; } // *filter is modified only when parsing is successful. filter->id = filter_name_map[i].id; filter->options = options; return NULL; } } - return "Unknown filter name"; + return N_("Unknown filter name"); } /// Converts the string to a filter chain (array of lzma_filter structures). /// /// *str is advanced every time something has been decoded successfully. /// This way the caller knows where in the string a possible error occurred. static const char * str_to_filters(const char **const str, lzma_filter *filters, uint32_t flags, const lzma_allocator *allocator) { const char *errmsg; // Skip leading spaces. while (**str == ' ') ++*str; if (**str == '\0') - return "Empty string is not allowed, " - "try \"6\" if a default value is needed"; + return N_("Empty string is not allowed, " + "try '6' if a default value is needed"); // Detect the type of the string. // // A string beginning with a digit or a string beginning with // one dash and a digit are treated as presets. Trailing spaces // will be ignored too (leading spaces were already ignored above). // // For example, "6", "7 ", "-9e", or " -3 " are treated as presets. // Strings like "-" or "- " aren't preset. #define MY_IS_DIGIT(c) ((c) >= '0' && (c) <= '9') if (MY_IS_DIGIT(**str) || (**str == '-' && MY_IS_DIGIT((*str)[1]))) { if (**str == '-') ++*str; // Ignore trailing spaces. const size_t str_len = strlen(*str); const char *str_end = memchr(*str, ' ', str_len); if (str_end != NULL) { // There is at least one trailing space. Check that // there are no chars other than spaces. for (size_t i = 1; str_end[i] != '\0'; ++i) if (str_end[i] != ' ') - return "Unsupported preset"; + return N_("Unsupported preset"); } else { // There are no trailing spaces. Use the whole string. str_end = *str + str_len; } uint32_t preset; errmsg = parse_lzma12_preset(str, str_end, &preset); if (errmsg != NULL) return errmsg; lzma_options_lzma *opts = lzma_alloc(sizeof(*opts), allocator); if (opts == NULL) - return "Memory allocation failed"; + return N_("Memory allocation failed"); if (lzma_lzma_preset(opts, preset)) { lzma_free(opts, allocator); - return "Unsupported preset"; + return N_("Unsupported preset"); } filters[0].id = LZMA_FILTER_LZMA2; filters[0].options = opts; filters[1].id = LZMA_VLI_UNKNOWN; filters[1].options = NULL; return NULL; } // Not a preset so it must be a filter chain. // // If LZMA_STR_ALL_FILTERS isn't used we allow only filters that // can be used in .xz. const bool only_xz = (flags & LZMA_STR_ALL_FILTERS) == 0; // Use a temporary array so that we don't modify the caller-supplied // one until we know that no errors occurred. lzma_filter temp_filters[LZMA_FILTERS_MAX + 1]; size_t i = 0; do { if (i == LZMA_FILTERS_MAX) { - errmsg = "The maximum number of filters is four"; + errmsg = N_("The maximum number of filters is four"); goto error; } // Skip "--" if present. if ((*str)[0] == '-' && (*str)[1] == '-') *str += 2; // Locate the end of "filter:name1=value1,name2=value2", // stopping at the first "--" or a single space. const char *filter_end = *str; while (filter_end[0] != '\0') { if ((filter_end[0] == '-' && filter_end[1] == '-') || filter_end[0] == ' ') break; ++filter_end; } // Inputs that have "--" at the end or "-- " in the middle // will result in an empty filter name. if (filter_end == *str) { - errmsg = "Filter name is missing"; + errmsg = N_("Filter name is missing"); goto error; } errmsg = parse_filter(str, filter_end, &temp_filters[i], allocator, only_xz); if (errmsg != NULL) goto error; // Skip trailing spaces. while (**str == ' ') ++*str; ++i; } while (**str != '\0'); // Seems to be good, terminate the array so that // basic validation can be done. temp_filters[i].id = LZMA_VLI_UNKNOWN; temp_filters[i].options = NULL; // Do basic validation if the application didn't prohibit it. if ((flags & LZMA_STR_NO_VALIDATION) == 0) { size_t dummy; const lzma_ret ret = lzma_validate_chain(temp_filters, &dummy); assert(ret == LZMA_OK || ret == LZMA_OPTIONS_ERROR); if (ret != LZMA_OK) { - errmsg = "Invalid filter chain " - "('lzma2' missing at the end?)"; + errmsg = N_("Invalid filter chain " + "('lzma2' missing at the end?)"); goto error; } } // All good. Copy the filters to the application supplied array. memcpy(filters, temp_filters, (i + 1) * sizeof(lzma_filter)); return NULL; error: // Free the filter options that were successfully decoded. while (i-- > 0) lzma_free(temp_filters[i].options, allocator); return errmsg; } extern LZMA_API(const char *) lzma_str_to_filters(const char *str, int *error_pos, lzma_filter *filters, uint32_t flags, const lzma_allocator *allocator) { // If error_pos isn't NULL, *error_pos must always be set. // liblzma <= 5.4.6 and <= 5.6.1 have a bug and don't do this // when str == NULL or filters == NULL or flags are unsupported. if (error_pos != NULL) *error_pos = 0; - if (str == NULL || filters == NULL) + if (str == NULL || filters == NULL) { + // Don't translate this because it's only shown in case of + // a programming error. return "Unexpected NULL pointer argument(s) " "to lzma_str_to_filters()"; + } // Validate the flags. const uint32_t supported_flags = LZMA_STR_ALL_FILTERS | LZMA_STR_NO_VALIDATION; - if (flags & ~supported_flags) + if (flags & ~supported_flags) { + // This message is possible only if the caller uses flags + // that are only supported in a newer liblzma version (or + // the flags are simply buggy). Don't translate this at least + // when liblzma itself doesn't use gettext; xz and liblzma + // are usually upgraded at the same time. return "Unsupported flags to lzma_str_to_filters()"; + } const char *used = str; const char *errmsg = str_to_filters(&used, filters, flags, allocator); if (error_pos != NULL) { const size_t n = (size_t)(used - str); *error_pos = n > INT_MAX ? INT_MAX : (int)n; } return errmsg; } /// Converts options of one filter to a string. /// /// The caller must have already put the filter name in the destination /// string. Since it is possible that no options will be needed, the caller /// won't have put a delimiter character (':' or '=') in the string yet. /// We will add it if at least one option will be added to the string. static void strfy_filter(lzma_str *dest, const char *delimiter, const option_map *optmap, size_t optmap_count, const void *filter_options) { for (size_t i = 0; i < optmap_count; ++i) { // No attempt is made to reverse LZMA1/2 preset. if (optmap[i].type == OPTMAP_TYPE_LZMA_PRESET) continue; // All options have integer values, some just are mapped // to a string with a name_value_map. LZMA1/2 preset // isn't reversed back to preset=PRESET form. uint32_t v; const void *ptr = (const char *)filter_options + optmap[i].offset; switch (optmap[i].type) { case OPTMAP_TYPE_LZMA_MODE: v = *(const lzma_mode *)ptr; break; case OPTMAP_TYPE_LZMA_MATCH_FINDER: v = *(const lzma_match_finder *)ptr; break; default: v = *(const uint32_t *)ptr; break; } // Skip this if this option should be omitted from // the string when the value is zero. if (v == 0 && (optmap[i].flags & OPTMAP_NO_STRFY_ZERO)) continue; // Before the first option we add whatever delimiter // the caller gave us. For later options a comma is used. str_append_str(dest, delimiter); delimiter = ","; // Add the option name and equals sign. str_append_str(dest, optmap[i].name); str_append_str(dest, "="); if (optmap[i].flags & OPTMAP_USE_NAME_VALUE_MAP) { const name_value_map *map = optmap[i].u.map; size_t j = 0; while (true) { if (map[j].name[0] == '\0') { str_append_str(dest, "UNKNOWN"); break; } if (map[j].value == v) { str_append_str(dest, map[j].name); break; } ++j; } } else { str_append_u32(dest, v, optmap[i].flags & OPTMAP_USE_BYTE_SUFFIX); } } return; } extern LZMA_API(lzma_ret) lzma_str_from_filters(char **output_str, const lzma_filter *filters, uint32_t flags, const lzma_allocator *allocator) { // On error *output_str is always set to NULL. // Do it as the very first step. if (output_str == NULL) return LZMA_PROG_ERROR; *output_str = NULL; if (filters == NULL) return LZMA_PROG_ERROR; // Validate the flags. const uint32_t supported_flags = LZMA_STR_ENCODER | LZMA_STR_DECODER | LZMA_STR_GETOPT_LONG | LZMA_STR_NO_SPACES; if (flags & ~supported_flags) return LZMA_OPTIONS_ERROR; // There must be at least one filter. if (filters[0].id == LZMA_VLI_UNKNOWN) return LZMA_OPTIONS_ERROR; // Allocate memory for the output string. lzma_str dest; return_if_error(str_init(&dest, allocator)); const bool show_opts = (flags & (LZMA_STR_ENCODER | LZMA_STR_DECODER)); const char *opt_delim = (flags & LZMA_STR_GETOPT_LONG) ? "=" : ":"; for (size_t i = 0; filters[i].id != LZMA_VLI_UNKNOWN; ++i) { // If we reach LZMA_FILTERS_MAX, then the filters array // is too large since the ID cannot be LZMA_VLI_UNKNOWN here. if (i == LZMA_FILTERS_MAX) { str_free(&dest, allocator); return LZMA_OPTIONS_ERROR; } // Don't add a space between filters if the caller // doesn't want them. if (i > 0 && !(flags & LZMA_STR_NO_SPACES)) str_append_str(&dest, " "); // Use dashes for xz getopt_long() compatible syntax but also // use dashes to separate filters when spaces weren't wanted. if ((flags & LZMA_STR_GETOPT_LONG) || (i > 0 && (flags & LZMA_STR_NO_SPACES))) str_append_str(&dest, "--"); size_t j = 0; while (true) { if (j == ARRAY_SIZE(filter_name_map)) { // Filter ID in filters[i].id isn't supported. str_free(&dest, allocator); return LZMA_OPTIONS_ERROR; } if (filter_name_map[j].id == filters[i].id) { // Add the filter name. str_append_str(&dest, filter_name_map[j].name); // If only the filter names were wanted then // skip to the next filter. In this case // .options is ignored and may be NULL even // when the filter doesn't allow NULL options. if (!show_opts) break; if (filters[i].options == NULL) { if (!filter_name_map[j].allow_null) { // Filter-specific options // are missing but with // this filter the options // structure is mandatory. str_free(&dest, allocator); return LZMA_OPTIONS_ERROR; } // .options is allowed to be NULL. // There is no need to add any // options to the string. break; } // Options structure is available. Add // the filter options to the string. const size_t optmap_count = (flags & LZMA_STR_ENCODER) ? filter_name_map[j].strfy_encoder : filter_name_map[j].strfy_decoder; strfy_filter(&dest, opt_delim, filter_name_map[j].optmap, optmap_count, filters[i].options); break; } ++j; } } return str_finish(output_str, &dest, allocator); } extern LZMA_API(lzma_ret) lzma_str_list_filters(char **output_str, lzma_vli filter_id, uint32_t flags, const lzma_allocator *allocator) { // On error *output_str is always set to NULL. // Do it as the very first step. if (output_str == NULL) return LZMA_PROG_ERROR; *output_str = NULL; // Validate the flags. const uint32_t supported_flags = LZMA_STR_ALL_FILTERS | LZMA_STR_ENCODER | LZMA_STR_DECODER | LZMA_STR_GETOPT_LONG; if (flags & ~supported_flags) return LZMA_OPTIONS_ERROR; // Allocate memory for the output string. lzma_str dest; return_if_error(str_init(&dest, allocator)); // If only listing the filter names then separate them with spaces. // Otherwise use newlines. const bool show_opts = (flags & (LZMA_STR_ENCODER | LZMA_STR_DECODER)); const char *filter_delim = show_opts ? "\n" : " "; const char *opt_delim = (flags & LZMA_STR_GETOPT_LONG) ? "=" : ":"; bool first_filter_printed = false; for (size_t i = 0; i < ARRAY_SIZE(filter_name_map); ++i) { // If we are printing only one filter then skip others. if (filter_id != LZMA_VLI_UNKNOWN && filter_id != filter_name_map[i].id) continue; // If we are printing only .xz filters then skip the others. if (filter_name_map[i].id >= LZMA_FILTER_RESERVED_START && (flags & LZMA_STR_ALL_FILTERS) == 0 && filter_id == LZMA_VLI_UNKNOWN) continue; // Add a new line if this isn't the first filter being // written to the string. if (first_filter_printed) str_append_str(&dest, filter_delim); first_filter_printed = true; if (flags & LZMA_STR_GETOPT_LONG) str_append_str(&dest, "--"); str_append_str(&dest, filter_name_map[i].name); // If only the filter names were wanted then continue // to the next filter. if (!show_opts) continue; const option_map *optmap = filter_name_map[i].optmap; const char *d = opt_delim; const size_t end = (flags & LZMA_STR_ENCODER) ? filter_name_map[i].strfy_encoder : filter_name_map[i].strfy_decoder; for (size_t j = 0; j < end; ++j) { // The first option is delimited from the filter // name using "=" or ":" and the rest of the options // are separated with ",". str_append_str(&dest, d); d = ","; // optname= str_append_str(&dest, optmap[j].name); str_append_str(&dest, "=<"); if (optmap[j].type == OPTMAP_TYPE_LZMA_PRESET) { // LZMA1/2 preset has its custom help string. str_append_str(&dest, LZMA12_PRESET_STR); } else if (optmap[j].flags & OPTMAP_USE_NAME_VALUE_MAP) { // Separate the possible option values by "|". const name_value_map *m = optmap[j].u.map; for (size_t k = 0; m[k].name[0] != '\0'; ++k) { if (k > 0) str_append_str(&dest, "|"); str_append_str(&dest, m[k].name); } } else { // Integer range is shown as min-max. const bool use_byte_suffix = optmap[j].flags & OPTMAP_USE_BYTE_SUFFIX; str_append_u32(&dest, optmap[j].u.range.min, use_byte_suffix); str_append_str(&dest, "-"); str_append_u32(&dest, optmap[j].u.range.max, use_byte_suffix); } str_append_str(&dest, ">"); } } // If no filters were added to the string then it must be because // the caller provided an unsupported Filter ID. if (!first_filter_printed) { str_free(&dest, allocator); return LZMA_OPTIONS_ERROR; } return str_finish(output_str, &dest, allocator); } diff --git a/src/liblzma/liblzma_generic.map b/src/liblzma/liblzma_generic.map index f74c15484559..2bef27a8f7d7 100644 --- a/src/liblzma/liblzma_generic.map +++ b/src/liblzma/liblzma_generic.map @@ -1,128 +1,138 @@ /* SPDX-License-Identifier: 0BSD */ XZ_5.0 { global: lzma_alone_decoder; lzma_alone_encoder; lzma_auto_decoder; lzma_block_buffer_bound; lzma_block_buffer_decode; lzma_block_buffer_encode; lzma_block_compressed_size; lzma_block_decoder; lzma_block_encoder; lzma_block_header_decode; lzma_block_header_encode; lzma_block_header_size; lzma_block_total_size; lzma_block_unpadded_size; lzma_check_is_supported; lzma_check_size; lzma_code; lzma_crc32; lzma_crc64; lzma_easy_buffer_encode; lzma_easy_decoder_memusage; lzma_easy_encoder; lzma_easy_encoder_memusage; lzma_end; lzma_filter_decoder_is_supported; lzma_filter_encoder_is_supported; lzma_filter_flags_decode; lzma_filter_flags_encode; lzma_filter_flags_size; lzma_filters_copy; lzma_filters_update; lzma_get_check; lzma_index_append; lzma_index_block_count; lzma_index_buffer_decode; lzma_index_buffer_encode; lzma_index_cat; lzma_index_checks; lzma_index_decoder; lzma_index_dup; lzma_index_encoder; lzma_index_end; lzma_index_file_size; lzma_index_hash_append; lzma_index_hash_decode; lzma_index_hash_end; lzma_index_hash_init; lzma_index_hash_size; lzma_index_init; lzma_index_iter_init; lzma_index_iter_locate; lzma_index_iter_next; lzma_index_iter_rewind; lzma_index_memusage; lzma_index_memused; lzma_index_size; lzma_index_stream_count; lzma_index_stream_flags; lzma_index_stream_padding; lzma_index_stream_size; lzma_index_total_size; lzma_index_uncompressed_size; lzma_lzma_preset; lzma_memlimit_get; lzma_memlimit_set; lzma_memusage; lzma_mf_is_supported; lzma_mode_is_supported; lzma_physmem; lzma_properties_decode; lzma_properties_encode; lzma_properties_size; lzma_raw_buffer_decode; lzma_raw_buffer_encode; lzma_raw_decoder; lzma_raw_decoder_memusage; lzma_raw_encoder; lzma_raw_encoder_memusage; lzma_stream_buffer_bound; lzma_stream_buffer_decode; lzma_stream_buffer_encode; lzma_stream_decoder; lzma_stream_encoder; lzma_stream_flags_compare; lzma_stream_footer_decode; lzma_stream_footer_encode; lzma_stream_header_decode; lzma_stream_header_encode; lzma_version_number; lzma_version_string; lzma_vli_decode; lzma_vli_encode; lzma_vli_size; local: *; }; XZ_5.2 { global: lzma_block_uncomp_encode; lzma_cputhreads; lzma_get_progress; lzma_stream_encoder_mt; lzma_stream_encoder_mt_memusage; } XZ_5.0; XZ_5.4 { global: lzma_file_info_decoder; lzma_filters_free; lzma_lzip_decoder; lzma_microlzma_decoder; lzma_microlzma_encoder; lzma_stream_decoder_mt; lzma_str_from_filters; lzma_str_list_filters; lzma_str_to_filters; } XZ_5.2; XZ_5.6.0 { global: lzma_mt_block_size; } XZ_5.4; + +XZ_5.8 { +global: + lzma_bcj_arm64_encode; + lzma_bcj_arm64_decode; + lzma_bcj_riscv_encode; + lzma_bcj_riscv_decode; + lzma_bcj_x86_encode; + lzma_bcj_x86_decode; +} XZ_5.6.0; diff --git a/src/liblzma/liblzma_linux.map b/src/liblzma/liblzma_linux.map index 7e4b25e17620..50f1571de219 100644 --- a/src/liblzma/liblzma_linux.map +++ b/src/liblzma/liblzma_linux.map @@ -1,143 +1,153 @@ /* SPDX-License-Identifier: 0BSD */ XZ_5.0 { global: lzma_alone_decoder; lzma_alone_encoder; lzma_auto_decoder; lzma_block_buffer_bound; lzma_block_buffer_decode; lzma_block_buffer_encode; lzma_block_compressed_size; lzma_block_decoder; lzma_block_encoder; lzma_block_header_decode; lzma_block_header_encode; lzma_block_header_size; lzma_block_total_size; lzma_block_unpadded_size; lzma_check_is_supported; lzma_check_size; lzma_code; lzma_crc32; lzma_crc64; lzma_easy_buffer_encode; lzma_easy_decoder_memusage; lzma_easy_encoder; lzma_easy_encoder_memusage; lzma_end; lzma_filter_decoder_is_supported; lzma_filter_encoder_is_supported; lzma_filter_flags_decode; lzma_filter_flags_encode; lzma_filter_flags_size; lzma_filters_copy; lzma_filters_update; lzma_get_check; lzma_index_append; lzma_index_block_count; lzma_index_buffer_decode; lzma_index_buffer_encode; lzma_index_cat; lzma_index_checks; lzma_index_decoder; lzma_index_dup; lzma_index_encoder; lzma_index_end; lzma_index_file_size; lzma_index_hash_append; lzma_index_hash_decode; lzma_index_hash_end; lzma_index_hash_init; lzma_index_hash_size; lzma_index_init; lzma_index_iter_init; lzma_index_iter_locate; lzma_index_iter_next; lzma_index_iter_rewind; lzma_index_memusage; lzma_index_memused; lzma_index_size; lzma_index_stream_count; lzma_index_stream_flags; lzma_index_stream_padding; lzma_index_stream_size; lzma_index_total_size; lzma_index_uncompressed_size; lzma_lzma_preset; lzma_memlimit_get; lzma_memlimit_set; lzma_memusage; lzma_mf_is_supported; lzma_mode_is_supported; lzma_physmem; lzma_properties_decode; lzma_properties_encode; lzma_properties_size; lzma_raw_buffer_decode; lzma_raw_buffer_encode; lzma_raw_decoder; lzma_raw_decoder_memusage; lzma_raw_encoder; lzma_raw_encoder_memusage; lzma_stream_buffer_bound; lzma_stream_buffer_decode; lzma_stream_buffer_encode; lzma_stream_decoder; lzma_stream_encoder; lzma_stream_flags_compare; lzma_stream_footer_decode; lzma_stream_footer_encode; lzma_stream_header_decode; lzma_stream_header_encode; lzma_version_number; lzma_version_string; lzma_vli_decode; lzma_vli_encode; lzma_vli_size; local: *; }; XZ_5.2 { global: lzma_block_uncomp_encode; lzma_cputhreads; lzma_get_progress; lzma_stream_encoder_mt; lzma_stream_encoder_mt_memusage; } XZ_5.0; XZ_5.1.2alpha { global: lzma_stream_encoder_mt; lzma_stream_encoder_mt_memusage; } XZ_5.0; XZ_5.2.2 { global: lzma_block_uncomp_encode; lzma_cputhreads; lzma_get_progress; lzma_stream_encoder_mt; lzma_stream_encoder_mt_memusage; } XZ_5.1.2alpha; XZ_5.4 { global: lzma_file_info_decoder; lzma_filters_free; lzma_lzip_decoder; lzma_microlzma_decoder; lzma_microlzma_encoder; lzma_stream_decoder_mt; lzma_str_from_filters; lzma_str_list_filters; lzma_str_to_filters; } XZ_5.2; XZ_5.6.0 { global: lzma_mt_block_size; } XZ_5.4; + +XZ_5.8 { +global: + lzma_bcj_arm64_encode; + lzma_bcj_arm64_decode; + lzma_bcj_riscv_encode; + lzma_bcj_riscv_decode; + lzma_bcj_x86_encode; + lzma_bcj_x86_decode; +} XZ_5.6.0; diff --git a/src/liblzma/lz/lz_decoder.c b/src/liblzma/lz/lz_decoder.c index 92913f225a0d..1cb120ab3b09 100644 --- a/src/liblzma/lz/lz_decoder.c +++ b/src/liblzma/lz/lz_decoder.c @@ -1,324 +1,333 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file lz_decoder.c /// \brief LZ out window /// // Authors: Igor Pavlov // Lasse Collin // /////////////////////////////////////////////////////////////////////////////// // liblzma supports multiple LZ77-based filters. The LZ part is shared // between these filters. The LZ code takes care of dictionary handling // and passing the data between filters in the chain. The filter-specific // part decodes from the input buffer to the dictionary. #include "lz_decoder.h" typedef struct { /// Dictionary (history buffer) lzma_dict dict; /// The actual LZ-based decoder e.g. LZMA lzma_lz_decoder lz; /// Next filter in the chain, if any. Note that LZMA and LZMA2 are /// only allowed as the last filter, but the long-range filter in /// future can be in the middle of the chain. lzma_next_coder next; /// True if the next filter in the chain has returned LZMA_STREAM_END. bool next_finished; /// True if the LZ decoder (e.g. LZMA) has detected end of payload /// marker. This may become true before next_finished becomes true. bool this_finished; /// Temporary buffer needed when the LZ-based filter is not the last /// filter in the chain. The output of the next filter is first /// decoded into buffer[], which is then used as input for the actual /// LZ-based decoder. struct { size_t pos; size_t size; uint8_t buffer[LZMA_BUFFER_SIZE]; } temp; } lzma_coder; static void lz_decoder_reset(lzma_coder *coder) { - coder->dict.pos = 2 * LZ_DICT_REPEAT_MAX; + coder->dict.pos = LZ_DICT_INIT_POS; coder->dict.full = 0; - coder->dict.buf[2 * LZ_DICT_REPEAT_MAX - 1] = '\0'; + coder->dict.buf[LZ_DICT_INIT_POS - 1] = '\0'; coder->dict.has_wrapped = false; coder->dict.need_reset = false; return; } static lzma_ret decode_buffer(lzma_coder *coder, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size, uint8_t *restrict out, size_t *restrict out_pos, size_t out_size) { while (true) { // Wrap the dictionary if needed. if (coder->dict.pos == coder->dict.size) { // See the comment of #define LZ_DICT_REPEAT_MAX. coder->dict.pos = LZ_DICT_REPEAT_MAX; coder->dict.has_wrapped = true; memcpy(coder->dict.buf, coder->dict.buf + coder->dict.size - LZ_DICT_REPEAT_MAX, LZ_DICT_REPEAT_MAX); } // Store the current dictionary position. It is needed to know // where to start copying to the out[] buffer. const size_t dict_start = coder->dict.pos; // Calculate how much we allow coder->lz.code() to decode. // It must not decode past the end of the dictionary // buffer, and we don't want it to decode more than is // actually needed to fill the out[] buffer. coder->dict.limit = coder->dict.pos + my_min(out_size - *out_pos, coder->dict.size - coder->dict.pos); // Call the coder->lz.code() to do the actual decoding. const lzma_ret ret = coder->lz.code( coder->lz.coder, &coder->dict, in, in_pos, in_size); // Copy the decoded data from the dictionary to the out[] // buffer. Do it conditionally because out can be NULL // (in which case copy_size is always 0). Calling memcpy() // with a null-pointer is undefined even if the third // argument is 0. const size_t copy_size = coder->dict.pos - dict_start; assert(copy_size <= out_size - *out_pos); if (copy_size > 0) memcpy(out + *out_pos, coder->dict.buf + dict_start, copy_size); *out_pos += copy_size; // Reset the dictionary if so requested by coder->lz.code(). if (coder->dict.need_reset) { lz_decoder_reset(coder); // Since we reset dictionary, we don't check if // dictionary became full. if (ret != LZMA_OK || *out_pos == out_size) return ret; } else { // Return if everything got decoded or an error // occurred, or if there's no more data to decode. // // Note that detecting if there's something to decode // is done by looking if dictionary become full // instead of looking if *in_pos == in_size. This // is because it is possible that all the input was // consumed already but some data is pending to be // written to the dictionary. if (ret != LZMA_OK || *out_pos == out_size || coder->dict.pos < coder->dict.size) return ret; } } } static lzma_ret lz_decode(void *coder_ptr, const lzma_allocator *allocator, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size, uint8_t *restrict out, size_t *restrict out_pos, size_t out_size, lzma_action action) { lzma_coder *coder = coder_ptr; if (coder->next.code == NULL) return decode_buffer(coder, in, in_pos, in_size, out, out_pos, out_size); // We aren't the last coder in the chain, we need to decode // our input to a temporary buffer. while (*out_pos < out_size) { // Fill the temporary buffer if it is empty. if (!coder->next_finished && coder->temp.pos == coder->temp.size) { coder->temp.pos = 0; coder->temp.size = 0; const lzma_ret ret = coder->next.code( coder->next.coder, allocator, in, in_pos, in_size, coder->temp.buffer, &coder->temp.size, LZMA_BUFFER_SIZE, action); if (ret == LZMA_STREAM_END) coder->next_finished = true; else if (ret != LZMA_OK || coder->temp.size == 0) return ret; } if (coder->this_finished) { if (coder->temp.size != 0) return LZMA_DATA_ERROR; if (coder->next_finished) return LZMA_STREAM_END; return LZMA_OK; } const lzma_ret ret = decode_buffer(coder, coder->temp.buffer, &coder->temp.pos, coder->temp.size, out, out_pos, out_size); if (ret == LZMA_STREAM_END) coder->this_finished = true; else if (ret != LZMA_OK) return ret; else if (coder->next_finished && *out_pos < out_size) return LZMA_DATA_ERROR; } return LZMA_OK; } static void lz_decoder_end(void *coder_ptr, const lzma_allocator *allocator) { lzma_coder *coder = coder_ptr; lzma_next_end(&coder->next, allocator); lzma_free(coder->dict.buf, allocator); if (coder->lz.end != NULL) coder->lz.end(coder->lz.coder, allocator); else lzma_free(coder->lz.coder, allocator); lzma_free(coder, allocator); return; } extern lzma_ret lzma_lz_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters, lzma_ret (*lz_init)(lzma_lz_decoder *lz, const lzma_allocator *allocator, lzma_vli id, const void *options, lzma_lz_options *lz_options)) { // Allocate the base structure if it isn't already allocated. lzma_coder *coder = next->coder; if (coder == NULL) { coder = lzma_alloc(sizeof(lzma_coder), allocator); if (coder == NULL) return LZMA_MEM_ERROR; next->coder = coder; next->code = &lz_decode; next->end = &lz_decoder_end; coder->dict.buf = NULL; coder->dict.size = 0; coder->lz = LZMA_LZ_DECODER_INIT; coder->next = LZMA_NEXT_CODER_INIT; } // Allocate and initialize the LZ-based decoder. It will also give // us the dictionary size. lzma_lz_options lz_options; return_if_error(lz_init(&coder->lz, allocator, filters[0].id, filters[0].options, &lz_options)); // If the dictionary size is very small, increase it to 4096 bytes. // This is to prevent constant wrapping of the dictionary, which // would slow things down. The downside is that since we don't check // separately for the real dictionary size, we may happily accept // corrupt files. if (lz_options.dict_size < 4096) lz_options.dict_size = 4096; // Make dictionary size a multiple of 16. Some LZ-based decoders like // LZMA use the lowest bits lzma_dict.pos to know the alignment of the // data. Aligned buffer is also good when memcpying from the // dictionary to the output buffer, since applications are // recommended to give aligned buffers to liblzma. // // Reserve 2 * LZ_DICT_REPEAT_MAX bytes of extra space which is - // needed for alloc_size. + // needed for alloc_size. Reserve also LZ_DICT_EXTRA bytes of extra + // space which is *not* counted in alloc_size or coder->dict.size. // // Avoid integer overflow. - if (lz_options.dict_size > SIZE_MAX - 15 - 2 * LZ_DICT_REPEAT_MAX) + if (lz_options.dict_size > SIZE_MAX - 15 - 2 * LZ_DICT_REPEAT_MAX + - LZ_DICT_EXTRA) return LZMA_MEM_ERROR; lz_options.dict_size = (lz_options.dict_size + 15) & ~((size_t)(15)); // Reserve extra space as explained in the comment // of #define LZ_DICT_REPEAT_MAX. const size_t alloc_size = lz_options.dict_size + 2 * LZ_DICT_REPEAT_MAX; // Allocate and initialize the dictionary. if (coder->dict.size != alloc_size) { lzma_free(coder->dict.buf, allocator); - coder->dict.buf = lzma_alloc(alloc_size, allocator); + + // The LZ_DICT_EXTRA bytes at the end of the buffer aren't + // included in alloc_size. These extra bytes allow + // dict_repeat() to read and write more data than requested. + // Otherwise this extra space is ignored. + coder->dict.buf = lzma_alloc(alloc_size + LZ_DICT_EXTRA, + allocator); if (coder->dict.buf == NULL) return LZMA_MEM_ERROR; // NOTE: Yes, alloc_size, not lz_options.dict_size. The way // coder->dict.full is updated will take care that we will // still reject distances larger than lz_options.dict_size. coder->dict.size = alloc_size; } lz_decoder_reset(next->coder); // Use the preset dictionary if it was given to us. if (lz_options.preset_dict != NULL && lz_options.preset_dict_size > 0) { // If the preset dictionary is bigger than the actual // dictionary, copy only the tail. const size_t copy_size = my_min(lz_options.preset_dict_size, lz_options.dict_size); const size_t offset = lz_options.preset_dict_size - copy_size; memcpy(coder->dict.buf + coder->dict.pos, lz_options.preset_dict + offset, copy_size); // dict.pos isn't zero after lz_decoder_reset(). coder->dict.pos += copy_size; coder->dict.full = copy_size; } // Miscellaneous initializations coder->next_finished = false; coder->this_finished = false; coder->temp.pos = 0; coder->temp.size = 0; // Initialize the next filter in the chain, if any. return lzma_next_filter_init(&coder->next, allocator, filters + 1); } extern uint64_t lzma_lz_decoder_memusage(size_t dictionary_size) { - return sizeof(lzma_coder) + (uint64_t)(dictionary_size); + return sizeof(lzma_coder) + (uint64_t)(dictionary_size) + + 2 * LZ_DICT_REPEAT_MAX + LZ_DICT_EXTRA; } diff --git a/src/liblzma/lz/lz_decoder.h b/src/liblzma/lz/lz_decoder.h index cb61b6e24c78..2698e0167fcc 100644 --- a/src/liblzma/lz/lz_decoder.h +++ b/src/liblzma/lz/lz_decoder.h @@ -1,250 +1,325 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file lz_decoder.h /// \brief LZ out window /// // Authors: Igor Pavlov // Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #ifndef LZMA_LZ_DECODER_H #define LZMA_LZ_DECODER_H #include "common.h" +#ifdef HAVE_IMMINTRIN_H +# include +#endif + + +// dict_repeat() implementation variant: +// 0 = Byte-by-byte copying only. +// 1 = Use memcpy() for non-overlapping copies. +// 2 = Use x86 SSE2 for non-overlapping copies. +#ifndef LZMA_LZ_DECODER_CONFIG +# if defined(TUKLIB_FAST_UNALIGNED_ACCESS) \ + && defined(HAVE_IMMINTRIN_H) \ + && (defined(__SSE2__) || defined(_M_X64) \ + || (defined(_M_IX86_FP) && _M_IX86_FP >= 2)) +# define LZMA_LZ_DECODER_CONFIG 2 +# else +# define LZMA_LZ_DECODER_CONFIG 1 +# endif +#endif -/// Maximum length of a match rounded up to a nice power of 2 which is -/// a good size for aligned memcpy(). The allocated dictionary buffer will -/// be 2 * LZ_DICT_REPEAT_MAX bytes larger than the actual dictionary size: +/// Byte-by-byte and memcpy() copy exactly the amount needed. Other methods +/// can copy up to LZ_DICT_EXTRA bytes more than requested, and this amount +/// of extra space is needed at the end of the allocated dictionary buffer. +/// +/// NOTE: If this is increased, update LZMA_DICT_REPEAT_MAX too. +#if LZMA_LZ_DECODER_CONFIG >= 2 +# define LZ_DICT_EXTRA 32 +#else +# define LZ_DICT_EXTRA 0 +#endif + +/// Maximum number of bytes that dict_repeat() may copy. The allocated +/// dictionary buffer will be 2 * LZ_DICT_REPEAT_MAX + LZMA_DICT_EXTRA bytes +/// larger than the actual dictionary size: /// /// (1) Every time the decoder reaches the end of the dictionary buffer, /// the last LZ_DICT_REPEAT_MAX bytes will be copied to the beginning. /// This way dict_repeat() will only need to copy from one place, /// never from both the end and beginning of the buffer. /// /// (2) The other LZ_DICT_REPEAT_MAX bytes is kept as a buffer between /// the oldest byte still in the dictionary and the current write -/// position. This way dict_repeat(dict, dict->size - 1, &len) +/// position. This way dict_repeat() with the maximum valid distance /// won't need memmove() as the copying cannot overlap. /// +/// (3) LZ_DICT_EXTRA bytes are required at the end of the dictionary buffer +/// so that extra copying done by dict_repeat() won't write or read past +/// the end of the allocated buffer. This amount is *not* counted as part +/// of lzma_dict.size. +/// /// Note that memcpy() still cannot be used if distance < len. /// -/// LZMA's longest match length is 273 so pick a multiple of 16 above that. +/// LZMA's longest match length is 273 bytes. The LZMA decoder looks at +/// the lowest four bits of the dictionary position, thus 273 must be +/// rounded up to the next multiple of 16 (288). In addition, optimized +/// dict_repeat() copies 32 bytes at a time, thus this must also be +/// a multiple of 32. #define LZ_DICT_REPEAT_MAX 288 +/// Initial position in lzma_dict.buf when the dictionary is empty. +#define LZ_DICT_INIT_POS (2 * LZ_DICT_REPEAT_MAX) + typedef struct { /// Pointer to the dictionary buffer. uint8_t *buf; /// Write position in dictionary. The next byte will be written to /// buf[pos]. size_t pos; /// Indicates how full the dictionary is. This is used by /// dict_is_distance_valid() to detect corrupt files that would /// read beyond the beginning of the dictionary. size_t full; /// Write limit size_t limit; /// Allocated size of buf. This is 2 * LZ_DICT_REPEAT_MAX bytes /// larger than the actual dictionary size. This is enforced by /// how the value for "full" is set; it can be at most /// "size - 2 * LZ_DICT_REPEAT_MAX". size_t size; /// True once the dictionary has become full and the writing position /// has been wrapped in decode_buffer() in lz_decoder.c. bool has_wrapped; /// True when dictionary should be reset before decoding more data. bool need_reset; } lzma_dict; typedef struct { size_t dict_size; const uint8_t *preset_dict; size_t preset_dict_size; } lzma_lz_options; typedef struct { /// Data specific to the LZ-based decoder void *coder; /// Function to decode from in[] to *dict lzma_ret (*code)(void *coder, lzma_dict *restrict dict, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size); void (*reset)(void *coder, const void *options); /// Set the uncompressed size. If uncompressed_size == LZMA_VLI_UNKNOWN /// then allow_eopm will always be true. void (*set_uncompressed)(void *coder, lzma_vli uncompressed_size, bool allow_eopm); /// Free allocated resources void (*end)(void *coder, const lzma_allocator *allocator); } lzma_lz_decoder; #define LZMA_LZ_DECODER_INIT \ (lzma_lz_decoder){ \ .coder = NULL, \ .code = NULL, \ .reset = NULL, \ .set_uncompressed = NULL, \ .end = NULL, \ } extern lzma_ret lzma_lz_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters, lzma_ret (*lz_init)(lzma_lz_decoder *lz, const lzma_allocator *allocator, lzma_vli id, const void *options, lzma_lz_options *lz_options)); extern uint64_t lzma_lz_decoder_memusage(size_t dictionary_size); ////////////////////// // Inline functions // ////////////////////// /// Get a byte from the history buffer. static inline uint8_t dict_get(const lzma_dict *const dict, const uint32_t distance) { return dict->buf[dict->pos - distance - 1 + (distance < dict->pos ? 0 : dict->size - LZ_DICT_REPEAT_MAX)]; } /// Optimized version of dict_get(dict, 0) static inline uint8_t dict_get0(const lzma_dict *const dict) { return dict->buf[dict->pos - 1]; } /// Test if dictionary is empty. static inline bool dict_is_empty(const lzma_dict *const dict) { return dict->full == 0; } /// Validate the match distance static inline bool dict_is_distance_valid(const lzma_dict *const dict, const size_t distance) { return dict->full > distance; } /// Repeat *len bytes at distance. static inline bool -dict_repeat(lzma_dict *dict, uint32_t distance, uint32_t *len) +dict_repeat(lzma_dict *restrict dict, + uint32_t distance, uint32_t *restrict len) { // Don't write past the end of the dictionary. const size_t dict_avail = dict->limit - dict->pos; uint32_t left = my_min(dict_avail, *len); *len -= left; size_t back = dict->pos - distance - 1; if (distance >= dict->pos) back += dict->size - LZ_DICT_REPEAT_MAX; - // Repeat a block of data from the history. Because memcpy() is faster - // than copying byte by byte in a loop, the copying process gets split - // into two cases. +#if LZMA_LZ_DECODER_CONFIG == 0 + // Minimal byte-by-byte method. This might be the least bad choice + // if memcpy() isn't fast and there's no replacement for it below. + while (left-- > 0) { + dict->buf[dict->pos++] = dict->buf[back++]; + } + +#else + // Because memcpy() or a similar method can be faster than copying + // byte by byte in a loop, the copying process is split into + // two cases. if (distance < left) { // Source and target areas overlap, thus we can't use // memcpy() nor even memmove() safely. do { dict->buf[dict->pos++] = dict->buf[back++]; } while (--left > 0); } else { +# if LZMA_LZ_DECODER_CONFIG == 1 memcpy(dict->buf + dict->pos, dict->buf + back, left); dict->pos += left; + +# elif LZMA_LZ_DECODER_CONFIG == 2 + // This can copy up to 32 bytes more than required. + // (If left == 0, we still copy 32 bytes.) + size_t pos = dict->pos; + dict->pos += left; + do { + const __m128i x0 = _mm_loadu_si128( + (__m128i *)(dict->buf + back)); + const __m128i x1 = _mm_loadu_si128( + (__m128i *)(dict->buf + back + 16)); + back += 32; + _mm_storeu_si128( + (__m128i *)(dict->buf + pos), x0); + _mm_storeu_si128( + (__m128i *)(dict->buf + pos + 16), x1); + pos += 32; + } while (pos < dict->pos); + +# else +# error "Invalid LZMA_LZ_DECODER_CONFIG value" +# endif } +#endif // Update how full the dictionary is. if (!dict->has_wrapped) - dict->full = dict->pos - 2 * LZ_DICT_REPEAT_MAX; + dict->full = dict->pos - LZ_DICT_INIT_POS; return *len != 0; } static inline void -dict_put(lzma_dict *dict, uint8_t byte) +dict_put(lzma_dict *restrict dict, uint8_t byte) { dict->buf[dict->pos++] = byte; if (!dict->has_wrapped) - dict->full = dict->pos - 2 * LZ_DICT_REPEAT_MAX; + dict->full = dict->pos - LZ_DICT_INIT_POS; } /// Puts one byte into the dictionary. Returns true if the dictionary was /// already full and the byte couldn't be added. static inline bool -dict_put_safe(lzma_dict *dict, uint8_t byte) +dict_put_safe(lzma_dict *restrict dict, uint8_t byte) { if (unlikely(dict->pos == dict->limit)) return true; dict_put(dict, byte); return false; } /// Copies arbitrary amount of data into the dictionary. static inline void dict_write(lzma_dict *restrict dict, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size, size_t *restrict left) { // NOTE: If we are being given more data than the size of the // dictionary, it could be possible to optimize the LZ decoder // so that not everything needs to go through the dictionary. // This shouldn't be very common thing in practice though, and // the slowdown of one extra memcpy() isn't bad compared to how // much time it would have taken if the data were compressed. if (in_size - *in_pos > *left) in_size = *in_pos + *left; *left -= lzma_bufcpy(in, in_pos, in_size, dict->buf, &dict->pos, dict->limit); if (!dict->has_wrapped) - dict->full = dict->pos - 2 * LZ_DICT_REPEAT_MAX; + dict->full = dict->pos - LZ_DICT_INIT_POS; return; } static inline void dict_reset(lzma_dict *dict) { dict->need_reset = true; return; } #endif diff --git a/src/liblzma/lz/lz_encoder.c b/src/liblzma/lz/lz_encoder.c index 4af23e14c423..e5c4057dca53 100644 --- a/src/liblzma/lz/lz_encoder.c +++ b/src/liblzma/lz/lz_encoder.c @@ -1,632 +1,632 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file lz_encoder.c /// \brief LZ in window /// // Authors: Igor Pavlov // Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "lz_encoder.h" #include "lz_encoder_hash.h" // See lz_encoder_hash.h. This is a bit hackish but avoids making // endianness a conditional in makefiles. -#if defined(WORDS_BIGENDIAN) && !defined(HAVE_SMALL) +#ifdef LZMA_LZ_HASH_TABLE_IS_NEEDED # include "lz_encoder_hash_table.h" #endif #include "memcmplen.h" typedef struct { /// LZ-based encoder e.g. LZMA lzma_lz_encoder lz; /// History buffer and match finder lzma_mf mf; /// Next coder in the chain lzma_next_coder next; } lzma_coder; /// \brief Moves the data in the input window to free space for new data /// /// mf->buffer is a sliding input window, which keeps mf->keep_size_before /// bytes of input history available all the time. Now and then we need to /// "slide" the buffer to make space for the new data to the end of the /// buffer. At the same time, data older than keep_size_before is dropped. /// static void move_window(lzma_mf *mf) { // Align the move to a multiple of 16 bytes. Some LZ-based encoders // like LZMA use the lowest bits of mf->read_pos to know the // alignment of the uncompressed data. We also get better speed // for memmove() with aligned buffers. assert(mf->read_pos > mf->keep_size_before); const uint32_t move_offset = (mf->read_pos - mf->keep_size_before) & ~UINT32_C(15); assert(mf->write_pos > move_offset); const size_t move_size = mf->write_pos - move_offset; assert(move_offset + move_size <= mf->size); memmove(mf->buffer, mf->buffer + move_offset, move_size); mf->offset += move_offset; mf->read_pos -= move_offset; mf->read_limit -= move_offset; mf->write_pos -= move_offset; return; } /// \brief Tries to fill the input window (mf->buffer) /// /// If we are the last encoder in the chain, our input data is in in[]. /// Otherwise we call the next filter in the chain to process in[] and /// write its output to mf->buffer. /// /// This function must not be called once it has returned LZMA_STREAM_END. /// static lzma_ret fill_window(lzma_coder *coder, const lzma_allocator *allocator, const uint8_t *in, size_t *in_pos, size_t in_size, lzma_action action) { assert(coder->mf.read_pos <= coder->mf.write_pos); // Move the sliding window if needed. if (coder->mf.read_pos >= coder->mf.size - coder->mf.keep_size_after) move_window(&coder->mf); // Maybe this is ugly, but lzma_mf uses uint32_t for most things // (which I find cleanest), but we need size_t here when filling // the history window. size_t write_pos = coder->mf.write_pos; lzma_ret ret; if (coder->next.code == NULL) { // Not using a filter, simply memcpy() as much as possible. lzma_bufcpy(in, in_pos, in_size, coder->mf.buffer, &write_pos, coder->mf.size); ret = action != LZMA_RUN && *in_pos == in_size ? LZMA_STREAM_END : LZMA_OK; } else { ret = coder->next.code(coder->next.coder, allocator, in, in_pos, in_size, coder->mf.buffer, &write_pos, coder->mf.size, action); } coder->mf.write_pos = write_pos; // Silence Valgrind. lzma_memcmplen() can read extra bytes // and Valgrind will give warnings if those bytes are uninitialized // because Valgrind cannot see that the values of the uninitialized // bytes are eventually ignored. memzero(coder->mf.buffer + write_pos, LZMA_MEMCMPLEN_EXTRA); // If end of stream has been reached or flushing completed, we allow // the encoder to process all the input (that is, read_pos is allowed // to reach write_pos). Otherwise we keep keep_size_after bytes // available as prebuffer. if (ret == LZMA_STREAM_END) { assert(*in_pos == in_size); ret = LZMA_OK; coder->mf.action = action; coder->mf.read_limit = coder->mf.write_pos; } else if (coder->mf.write_pos > coder->mf.keep_size_after) { // This needs to be done conditionally, because if we got // only little new input, there may be too little input // to do any encoding yet. coder->mf.read_limit = coder->mf.write_pos - coder->mf.keep_size_after; } // Restart the match finder after finished LZMA_SYNC_FLUSH. if (coder->mf.pending > 0 && coder->mf.read_pos < coder->mf.read_limit) { // Match finder may update coder->pending and expects it to // start from zero, so use a temporary variable. const uint32_t pending = coder->mf.pending; coder->mf.pending = 0; // Rewind read_pos so that the match finder can hash // the pending bytes. assert(coder->mf.read_pos >= pending); coder->mf.read_pos -= pending; // Call the skip function directly instead of using // mf_skip(), since we don't want to touch mf->read_ahead. coder->mf.skip(&coder->mf, pending); } return ret; } static lzma_ret lz_encode(void *coder_ptr, const lzma_allocator *allocator, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size, uint8_t *restrict out, size_t *restrict out_pos, size_t out_size, lzma_action action) { lzma_coder *coder = coder_ptr; while (*out_pos < out_size && (*in_pos < in_size || action != LZMA_RUN)) { // Read more data to coder->mf.buffer if needed. if (coder->mf.action == LZMA_RUN && coder->mf.read_pos >= coder->mf.read_limit) return_if_error(fill_window(coder, allocator, in, in_pos, in_size, action)); // Encode const lzma_ret ret = coder->lz.code(coder->lz.coder, &coder->mf, out, out_pos, out_size); if (ret != LZMA_OK) { // Setting this to LZMA_RUN for cases when we are // flushing. It doesn't matter when finishing or if // an error occurred. coder->mf.action = LZMA_RUN; return ret; } } return LZMA_OK; } static bool lz_encoder_prepare(lzma_mf *mf, const lzma_allocator *allocator, const lzma_lz_options *lz_options) { // For now, the dictionary size is limited to 1.5 GiB. This may grow // in the future if needed, but it needs a little more work than just // changing this check. if (!IS_ENC_DICT_SIZE_VALID(lz_options->dict_size) || lz_options->nice_len > lz_options->match_len_max) return true; mf->keep_size_before = lz_options->before_size + lz_options->dict_size; mf->keep_size_after = lz_options->after_size + lz_options->match_len_max; // To avoid constant memmove()s, allocate some extra space. Since // memmove()s become more expensive when the size of the buffer // increases, we reserve more space when a large dictionary is // used to make the memmove() calls rarer. // // This works with dictionaries up to about 3 GiB. If bigger // dictionary is wanted, some extra work is needed: // - Several variables in lzma_mf have to be changed from uint32_t // to size_t. // - Memory usage calculation needs something too, e.g. use uint64_t // for mf->size. uint32_t reserve = lz_options->dict_size / 2; if (reserve > (UINT32_C(1) << 30)) reserve /= 2; reserve += (lz_options->before_size + lz_options->match_len_max + lz_options->after_size) / 2 + (UINT32_C(1) << 19); const uint32_t old_size = mf->size; mf->size = mf->keep_size_before + reserve + mf->keep_size_after; // Deallocate the old history buffer if it exists but has different // size than what is needed now. if (mf->buffer != NULL && old_size != mf->size) { lzma_free(mf->buffer, allocator); mf->buffer = NULL; } // Match finder options mf->match_len_max = lz_options->match_len_max; mf->nice_len = lz_options->nice_len; // cyclic_size has to stay smaller than 2 Gi. Note that this doesn't // mean limiting dictionary size to less than 2 GiB. With a match // finder that uses multibyte resolution (hashes start at e.g. every // fourth byte), cyclic_size would stay below 2 Gi even when // dictionary size is greater than 2 GiB. // // It would be possible to allow cyclic_size >= 2 Gi, but then we // would need to be careful to use 64-bit types in various places // (size_t could do since we would need bigger than 32-bit address // space anyway). It would also require either zeroing a multigigabyte // buffer at initialization (waste of time and RAM) or allow // normalization in lz_encoder_mf.c to access uninitialized // memory to keep the code simpler. The current way is simple and // still allows pretty big dictionaries, so I don't expect these // limits to change. mf->cyclic_size = lz_options->dict_size + 1; // Validate the match finder ID and setup the function pointers. switch (lz_options->match_finder) { #ifdef HAVE_MF_HC3 case LZMA_MF_HC3: mf->find = &lzma_mf_hc3_find; mf->skip = &lzma_mf_hc3_skip; break; #endif #ifdef HAVE_MF_HC4 case LZMA_MF_HC4: mf->find = &lzma_mf_hc4_find; mf->skip = &lzma_mf_hc4_skip; break; #endif #ifdef HAVE_MF_BT2 case LZMA_MF_BT2: mf->find = &lzma_mf_bt2_find; mf->skip = &lzma_mf_bt2_skip; break; #endif #ifdef HAVE_MF_BT3 case LZMA_MF_BT3: mf->find = &lzma_mf_bt3_find; mf->skip = &lzma_mf_bt3_skip; break; #endif #ifdef HAVE_MF_BT4 case LZMA_MF_BT4: mf->find = &lzma_mf_bt4_find; mf->skip = &lzma_mf_bt4_skip; break; #endif default: return true; } // Calculate the sizes of mf->hash and mf->son. // // NOTE: Since 5.3.5beta the LZMA encoder ensures that nice_len // is big enough for the selected match finder. This makes it // easier for applications as nice_len = 2 will always be accepted // even though the effective value can be slightly bigger. const uint32_t hash_bytes = mf_get_hash_bytes(lz_options->match_finder); assert(hash_bytes <= mf->nice_len); const bool is_bt = (lz_options->match_finder & 0x10) != 0; uint32_t hs; if (hash_bytes == 2) { hs = 0xFFFF; } else { // Round dictionary size up to the next 2^n - 1 so it can // be used as a hash mask. hs = lz_options->dict_size - 1; hs |= hs >> 1; hs |= hs >> 2; hs |= hs >> 4; hs |= hs >> 8; hs >>= 1; hs |= 0xFFFF; if (hs > (UINT32_C(1) << 24)) { if (hash_bytes == 3) hs = (UINT32_C(1) << 24) - 1; else hs >>= 1; } } mf->hash_mask = hs; ++hs; if (hash_bytes > 2) hs += HASH_2_SIZE; if (hash_bytes > 3) hs += HASH_3_SIZE; /* No match finder uses this at the moment. if (mf->hash_bytes > 4) hs += HASH_4_SIZE; */ const uint32_t old_hash_count = mf->hash_count; const uint32_t old_sons_count = mf->sons_count; mf->hash_count = hs; mf->sons_count = mf->cyclic_size; if (is_bt) mf->sons_count *= 2; // Deallocate the old hash array if it exists and has different size // than what is needed now. if (old_hash_count != mf->hash_count || old_sons_count != mf->sons_count) { lzma_free(mf->hash, allocator); mf->hash = NULL; lzma_free(mf->son, allocator); mf->son = NULL; } // Maximum number of match finder cycles mf->depth = lz_options->depth; if (mf->depth == 0) { if (is_bt) mf->depth = 16 + mf->nice_len / 2; else mf->depth = 4 + mf->nice_len / 4; } return false; } static bool lz_encoder_init(lzma_mf *mf, const lzma_allocator *allocator, const lzma_lz_options *lz_options) { // Allocate the history buffer. if (mf->buffer == NULL) { // lzma_memcmplen() is used for the dictionary buffer // so we need to allocate a few extra bytes to prevent // it from reading past the end of the buffer. mf->buffer = lzma_alloc(mf->size + LZMA_MEMCMPLEN_EXTRA, allocator); if (mf->buffer == NULL) return true; // Keep Valgrind happy with lzma_memcmplen() and initialize // the extra bytes whose value may get read but which will // effectively get ignored. memzero(mf->buffer + mf->size, LZMA_MEMCMPLEN_EXTRA); } // Use cyclic_size as initial mf->offset. This allows // avoiding a few branches in the match finders. The downside is // that match finder needs to be normalized more often, which may // hurt performance with huge dictionaries. mf->offset = mf->cyclic_size; mf->read_pos = 0; mf->read_ahead = 0; mf->read_limit = 0; mf->write_pos = 0; mf->pending = 0; #if UINT32_MAX >= SIZE_MAX / 4 // Check for integer overflow. (Huge dictionaries are not // possible on 32-bit CPU.) if (mf->hash_count > SIZE_MAX / sizeof(uint32_t) || mf->sons_count > SIZE_MAX / sizeof(uint32_t)) return true; #endif // Allocate and initialize the hash table. Since EMPTY_HASH_VALUE // is zero, we can use lzma_alloc_zero() or memzero() for mf->hash. // // We don't need to initialize mf->son, but not doing that may // make Valgrind complain in normalization (see normalize() in // lz_encoder_mf.c). Skipping the initialization is *very* good // when big dictionary is used but only small amount of data gets // actually compressed: most of the mf->son won't get actually // allocated by the kernel, so we avoid wasting RAM and improve // initialization speed a lot. if (mf->hash == NULL) { mf->hash = lzma_alloc_zero(mf->hash_count * sizeof(uint32_t), allocator); mf->son = lzma_alloc(mf->sons_count * sizeof(uint32_t), allocator); if (mf->hash == NULL || mf->son == NULL) { lzma_free(mf->hash, allocator); mf->hash = NULL; lzma_free(mf->son, allocator); mf->son = NULL; return true; } } else { /* for (uint32_t i = 0; i < mf->hash_count; ++i) mf->hash[i] = EMPTY_HASH_VALUE; */ memzero(mf->hash, mf->hash_count * sizeof(uint32_t)); } mf->cyclic_pos = 0; // Handle preset dictionary. if (lz_options->preset_dict != NULL && lz_options->preset_dict_size > 0) { // If the preset dictionary is bigger than the actual // dictionary, use only the tail. mf->write_pos = my_min(lz_options->preset_dict_size, mf->size); memcpy(mf->buffer, lz_options->preset_dict + lz_options->preset_dict_size - mf->write_pos, mf->write_pos); mf->action = LZMA_SYNC_FLUSH; mf->skip(mf, mf->write_pos); } mf->action = LZMA_RUN; return false; } extern uint64_t lzma_lz_encoder_memusage(const lzma_lz_options *lz_options) { // Old buffers must not exist when calling lz_encoder_prepare(). lzma_mf mf = { .buffer = NULL, .hash = NULL, .son = NULL, .hash_count = 0, .sons_count = 0, }; // Setup the size information into mf. if (lz_encoder_prepare(&mf, NULL, lz_options)) return UINT64_MAX; // Calculate the memory usage. return ((uint64_t)(mf.hash_count) + mf.sons_count) * sizeof(uint32_t) + mf.size + sizeof(lzma_coder); } static void lz_encoder_end(void *coder_ptr, const lzma_allocator *allocator) { lzma_coder *coder = coder_ptr; lzma_next_end(&coder->next, allocator); lzma_free(coder->mf.son, allocator); lzma_free(coder->mf.hash, allocator); lzma_free(coder->mf.buffer, allocator); if (coder->lz.end != NULL) coder->lz.end(coder->lz.coder, allocator); else lzma_free(coder->lz.coder, allocator); lzma_free(coder, allocator); return; } static lzma_ret lz_encoder_update(void *coder_ptr, const lzma_allocator *allocator, const lzma_filter *filters_null lzma_attribute((__unused__)), const lzma_filter *reversed_filters) { lzma_coder *coder = coder_ptr; if (coder->lz.options_update == NULL) return LZMA_PROG_ERROR; return_if_error(coder->lz.options_update( coder->lz.coder, reversed_filters)); return lzma_next_filter_update( &coder->next, allocator, reversed_filters + 1); } static lzma_ret lz_encoder_set_out_limit(void *coder_ptr, uint64_t *uncomp_size, uint64_t out_limit) { lzma_coder *coder = coder_ptr; // This is supported only if there are no other filters chained. if (coder->next.code == NULL && coder->lz.set_out_limit != NULL) return coder->lz.set_out_limit( coder->lz.coder, uncomp_size, out_limit); return LZMA_OPTIONS_ERROR; } extern lzma_ret lzma_lz_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters, lzma_ret (*lz_init)(lzma_lz_encoder *lz, const lzma_allocator *allocator, lzma_vli id, const void *options, lzma_lz_options *lz_options)) { #if defined(HAVE_SMALL) && !defined(HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR) // The CRC32 table must be initialized. lzma_crc32_init(); #endif // Allocate and initialize the base data structure. lzma_coder *coder = next->coder; if (coder == NULL) { coder = lzma_alloc(sizeof(lzma_coder), allocator); if (coder == NULL) return LZMA_MEM_ERROR; next->coder = coder; next->code = &lz_encode; next->end = &lz_encoder_end; next->update = &lz_encoder_update; next->set_out_limit = &lz_encoder_set_out_limit; coder->lz.coder = NULL; coder->lz.code = NULL; coder->lz.end = NULL; coder->lz.options_update = NULL; coder->lz.set_out_limit = NULL; // mf.size is initialized to silence Valgrind // when used on optimized binaries (GCC may reorder // code in a way that Valgrind gets unhappy). coder->mf.buffer = NULL; coder->mf.size = 0; coder->mf.hash = NULL; coder->mf.son = NULL; coder->mf.hash_count = 0; coder->mf.sons_count = 0; coder->next = LZMA_NEXT_CODER_INIT; } // Initialize the LZ-based encoder. lzma_lz_options lz_options; return_if_error(lz_init(&coder->lz, allocator, filters[0].id, filters[0].options, &lz_options)); // Setup the size information into coder->mf and deallocate // old buffers if they have wrong size. if (lz_encoder_prepare(&coder->mf, allocator, &lz_options)) return LZMA_OPTIONS_ERROR; // Allocate new buffers if needed, and do the rest of // the initialization. if (lz_encoder_init(&coder->mf, allocator, &lz_options)) return LZMA_MEM_ERROR; // Initialize the next filter in the chain, if any. return lzma_next_filter_init(&coder->next, allocator, filters + 1); } extern LZMA_API(lzma_bool) lzma_mf_is_supported(lzma_match_finder mf) { switch (mf) { #ifdef HAVE_MF_HC3 case LZMA_MF_HC3: return true; #endif #ifdef HAVE_MF_HC4 case LZMA_MF_HC4: return true; #endif #ifdef HAVE_MF_BT2 case LZMA_MF_BT2: return true; #endif #ifdef HAVE_MF_BT3 case LZMA_MF_BT3: return true; #endif #ifdef HAVE_MF_BT4 case LZMA_MF_BT4: return true; #endif default: return false; } } diff --git a/src/liblzma/lz/lz_encoder_hash.h b/src/liblzma/lz/lz_encoder_hash.h index 8ace82b04c51..6d4bf837fd16 100644 --- a/src/liblzma/lz/lz_encoder_hash.h +++ b/src/liblzma/lz/lz_encoder_hash.h @@ -1,108 +1,122 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file lz_encoder_hash.h /// \brief Hash macros for match finders // -// Author: Igor Pavlov +// Authors: Igor Pavlov +// Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #ifndef LZMA_LZ_ENCODER_HASH_H #define LZMA_LZ_ENCODER_HASH_H -#if defined(WORDS_BIGENDIAN) && !defined(HAVE_SMALL) - // This is to make liblzma produce the same output on big endian - // systems that it does on little endian systems. lz_encoder.c - // takes care of including the actual table. +// We need to know if CRC32_GENERIC is defined and we may need the declaration +// of lzma_crc32_table[][]. +#include "crc_common.h" + +// If HAVE_SMALL is defined, then lzma_crc32_table[][] exists and +// it's little endian even on big endian systems. +// +// If HAVE_SMALL isn't defined, lzma_crc32_table[][] is in native endian +// but we want a little endian one so that the compressed output won't +// depend on the processor endianness. Big endian systems are less common +// so those get the burden of an extra 1 KiB table. +// +// If HAVE_SMALL isn't defined and CRC32_GENERIC isn't defined either, +// then lzma_crc32_table[][] doesn't exist. +#if defined(HAVE_SMALL) \ + || (defined(CRC32_GENERIC) && !defined(WORDS_BIGENDIAN)) +# define hash_table lzma_crc32_table[0] +#else + // lz_encoder.c takes care of including the actual table. lzma_attr_visibility_hidden extern const uint32_t lzma_lz_hash_table[256]; # define hash_table lzma_lz_hash_table -#else -# include "check.h" -# define hash_table lzma_crc32_table[0] +# define LZMA_LZ_HASH_TABLE_IS_NEEDED 1 #endif #define HASH_2_SIZE (UINT32_C(1) << 10) #define HASH_3_SIZE (UINT32_C(1) << 16) #define HASH_4_SIZE (UINT32_C(1) << 20) #define HASH_2_MASK (HASH_2_SIZE - 1) #define HASH_3_MASK (HASH_3_SIZE - 1) #define HASH_4_MASK (HASH_4_SIZE - 1) #define FIX_3_HASH_SIZE (HASH_2_SIZE) #define FIX_4_HASH_SIZE (HASH_2_SIZE + HASH_3_SIZE) #define FIX_5_HASH_SIZE (HASH_2_SIZE + HASH_3_SIZE + HASH_4_SIZE) // Endianness doesn't matter in hash_2_calc() (no effect on the output). #ifdef TUKLIB_FAST_UNALIGNED_ACCESS # define hash_2_calc() \ const uint32_t hash_value = read16ne(cur) #else # define hash_2_calc() \ const uint32_t hash_value \ = (uint32_t)(cur[0]) | ((uint32_t)(cur[1]) << 8) #endif #define hash_3_calc() \ const uint32_t temp = hash_table[cur[0]] ^ cur[1]; \ const uint32_t hash_2_value = temp & HASH_2_MASK; \ const uint32_t hash_value \ = (temp ^ ((uint32_t)(cur[2]) << 8)) & mf->hash_mask #define hash_4_calc() \ const uint32_t temp = hash_table[cur[0]] ^ cur[1]; \ const uint32_t hash_2_value = temp & HASH_2_MASK; \ const uint32_t hash_3_value \ = (temp ^ ((uint32_t)(cur[2]) << 8)) & HASH_3_MASK; \ const uint32_t hash_value = (temp ^ ((uint32_t)(cur[2]) << 8) \ ^ (hash_table[cur[3]] << 5)) & mf->hash_mask // The following are not currently used. #define hash_5_calc() \ const uint32_t temp = hash_table[cur[0]] ^ cur[1]; \ const uint32_t hash_2_value = temp & HASH_2_MASK; \ const uint32_t hash_3_value \ = (temp ^ ((uint32_t)(cur[2]) << 8)) & HASH_3_MASK; \ uint32_t hash_4_value = (temp ^ ((uint32_t)(cur[2]) << 8) ^ \ ^ hash_table[cur[3]] << 5); \ const uint32_t hash_value \ = (hash_4_value ^ (hash_table[cur[4]] << 3)) \ & mf->hash_mask; \ hash_4_value &= HASH_4_MASK /* #define hash_zip_calc() \ const uint32_t hash_value \ = (((uint32_t)(cur[0]) | ((uint32_t)(cur[1]) << 8)) \ ^ hash_table[cur[2]]) & 0xFFFF */ #define hash_zip_calc() \ const uint32_t hash_value \ = (((uint32_t)(cur[2]) | ((uint32_t)(cur[0]) << 8)) \ ^ hash_table[cur[1]]) & 0xFFFF #define mt_hash_2_calc() \ const uint32_t hash_2_value \ = (hash_table[cur[0]] ^ cur[1]) & HASH_2_MASK #define mt_hash_3_calc() \ const uint32_t temp = hash_table[cur[0]] ^ cur[1]; \ const uint32_t hash_2_value = temp & HASH_2_MASK; \ const uint32_t hash_3_value \ = (temp ^ ((uint32_t)(cur[2]) << 8)) & HASH_3_MASK #define mt_hash_4_calc() \ const uint32_t temp = hash_table[cur[0]] ^ cur[1]; \ const uint32_t hash_2_value = temp & HASH_2_MASK; \ const uint32_t hash_3_value \ = (temp ^ ((uint32_t)(cur[2]) << 8)) & HASH_3_MASK; \ const uint32_t hash_4_value = (temp ^ ((uint32_t)(cur[2]) << 8) ^ \ (hash_table[cur[3]] << 5)) & HASH_4_MASK #endif diff --git a/src/liblzma/lzma/lzma2_encoder.c b/src/liblzma/lzma/lzma2_encoder.c index e20b75b30037..71cfd9b4114e 100644 --- a/src/liblzma/lzma/lzma2_encoder.c +++ b/src/liblzma/lzma/lzma2_encoder.c @@ -1,416 +1,413 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file lzma2_encoder.c /// \brief LZMA2 encoder /// // Authors: Igor Pavlov // Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "lz_encoder.h" #include "lzma_encoder.h" #include "fastpos.h" #include "lzma2_encoder.h" typedef struct { enum { SEQ_INIT, SEQ_LZMA_ENCODE, SEQ_LZMA_COPY, SEQ_UNCOMPRESSED_HEADER, SEQ_UNCOMPRESSED_COPY, } sequence; /// LZMA encoder void *lzma; /// LZMA options currently in use. lzma_options_lzma opt_cur; bool need_properties; bool need_state_reset; bool need_dictionary_reset; /// Uncompressed size of a chunk size_t uncompressed_size; /// Compressed size of a chunk (excluding headers); this is also used /// to indicate the end of buf[] in SEQ_LZMA_COPY. size_t compressed_size; /// Read position in buf[] size_t buf_pos; /// Buffer to hold the chunk header and LZMA compressed data uint8_t buf[LZMA2_HEADER_MAX + LZMA2_CHUNK_MAX]; } lzma_lzma2_coder; static void lzma2_header_lzma(lzma_lzma2_coder *coder) { assert(coder->uncompressed_size > 0); assert(coder->uncompressed_size <= LZMA2_UNCOMPRESSED_MAX); assert(coder->compressed_size > 0); assert(coder->compressed_size <= LZMA2_CHUNK_MAX); size_t pos; if (coder->need_properties) { pos = 0; if (coder->need_dictionary_reset) coder->buf[pos] = 0x80 + (3 << 5); else coder->buf[pos] = 0x80 + (2 << 5); } else { pos = 1; if (coder->need_state_reset) coder->buf[pos] = 0x80 + (1 << 5); else coder->buf[pos] = 0x80; } // Set the start position for copying. coder->buf_pos = pos; // Uncompressed size size_t size = coder->uncompressed_size - 1; coder->buf[pos++] += size >> 16; coder->buf[pos++] = (size >> 8) & 0xFF; coder->buf[pos++] = size & 0xFF; // Compressed size size = coder->compressed_size - 1; coder->buf[pos++] = size >> 8; coder->buf[pos++] = size & 0xFF; // Properties, if needed if (coder->need_properties) lzma_lzma_lclppb_encode(&coder->opt_cur, coder->buf + pos); coder->need_properties = false; coder->need_state_reset = false; coder->need_dictionary_reset = false; // The copying code uses coder->compressed_size to indicate the end // of coder->buf[], so we need add the maximum size of the header here. coder->compressed_size += LZMA2_HEADER_MAX; return; } static void lzma2_header_uncompressed(lzma_lzma2_coder *coder) { assert(coder->uncompressed_size > 0); assert(coder->uncompressed_size <= LZMA2_CHUNK_MAX); // If this is the first chunk, we need to include dictionary // reset indicator. if (coder->need_dictionary_reset) coder->buf[0] = 1; else coder->buf[0] = 2; coder->need_dictionary_reset = false; // "Compressed" size coder->buf[1] = (coder->uncompressed_size - 1) >> 8; coder->buf[2] = (coder->uncompressed_size - 1) & 0xFF; // Set the start position for copying. coder->buf_pos = 0; return; } static lzma_ret lzma2_encode(void *coder_ptr, lzma_mf *restrict mf, uint8_t *restrict out, size_t *restrict out_pos, size_t out_size) { lzma_lzma2_coder *restrict coder = coder_ptr; while (*out_pos < out_size) switch (coder->sequence) { case SEQ_INIT: // If there's no input left and we are flushing or finishing, // don't start a new chunk. if (mf_unencoded(mf) == 0) { // Write end of payload marker if finishing. if (mf->action == LZMA_FINISH) out[(*out_pos)++] = 0; return mf->action == LZMA_RUN ? LZMA_OK : LZMA_STREAM_END; } if (coder->need_state_reset) return_if_error(lzma_lzma_encoder_reset( coder->lzma, &coder->opt_cur)); coder->uncompressed_size = 0; coder->compressed_size = 0; coder->sequence = SEQ_LZMA_ENCODE; - - // Fall through + FALLTHROUGH; case SEQ_LZMA_ENCODE: { // Calculate how much more uncompressed data this chunk // could accept. const uint32_t left = LZMA2_UNCOMPRESSED_MAX - coder->uncompressed_size; uint32_t limit; if (left < mf->match_len_max) { // Must flush immediately since the next LZMA symbol // could make the uncompressed size of the chunk too // big. limit = 0; } else { // Calculate maximum read_limit that is OK from point // of view of LZMA2 chunk size. limit = mf->read_pos - mf->read_ahead + left - mf->match_len_max; } // Save the start position so that we can update // coder->uncompressed_size. const uint32_t read_start = mf->read_pos - mf->read_ahead; // Call the LZMA encoder until the chunk is finished. const lzma_ret ret = lzma_lzma_encode(coder->lzma, mf, coder->buf + LZMA2_HEADER_MAX, &coder->compressed_size, LZMA2_CHUNK_MAX, limit); coder->uncompressed_size += mf->read_pos - mf->read_ahead - read_start; assert(coder->compressed_size <= LZMA2_CHUNK_MAX); assert(coder->uncompressed_size <= LZMA2_UNCOMPRESSED_MAX); if (ret != LZMA_STREAM_END) return LZMA_OK; // See if the chunk compressed. If it didn't, we encode it // as uncompressed chunk. This saves a few bytes of space // and makes decoding faster. if (coder->compressed_size >= coder->uncompressed_size) { coder->uncompressed_size += mf->read_ahead; assert(coder->uncompressed_size <= LZMA2_UNCOMPRESSED_MAX); mf->read_ahead = 0; lzma2_header_uncompressed(coder); coder->need_state_reset = true; coder->sequence = SEQ_UNCOMPRESSED_HEADER; break; } // The chunk did compress at least by one byte, so we store // the chunk as LZMA. lzma2_header_lzma(coder); coder->sequence = SEQ_LZMA_COPY; + FALLTHROUGH; } - // Fall through - case SEQ_LZMA_COPY: // Copy the compressed chunk along its headers to the // output buffer. lzma_bufcpy(coder->buf, &coder->buf_pos, coder->compressed_size, out, out_pos, out_size); if (coder->buf_pos != coder->compressed_size) return LZMA_OK; coder->sequence = SEQ_INIT; break; case SEQ_UNCOMPRESSED_HEADER: // Copy the three-byte header to indicate uncompressed chunk. lzma_bufcpy(coder->buf, &coder->buf_pos, LZMA2_HEADER_UNCOMPRESSED, out, out_pos, out_size); if (coder->buf_pos != LZMA2_HEADER_UNCOMPRESSED) return LZMA_OK; coder->sequence = SEQ_UNCOMPRESSED_COPY; - - // Fall through + FALLTHROUGH; case SEQ_UNCOMPRESSED_COPY: // Copy the uncompressed data as is from the dictionary // to the output buffer. mf_read(mf, out, out_pos, out_size, &coder->uncompressed_size); if (coder->uncompressed_size != 0) return LZMA_OK; coder->sequence = SEQ_INIT; break; } return LZMA_OK; } static void lzma2_encoder_end(void *coder_ptr, const lzma_allocator *allocator) { lzma_lzma2_coder *coder = coder_ptr; lzma_free(coder->lzma, allocator); lzma_free(coder, allocator); return; } static lzma_ret lzma2_encoder_options_update(void *coder_ptr, const lzma_filter *filter) { lzma_lzma2_coder *coder = coder_ptr; // New options can be set only when there is no incomplete chunk. // This is the case at the beginning of the raw stream and right // after LZMA_SYNC_FLUSH. if (filter->options == NULL || coder->sequence != SEQ_INIT) return LZMA_PROG_ERROR; // Look if there are new options. At least for now, // only lc/lp/pb can be changed. const lzma_options_lzma *opt = filter->options; if (coder->opt_cur.lc != opt->lc || coder->opt_cur.lp != opt->lp || coder->opt_cur.pb != opt->pb) { // Validate the options. if (opt->lc > LZMA_LCLP_MAX || opt->lp > LZMA_LCLP_MAX || opt->lc + opt->lp > LZMA_LCLP_MAX || opt->pb > LZMA_PB_MAX) return LZMA_OPTIONS_ERROR; // The new options will be used when the encoder starts // a new LZMA2 chunk. coder->opt_cur.lc = opt->lc; coder->opt_cur.lp = opt->lp; coder->opt_cur.pb = opt->pb; coder->need_properties = true; coder->need_state_reset = true; } return LZMA_OK; } static lzma_ret lzma2_encoder_init(lzma_lz_encoder *lz, const lzma_allocator *allocator, lzma_vli id lzma_attribute((__unused__)), const void *options, lzma_lz_options *lz_options) { if (options == NULL) return LZMA_PROG_ERROR; lzma_lzma2_coder *coder = lz->coder; if (coder == NULL) { coder = lzma_alloc(sizeof(lzma_lzma2_coder), allocator); if (coder == NULL) return LZMA_MEM_ERROR; lz->coder = coder; lz->code = &lzma2_encode; lz->end = &lzma2_encoder_end; lz->options_update = &lzma2_encoder_options_update; coder->lzma = NULL; } coder->opt_cur = *(const lzma_options_lzma *)(options); coder->sequence = SEQ_INIT; coder->need_properties = true; coder->need_state_reset = false; coder->need_dictionary_reset = coder->opt_cur.preset_dict == NULL || coder->opt_cur.preset_dict_size == 0; // Initialize LZMA encoder return_if_error(lzma_lzma_encoder_create(&coder->lzma, allocator, LZMA_FILTER_LZMA2, &coder->opt_cur, lz_options)); // Make sure that we will always have enough history available in // case we need to use uncompressed chunks. They are used when the // compressed size of a chunk is not smaller than the uncompressed // size, so we need to have at least LZMA2_COMPRESSED_MAX bytes // history available. if (lz_options->before_size + lz_options->dict_size < LZMA2_CHUNK_MAX) lz_options->before_size = LZMA2_CHUNK_MAX - lz_options->dict_size; return LZMA_OK; } extern lzma_ret lzma_lzma2_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters) { return lzma_lz_encoder_init( next, allocator, filters, &lzma2_encoder_init); } extern uint64_t lzma_lzma2_encoder_memusage(const void *options) { const uint64_t lzma_mem = lzma_lzma_encoder_memusage(options); if (lzma_mem == UINT64_MAX) return UINT64_MAX; return sizeof(lzma_lzma2_coder) + lzma_mem; } extern lzma_ret lzma_lzma2_props_encode(const void *options, uint8_t *out) { if (options == NULL) return LZMA_PROG_ERROR; const lzma_options_lzma *const opt = options; uint32_t d = my_max(opt->dict_size, LZMA_DICT_SIZE_MIN); // Round up to the next 2^n - 1 or 2^n + 2^(n - 1) - 1 depending // on which one is the next: --d; d |= d >> 2; d |= d >> 3; d |= d >> 4; d |= d >> 8; d |= d >> 16; // Get the highest two bits using the proper encoding: if (d == UINT32_MAX) out[0] = 40; else out[0] = get_dist_slot(d + 1) - 24; return LZMA_OK; } extern uint64_t lzma_lzma2_block_size(const void *options) { const lzma_options_lzma *const opt = options; if (!IS_ENC_DICT_SIZE_VALID(opt->dict_size)) return UINT64_MAX; // Use at least 1 MiB to keep compression ratio better. return my_max((uint64_t)(opt->dict_size) * 3, UINT64_C(1) << 20); } diff --git a/src/liblzma/lzma/lzma_decoder.c b/src/liblzma/lzma/lzma_decoder.c index 0abed02b8154..2088a2faa54e 100644 --- a/src/liblzma/lzma/lzma_decoder.c +++ b/src/liblzma/lzma/lzma_decoder.c @@ -1,1263 +1,1263 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file lzma_decoder.c /// \brief LZMA decoder /// // Authors: Igor Pavlov // Lasse Collin // Jia Tan // /////////////////////////////////////////////////////////////////////////////// #include "lz_decoder.h" #include "lzma_common.h" #include "lzma_decoder.h" #include "range_decoder.h" // The macros unroll loops with switch statements. // Silence warnings about missing fall-through comments. -#if TUKLIB_GNUC_REQ(7, 0) +#if TUKLIB_GNUC_REQ(7, 0) || defined(__clang__) # pragma GCC diagnostic ignored "-Wimplicit-fallthrough" #endif // Minimum number of input bytes to safely decode one LZMA symbol. // The worst case is that we decode 22 bits using probabilities and 26 // direct bits. This may decode at maximum 20 bytes of input. #define LZMA_IN_REQUIRED 20 // Macros for (somewhat) size-optimized code. // This is used to decode the match length (how many bytes must be repeated // from the dictionary). This version is used in the Resumable mode and // does not unroll any loops. #define len_decode(target, ld, pos_state, seq) \ do { \ case seq ## _CHOICE: \ rc_if_0_safe(ld.choice, seq ## _CHOICE) { \ rc_update_0(ld.choice); \ probs = ld.low[pos_state];\ limit = LEN_LOW_SYMBOLS; \ target = MATCH_LEN_MIN; \ } else { \ rc_update_1(ld.choice); \ case seq ## _CHOICE2: \ rc_if_0_safe(ld.choice2, seq ## _CHOICE2) { \ rc_update_0(ld.choice2); \ probs = ld.mid[pos_state]; \ limit = LEN_MID_SYMBOLS; \ target = MATCH_LEN_MIN + LEN_LOW_SYMBOLS; \ } else { \ rc_update_1(ld.choice2); \ probs = ld.high; \ limit = LEN_HIGH_SYMBOLS; \ target = MATCH_LEN_MIN + LEN_LOW_SYMBOLS \ + LEN_MID_SYMBOLS; \ } \ } \ symbol = 1; \ case seq ## _BITTREE: \ do { \ rc_bit_safe(probs[symbol], , , seq ## _BITTREE); \ } while (symbol < limit); \ target += symbol - limit; \ } while (0) // This is the faster version of the match length decoder that does not // worry about being resumable. It unrolls the bittree decoding loop. #define len_decode_fast(target, ld, pos_state) \ do { \ symbol = 1; \ rc_if_0(ld.choice) { \ rc_update_0(ld.choice); \ rc_bittree3(ld.low[pos_state], \ -LEN_LOW_SYMBOLS + MATCH_LEN_MIN); \ target = symbol; \ } else { \ rc_update_1(ld.choice); \ rc_if_0(ld.choice2) { \ rc_update_0(ld.choice2); \ rc_bittree3(ld.mid[pos_state], -LEN_MID_SYMBOLS \ + MATCH_LEN_MIN + LEN_LOW_SYMBOLS); \ target = symbol; \ } else { \ rc_update_1(ld.choice2); \ rc_bittree8(ld.high, -LEN_HIGH_SYMBOLS \ + MATCH_LEN_MIN \ + LEN_LOW_SYMBOLS + LEN_MID_SYMBOLS); \ target = symbol; \ } \ } \ } while (0) /// Length decoder probabilities; see comments in lzma_common.h. typedef struct { probability choice; probability choice2; probability low[POS_STATES_MAX][LEN_LOW_SYMBOLS]; probability mid[POS_STATES_MAX][LEN_MID_SYMBOLS]; probability high[LEN_HIGH_SYMBOLS]; } lzma_length_decoder; typedef struct { /////////////////// // Probabilities // /////////////////// /// Literals; see comments in lzma_common.h. probability literal[LITERAL_CODERS_MAX * LITERAL_CODER_SIZE]; /// If 1, it's a match. Otherwise it's a single 8-bit literal. probability is_match[STATES][POS_STATES_MAX]; /// If 1, it's a repeated match. The distance is one of rep0 .. rep3. probability is_rep[STATES]; /// If 0, distance of a repeated match is rep0. /// Otherwise check is_rep1. probability is_rep0[STATES]; /// If 0, distance of a repeated match is rep1. /// Otherwise check is_rep2. probability is_rep1[STATES]; /// If 0, distance of a repeated match is rep2. Otherwise it is rep3. probability is_rep2[STATES]; /// If 1, the repeated match has length of one byte. Otherwise /// the length is decoded from rep_len_decoder. probability is_rep0_long[STATES][POS_STATES_MAX]; /// Probability tree for the highest two bits of the match distance. /// There is a separate probability tree for match lengths of /// 2 (i.e. MATCH_LEN_MIN), 3, 4, and [5, 273]. probability dist_slot[DIST_STATES][DIST_SLOTS]; /// Probability trees for additional bits for match distance when the /// distance is in the range [4, 127]. probability pos_special[FULL_DISTANCES - DIST_MODEL_END]; /// Probability tree for the lowest four bits of a match distance /// that is equal to or greater than 128. probability pos_align[ALIGN_SIZE]; /// Length of a normal match lzma_length_decoder match_len_decoder; /// Length of a repeated match lzma_length_decoder rep_len_decoder; /////////////////// // Decoder state // /////////////////// // Range coder lzma_range_decoder rc; // Types of the most recently seen LZMA symbols lzma_lzma_state state; uint32_t rep0; ///< Distance of the latest match uint32_t rep1; ///< Distance of second latest match uint32_t rep2; ///< Distance of third latest match uint32_t rep3; ///< Distance of fourth latest match uint32_t pos_mask; // (1U << pb) - 1 uint32_t literal_context_bits; uint32_t literal_mask; /// Uncompressed size as bytes, or LZMA_VLI_UNKNOWN if end of /// payload marker is expected. lzma_vli uncompressed_size; /// True if end of payload marker (EOPM) is allowed even when /// uncompressed_size is known; false if EOPM must not be present. /// This is ignored if uncompressed_size == LZMA_VLI_UNKNOWN. bool allow_eopm; //////////////////////////////// // State of incomplete symbol // //////////////////////////////// /// Position where to continue the decoder loop enum { SEQ_NORMALIZE, SEQ_IS_MATCH, SEQ_LITERAL, SEQ_LITERAL_MATCHED, SEQ_LITERAL_WRITE, SEQ_IS_REP, SEQ_MATCH_LEN_CHOICE, SEQ_MATCH_LEN_CHOICE2, SEQ_MATCH_LEN_BITTREE, SEQ_DIST_SLOT, SEQ_DIST_MODEL, SEQ_DIRECT, SEQ_ALIGN, SEQ_EOPM, SEQ_IS_REP0, SEQ_SHORTREP, SEQ_IS_REP0_LONG, SEQ_IS_REP1, SEQ_IS_REP2, SEQ_REP_LEN_CHOICE, SEQ_REP_LEN_CHOICE2, SEQ_REP_LEN_BITTREE, SEQ_COPY, } sequence; /// Base of the current probability tree probability *probs; /// Symbol being decoded. This is also used as an index variable in /// bittree decoders: probs[symbol] uint32_t symbol; /// Used as a loop termination condition on bittree decoders and /// direct bits decoder. uint32_t limit; /// Matched literal decoder: 0x100 or 0 to help avoiding branches. /// Bittree reverse decoders: Offset of the next bit: 1 << offset uint32_t offset; /// If decoding a literal: match byte. /// If decoding a match: length of the match. uint32_t len; } lzma_lzma1_decoder; static lzma_ret lzma_decode(void *coder_ptr, lzma_dict *restrict dictptr, const uint8_t *restrict in, size_t *restrict in_pos, size_t in_size) { lzma_lzma1_decoder *restrict coder = coder_ptr; //////////////////// // Initialization // //////////////////// { const lzma_ret ret = rc_read_init( &coder->rc, in, in_pos, in_size); if (ret != LZMA_STREAM_END) return ret; } /////////////// // Variables // /////////////// // Making local copies of often-used variables improves both // speed and readability. lzma_dict dict = *dictptr; const size_t dict_start = dict.pos; // Range decoder rc_to_local(coder->rc, *in_pos, LZMA_IN_REQUIRED); // State uint32_t state = coder->state; uint32_t rep0 = coder->rep0; uint32_t rep1 = coder->rep1; uint32_t rep2 = coder->rep2; uint32_t rep3 = coder->rep3; const uint32_t pos_mask = coder->pos_mask; // These variables are actually needed only if we last time ran // out of input in the middle of the decoder loop. probability *probs = coder->probs; uint32_t symbol = coder->symbol; uint32_t limit = coder->limit; uint32_t offset = coder->offset; uint32_t len = coder->len; const uint32_t literal_mask = coder->literal_mask; const uint32_t literal_context_bits = coder->literal_context_bits; // Temporary variables uint32_t pos_state = dict.pos & pos_mask; lzma_ret ret = LZMA_OK; // This is true when the next LZMA symbol is allowed to be EOPM. // That is, if this is false, then EOPM is considered // an invalid symbol and we will return LZMA_DATA_ERROR. // // EOPM is always required (not just allowed) when // the uncompressed size isn't known. When uncompressed size // is known, eopm_is_valid may be set to true later. bool eopm_is_valid = coder->uncompressed_size == LZMA_VLI_UNKNOWN; // If uncompressed size is known and there is enough output space // to decode all the data, limit the available buffer space so that // the main loop won't try to decode past the end of the stream. bool might_finish_without_eopm = false; if (coder->uncompressed_size != LZMA_VLI_UNKNOWN && coder->uncompressed_size <= dict.limit - dict.pos) { dict.limit = dict.pos + (size_t)(coder->uncompressed_size); might_finish_without_eopm = true; } // The main decoder loop. The "switch" is used to resume the decoder at // correct location. Once resumed, the "switch" is no longer used. // The decoder loops is split into two modes: // // 1 - Non-resumable mode (fast). This is used when it is guaranteed // there is enough input to decode the next symbol. If the output // limit is reached, then the decoder loop will save the place // for the resumable mode to continue. This mode is not used if // HAVE_SMALL is defined. This is faster than Resumable mode // because it reduces the number of branches needed and allows // for more compiler optimizations. // // 2 - Resumable mode (slow). This is used when a previous decoder // loop did not have enough space in the input or output buffers // to complete. It uses sequence enum values to set remind // coder->sequence where to resume in the decoder loop. This // is the only mode used when HAVE_SMALL is defined. switch (coder->sequence) while (true) { // Calculate new pos_state. This is skipped on the first loop // since we already calculated it when setting up the local // variables. pos_state = dict.pos & pos_mask; #ifndef HAVE_SMALL /////////////////////////////// // Non-resumable Mode (fast) // /////////////////////////////// // Go to Resumable mode (1) if there is not enough input to // safely decode any possible LZMA symbol or (2) if the // dictionary is full, which may need special checks that // are only done in the Resumable mode. if (unlikely(!rc_is_fast_allowed() || dict.pos == dict.limit)) goto slow; // Decode the first bit from the next LZMA symbol. // If the bit is a 0, then we handle it as a literal. // If the bit is a 1, then it is a match of previously // decoded data. rc_if_0(coder->is_match[state][pos_state]) { ///////////////////// // Decode literal. // ///////////////////// // Update the RC that we have decoded a 0. rc_update_0(coder->is_match[state][pos_state]); // Get the correct probability array from lp and // lc params. probs = literal_subcoder(coder->literal, literal_context_bits, literal_mask, dict.pos, dict_get0(&dict)); if (is_literal_state(state)) { update_literal_normal(state); // Decode literal without match byte. rc_bittree8(probs, 0); } else { update_literal_matched(state); // Decode literal with match byte. rc_matched_literal(probs, dict_get(&dict, rep0)); } // Write decoded literal to dictionary dict_put(&dict, symbol); continue; } /////////////////// // Decode match. // /////////////////// // Instead of a new byte we are going to decode a // distance-length pair. The distance represents how far // back in the dictionary to begin copying. The length // represents how many bytes to copy. rc_update_1(coder->is_match[state][pos_state]); rc_if_0(coder->is_rep[state]) { /////////////////// // Simple match. // /////////////////// // Not a repeated match. In this case, // the length (how many bytes to copy) must be // decoded first. Then, the distance (where to // start copying) is decoded. // // This is also how we know when we are done // decoding. If the distance decodes to UINT32_MAX, // then we know to stop decoding (end of payload // marker). rc_update_0(coder->is_rep[state]); update_match(state); // The latest three match distances are kept in // memory in case there are repeated matches. rep3 = rep2; rep2 = rep1; rep1 = rep0; // Decode the length of the match. len_decode_fast(len, coder->match_len_decoder, pos_state); // Next, decode the distance into rep0. // The next 6 bits determine how to decode the // rest of the distance. probs = coder->dist_slot[get_dist_state(len)]; rc_bittree6(probs, -DIST_SLOTS); assert(symbol <= 63); if (symbol < DIST_MODEL_START) { // If the decoded symbol is < DIST_MODEL_START // then we use its value directly as the // match distance. No other bits are needed. // The only possible distance values // are [0, 3]. rep0 = symbol; } else { // Use the first two bits of symbol as the // highest bits of the match distance. // "limit" represents the number of low bits // to decode. limit = (symbol >> 1) - 1; assert(limit >= 1 && limit <= 30); rep0 = 2 + (symbol & 1); if (symbol < DIST_MODEL_END) { // When symbol is > DIST_MODEL_START, // but symbol < DIST_MODEL_END, then // it can decode distances between // [4, 127]. assert(limit <= 5); rep0 <<= limit; assert(rep0 <= 96); // -1 is fine, because we start // decoding at probs[1], not probs[0]. // NOTE: This violates the C standard, // since we are doing pointer // arithmetic past the beginning of // the array. assert((int32_t)(rep0 - symbol - 1) >= -1); assert((int32_t)(rep0 - symbol - 1) <= 82); probs = coder->pos_special + rep0 - symbol - 1; symbol = 1; offset = 1; // Variable number (1-5) of bits // from a reverse bittree. This // isn't worth manual unrolling. // // NOTE: Making one or many of the // variables (probs, symbol, offset, // or limit) local here (instead of // using those declared outside the // main loop) can affect code size // and performance which isn't a // surprise but it's not so clear // what is the best. do { rc_bit_add_if_1(probs, rep0, offset); offset <<= 1; } while (--limit > 0); } else { // The distance is >= 128. Decode the // lower bits without probabilities // except the lowest four bits. assert(symbol >= 14); assert(limit >= 6); limit -= ALIGN_BITS; assert(limit >= 2); rc_direct(rep0, limit); // Decode the lowest four bits using // probabilities. rep0 <<= ALIGN_BITS; rc_bittree_rev4(coder->pos_align); rep0 += symbol; // If the end of payload marker (EOPM) // is detected, jump to the safe code. // The EOPM handling isn't speed // critical at all. // // A final normalization is needed // after the EOPM (there can be a // dummy byte to read in some cases). // If the normalization was done here // in the fast code, it would need to // be taken into account in the value // of LZMA_IN_REQUIRED. Using the // safe code allows keeping // LZMA_IN_REQUIRED as 20 instead of // 21. if (rep0 == UINT32_MAX) goto eopm; } } // Validate the distance we just decoded. if (unlikely(!dict_is_distance_valid(&dict, rep0))) { ret = LZMA_DATA_ERROR; goto out; } } else { rc_update_1(coder->is_rep[state]); ///////////////////// // Repeated match. // ///////////////////// // The match distance is a value that we have decoded // recently. The latest four match distances are // available as rep0, rep1, rep2 and rep3. We will // now decode which of them is the new distance. // // There cannot be a match if we haven't produced // any output, so check that first. if (unlikely(!dict_is_distance_valid(&dict, 0))) { ret = LZMA_DATA_ERROR; goto out; } rc_if_0(coder->is_rep0[state]) { rc_update_0(coder->is_rep0[state]); // The distance is rep0. // Decode the next bit to determine if 1 byte // should be copied from rep0 distance or // if the number of bytes needs to be decoded. // If the next bit is 0, then it is a // "Short Rep Match" and only 1 bit is copied. // Otherwise, the length of the match is // decoded after the "else" statement. rc_if_0(coder->is_rep0_long[state][pos_state]) { rc_update_0(coder->is_rep0_long[ state][pos_state]); update_short_rep(state); dict_put(&dict, dict_get(&dict, rep0)); continue; } // Repeating more than one byte at // distance of rep0. rc_update_1(coder->is_rep0_long[ state][pos_state]); } else { rc_update_1(coder->is_rep0[state]); // The distance is rep1, rep2 or rep3. Once // we find out which one of these three, it // is stored to rep0 and rep1, rep2 and rep3 // are updated accordingly. There is no // "Short Rep Match" option, so the length // of the match must always be decoded next. rc_if_0(coder->is_rep1[state]) { // The distance is rep1. rc_update_0(coder->is_rep1[state]); const uint32_t distance = rep1; rep1 = rep0; rep0 = distance; } else { rc_update_1(coder->is_rep1[state]); rc_if_0(coder->is_rep2[state]) { // The distance is rep2. rc_update_0(coder->is_rep2[ state]); const uint32_t distance = rep2; rep2 = rep1; rep1 = rep0; rep0 = distance; } else { // The distance is rep3. rc_update_1(coder->is_rep2[ state]); const uint32_t distance = rep3; rep3 = rep2; rep2 = rep1; rep1 = rep0; rep0 = distance; } } } update_long_rep(state); // Decode the length of the repeated match. len_decode_fast(len, coder->rep_len_decoder, pos_state); } ///////////////////////////////// // Repeat from history buffer. // ///////////////////////////////// // The length is always between these limits. There is no way // to trigger the algorithm to set len outside this range. assert(len >= MATCH_LEN_MIN); assert(len <= MATCH_LEN_MAX); // Repeat len bytes from distance of rep0. if (unlikely(dict_repeat(&dict, rep0, &len))) { coder->sequence = SEQ_COPY; goto out; } continue; slow: #endif /////////////////////////// // Resumable Mode (slow) // /////////////////////////// // This is very similar to Non-resumable Mode, so most of the // comments are not repeated. The main differences are: // - case labels are used to resume at the correct location. // - Loops are not unrolled. // - Range coder macros take an extra sequence argument // so they can save to coder->sequence the location to // resume in case there is not enough input. case SEQ_NORMALIZE: case SEQ_IS_MATCH: if (unlikely(might_finish_without_eopm && dict.pos == dict.limit)) { // In rare cases there is a useless byte that needs // to be read anyway. rc_normalize_safe(SEQ_NORMALIZE); // If the range decoder state is such that we can // be at the end of the LZMA stream, then the // decoding is finished. if (rc_is_finished(rc)) { ret = LZMA_STREAM_END; goto out; } // If the caller hasn't allowed EOPM to be present // together with known uncompressed size, then the // LZMA stream is corrupt. if (!coder->allow_eopm) { ret = LZMA_DATA_ERROR; goto out; } // Otherwise continue decoding with the expectation // that the next LZMA symbol is EOPM. eopm_is_valid = true; } rc_if_0_safe(coder->is_match[state][pos_state], SEQ_IS_MATCH) { ///////////////////// // Decode literal. // ///////////////////// rc_update_0(coder->is_match[state][pos_state]); probs = literal_subcoder(coder->literal, literal_context_bits, literal_mask, dict.pos, dict_get0(&dict)); symbol = 1; if (is_literal_state(state)) { update_literal_normal(state); // Decode literal without match byte. // The "slow" version does not unroll // the loop. case SEQ_LITERAL: do { rc_bit_safe(probs[symbol], , , SEQ_LITERAL); } while (symbol < (1 << 8)); } else { update_literal_matched(state); // Decode literal with match byte. len = (uint32_t)(dict_get(&dict, rep0)) << 1; offset = 0x100; case SEQ_LITERAL_MATCHED: do { const uint32_t match_bit = len & offset; const uint32_t subcoder_index = offset + match_bit + symbol; rc_bit_safe(probs[subcoder_index], offset &= ~match_bit, offset &= match_bit, SEQ_LITERAL_MATCHED); // It seems to be faster to do this // here instead of putting it to the // beginning of the loop and then // putting the "case" in the middle // of the loop. len <<= 1; } while (symbol < (1 << 8)); } case SEQ_LITERAL_WRITE: if (dict_put_safe(&dict, symbol)) { coder->sequence = SEQ_LITERAL_WRITE; goto out; } continue; } /////////////////// // Decode match. // /////////////////// rc_update_1(coder->is_match[state][pos_state]); case SEQ_IS_REP: rc_if_0_safe(coder->is_rep[state], SEQ_IS_REP) { /////////////////// // Simple match. // /////////////////// rc_update_0(coder->is_rep[state]); update_match(state); rep3 = rep2; rep2 = rep1; rep1 = rep0; len_decode(len, coder->match_len_decoder, pos_state, SEQ_MATCH_LEN); probs = coder->dist_slot[get_dist_state(len)]; symbol = 1; case SEQ_DIST_SLOT: do { rc_bit_safe(probs[symbol], , , SEQ_DIST_SLOT); } while (symbol < DIST_SLOTS); symbol -= DIST_SLOTS; assert(symbol <= 63); if (symbol < DIST_MODEL_START) { rep0 = symbol; } else { limit = (symbol >> 1) - 1; assert(limit >= 1 && limit <= 30); rep0 = 2 + (symbol & 1); if (symbol < DIST_MODEL_END) { assert(limit <= 5); rep0 <<= limit; assert(rep0 <= 96); // -1 is fine, because we start // decoding at probs[1], not probs[0]. // NOTE: This violates the C standard, // since we are doing pointer // arithmetic past the beginning of // the array. assert((int32_t)(rep0 - symbol - 1) >= -1); assert((int32_t)(rep0 - symbol - 1) <= 82); probs = coder->pos_special + rep0 - symbol - 1; symbol = 1; offset = 0; case SEQ_DIST_MODEL: do { rc_bit_safe(probs[symbol], , rep0 += 1U << offset, SEQ_DIST_MODEL); } while (++offset < limit); } else { assert(symbol >= 14); assert(limit >= 6); limit -= ALIGN_BITS; assert(limit >= 2); case SEQ_DIRECT: rc_direct_safe(rep0, limit, SEQ_DIRECT); rep0 <<= ALIGN_BITS; symbol = 0; offset = 1; case SEQ_ALIGN: do { rc_bit_last_safe( coder->pos_align[ offset + symbol], , symbol += offset, SEQ_ALIGN); offset <<= 1; } while (offset < ALIGN_SIZE); rep0 += symbol; if (rep0 == UINT32_MAX) { // End of payload marker was // found. It may only be // present if // - uncompressed size is // unknown or // - after known uncompressed // size amount of bytes has // been decompressed and // caller has indicated // that EOPM might be used // (it's not allowed in // LZMA2). #ifndef HAVE_SMALL eopm: #endif if (!eopm_is_valid) { ret = LZMA_DATA_ERROR; goto out; } case SEQ_EOPM: // LZMA1 stream with // end-of-payload marker. rc_normalize_safe(SEQ_EOPM); ret = rc_is_finished(rc) ? LZMA_STREAM_END : LZMA_DATA_ERROR; goto out; } } } if (unlikely(!dict_is_distance_valid(&dict, rep0))) { ret = LZMA_DATA_ERROR; goto out; } } else { ///////////////////// // Repeated match. // ///////////////////// rc_update_1(coder->is_rep[state]); if (unlikely(!dict_is_distance_valid(&dict, 0))) { ret = LZMA_DATA_ERROR; goto out; } case SEQ_IS_REP0: rc_if_0_safe(coder->is_rep0[state], SEQ_IS_REP0) { rc_update_0(coder->is_rep0[state]); case SEQ_IS_REP0_LONG: rc_if_0_safe(coder->is_rep0_long [state][pos_state], SEQ_IS_REP0_LONG) { rc_update_0(coder->is_rep0_long[ state][pos_state]); update_short_rep(state); case SEQ_SHORTREP: if (dict_put_safe(&dict, dict_get(&dict, rep0))) { coder->sequence = SEQ_SHORTREP; goto out; } continue; } rc_update_1(coder->is_rep0_long[ state][pos_state]); } else { rc_update_1(coder->is_rep0[state]); case SEQ_IS_REP1: rc_if_0_safe(coder->is_rep1[state], SEQ_IS_REP1) { rc_update_0(coder->is_rep1[state]); const uint32_t distance = rep1; rep1 = rep0; rep0 = distance; } else { rc_update_1(coder->is_rep1[state]); case SEQ_IS_REP2: rc_if_0_safe(coder->is_rep2[state], SEQ_IS_REP2) { rc_update_0(coder->is_rep2[ state]); const uint32_t distance = rep2; rep2 = rep1; rep1 = rep0; rep0 = distance; } else { rc_update_1(coder->is_rep2[ state]); const uint32_t distance = rep3; rep3 = rep2; rep2 = rep1; rep1 = rep0; rep0 = distance; } } } update_long_rep(state); len_decode(len, coder->rep_len_decoder, pos_state, SEQ_REP_LEN); } ///////////////////////////////// // Repeat from history buffer. // ///////////////////////////////// assert(len >= MATCH_LEN_MIN); assert(len <= MATCH_LEN_MAX); case SEQ_COPY: if (unlikely(dict_repeat(&dict, rep0, &len))) { coder->sequence = SEQ_COPY; goto out; } } out: // Save state // NOTE: Must not copy dict.limit. dictptr->pos = dict.pos; dictptr->full = dict.full; rc_from_local(coder->rc, *in_pos); coder->state = state; coder->rep0 = rep0; coder->rep1 = rep1; coder->rep2 = rep2; coder->rep3 = rep3; coder->probs = probs; coder->symbol = symbol; coder->limit = limit; coder->offset = offset; coder->len = len; // Update the remaining amount of uncompressed data if uncompressed // size was known. if (coder->uncompressed_size != LZMA_VLI_UNKNOWN) { coder->uncompressed_size -= dict.pos - dict_start; // If we have gotten all the output but the decoder wants // to write more output, the file is corrupt. There are // three SEQ values where output is produced. if (coder->uncompressed_size == 0 && ret == LZMA_OK && (coder->sequence == SEQ_LITERAL_WRITE || coder->sequence == SEQ_SHORTREP || coder->sequence == SEQ_COPY)) ret = LZMA_DATA_ERROR; } if (ret == LZMA_STREAM_END) { // Reset the range decoder so that it is ready to reinitialize // for a new LZMA2 chunk. rc_reset(coder->rc); coder->sequence = SEQ_IS_MATCH; } return ret; } static void lzma_decoder_uncompressed(void *coder_ptr, lzma_vli uncompressed_size, bool allow_eopm) { lzma_lzma1_decoder *coder = coder_ptr; coder->uncompressed_size = uncompressed_size; coder->allow_eopm = allow_eopm; } static void lzma_decoder_reset(void *coder_ptr, const void *opt) { lzma_lzma1_decoder *coder = coder_ptr; const lzma_options_lzma *options = opt; // NOTE: We assume that lc/lp/pb are valid since they were // successfully decoded with lzma_lzma_decode_properties(). // Calculate pos_mask. We don't need pos_bits as is for anything. coder->pos_mask = (1U << options->pb) - 1; // Initialize the literal decoder. literal_init(coder->literal, options->lc, options->lp); coder->literal_context_bits = options->lc; coder->literal_mask = literal_mask_calc(options->lc, options->lp); // State coder->state = STATE_LIT_LIT; coder->rep0 = 0; coder->rep1 = 0; coder->rep2 = 0; coder->rep3 = 0; coder->pos_mask = (1U << options->pb) - 1; // Range decoder rc_reset(coder->rc); // Bit and bittree decoders for (uint32_t i = 0; i < STATES; ++i) { for (uint32_t j = 0; j <= coder->pos_mask; ++j) { bit_reset(coder->is_match[i][j]); bit_reset(coder->is_rep0_long[i][j]); } bit_reset(coder->is_rep[i]); bit_reset(coder->is_rep0[i]); bit_reset(coder->is_rep1[i]); bit_reset(coder->is_rep2[i]); } for (uint32_t i = 0; i < DIST_STATES; ++i) bittree_reset(coder->dist_slot[i], DIST_SLOT_BITS); for (uint32_t i = 0; i < FULL_DISTANCES - DIST_MODEL_END; ++i) bit_reset(coder->pos_special[i]); bittree_reset(coder->pos_align, ALIGN_BITS); // Len decoders (also bit/bittree) const uint32_t num_pos_states = 1U << options->pb; bit_reset(coder->match_len_decoder.choice); bit_reset(coder->match_len_decoder.choice2); bit_reset(coder->rep_len_decoder.choice); bit_reset(coder->rep_len_decoder.choice2); for (uint32_t pos_state = 0; pos_state < num_pos_states; ++pos_state) { bittree_reset(coder->match_len_decoder.low[pos_state], LEN_LOW_BITS); bittree_reset(coder->match_len_decoder.mid[pos_state], LEN_MID_BITS); bittree_reset(coder->rep_len_decoder.low[pos_state], LEN_LOW_BITS); bittree_reset(coder->rep_len_decoder.mid[pos_state], LEN_MID_BITS); } bittree_reset(coder->match_len_decoder.high, LEN_HIGH_BITS); bittree_reset(coder->rep_len_decoder.high, LEN_HIGH_BITS); coder->sequence = SEQ_IS_MATCH; coder->probs = NULL; coder->symbol = 0; coder->limit = 0; coder->offset = 0; coder->len = 0; return; } extern lzma_ret lzma_lzma_decoder_create(lzma_lz_decoder *lz, const lzma_allocator *allocator, const lzma_options_lzma *options, lzma_lz_options *lz_options) { if (lz->coder == NULL) { lz->coder = lzma_alloc(sizeof(lzma_lzma1_decoder), allocator); if (lz->coder == NULL) return LZMA_MEM_ERROR; lz->code = &lzma_decode; lz->reset = &lzma_decoder_reset; lz->set_uncompressed = &lzma_decoder_uncompressed; } // All dictionary sizes are OK here. LZ decoder will take care of // the special cases. lz_options->dict_size = options->dict_size; lz_options->preset_dict = options->preset_dict; lz_options->preset_dict_size = options->preset_dict_size; return LZMA_OK; } /// Allocate and initialize LZMA decoder. This is used only via LZ /// initialization (lzma_lzma_decoder_init() passes function pointer to /// the LZ initialization). static lzma_ret lzma_decoder_init(lzma_lz_decoder *lz, const lzma_allocator *allocator, lzma_vli id, const void *options, lzma_lz_options *lz_options) { if (!is_lclppb_valid(options)) return LZMA_PROG_ERROR; lzma_vli uncomp_size = LZMA_VLI_UNKNOWN; bool allow_eopm = true; if (id == LZMA_FILTER_LZMA1EXT) { const lzma_options_lzma *opt = options; // Only one flag is supported. if (opt->ext_flags & ~LZMA_LZMA1EXT_ALLOW_EOPM) return LZMA_OPTIONS_ERROR; // FIXME? Using lzma_vli instead of uint64_t is weird because // this has nothing to do with .xz headers and variable-length // integer encoding. On the other hand, using LZMA_VLI_UNKNOWN // instead of UINT64_MAX is clearer when unknown size is // meant. A problem with using lzma_vli is that now we // allow > LZMA_VLI_MAX which is fine in this file but // it's still confusing. Note that alone_decoder.c also // allows > LZMA_VLI_MAX when setting uncompressed size. uncomp_size = opt->ext_size_low + ((uint64_t)(opt->ext_size_high) << 32); allow_eopm = (opt->ext_flags & LZMA_LZMA1EXT_ALLOW_EOPM) != 0 || uncomp_size == LZMA_VLI_UNKNOWN; } return_if_error(lzma_lzma_decoder_create( lz, allocator, options, lz_options)); lzma_decoder_reset(lz->coder, options); lzma_decoder_uncompressed(lz->coder, uncomp_size, allow_eopm); return LZMA_OK; } extern lzma_ret lzma_lzma_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters) { // LZMA can only be the last filter in the chain. This is enforced // by the raw_decoder initialization. assert(filters[1].init == NULL); return lzma_lz_decoder_init(next, allocator, filters, &lzma_decoder_init); } extern bool lzma_lzma_lclppb_decode(lzma_options_lzma *options, uint8_t byte) { if (byte > (4 * 5 + 4) * 9 + 8) return true; // See the file format specification to understand this. options->pb = byte / (9 * 5); byte -= options->pb * 9 * 5; options->lp = byte / 9; options->lc = byte - options->lp * 9; return options->lc + options->lp > LZMA_LCLP_MAX; } extern uint64_t lzma_lzma_decoder_memusage_nocheck(const void *options) { const lzma_options_lzma *const opt = options; return sizeof(lzma_lzma1_decoder) + lzma_lz_decoder_memusage(opt->dict_size); } extern uint64_t lzma_lzma_decoder_memusage(const void *options) { if (!is_lclppb_valid(options)) return UINT64_MAX; return lzma_lzma_decoder_memusage_nocheck(options); } extern lzma_ret lzma_lzma_props_decode(void **options, const lzma_allocator *allocator, const uint8_t *props, size_t props_size) { if (props_size != 5) return LZMA_OPTIONS_ERROR; lzma_options_lzma *opt = lzma_alloc(sizeof(lzma_options_lzma), allocator); if (opt == NULL) return LZMA_MEM_ERROR; if (lzma_lzma_lclppb_decode(opt, props[0])) goto error; // All dictionary sizes are accepted, including zero. LZ decoder // will automatically use a dictionary at least a few KiB even if // a smaller dictionary is requested. opt->dict_size = read32le(props + 1); opt->preset_dict = NULL; opt->preset_dict_size = 0; *options = opt; return LZMA_OK; error: lzma_free(opt, allocator); return LZMA_OPTIONS_ERROR; } diff --git a/src/liblzma/simple/arm.c b/src/liblzma/simple/arm.c index 58acb2d11adf..f9d9c08b3c42 100644 --- a/src/liblzma/simple/arm.c +++ b/src/liblzma/simple/arm.c @@ -1,74 +1,76 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file arm.c /// \brief Filter for ARM binaries /// // Authors: Igor Pavlov // Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "simple_private.h" static size_t arm_code(void *simple lzma_attribute((__unused__)), uint32_t now_pos, bool is_encoder, uint8_t *buffer, size_t size) { + size &= ~(size_t)3; + size_t i; - for (i = 0; i + 4 <= size; i += 4) { + for (i = 0; i < size; i += 4) { if (buffer[i + 3] == 0xEB) { uint32_t src = ((uint32_t)(buffer[i + 2]) << 16) | ((uint32_t)(buffer[i + 1]) << 8) | (uint32_t)(buffer[i + 0]); src <<= 2; uint32_t dest; if (is_encoder) dest = now_pos + (uint32_t)(i) + 8 + src; else dest = src - (now_pos + (uint32_t)(i) + 8); dest >>= 2; buffer[i + 2] = (dest >> 16); buffer[i + 1] = (dest >> 8); buffer[i + 0] = dest; } } return i; } static lzma_ret arm_coder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters, bool is_encoder) { return lzma_simple_coder_init(next, allocator, filters, &arm_code, 0, 4, 4, is_encoder); } #ifdef HAVE_ENCODER_ARM extern lzma_ret lzma_simple_arm_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters) { return arm_coder_init(next, allocator, filters, true); } #endif #ifdef HAVE_DECODER_ARM extern lzma_ret lzma_simple_arm_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters) { return arm_coder_init(next, allocator, filters, false); } #endif diff --git a/src/liblzma/simple/arm64.c b/src/liblzma/simple/arm64.c index 16c2f565f73d..2ec10d937fbd 100644 --- a/src/liblzma/simple/arm64.c +++ b/src/liblzma/simple/arm64.c @@ -1,136 +1,156 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file arm64.c /// \brief Filter for ARM64 binaries /// /// This converts ARM64 relative addresses in the BL and ADRP immediates /// to absolute values to increase redundancy of ARM64 code. /// /// Converting B or ADR instructions was also tested but it's not useful. /// A majority of the jumps for the B instruction are very small (+/- 0xFF). /// These are typical for loops and if-statements. Encoding them to their /// absolute address reduces redundancy since many of the small relative /// jump values are repeated, but very few of the absolute addresses are. // // Authors: Lasse Collin // Jia Tan // Igor Pavlov // /////////////////////////////////////////////////////////////////////////////// #include "simple_private.h" static size_t arm64_code(void *simple lzma_attribute((__unused__)), uint32_t now_pos, bool is_encoder, uint8_t *buffer, size_t size) { + size &= ~(size_t)3; + size_t i; // Clang 14.0.6 on x86-64 makes this four times bigger and 40 % slower // with auto-vectorization that is enabled by default with -O2. // Such vectorization bloat happens with -O2 when targeting ARM64 too // but performance hasn't been tested. #ifdef __clang__ # pragma clang loop vectorize(disable) #endif - for (i = 0; i + 4 <= size; i += 4) { + for (i = 0; i < size; i += 4) { uint32_t pc = (uint32_t)(now_pos + i); uint32_t instr = read32le(buffer + i); if ((instr >> 26) == 0x25) { // BL instruction: // The full 26-bit immediate is converted. // The range is +/-128 MiB. // // Using the full range helps quite a lot with // big executables. Smaller range would reduce false // positives in non-code sections of the input though // so this is a compromise that slightly favors big // files. With the full range, only six bits of the 32 // need to match to trigger a conversion. const uint32_t src = instr; instr = 0x94000000; pc >>= 2; if (!is_encoder) pc = 0U - pc; instr |= (src + pc) & 0x03FFFFFF; write32le(buffer + i, instr); } else if ((instr & 0x9F000000) == 0x90000000) { // ADRP instruction: // Only values in the range +/-512 MiB are converted. // // Using less than the full +/-4 GiB range reduces // false positives on non-code sections of the input // while being excellent for executables up to 512 MiB. // The positive effect of ADRP conversion is smaller // than that of BL but it also doesn't hurt so much in // non-code sections of input because, with +/-512 MiB // range, nine bits of 32 need to match to trigger a // conversion (two 10-bit match choices = 9 bits). const uint32_t src = ((instr >> 29) & 3) | ((instr >> 3) & 0x001FFFFC); // With the addition only one branch is needed to // check the +/- range. This is usually false when // processing ARM64 code so branch prediction will // handle it well in terms of performance. // //if ((src & 0x001E0000) != 0 // && (src & 0x001E0000) != 0x001E0000) if ((src + 0x00020000) & 0x001C0000) continue; instr &= 0x9000001F; pc >>= 12; if (!is_encoder) pc = 0U - pc; const uint32_t dest = src + pc; instr |= (dest & 3) << 29; instr |= (dest & 0x0003FFFC) << 3; instr |= (0U - (dest & 0x00020000)) & 0x00E00000; write32le(buffer + i, instr); } } return i; } static lzma_ret arm64_coder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters, bool is_encoder) { return lzma_simple_coder_init(next, allocator, filters, &arm64_code, 0, 4, 4, is_encoder); } #ifdef HAVE_ENCODER_ARM64 extern lzma_ret lzma_simple_arm64_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters) { return arm64_coder_init(next, allocator, filters, true); } + + +extern LZMA_API(size_t) +lzma_bcj_arm64_encode(uint32_t start_offset, uint8_t *buf, size_t size) +{ + // start_offset must be a multiple of four. + start_offset &= ~UINT32_C(3); + return arm64_code(NULL, start_offset, true, buf, size); +} #endif #ifdef HAVE_DECODER_ARM64 extern lzma_ret lzma_simple_arm64_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters) { return arm64_coder_init(next, allocator, filters, false); } + + +extern LZMA_API(size_t) +lzma_bcj_arm64_decode(uint32_t start_offset, uint8_t *buf, size_t size) +{ + // start_offset must be a multiple of four. + start_offset &= ~UINT32_C(3); + return arm64_code(NULL, start_offset, false, buf, size); +} #endif diff --git a/src/liblzma/simple/armthumb.c b/src/liblzma/simple/armthumb.c index f1eeca9b80f1..368b51c7fea9 100644 --- a/src/liblzma/simple/armthumb.c +++ b/src/liblzma/simple/armthumb.c @@ -1,79 +1,84 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file armthumb.c /// \brief Filter for ARM-Thumb binaries /// // Authors: Igor Pavlov // Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "simple_private.h" static size_t armthumb_code(void *simple lzma_attribute((__unused__)), uint32_t now_pos, bool is_encoder, uint8_t *buffer, size_t size) { + if (size < 4) + return 0; + + size -= 4; + size_t i; - for (i = 0; i + 4 <= size; i += 2) { + for (i = 0; i <= size; i += 2) { if ((buffer[i + 1] & 0xF8) == 0xF0 && (buffer[i + 3] & 0xF8) == 0xF8) { uint32_t src = (((uint32_t)(buffer[i + 1]) & 7) << 19) | ((uint32_t)(buffer[i + 0]) << 11) | (((uint32_t)(buffer[i + 3]) & 7) << 8) | (uint32_t)(buffer[i + 2]); src <<= 1; uint32_t dest; if (is_encoder) dest = now_pos + (uint32_t)(i) + 4 + src; else dest = src - (now_pos + (uint32_t)(i) + 4); dest >>= 1; buffer[i + 1] = 0xF0 | ((dest >> 19) & 0x7); buffer[i + 0] = (dest >> 11); buffer[i + 3] = 0xF8 | ((dest >> 8) & 0x7); buffer[i + 2] = (dest); i += 2; } } return i; } static lzma_ret armthumb_coder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters, bool is_encoder) { return lzma_simple_coder_init(next, allocator, filters, &armthumb_code, 0, 4, 2, is_encoder); } #ifdef HAVE_ENCODER_ARMTHUMB extern lzma_ret lzma_simple_armthumb_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters) { return armthumb_coder_init(next, allocator, filters, true); } #endif #ifdef HAVE_DECODER_ARMTHUMB extern lzma_ret lzma_simple_armthumb_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters) { return armthumb_coder_init(next, allocator, filters, false); } #endif diff --git a/src/liblzma/simple/ia64.c b/src/liblzma/simple/ia64.c index 502501409977..2a4aaebb4720 100644 --- a/src/liblzma/simple/ia64.c +++ b/src/liblzma/simple/ia64.c @@ -1,115 +1,117 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file ia64.c /// \brief Filter for IA64 (Itanium) binaries /// // Authors: Igor Pavlov // Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "simple_private.h" static size_t ia64_code(void *simple lzma_attribute((__unused__)), uint32_t now_pos, bool is_encoder, uint8_t *buffer, size_t size) { static const uint32_t BRANCH_TABLE[32] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 6, 6, 0, 0, 7, 7, 4, 4, 0, 0, 4, 4, 0, 0 }; + size &= ~(size_t)15; + size_t i; - for (i = 0; i + 16 <= size; i += 16) { + for (i = 0; i < size; i += 16) { const uint32_t instr_template = buffer[i] & 0x1F; const uint32_t mask = BRANCH_TABLE[instr_template]; uint32_t bit_pos = 5; for (size_t slot = 0; slot < 3; ++slot, bit_pos += 41) { if (((mask >> slot) & 1) == 0) continue; const size_t byte_pos = (bit_pos >> 3); const uint32_t bit_res = bit_pos & 0x7; uint64_t instruction = 0; for (size_t j = 0; j < 6; ++j) instruction += (uint64_t)( buffer[i + j + byte_pos]) << (8 * j); uint64_t inst_norm = instruction >> bit_res; if (((inst_norm >> 37) & 0xF) == 0x5 && ((inst_norm >> 9) & 0x7) == 0 /* && (inst_norm & 0x3F)== 0 */ ) { uint32_t src = (uint32_t)( (inst_norm >> 13) & 0xFFFFF); src |= ((inst_norm >> 36) & 1) << 20; src <<= 4; uint32_t dest; if (is_encoder) dest = now_pos + (uint32_t)(i) + src; else dest = src - (now_pos + (uint32_t)(i)); dest >>= 4; inst_norm &= ~((uint64_t)(0x8FFFFF) << 13); inst_norm |= (uint64_t)(dest & 0xFFFFF) << 13; inst_norm |= (uint64_t)(dest & 0x100000) << (36 - 20); instruction &= (1U << bit_res) - 1; instruction |= (inst_norm << bit_res); for (size_t j = 0; j < 6; j++) buffer[i + j + byte_pos] = (uint8_t)( instruction >> (8 * j)); } } } return i; } static lzma_ret ia64_coder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters, bool is_encoder) { return lzma_simple_coder_init(next, allocator, filters, &ia64_code, 0, 16, 16, is_encoder); } #ifdef HAVE_ENCODER_IA64 extern lzma_ret lzma_simple_ia64_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters) { return ia64_coder_init(next, allocator, filters, true); } #endif #ifdef HAVE_DECODER_IA64 extern lzma_ret lzma_simple_ia64_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters) { return ia64_coder_init(next, allocator, filters, false); } #endif diff --git a/src/liblzma/simple/powerpc.c b/src/liblzma/simple/powerpc.c index ba6cfbef3ab6..ea47d14d4c3f 100644 --- a/src/liblzma/simple/powerpc.c +++ b/src/liblzma/simple/powerpc.c @@ -1,79 +1,81 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file powerpc.c /// \brief Filter for PowerPC (big endian) binaries /// // Authors: Igor Pavlov // Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "simple_private.h" static size_t powerpc_code(void *simple lzma_attribute((__unused__)), uint32_t now_pos, bool is_encoder, uint8_t *buffer, size_t size) { + size &= ~(size_t)3; + size_t i; - for (i = 0; i + 4 <= size; i += 4) { + for (i = 0; i < size; i += 4) { // PowerPC branch 6(48) 24(Offset) 1(Abs) 1(Link) if ((buffer[i] >> 2) == 0x12 && ((buffer[i + 3] & 3) == 1)) { const uint32_t src = (((uint32_t)(buffer[i + 0]) & 3) << 24) | ((uint32_t)(buffer[i + 1]) << 16) | ((uint32_t)(buffer[i + 2]) << 8) | ((uint32_t)(buffer[i + 3]) & ~UINT32_C(3)); uint32_t dest; if (is_encoder) dest = now_pos + (uint32_t)(i) + src; else dest = src - (now_pos + (uint32_t)(i)); buffer[i + 0] = 0x48 | ((dest >> 24) & 0x03); buffer[i + 1] = (dest >> 16); buffer[i + 2] = (dest >> 8); buffer[i + 3] &= 0x03; buffer[i + 3] |= dest; } } return i; } static lzma_ret powerpc_coder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters, bool is_encoder) { return lzma_simple_coder_init(next, allocator, filters, &powerpc_code, 0, 4, 4, is_encoder); } #ifdef HAVE_ENCODER_POWERPC extern lzma_ret lzma_simple_powerpc_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters) { return powerpc_coder_init(next, allocator, filters, true); } #endif #ifdef HAVE_DECODER_POWERPC extern lzma_ret lzma_simple_powerpc_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters) { return powerpc_coder_init(next, allocator, filters, false); } #endif diff --git a/src/liblzma/simple/riscv.c b/src/liblzma/simple/riscv.c index b18df8b637d0..bc97ebdbb0fb 100644 --- a/src/liblzma/simple/riscv.c +++ b/src/liblzma/simple/riscv.c @@ -1,755 +1,773 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file riscv.c /// \brief Filter for 32-bit/64-bit little/big endian RISC-V binaries /// /// This converts program counter relative addresses in function calls /// (JAL, AUIPC+JALR), address calculation of functions and global /// variables (AUIPC+ADDI), loads (AUIPC+load), and stores (AUIPC+store). /// /// For AUIPC+inst2 pairs, the paired instruction checking is fairly relaxed. /// The paired instruction opcode must only have its lowest two bits set, /// meaning it will convert any paired instruction that is not a 16-bit /// compressed instruction. This was shown to be enough to keep the number /// of false matches low while improving code size and speed. // // Authors: Lasse Collin // Jia Tan // // Special thanks: // // - Chien Wong provided a few early versions of RISC-V // filter variants along with test files and benchmark results. // // - Igor Pavlov helped a lot in the filter design, getting it both // faster and smaller. The implementation here is still independently // written, not based on LZMA SDK. // /////////////////////////////////////////////////////////////////////////////// /* RISC-V filtering ================ RV32I and RV64I, possibly combined with extensions C, Zfh, F, D, and Q, are identical enough that the same filter works for both. The instruction encoding is always little endian, even on systems with big endian data access. Thus the same filter works for both endiannesses. The following instructions have program counter relative (pc-relative) behavior: JAL --- JAL is used for function calls (including tail calls) and unconditional jumps within functions. Jumps within functions aren't useful to filter because the absolute addresses often appear only once or at most a few times. Tail calls and jumps within functions look the same to a simple filter so neither are filtered, that is, JAL x0 is ignored (the ABI name of the register x0 is "zero"). Almost all calls store the return address to register x1 (ra) or x5 (t0). To reduce false matches when the filter is applied to non-code data, only the JAL instructions that use x1 or x5 are converted. JAL has pc-relative range of +/-1 MiB so longer calls and jumps need another method (AUIPC+JALR). C.J and C.JAL ------------- C.J and C.JAL have pc-relative range of +/-2 KiB. C.J is for tail calls and jumps within functions and isn't filtered for the reasons mentioned for JAL x0. C.JAL is an RV32C-only instruction. Its encoding overlaps with RV64C-only C.ADDIW which is a common instruction. So if filtering C.JAL was useful (it wasn't tested) then a separate filter would be needed for RV32 and RV64. Also, false positives would be a significant problem when the filter is applied to non-code data because C.JAL needs only five bits to match. Thus, this filter doesn't modify C.JAL instructions. BEQ, BNE, BLT, BGE, BLTU, BGEU, C.BEQZ, and C.BNEZ -------------------------------------------------- These are conditional branches with pc-relative range of +/-4 KiB (+/-256 B for C.*). The absolute addresses often appear only once and very short distances are the most common, so filtering these instructions would make compression worse. AUIPC with rd != x0 ------------------- AUIPC is paired with a second instruction (inst2) to do pc-relative jumps, calls, loads, stores, and for taking an address of a symbol. AUIPC has a 20-bit immediate and the possible inst2 choices have a 12-bit immediate. AUIPC stores pc + 20-bit signed immediate to a register. The immediate encodes a multiple of 4 KiB so AUIPC itself has a pc-relative range of +/-2 GiB. AUIPC does *NOT* set the lowest 12 bits of the result to zero! This means that the 12-bit immediate in inst2 cannot just include the lowest 12 bits of the absolute address as is; the immediate has to compensate for the lowest 12 bits that AUIPC copies from the program counter. This means that a good filter has to convert not only AUIPC but also the paired inst2. A strict filter would focus on filtering the following AUIPC+inst2 pairs: - AUIPC+JALR: Function calls, including tail calls. - AUIPC+ADDI: Calculating the address of a function or a global variable. - AUIPC+load/store from the base instruction sets (RV32I, RV64I) or from the floating point extensions Zfh, F, D, and Q: * RV32I: LB, LH, LW, LBU, LHU, SB, SH, SW * RV64I has also: LD, LWU, SD * Zfh: FLH, FSH * F: FLW, FSW * D: FLD, FSD * Q: FLQ, FSQ NOTE: AUIPC+inst2 can only be a pair if AUIPC's rd specifies the same register as inst2's rs1. Instead of strictly accepting only the above instructions as inst2, this filter uses a much simpler condition: the lowest two bits of inst2 must be set, that is, inst2 must not be a 16-bit compressed instruction. So this will accept all 32-bit and possible future extended instructions as a pair to AUIPC if the bits in AUIPC's rd [11:7] match the bits [19:15] in inst2 (the bits that I-type and S-type instructions use for rs1). Testing showed that this relaxed condition for inst2 did not consistently or significantly affect compression ratio but it reduced code size and improved speed. Additionally, the paired instruction is always treated as an I-type instruction. The S-type instructions used by stores (SB, SH, SW, etc.) place the lowest 5 bits of the immediate in a different location than I-type instructions. AUIPC+store pairs are less common than other pairs, and testing showed that the extra code required to handle S-type instructions was not worth the compression ratio gained. AUIPC+inst2 don't necessarily appear sequentially next to each other although very often they do. Especially AUIPC+JALR are sequential as that may allow instruction fusion in processors (and perhaps help branch prediction as a fused AUIPC+JALR is a direct branch while JALR alone is an indirect branch). Clang 16 can generate code where AUIPC+inst2 is split: - AUIPC is outside a loop and inst2 (load/store) is inside the loop. This way the AUIPC instruction needs to be executed only once. - Load-modify-store may have AUIPC for the load and the same AUIPC-result is used for the store too. This may get combined with AUIPC being outside the loop. - AUIPC is before a conditional branch and inst2 is hundreds of bytes away at the branch target. - Inner and outer pair: auipc a1,0x2f auipc a2,0x3d ld a2,-500(a2) addi a1,a1,-233 - Many split pairs with an untaken conditional branch between: auipc s9,0x1613 # Pair 1 auipc s4,0x1613 # Pair 2 auipc s6,0x1613 # Pair 3 auipc s10,0x1613 # Pair 4 beqz a5,a3baae ld a0,0(a6) ld a6,246(s9) # Pair 1 ld a1,250(s4) # Pair 2 ld a3,254(s6) # Pair 3 ld a4,258(s10) # Pair 4 It's not possible to find all split pairs in a filter like this. At least in 2024, simple sequential pairs are 99 % of AUIPC uses so filtering only such pairs gives good results and makes the filter simpler. However, it's possible that future compilers will produce different code where sequential pairs aren't as common. This filter doesn't convert AUIPC instructions alone because: (1) The conversion would be off-by-one (or off-by-4096) half the time because the lowest 12 bits from inst2 (inst2_imm12) aren't known. We only know that the absolute address is pc + AUIPC_imm20 + [-2048, +2047] but there is no way to know the exact 4096-byte multiple (or 4096 * n + 2048): there are always two possibilities because AUIPC copies the 12 lowest bits from pc instead of zeroing them. NOTE: The sign-extension of inst2_imm12 adds a tiny bit of extra complexity to AUIPC math in general but it's not the reason for this problem. The sign-extension only changes the relative position of the pc-relative 4096-byte window. (2) Matching AUIPC instruction alone requires only seven bits. When the filter is applied to non-code data, that leads to many false positives which make compression worse. As long as most AUIPC+inst2 pairs appear as two consecutive instructions, converting only such pairs gives better results. In assembly, AUIPC+inst2 tend to look like this: # Call: auipc ra, 0x12345 jalr ra, -42(ra) # Tail call: auipc t1, 0x12345 jalr zero, -42(t1) # Getting the absolute address: auipc a0, 0x12345 addi a0, a0, -42 # rd of inst2 isn't necessarily the same as rs1 even # in cases where there is no reason to preserve rs1. auipc a0, 0x12345 addi a1, a0, -42 As of 2024, 16-bit instructions from the C extension don't appear as inst2. The RISC-V psABI doesn't list AUIPC+C.* as a linker relaxation type explicitly but it's not disallowed either. Usefulness is limited as most of the time the lowest 12 bits won't fit in a C instruction. This filter doesn't support AUIPC+C.* combinations because this makes the filter simpler, there are no test files, and it hopefully will never be needed anyway. (Compare AUIPC to ARM64 where ADRP does set the lowest 12 bits to zero. The paired instruction has the lowest 12 bits of the absolute address as is in a zero-extended immediate. Thus the ARM64 filter doesn't need to care about the instructions that are paired with ADRP. An off-by-4096 issue can still occur if the code section isn't aligned with the filter's start offset. It's not a problem with standalone ELF files but Windows PE files need start_offset=3072 for best results. Also, a .tar stores files with 512-byte alignment so most of the time it won't be the best for ARM64.) AUIPC with rd == x0 ------------------- AUIPC instructions with rd=x0 are reserved for HINTs in the base instruction set. Such AUIPC instructions are never filtered. As of January 2024, it seems likely that AUIPC with rd=x0 will be used for landing pads (pseudoinstruction LPAD). LPAD is used to mark valid targets for indirect jumps (for JALR), for example, beginnings of functions. The 20-bit immediate in LPAD instruction is a label, not a pc-relative address. Thus it would be counterproductive to convert AUIPC instructions with rd=x0. Often the next instruction after LPAD won't have rs1=x0 and thus the filtering would be skipped for that reason alone. However, it's not good to rely on this. For example, consider a function that begins like this: int foo(int i) { if (i <= 234) { ... } A compiler may generate something like this: lpad 0x54321 li a5, 234 bgt a0, a5, .L2 Converting the pseudoinstructions to raw instructions: auipc x0, 0x54321 addi x15, x0, 234 blt x15, x10, .L2 In this case the filter would undesirably convert the AUIPC+ADDI pair if the filter didn't explicitly skip AUIPC instructions that have rd=x0. */ #include "simple_private.h" // This checks two conditions at once: // - AUIPC rd == inst2 rs1. // - inst2 opcode has the lowest two bits set. // // The 8 bit left shift aligns the rd of AUIPC with the rs1 of inst2. // By XORing the registers, any non-zero value in those bits indicates the // registers are not equal and thus not an AUIPC pair. Subtracting 3 from // inst2 will zero out the first two opcode bits only when they are set. // The mask tests if any of the register or opcode bits are set (and thus // not an AUIPC pair). // // Alternative expression: (((((auipc) << 8) ^ (inst2)) & 0xF8003) != 3) #define NOT_AUIPC_PAIR(auipc, inst2) \ ((((auipc) << 8) ^ ((inst2) - 3)) & 0xF8003) // This macro checks multiple conditions: // (1) AUIPC rd [11:7] == x2 (special rd value). // (2) AUIPC bits 12 and 13 set (the lowest two opcode bits of packed inst2). // (3) inst2_rs1 doesn't equal x0 or x2 because the opposite // conversion is only done when // auipc_rd != x0 && // auipc_rd != x2 && // auipc_rd == inst2_rs1. // // The left-hand side takes care of (1) and (2). // (a) The lowest 7 bits are already known to be AUIPC so subtracting 0x17 // makes those bits zeros. // (b) If AUIPC rd equals x2, subtracting 0x100 makes bits [11:7] zeros. // If rd doesn't equal x2, then there will be at least one non-zero bit // and the next step (c) is irrelevant. // (c) If the lowest two opcode bits of the packed inst2 are set in [13:12], // then subtracting 0x3000 will make those bits zeros. Otherwise there // will be at least one non-zero bit. // // The shift by 18 removes the high bits from the final '>=' comparison and // ensures that any non-zero result will be larger than any possible result // from the right-hand side of the comparison. The cast ensures that the // left-hand side didn't get promoted to a larger type than uint32_t. // // On the right-hand side, inst2_rs1 & 0x1D will be non-zero as long as // inst2_rs1 is not x0 or x2. // // The final '>=' comparison will make the expression true if: // - The subtraction caused any bits to be set (special AUIPC rd value not // used or inst2 opcode bits not set). (non-zero >= non-zero or 0) // - The subtraction did not cause any bits to be set but inst2_rs1 was // x0 or x2. (0 >= 0) #define NOT_SPECIAL_AUIPC(auipc, inst2_rs1) \ ((uint32_t)(((auipc) - 0x3117) << 18) >= ((inst2_rs1) & 0x1D)) // The encode and decode functions are split for this filter because of the // AUIPC+inst2 filtering. This filter design allows a decoder-only // implementation to be smaller than alternative designs. #ifdef HAVE_ENCODER_RISCV static size_t riscv_encode(void *simple lzma_attribute((__unused__)), uint32_t now_pos, bool is_encoder lzma_attribute((__unused__)), uint8_t *buffer, size_t size) { // Avoid using i + 8 <= size in the loop condition. // // NOTE: If there is a JAL in the last six bytes of the stream, it // won't be converted. This is intentional to keep the code simpler. if (size < 8) return 0; size -= 8; size_t i; // The loop is advanced by 2 bytes every iteration since the // instruction stream may include 16-bit instructions (C extension). for (i = 0; i <= size; i += 2) { uint32_t inst = buffer[i]; if (inst == 0xEF) { // JAL const uint32_t b1 = buffer[i + 1]; // Only filter rd=x1(ra) and rd=x5(t0). if ((b1 & 0x0D) != 0) continue; // The 20-bit immediate is in four pieces. // The encoder stores it in big endian form // since it improves compression slightly. const uint32_t b2 = buffer[i + 2]; const uint32_t b3 = buffer[i + 3]; const uint32_t pc = now_pos + (uint32_t)i; // The following chart shows the highest three bytes of JAL, focusing on // the 20-bit immediate field [31:12]. The first row of numbers is the // bit position in a 32-bit little endian instruction. The second row of // numbers shows the order of the immediate field in a J-type instruction. // The last row is the bit number in each byte. // // To determine the amount to shift each bit, subtract the value in // the last row from the value in the second last row. If the number // is positive, shift left. If negative, shift right. // // For example, at the rightmost side of the chart, the bit 4 in b1 is // the bit 12 of the address. Thus that bit needs to be shifted left // by 12 - 4 = 8 bits to put it in the right place in the addr variable. // // NOTE: The immediate of a J-type instruction holds bits [20:1] of // the address. The bit [0] is always 0 and not part of the immediate. // // | b3 | b2 | b1 | // | 31 30 29 28 27 26 25 24 | 23 22 21 20 19 18 17 16 | 15 14 13 12 x x x x | // | 20 10 9 8 7 6 5 4 | 3 2 1 11 19 18 17 16 | 15 14 13 12 x x x x | // | 7 6 5 4 3 2 1 0 | 7 6 5 4 3 2 1 0 | 7 6 5 4 x x x x | uint32_t addr = ((b1 & 0xF0) << 8) | ((b2 & 0x0F) << 16) | ((b2 & 0x10) << 7) | ((b2 & 0xE0) >> 4) | ((b3 & 0x7F) << 4) | ((b3 & 0x80) << 13); addr += pc; buffer[i + 1] = (uint8_t)((b1 & 0x0F) | ((addr >> 13) & 0xF0)); buffer[i + 2] = (uint8_t)(addr >> 9); buffer[i + 3] = (uint8_t)(addr >> 1); // The "-2" is included because the for-loop will // always increment by 2. In this case, we want to // skip an extra 2 bytes since we used 4 bytes // of input. i += 4 - 2; } else if ((inst & 0x7F) == 0x17) { // AUIPC inst |= (uint32_t)buffer[i + 1] << 8; inst |= (uint32_t)buffer[i + 2] << 16; inst |= (uint32_t)buffer[i + 3] << 24; // Branch based on AUIPC's rd. The bitmask test does // the same thing as this: // // const uint32_t auipc_rd = (inst >> 7) & 0x1F; // if (auipc_rd != 0 && auipc_rd != 2) { if (inst & 0xE80) { // AUIPC's rd doesn't equal x0 or x2. // Check if AUIPC+inst2 are a pair. uint32_t inst2 = read32le(buffer + i + 4); if (NOT_AUIPC_PAIR(inst, inst2)) { // The NOT_AUIPC_PAIR macro allows // a false AUIPC+AUIPC pair if the // bits [19:15] (where rs1 would be) // in the second AUIPC match the rd // of the first AUIPC. // // We must skip enough forward so // that the first two bytes of the // second AUIPC cannot get converted. // Such a conversion could make the // current pair become a valid pair // which would desync the decoder. // // Skipping six bytes is enough even // though the above condition looks // at the lowest four bits of the // buffer[i + 6] too. This is safe // because this filter never changes // those bits if a conversion at // that position is done. i += 6 - 2; continue; } // Convert AUIPC+inst2 to a special format: // // - The lowest 7 bits [6:0] retain the // AUIPC opcode. // // - The rd [11:7] is set to x2(sp). x2 is // used as the stack pointer so AUIPC with // rd=x2 should be very rare in real-world // executables. // // - The remaining 20 bits [31:12] (that // normally hold the pc-relative immediate) // are used to store the lowest 20 bits of // inst2. That is, the 12-bit immediate of // inst2 is not included. // // - The location of the original inst2 is // used to store the 32-bit absolute // address in big endian format. Compared // to the 20+12-bit split encoding, this // results in a longer uninterrupted // sequence of identical common bytes // when the same address is referred // with different instruction pairs // (like AUIPC+LD vs. AUIPC+ADDI) or // when the occurrences of the same // pair use different registers. When // referring to adjacent memory locations // (like function calls that go via the // ELF PLT), in big endian order only the // last 1-2 bytes differ; in little endian // the differing 1-2 bytes would be in the // middle of the 8-byte sequence. // // When reversing the transformation, the // original rd of AUIPC can be restored // from inst2's rs1 as they are required to // be the same. // Arithmetic right shift makes sign extension // trivial but (1) it's implementation-defined // behavior (C99/C11/C23 6.5.7-p5) and so is // (2) casting unsigned to signed (6.3.1.3-p3). // // One can check for (1) with // // if ((-1 >> 1) == -1) ... // // but (2) has to be checked from the // compiler docs. GCC promises that (1) // and (2) behave in the common expected // way and thus // // addr += (uint32_t)( // (int32_t)inst2 >> 20); // // does the same as the code below. But since // the 100 % portable way is only a few bytes // bigger code and there is no real speed // difference, let's just use that, especially // since the decoder doesn't need this at all. uint32_t addr = inst & 0xFFFFF000; addr += (inst2 >> 20) - ((inst2 >> 19) & 0x1000); addr += now_pos + (uint32_t)i; // Construct the first 32 bits: // [6:0] AUIPC opcode // [11:7] Special AUIPC rd = x2 // [31:12] The lowest 20 bits of inst2 inst = 0x17 | (2 << 7) | (inst2 << 12); write32le(buffer + i, inst); // The second 32 bits store the absolute // address in big endian order. write32be(buffer + i + 4, addr); } else { // AUIPC's rd equals x0 or x2. // // x0 indicates a landing pad (LPAD). // It's always skipped. // // AUIPC with rd == x2 is used for the special // format as explained above. When the input // contains a byte sequence that matches the // special format, "fake" decoding must be // done to keep the filter bijective (that // is, safe to apply on arbitrary data). // // See the "x0 or x2" section in riscv_decode() // for how the "real" decoding is done. The // "fake" decoding is a simplified version // of "real" decoding with the following // differences (these reduce code size of // the decoder): // (1) The lowest 12 bits aren't sign-extended. // (2) No address conversion is done. // (3) Big endian format isn't used (the fake // address is in little endian order). // Check if inst matches the special format. const uint32_t fake_rs1 = inst >> 27; if (NOT_SPECIAL_AUIPC(inst, fake_rs1)) { i += 4 - 2; continue; } const uint32_t fake_addr = read32le(buffer + i + 4); // Construct the second 32 bits: // [19:0] Upper 20 bits from AUIPC // [31:20] The lowest 12 bits of fake_addr const uint32_t fake_inst2 = (inst >> 12) | (fake_addr << 20); // Construct new first 32 bits from: // [6:0] AUIPC opcode // [11:7] Fake AUIPC rd = fake_rs1 // [31:12] The highest 20 bits of fake_addr inst = 0x17 | (fake_rs1 << 7) | (fake_addr & 0xFFFFF000); write32le(buffer + i, inst); write32le(buffer + i + 4, fake_inst2); } i += 8 - 2; } } return i; } extern lzma_ret lzma_simple_riscv_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters) { return lzma_simple_coder_init(next, allocator, filters, &riscv_encode, 0, 8, 2, true); } + + +extern LZMA_API(size_t) +lzma_bcj_riscv_encode(uint32_t start_offset, uint8_t *buf, size_t size) +{ + // start_offset must be a multiple of two. + start_offset &= ~UINT32_C(1); + return riscv_encode(NULL, start_offset, true, buf, size); +} #endif #ifdef HAVE_DECODER_RISCV static size_t riscv_decode(void *simple lzma_attribute((__unused__)), uint32_t now_pos, bool is_encoder lzma_attribute((__unused__)), uint8_t *buffer, size_t size) { if (size < 8) return 0; size -= 8; size_t i; for (i = 0; i <= size; i += 2) { uint32_t inst = buffer[i]; if (inst == 0xEF) { // JAL const uint32_t b1 = buffer[i + 1]; // Only filter rd=x1(ra) and rd=x5(t0). if ((b1 & 0x0D) != 0) continue; const uint32_t b2 = buffer[i + 2]; const uint32_t b3 = buffer[i + 3]; const uint32_t pc = now_pos + (uint32_t)i; // | b3 | b2 | b1 | // | 31 30 29 28 27 26 25 24 | 23 22 21 20 19 18 17 16 | 15 14 13 12 x x x x | // | 20 10 9 8 7 6 5 4 | 3 2 1 11 19 18 17 16 | 15 14 13 12 x x x x | // | 7 6 5 4 3 2 1 0 | 7 6 5 4 3 2 1 0 | 7 6 5 4 x x x x | uint32_t addr = ((b1 & 0xF0) << 13) | (b2 << 9) | (b3 << 1); addr -= pc; buffer[i + 1] = (uint8_t)((b1 & 0x0F) | ((addr >> 8) & 0xF0)); buffer[i + 2] = (uint8_t)(((addr >> 16) & 0x0F) | ((addr >> 7) & 0x10) | ((addr << 4) & 0xE0)); buffer[i + 3] = (uint8_t)(((addr >> 4) & 0x7F) | ((addr >> 13) & 0x80)); i += 4 - 2; } else if ((inst & 0x7F) == 0x17) { // AUIPC uint32_t inst2; inst |= (uint32_t)buffer[i + 1] << 8; inst |= (uint32_t)buffer[i + 2] << 16; inst |= (uint32_t)buffer[i + 3] << 24; if (inst & 0xE80) { // AUIPC's rd doesn't equal x0 or x2. // Check if it is a "fake" AUIPC+inst2 pair. inst2 = read32le(buffer + i + 4); if (NOT_AUIPC_PAIR(inst, inst2)) { i += 6 - 2; continue; } // Decode (or more like re-encode) the "fake" // pair. The "fake" format doesn't do // sign-extension, address conversion, or // use big endian. (The use of little endian // allows sharing the write32le() calls in // the decoder to reduce code size when // unaligned access isn't supported.) uint32_t addr = inst & 0xFFFFF000; addr += inst2 >> 20; inst = 0x17 | (2 << 7) | (inst2 << 12); inst2 = addr; } else { // AUIPC's rd equals x0 or x2. // Check if inst matches the special format // used by the encoder. const uint32_t inst2_rs1 = inst >> 27; if (NOT_SPECIAL_AUIPC(inst, inst2_rs1)) { i += 4 - 2; continue; } // Decode the "real" pair. uint32_t addr = read32be(buffer + i + 4); addr -= now_pos + (uint32_t)i; // The second instruction: // - Get the lowest 20 bits from inst. // - Add the lowest 12 bits of the address // as the immediate field. inst2 = (inst >> 12) | (addr << 20); // AUIPC: // - rd is the same as inst2_rs1. // - The sign extension of the lowest 12 bits // must be taken into account. inst = 0x17 | (inst2_rs1 << 7) | ((addr + 0x800) & 0xFFFFF000); } // Both decoder branches write in little endian order. write32le(buffer + i, inst); write32le(buffer + i + 4, inst2); i += 8 - 2; } } return i; } extern lzma_ret lzma_simple_riscv_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters) { return lzma_simple_coder_init(next, allocator, filters, &riscv_decode, 0, 8, 2, false); } + + +extern LZMA_API(size_t) +lzma_bcj_riscv_decode(uint32_t start_offset, uint8_t *buf, size_t size) +{ + // start_offset must be a multiple of two. + start_offset &= ~UINT32_C(1); + return riscv_decode(NULL, start_offset, false, buf, size); +} #endif diff --git a/src/liblzma/simple/sparc.c b/src/liblzma/simple/sparc.c index e8ad285a1927..1fa4850458e8 100644 --- a/src/liblzma/simple/sparc.c +++ b/src/liblzma/simple/sparc.c @@ -1,86 +1,87 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file sparc.c /// \brief Filter for SPARC binaries /// // Authors: Igor Pavlov // Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "simple_private.h" static size_t sparc_code(void *simple lzma_attribute((__unused__)), uint32_t now_pos, bool is_encoder, uint8_t *buffer, size_t size) { - size_t i; - for (i = 0; i + 4 <= size; i += 4) { + size &= ~(size_t)3; + size_t i; + for (i = 0; i < size; i += 4) { if ((buffer[i] == 0x40 && (buffer[i + 1] & 0xC0) == 0x00) || (buffer[i] == 0x7F && (buffer[i + 1] & 0xC0) == 0xC0)) { uint32_t src = ((uint32_t)buffer[i + 0] << 24) | ((uint32_t)buffer[i + 1] << 16) | ((uint32_t)buffer[i + 2] << 8) | ((uint32_t)buffer[i + 3]); src <<= 2; uint32_t dest; if (is_encoder) dest = now_pos + (uint32_t)(i) + src; else dest = src - (now_pos + (uint32_t)(i)); dest >>= 2; dest = (((0 - ((dest >> 22) & 1)) << 22) & 0x3FFFFFFF) | (dest & 0x3FFFFF) | 0x40000000; buffer[i + 0] = (uint8_t)(dest >> 24); buffer[i + 1] = (uint8_t)(dest >> 16); buffer[i + 2] = (uint8_t)(dest >> 8); buffer[i + 3] = (uint8_t)(dest); } } return i; } static lzma_ret sparc_coder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters, bool is_encoder) { return lzma_simple_coder_init(next, allocator, filters, &sparc_code, 0, 4, 4, is_encoder); } #ifdef HAVE_ENCODER_SPARC extern lzma_ret lzma_simple_sparc_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters) { return sparc_coder_init(next, allocator, filters, true); } #endif #ifdef HAVE_DECODER_SPARC extern lzma_ret lzma_simple_sparc_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters) { return sparc_coder_init(next, allocator, filters, false); } #endif diff --git a/src/liblzma/simple/x86.c b/src/liblzma/simple/x86.c index f216231f2d12..dffa7863131a 100644 --- a/src/liblzma/simple/x86.c +++ b/src/liblzma/simple/x86.c @@ -1,157 +1,181 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file x86.c /// \brief Filter for x86 binaries (BCJ filter) /// // Authors: Igor Pavlov // Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "simple_private.h" #define Test86MSByte(b) ((b) == 0 || (b) == 0xFF) typedef struct { uint32_t prev_mask; uint32_t prev_pos; } lzma_simple_x86; static size_t x86_code(void *simple_ptr, uint32_t now_pos, bool is_encoder, uint8_t *buffer, size_t size) { static const uint32_t MASK_TO_BIT_NUMBER[5] = { 0, 1, 2, 2, 3 }; lzma_simple_x86 *simple = simple_ptr; uint32_t prev_mask = simple->prev_mask; uint32_t prev_pos = simple->prev_pos; if (size < 5) return 0; if (now_pos - prev_pos > 5) prev_pos = now_pos - 5; const size_t limit = size - 5; size_t buffer_pos = 0; while (buffer_pos <= limit) { uint8_t b = buffer[buffer_pos]; if (b != 0xE8 && b != 0xE9) { ++buffer_pos; continue; } const uint32_t offset = now_pos + (uint32_t)(buffer_pos) - prev_pos; prev_pos = now_pos + (uint32_t)(buffer_pos); if (offset > 5) { prev_mask = 0; } else { for (uint32_t i = 0; i < offset; ++i) { prev_mask &= 0x77; prev_mask <<= 1; } } b = buffer[buffer_pos + 4]; if (Test86MSByte(b) && (prev_mask >> 1) <= 4 && (prev_mask >> 1) != 3) { uint32_t src = ((uint32_t)(b) << 24) | ((uint32_t)(buffer[buffer_pos + 3]) << 16) | ((uint32_t)(buffer[buffer_pos + 2]) << 8) | (buffer[buffer_pos + 1]); uint32_t dest; while (true) { if (is_encoder) dest = src + (now_pos + (uint32_t)( buffer_pos) + 5); else dest = src - (now_pos + (uint32_t)( buffer_pos) + 5); if (prev_mask == 0) break; const uint32_t i = MASK_TO_BIT_NUMBER[ prev_mask >> 1]; b = (uint8_t)(dest >> (24 - i * 8)); if (!Test86MSByte(b)) break; src = dest ^ ((1U << (32 - i * 8)) - 1); } buffer[buffer_pos + 4] = (uint8_t)(~(((dest >> 24) & 1) - 1)); buffer[buffer_pos + 3] = (uint8_t)(dest >> 16); buffer[buffer_pos + 2] = (uint8_t)(dest >> 8); buffer[buffer_pos + 1] = (uint8_t)(dest); buffer_pos += 5; prev_mask = 0; } else { ++buffer_pos; prev_mask |= 1; if (Test86MSByte(b)) prev_mask |= 0x10; } } simple->prev_mask = prev_mask; simple->prev_pos = prev_pos; return buffer_pos; } static lzma_ret x86_coder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters, bool is_encoder) { const lzma_ret ret = lzma_simple_coder_init(next, allocator, filters, &x86_code, sizeof(lzma_simple_x86), 5, 1, is_encoder); if (ret == LZMA_OK) { lzma_simple_coder *coder = next->coder; lzma_simple_x86 *simple = coder->simple; simple->prev_mask = 0; simple->prev_pos = (uint32_t)(-5); } return ret; } #ifdef HAVE_ENCODER_X86 extern lzma_ret lzma_simple_x86_encoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters) { return x86_coder_init(next, allocator, filters, true); } + + +extern LZMA_API(size_t) +lzma_bcj_x86_encode(uint32_t start_offset, uint8_t *buf, size_t size) +{ + lzma_simple_x86 simple = { + .prev_mask = 0, + .prev_pos = (uint32_t)(-5), + }; + + return x86_code(&simple, start_offset, true, buf, size); +} #endif #ifdef HAVE_DECODER_X86 extern lzma_ret lzma_simple_x86_decoder_init(lzma_next_coder *next, const lzma_allocator *allocator, const lzma_filter_info *filters) { return x86_coder_init(next, allocator, filters, false); } + + +extern LZMA_API(size_t) +lzma_bcj_x86_decode(uint32_t start_offset, uint8_t *buf, size_t size) +{ + lzma_simple_x86 simple = { + .prev_mask = 0, + .prev_pos = (uint32_t)(-5), + }; + + return x86_code(&simple, start_offset, false, buf, size); +} #endif diff --git a/src/lzmainfo/lzmainfo.c b/src/lzmainfo/lzmainfo.c index d917f371c3ba..0b0b0d3d09a4 100644 --- a/src/lzmainfo/lzmainfo.c +++ b/src/lzmainfo/lzmainfo.c @@ -1,219 +1,243 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file lzmainfo.c /// \brief lzmainfo tool for compatibility with LZMA Utils // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "sysdefs.h" #include #include #include "lzma.h" #include "getopt.h" #include "tuklib_gettext.h" #include "tuklib_progname.h" +#include "tuklib_mbstr_nonprint.h" +#include "tuklib_mbstr_wrap.h" #include "tuklib_exit.h" #ifdef TUKLIB_DOSLIKE # include # include #endif tuklib_attr_noreturn static void help(void) { - printf( -_("Usage: %s [--help] [--version] [FILE]...\n" -"Show information stored in the .lzma file header"), progname); + // A few languages use so long strings that we need automatic + // wrapping. A few strings are the same as in xz/message.c and + // should be kept in sync. + static const struct tuklib_wrap_opt wrap0 = { 0, 0, 0, 0, 79 }; + int e = 0; - printf(_( -"\nWith no FILE, or when FILE is -, read standard input.\n")); - printf("\n"); + printf(_("Usage: %s [--help] [--version] [FILE]...\n"), progname); - printf(_("Report bugs to <%s> (in English or Finnish).\n"), + e |= tuklib_wraps(stdout, &wrap0, + W_("Show information stored in the .lzma file header.")); + e |= tuklib_wraps(stdout, &wrap0, + W_("With no FILE, or when FILE is -, read standard input.")); + + putchar('\n'); + + e |= tuklib_wrapf(stdout, &wrap0, + W_("Report bugs to <%s> (in English or Finnish)."), PACKAGE_BUGREPORT); - printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL); + + e |= tuklib_wrapf(stdout, &wrap0, + W_("%s home page: <%s>"), PACKAGE_NAME, PACKAGE_URL); + + if (e != 0) { + // Avoid new translatable strings by printing the message + // in pieces. + fprintf(stderr, _("%s: "), progname); + fprintf(stderr, _("Error printing the help text " + "(error code %d)"), e); + fprintf(stderr, "\n"); + } tuklib_exit(EXIT_SUCCESS, EXIT_FAILURE, true); } tuklib_attr_noreturn static void version(void) { puts("lzmainfo (" PACKAGE_NAME ") " LZMA_VERSION_STRING); tuklib_exit(EXIT_SUCCESS, EXIT_FAILURE, true); } /// Parse command line options. static void parse_args(int argc, char **argv) { enum { OPT_HELP, OPT_VERSION, }; static const struct option long_opts[] = { { "help", no_argument, NULL, OPT_HELP }, { "version", no_argument, NULL, OPT_VERSION }, { NULL, 0, NULL, 0 } }; int c; while ((c = getopt_long(argc, argv, "", long_opts, NULL)) != -1) { switch (c) { case OPT_HELP: help(); case OPT_VERSION: version(); default: exit(EXIT_FAILURE); } } return; } /// Primitive base-2 logarithm for integers static uint32_t my_log2(uint32_t n) { uint32_t e; for (e = 0; n > 1; ++e, n /= 2) ; return e; } /// Parse the .lzma header and display information about it. static bool lzmainfo(const char *name, FILE *f) { uint8_t buf[13]; const size_t size = fread(buf, 1, sizeof(buf), f); if (size != 13) { - fprintf(stderr, "%s: %s: %s\n", progname, name, + fprintf(stderr, "%s: %s: %s\n", progname, + tuklib_mask_nonprint(name), ferror(f) ? strerror(errno) : _("File is too small to be a .lzma file")); return true; } lzma_filter filter = { .id = LZMA_FILTER_LZMA1 }; // Parse the first five bytes. switch (lzma_properties_decode(&filter, NULL, buf, 5)) { case LZMA_OK: break; case LZMA_OPTIONS_ERROR: - fprintf(stderr, "%s: %s: %s\n", progname, name, + fprintf(stderr, "%s: %s: %s\n", progname, + tuklib_mask_nonprint(name), _("Not a .lzma file")); return true; case LZMA_MEM_ERROR: fprintf(stderr, "%s: %s\n", progname, strerror(ENOMEM)); exit(EXIT_FAILURE); default: fprintf(stderr, "%s: %s\n", progname, _("Internal error (bug)")); exit(EXIT_FAILURE); } // Uncompressed size uint64_t uncompressed_size = 0; for (size_t i = 0; i < 8; ++i) uncompressed_size |= (uint64_t)(buf[5 + i]) << (i * 8); // Display the results. We don't want to translate these and also // will use MB instead of MiB, because someone could be parsing // this output and we don't want to break that when people move // from LZMA Utils to XZ Utils. if (f != stdin) - printf("%s\n", name); + printf("%s\n", tuklib_mask_nonprint(name)); printf("Uncompressed size: "); if (uncompressed_size == UINT64_MAX) printf("Unknown"); else printf("%" PRIu64 " MB (%" PRIu64 " bytes)", (uncompressed_size / 1024 + 512) / 1024, uncompressed_size); lzma_options_lzma *opt = filter.options; printf("\nDictionary size: " "%" PRIu32 " MB (2^%" PRIu32 " bytes)\n" "Literal context bits (lc): %" PRIu32 "\n" "Literal pos bits (lp): %" PRIu32 "\n" "Number of pos bits (pb): %" PRIu32 "\n", (opt->dict_size / 1024 + 512) / 1024, my_log2(opt->dict_size), opt->lc, opt->lp, opt->pb); free(opt); return false; } extern int main(int argc, char **argv) { tuklib_progname_init(argv); tuklib_gettext_init(PACKAGE, LOCALEDIR); parse_args(argc, argv); #ifdef TUKLIB_DOSLIKE setmode(fileno(stdin), O_BINARY); #endif int ret = EXIT_SUCCESS; // We print empty lines around the output only when reading from // files specified on the command line. This is due to how // LZMA Utils did it. if (optind == argc) { if (lzmainfo("(stdin)", stdin)) ret = EXIT_FAILURE; } else { printf("\n"); do { if (strcmp(argv[optind], "-") == 0) { if (lzmainfo("(stdin)", stdin)) ret = EXIT_FAILURE; } else { FILE *f = fopen(argv[optind], "r"); if (f == NULL) { ret = EXIT_FAILURE; fprintf(stderr, "%s: %s: %s\n", - progname, - argv[optind], - strerror(errno)); + progname, + tuklib_mask_nonprint( + argv[optind]), + strerror(errno)); continue; } if (lzmainfo(argv[optind], f)) ret = EXIT_FAILURE; printf("\n"); fclose(f); } } while (++optind < argc); } tuklib_exit(ret, EXIT_FAILURE, true); } diff --git a/src/xz/args.c b/src/xz/args.c index b3743ceaf205..8043c98e21c1 100644 --- a/src/xz/args.c +++ b/src/xz/args.c @@ -1,898 +1,915 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file args.c /// \brief Argument parsing /// /// \note Filter-specific options parsing is in options.c. // // Authors: Lasse Collin // Jia Tan // /////////////////////////////////////////////////////////////////////////////// #include "private.h" #include "getopt.h" #include bool opt_stdout = false; bool opt_force = false; bool opt_keep_original = false; +bool opt_synchronous = true; bool opt_robot = false; bool opt_ignore_check = false; // We don't modify or free() this, but we need to assign it in some // non-const pointers. const char stdin_filename[] = "(stdin)"; /// Parse and set the memory usage limit for compression, decompression, /// and/or multithreaded decompression. static void parse_memlimit(const char *name, const char *name_percentage, const char *str, bool set_compress, bool set_decompress, bool set_mtdec) { bool is_percentage = false; uint64_t value; const size_t len = strlen(str); if (len > 0 && str[len - 1] == '%') { // Make a copy so that we can get rid of %. // // In the past str wasn't const and we modified it directly // but that modified argv[] and thus affected what was visible // in "ps auxf" or similar tools which was confusing. For // example, --memlimit=50% would show up as --memlimit=50 // since the percent sign was overwritten here. char *s = xstrdup(str); s[len - 1] = '\0'; is_percentage = true; value = str_to_uint64(name_percentage, s, 1, 100); free(s); } else { // On 32-bit systems, SIZE_MAX would make more sense than // UINT64_MAX. But use UINT64_MAX still so that scripts // that assume > 4 GiB values don't break. value = str_to_uint64(name, str, 0, UINT64_MAX); } hardware_memlimit_set(value, set_compress, set_decompress, set_mtdec, is_percentage); return; } static void parse_block_list(const char *str_const) { // We need a modifiable string in the for-loop. char *str_start = xstrdup(str_const); char *str = str_start; // It must be non-empty and not begin with a comma. if (str[0] == '\0' || str[0] == ',') message_fatal(_("%s: Invalid argument to --block-list"), str); // Count the number of comma-separated strings. size_t count = 1; for (size_t i = 0; str[i] != '\0'; ++i) if (str[i] == ',') ++count; // Prevent an unlikely integer overflow. if (count > SIZE_MAX / sizeof(block_list_entry) - 1) message_fatal(_("%s: Too many arguments to --block-list"), str); // Allocate memory to hold all the sizes specified. // If --block-list was specified already, its value is forgotten. free(opt_block_list); opt_block_list = xmalloc((count + 1) * sizeof(block_list_entry)); // Clear the bitmask of filter chains in use. block_list_chain_mask = 0; // Reset the largest Block size found in --block-list. block_list_largest = 0; for (size_t i = 0; i < count; ++i) { // Locate the next comma and replace it with \0. char *p = strchr(str, ','); if (p != NULL) *p = '\0'; // Use the default filter chain unless overridden. opt_block_list[i].chain_num = 0; // To specify a filter chain, the block list entry may be // prepended with "[filter-chain-number]:". The size is // still required for every block. // For instance: // --block-list=2:10MiB,1:5MiB,,8MiB,0:0 // // Translates to: // 1. Block of 10 MiB using filter chain 2 // 2. Block of 5 MiB using filter chain 1 // 3. Block of 5 MiB using filter chain 1 // 4. Block of 8 MiB using the default filter chain // 5. The last block uses the default filter chain // // The block list: // --block-list=2:MiB,1:,0 // // Is not allowed because the second block does not specify // the block size, only the filter chain. if (str[0] >= '0' && str[0] <= '9' && str[1] == ':') { if (str[2] == '\0') message_fatal(_("In --block-list, block " "size is missing after " "filter chain number '%c:'"), str[0]); const unsigned chain_num = (unsigned)(str[0] - '0'); opt_block_list[i].chain_num = chain_num; block_list_chain_mask |= 1U << chain_num; str += 2; } else { // This Block uses the default filter chain. block_list_chain_mask |= 1U << 0; } if (str[0] == '\0') { // There is no string, that is, a comma follows // another comma. Use the previous value. // // NOTE: We checked earlier that the first char // of the whole list cannot be a comma. assert(i > 0); opt_block_list[i] = opt_block_list[i - 1]; } else { opt_block_list[i].size = str_to_uint64("block-list", str, 0, UINT64_MAX); // Zero indicates no more new Blocks. if (opt_block_list[i].size == 0) { if (i + 1 != count) message_fatal(_("0 can only be used " "as the last element " "in --block-list")); opt_block_list[i].size = UINT64_MAX; } // Remember the largest Block size in the list. // // NOTE: Do this after handling the special value 0 // because when 0 is used, we don't want to reduce // the Block size of the multithreaded encoder. if (block_list_largest < opt_block_list[i].size) block_list_largest = opt_block_list[i].size; } // Be standards compliant: p + 1 is undefined behavior // if p == NULL. That occurs on the last iteration of // the loop when we won't care about the value of str // anymore anyway. That is, this is done conditionally // solely for standard conformance reasons. if (p != NULL) str = p + 1; } // Terminate the array. opt_block_list[count].size = 0; free(str_start); return; } static void parse_real(args_info *args, int argc, char **argv) { enum { OPT_FILTERS = INT_MIN, OPT_FILTERS1, OPT_FILTERS2, OPT_FILTERS3, OPT_FILTERS4, OPT_FILTERS5, OPT_FILTERS6, OPT_FILTERS7, OPT_FILTERS8, OPT_FILTERS9, OPT_FILTERS_HELP, OPT_X86, OPT_POWERPC, OPT_IA64, OPT_ARM, OPT_ARMTHUMB, OPT_ARM64, OPT_SPARC, OPT_RISCV, OPT_DELTA, OPT_LZMA1, OPT_LZMA2, + OPT_NO_SYNC, OPT_SINGLE_STREAM, OPT_NO_SPARSE, OPT_FILES, OPT_FILES0, OPT_BLOCK_SIZE, OPT_BLOCK_LIST, OPT_MEM_COMPRESS, OPT_MEM_DECOMPRESS, OPT_MEM_MT_DECOMPRESS, OPT_NO_ADJUST, OPT_INFO_MEMORY, OPT_ROBOT, OPT_FLUSH_TIMEOUT, OPT_IGNORE_CHECK, }; static const char short_opts[] = "cC:defF:hHlkM:qQrS:tT:vVz0123456789"; static const struct option long_opts[] = { // Operation mode { "compress", no_argument, NULL, 'z' }, { "decompress", no_argument, NULL, 'd' }, { "uncompress", no_argument, NULL, 'd' }, { "test", no_argument, NULL, 't' }, { "list", no_argument, NULL, 'l' }, // Operation modifiers { "keep", no_argument, NULL, 'k' }, { "force", no_argument, NULL, 'f' }, { "stdout", no_argument, NULL, 'c' }, { "to-stdout", no_argument, NULL, 'c' }, + { "no-sync", no_argument, NULL, OPT_NO_SYNC }, { "single-stream", no_argument, NULL, OPT_SINGLE_STREAM }, { "no-sparse", no_argument, NULL, OPT_NO_SPARSE }, { "suffix", required_argument, NULL, 'S' }, { "files", optional_argument, NULL, OPT_FILES }, { "files0", optional_argument, NULL, OPT_FILES0 }, // Basic compression settings { "format", required_argument, NULL, 'F' }, { "check", required_argument, NULL, 'C' }, { "ignore-check", no_argument, NULL, OPT_IGNORE_CHECK }, { "block-size", required_argument, NULL, OPT_BLOCK_SIZE }, { "block-list", required_argument, NULL, OPT_BLOCK_LIST }, { "memlimit-compress", required_argument, NULL, OPT_MEM_COMPRESS }, { "memlimit-decompress", required_argument, NULL, OPT_MEM_DECOMPRESS }, { "memlimit-mt-decompress", required_argument, NULL, OPT_MEM_MT_DECOMPRESS }, { "memlimit", required_argument, NULL, 'M' }, { "memory", required_argument, NULL, 'M' }, // Old alias { "no-adjust", no_argument, NULL, OPT_NO_ADJUST }, { "threads", required_argument, NULL, 'T' }, { "flush-timeout", required_argument, NULL, OPT_FLUSH_TIMEOUT }, { "extreme", no_argument, NULL, 'e' }, { "fast", no_argument, NULL, '0' }, { "best", no_argument, NULL, '9' }, // Filters - { "filters", optional_argument, NULL, OPT_FILTERS}, - { "filters1", optional_argument, NULL, OPT_FILTERS1}, - { "filters2", optional_argument, NULL, OPT_FILTERS2}, - { "filters3", optional_argument, NULL, OPT_FILTERS3}, - { "filters4", optional_argument, NULL, OPT_FILTERS4}, - { "filters5", optional_argument, NULL, OPT_FILTERS5}, - { "filters6", optional_argument, NULL, OPT_FILTERS6}, - { "filters7", optional_argument, NULL, OPT_FILTERS7}, - { "filters8", optional_argument, NULL, OPT_FILTERS8}, - { "filters9", optional_argument, NULL, OPT_FILTERS9}, - { "filters-help", optional_argument, NULL, OPT_FILTERS_HELP}, + { "filters", required_argument, NULL, OPT_FILTERS}, + { "filters1", required_argument, NULL, OPT_FILTERS1}, + { "filters2", required_argument, NULL, OPT_FILTERS2}, + { "filters3", required_argument, NULL, OPT_FILTERS3}, + { "filters4", required_argument, NULL, OPT_FILTERS4}, + { "filters5", required_argument, NULL, OPT_FILTERS5}, + { "filters6", required_argument, NULL, OPT_FILTERS6}, + { "filters7", required_argument, NULL, OPT_FILTERS7}, + { "filters8", required_argument, NULL, OPT_FILTERS8}, + { "filters9", required_argument, NULL, OPT_FILTERS9}, + { "filters-help", no_argument, NULL, OPT_FILTERS_HELP}, { "lzma1", optional_argument, NULL, OPT_LZMA1 }, { "lzma2", optional_argument, NULL, OPT_LZMA2 }, { "x86", optional_argument, NULL, OPT_X86 }, { "powerpc", optional_argument, NULL, OPT_POWERPC }, { "ia64", optional_argument, NULL, OPT_IA64 }, { "arm", optional_argument, NULL, OPT_ARM }, { "armthumb", optional_argument, NULL, OPT_ARMTHUMB }, { "arm64", optional_argument, NULL, OPT_ARM64 }, { "sparc", optional_argument, NULL, OPT_SPARC }, { "riscv", optional_argument, NULL, OPT_RISCV }, { "delta", optional_argument, NULL, OPT_DELTA }, // Other options { "quiet", no_argument, NULL, 'q' }, { "verbose", no_argument, NULL, 'v' }, { "no-warn", no_argument, NULL, 'Q' }, { "robot", no_argument, NULL, OPT_ROBOT }, { "info-memory", no_argument, NULL, OPT_INFO_MEMORY }, { "help", no_argument, NULL, 'h' }, { "long-help", no_argument, NULL, 'H' }, { "version", no_argument, NULL, 'V' }, { NULL, 0, NULL, 0 } }; int c; while ((c = getopt_long(argc, argv, short_opts, long_opts, NULL)) != -1) { switch (c) { // Compression preset (also for decompression if --format=raw) case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': coder_set_preset((uint32_t)(c - '0')); break; // --memlimit-compress case OPT_MEM_COMPRESS: parse_memlimit("memlimit-compress", "memlimit-compress%", optarg, true, false, false); break; // --memlimit-decompress case OPT_MEM_DECOMPRESS: parse_memlimit("memlimit-decompress", "memlimit-decompress%", optarg, false, true, false); break; // --memlimit-mt-decompress case OPT_MEM_MT_DECOMPRESS: parse_memlimit("memlimit-mt-decompress", "memlimit-mt-decompress%", optarg, false, false, true); break; // --memlimit case 'M': parse_memlimit("memlimit", "memlimit%", optarg, true, true, true); break; // --suffix case 'S': suffix_set(optarg); break; case 'T': { // Since xz 5.4.0: Ignore leading '+' first. const char *s = optarg; if (optarg[0] == '+') ++s; // The max is from src/liblzma/common/common.h. uint32_t t = str_to_uint64("threads", s, 0, 16384); // If leading '+' was used then use multi-threaded // mode even if exactly one thread was specified. if (t == 1 && optarg[0] == '+') t = UINT32_MAX; hardware_threads_set(t); break; } // --version case 'V': // This doesn't return. message_version(); // --stdout case 'c': opt_stdout = true; break; // --decompress case 'd': opt_mode = MODE_DECOMPRESS; break; // --extreme case 'e': coder_set_extreme(); break; // --force case 'f': opt_force = true; break; // --info-memory case OPT_INFO_MEMORY: // This doesn't return. hardware_memlimit_show(); // --help case 'h': // This doesn't return. message_help(false); // --long-help case 'H': // This doesn't return. message_help(true); // --list case 'l': opt_mode = MODE_LIST; break; // --keep case 'k': opt_keep_original = true; break; // --quiet case 'q': message_verbosity_decrease(); break; case 'Q': set_exit_no_warn(); break; case 't': opt_mode = MODE_TEST; break; // --verbose case 'v': message_verbosity_increase(); break; // --robot case OPT_ROBOT: opt_robot = true; // This is to make sure that floating point numbers // always have a dot as decimal separator. setlocale(LC_NUMERIC, "C"); break; case 'z': opt_mode = MODE_COMPRESS; break; // --filters case OPT_FILTERS: coder_add_filters_from_str(optarg); break; // --filters1...--filters9 case OPT_FILTERS1: case OPT_FILTERS2: case OPT_FILTERS3: case OPT_FILTERS4: case OPT_FILTERS5: case OPT_FILTERS6: case OPT_FILTERS7: case OPT_FILTERS8: case OPT_FILTERS9: coder_add_block_filters(optarg, (size_t)(c - OPT_FILTERS)); break; // --filters-help case OPT_FILTERS_HELP: // This doesn't return. message_filters_help(); break; case OPT_X86: coder_add_filter(LZMA_FILTER_X86, options_bcj(optarg)); break; case OPT_POWERPC: coder_add_filter(LZMA_FILTER_POWERPC, options_bcj(optarg)); break; case OPT_IA64: coder_add_filter(LZMA_FILTER_IA64, options_bcj(optarg)); break; case OPT_ARM: coder_add_filter(LZMA_FILTER_ARM, options_bcj(optarg)); break; case OPT_ARMTHUMB: coder_add_filter(LZMA_FILTER_ARMTHUMB, options_bcj(optarg)); break; case OPT_ARM64: coder_add_filter(LZMA_FILTER_ARM64, options_bcj(optarg)); break; case OPT_SPARC: coder_add_filter(LZMA_FILTER_SPARC, options_bcj(optarg)); break; case OPT_RISCV: coder_add_filter(LZMA_FILTER_RISCV, options_bcj(optarg)); break; case OPT_DELTA: coder_add_filter(LZMA_FILTER_DELTA, options_delta(optarg)); break; case OPT_LZMA1: coder_add_filter(LZMA_FILTER_LZMA1, options_lzma(optarg)); break; case OPT_LZMA2: coder_add_filter(LZMA_FILTER_LZMA2, options_lzma(optarg)); break; // Other // --format case 'F': { // Just in case, support both "lzma" and "alone" since // the latter was used for forward compatibility in // LZMA Utils 4.32.x. static const struct { char str[8]; enum format_type format; } types[] = { { "auto", FORMAT_AUTO }, { "xz", FORMAT_XZ }, { "lzma", FORMAT_LZMA }, { "alone", FORMAT_LZMA }, #ifdef HAVE_LZIP_DECODER { "lzip", FORMAT_LZIP }, #endif { "raw", FORMAT_RAW }, }; size_t i = 0; while (strcmp(types[i].str, optarg) != 0) if (++i == ARRAY_SIZE(types)) message_fatal(_("%s: Unknown file " "format type"), optarg); opt_format = types[i].format; break; } // --check case 'C': { static const struct { char str[8]; lzma_check check; } types[] = { { "none", LZMA_CHECK_NONE }, { "crc32", LZMA_CHECK_CRC32 }, { "crc64", LZMA_CHECK_CRC64 }, { "sha256", LZMA_CHECK_SHA256 }, }; size_t i = 0; while (strcmp(types[i].str, optarg) != 0) { if (++i == ARRAY_SIZE(types)) message_fatal(_("%s: Unsupported " "integrity " "check type"), optarg); } // Use a separate check in case we are using different // liblzma than what was used to compile us. if (!lzma_check_is_supported(types[i].check)) message_fatal(_("%s: Unsupported integrity " "check type"), optarg); coder_set_check(types[i].check); break; } case OPT_IGNORE_CHECK: opt_ignore_check = true; break; case OPT_BLOCK_SIZE: opt_block_size = str_to_uint64("block-size", optarg, 0, LZMA_VLI_MAX); break; case OPT_BLOCK_LIST: { parse_block_list(optarg); break; } case OPT_SINGLE_STREAM: opt_single_stream = true; + + // Since 5.7.1alpha --single-stream implies --keep. + opt_keep_original = true; break; case OPT_NO_SPARSE: io_no_sparse(); break; case OPT_FILES: args->files_delim = '\n'; - // Fall through + FALLTHROUGH; case OPT_FILES0: if (args->files_name != NULL) message_fatal(_("Only one file can be " "specified with '--files' " "or '--files0'.")); if (optarg == NULL) { args->files_name = stdin_filename; args->files_file = stdin; } else { args->files_name = optarg; args->files_file = fopen(optarg, c == OPT_FILES ? "r" : "rb"); if (args->files_file == NULL) // TRANSLATORS: This is a translatable // string because French needs a space // before the colon ("%s : %s"). message_fatal(_("%s: %s"), optarg, strerror(errno)); } break; case OPT_NO_ADJUST: opt_auto_adjust = false; break; case OPT_FLUSH_TIMEOUT: opt_flush_timeout = str_to_uint64("flush-timeout", optarg, 0, UINT64_MAX); break; + case OPT_NO_SYNC: + opt_synchronous = false; + break; + default: message_try_help(); tuklib_exit(E_ERROR, E_ERROR, false); } } return; } static void parse_environment(args_info *args, char *argv0, const char *varname) { char *env = getenv(varname); if (env == NULL) return; // We modify the string, so make a copy of it. env = xstrdup(env); // Calculate the number of arguments in env. argc stats at one // to include space for the program name. int argc = 1; bool prev_was_space = true; for (size_t i = 0; env[i] != '\0'; ++i) { // NOTE: Cast to unsigned char is needed so that correct // value gets passed to isspace(), which expects // unsigned char cast to int. Casting to int is done // automatically due to integer promotion, but we need to // force char to unsigned char manually. Otherwise 8-bit // characters would get promoted to wrong value if // char is signed. if (isspace((unsigned char)env[i])) { prev_was_space = true; } else if (prev_was_space) { prev_was_space = false; // Keep argc small enough to fit into a signed int // and to keep it usable for memory allocation. if (++argc == my_min( INT_MAX, SIZE_MAX / sizeof(char *))) message_fatal(_("The environment variable " "%s contains too many " "arguments"), varname); } } // Allocate memory to hold pointers to the arguments. Add one to get // space for the terminating NULL (if some systems happen to need it). char **argv = xmalloc(((size_t)(argc) + 1) * sizeof(char *)); argv[0] = argv0; argv[argc] = NULL; // Go through the string again. Split the arguments using '\0' // characters and add pointers to the resulting strings to argv. argc = 1; prev_was_space = true; for (size_t i = 0; env[i] != '\0'; ++i) { if (isspace((unsigned char)env[i])) { prev_was_space = true; env[i] = '\0'; } else if (prev_was_space) { prev_was_space = false; argv[argc++] = env + i; } } // Parse the argument list we got from the environment. All non-option // arguments i.e. filenames are ignored. parse_real(args, argc, argv); // Reset the state of the getopt_long() so that we can parse the // command line options too. There are two incompatible ways to // do it. #ifdef HAVE_OPTRESET // BSD optind = 1; optreset = 1; #else // GNU, Solaris optind = 0; #endif // We don't need the argument list from environment anymore. free(argv); free(env); return; } extern void args_parse(args_info *args, int argc, char **argv) { // Initialize those parts of *args that we need later. args->files_name = NULL; args->files_file = NULL; args->files_delim = '\0'; // Check how we were called. { // Remove the leading path name, if any. const char *name = strrchr(argv[0], '/'); if (name == NULL) name = argv[0]; else ++name; // NOTE: It's possible that name[0] is now '\0' if argv[0] // is weird, but it doesn't matter here. // Look for full command names instead of substrings like // "un", "cat", and "lz" to reduce possibility of false // positives when the programs have been renamed. if (strstr(name, "xzcat") != NULL) { opt_mode = MODE_DECOMPRESS; opt_stdout = true; } else if (strstr(name, "unxz") != NULL) { opt_mode = MODE_DECOMPRESS; } else if (strstr(name, "lzcat") != NULL) { opt_format = FORMAT_LZMA; opt_mode = MODE_DECOMPRESS; opt_stdout = true; } else if (strstr(name, "unlzma") != NULL) { opt_format = FORMAT_LZMA; opt_mode = MODE_DECOMPRESS; } else if (strstr(name, "lzma") != NULL) { opt_format = FORMAT_LZMA; } } // First the flags from the environment parse_environment(args, argv[0], "XZ_DEFAULTS"); parse_environment(args, argv[0], "XZ_OPT"); // Then from the command line parse_real(args, argc, argv); // If encoder or decoder support was omitted at build time, // show an error now so that the rest of the code can rely on // that whatever is in opt_mode is also supported. #ifndef HAVE_ENCODERS if (opt_mode == MODE_COMPRESS) message_fatal(_("Compression support was disabled " "at build time")); #endif #ifndef HAVE_DECODERS // Even MODE_LIST cannot work without decoder support so MODE_COMPRESS // is the only valid choice. if (opt_mode != MODE_COMPRESS) message_fatal(_("Decompression support was disabled " "at build time")); #endif #ifdef HAVE_LZIP_DECODER if (opt_mode == MODE_COMPRESS && opt_format == FORMAT_LZIP) message_fatal(_("Compression of lzip files (.lz) " "is not supported")); #endif // Never remove the source file when the destination is not on disk. // In test mode the data is written nowhere, but setting opt_stdout // will make the rest of the code behave well. if (opt_stdout || opt_mode == MODE_TEST) { opt_keep_original = true; opt_stdout = true; } + // Don't use fsync() if --keep is specified or implied. + // However, don't document this as "--keep implies --no-sync" + // because if syncing support was added to --flush-timeout, + // it would sync even if --keep was specified. + if (opt_keep_original) + opt_synchronous = false; + // When compressing, if no --format flag was used, or it // was --format=auto, we compress to the .xz format. if (opt_mode == MODE_COMPRESS && opt_format == FORMAT_AUTO) opt_format = FORMAT_XZ; // Set opt_block_list to NULL if we are not compressing to the .xz // format. This option cannot be used outside of this case, and // simplifies the implementation later. if ((opt_mode != MODE_COMPRESS || opt_format != FORMAT_XZ) && opt_block_list != NULL) { message(V_WARNING, _("--block-list is ignored unless " "compressing to the .xz format")); free(opt_block_list); opt_block_list = NULL; } // If raw format is used and a custom suffix is not provided, // then only stdout mode can be used when compressing or // decompressing. if (opt_format == FORMAT_RAW && !suffix_is_set() && !opt_stdout && (opt_mode == MODE_COMPRESS || opt_mode == MODE_DECOMPRESS)) { if (args->files_name != NULL) message_fatal(_("With --format=raw, " "--suffix=.SUF is required " "unless writing to stdout")); // If all of the filenames provided are "-" (more than one // "-" could be specified) or no filenames are provided, // then we are only going to be writing to standard out. for (int i = optind; i < argc; i++) { if (strcmp(argv[i], "-") != 0) message_fatal(_("With --format=raw, " "--suffix=.SUF is required " "unless writing to stdout")); } } // Compression settings need to be validated (options themselves and // their memory usage) when compressing to any file format. It has to // be done also when uncompressing raw data, since for raw decoding // the options given on the command line are used to know what kind // of raw data we are supposed to decode. if (opt_mode == MODE_COMPRESS || (opt_format == FORMAT_RAW && opt_mode != MODE_LIST)) coder_set_compression_settings(); // If no filenames are given, use stdin. if (argv[optind] == NULL && args->files_name == NULL) { // We don't modify or free() the "-" constant. The caller // modifies this so don't make the struct itself const. static char *names_stdin[2] = { (char *)"-", NULL }; args->arg_names = names_stdin; args->arg_count = 1; } else { // We got at least one filename from the command line, or // --files or --files0 was specified. args->arg_names = argv + optind; args->arg_count = (unsigned int)(argc - optind); } return; } #ifndef NDEBUG extern void args_free(void) { free(opt_block_list); return; } #endif diff --git a/src/xz/args.h b/src/xz/args.h index e693ecd62280..7fdf37f1420f 100644 --- a/src/xz/args.h +++ b/src/xz/args.h @@ -1,44 +1,44 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file args.h /// \brief Argument parsing // // Authors: Lasse Collin // Jia Tan // /////////////////////////////////////////////////////////////////////////////// typedef struct { /// Filenames from command line char **arg_names; /// Number of filenames from command line unsigned int arg_count; /// Name of the file from which to read filenames. This is NULL /// if --files or --files0 was not used. const char *files_name; /// File opened for reading from which filenames are read. This is /// non-NULL only if files_name is non-NULL. FILE *files_file; /// Delimiter for filenames read from files_file char files_delim; } args_info; extern bool opt_stdout; extern bool opt_force; extern bool opt_keep_original; -// extern bool opt_recursive; +extern bool opt_synchronous; extern bool opt_robot; extern bool opt_ignore_check; extern const char stdin_filename[]; extern void args_parse(args_info *args, int argc, char **argv); extern void args_free(void); diff --git a/src/xz/coder.c b/src/xz/coder.c index 5e41f0df6802..c28f874a25f7 100644 --- a/src/xz/coder.c +++ b/src/xz/coder.c @@ -1,1474 +1,1476 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file coder.c /// \brief Compresses or uncompresses a file // // Authors: Lasse Collin // Jia Tan // /////////////////////////////////////////////////////////////////////////////// #include "private.h" #include "tuklib_integer.h" /// Return value type for coder_init(). enum coder_init_ret { CODER_INIT_NORMAL, CODER_INIT_PASSTHRU, CODER_INIT_ERROR, }; enum operation_mode opt_mode = MODE_COMPRESS; enum format_type opt_format = FORMAT_AUTO; bool opt_auto_adjust = true; bool opt_single_stream = false; uint64_t opt_block_size = 0; block_list_entry *opt_block_list = NULL; uint64_t block_list_largest; uint32_t block_list_chain_mask; /// Stream used to communicate with liblzma static lzma_stream strm = LZMA_STREAM_INIT; /// Maximum number of filter chains. The first filter chain is the default, /// and 9 other filter chains can be specified with --filtersX. #define NUM_FILTER_CHAIN_MAX 10 /// The default filter chain is in chains[0]. It is used for encoding /// in all supported formats and also for decdoing raw streams. The other /// filter chains are set by --filtersX to support changing filters with /// the --block-list option. static lzma_filter chains[NUM_FILTER_CHAIN_MAX][LZMA_FILTERS_MAX + 1]; /// Bitmask indicating which filter chains are actually used when encoding /// in the .xz format. This is needed since the filter chains specified using /// --filtersX (or the default filter chain) might in reality be unneeded /// if they are never used in --block-list. When --block-list isn't /// specified, only the default filter chain is used, thus the initial /// value of this variable is 1U << 0 (the number of the default chain is 0). static uint32_t chains_used_mask = 1U << 0; /// Input and output buffers static io_buf in_buf; static io_buf out_buf; /// Number of filters in the default filter chain. Zero indicates that /// we are using a preset. static uint32_t filters_count = 0; /// Number of the preset (0-9) static uint32_t preset_number = LZMA_PRESET_DEFAULT; /// True if the current default filter chain was set using the --filters /// option. The filter chain is reset if a preset option (like -9) or an /// old-style filter option (like --lzma2) is used after a --filters option. static bool string_to_filter_used = false; /// Integrity check type static lzma_check check; /// This becomes false if the --check=CHECK option is used. static bool check_default = true; /// Indicates if unconsumed input is allowed to remain after /// decoding has successfully finished. This is set for each file /// in coder_init(). static bool allow_trailing_input; #ifdef MYTHREAD_ENABLED static lzma_mt mt_options = { .flags = 0, .timeout = 300, }; #endif extern void coder_set_check(lzma_check new_check) { check = new_check; check_default = false; return; } static void forget_filter_chain(void) { // Setting a preset or using --filters makes us forget // the earlier custom filter chain (if any). if (filters_count > 0) { lzma_filters_free(chains[0], NULL); filters_count = 0; } string_to_filter_used = false; return; } extern void coder_set_preset(uint32_t new_preset) { preset_number &= ~LZMA_PRESET_LEVEL_MASK; preset_number |= new_preset; forget_filter_chain(); return; } extern void coder_set_extreme(void) { preset_number |= LZMA_PRESET_EXTREME; forget_filter_chain(); return; } extern void coder_add_filter(lzma_vli id, void *options) { if (filters_count == LZMA_FILTERS_MAX) message_fatal(_("Maximum number of filters is four")); if (string_to_filter_used) forget_filter_chain(); chains[0][filters_count].id = id; chains[0][filters_count].options = options; // Terminate the filter chain with LZMA_VLI_UNKNOWN to simplify // implementation of forget_filter_chain(). chains[0][++filters_count].id = LZMA_VLI_UNKNOWN; // Setting a custom filter chain makes us forget the preset options. // This makes a difference if one specifies e.g. "xz -9 --lzma2 -e" // where the custom filter chain resets the preset level back to // the default 6, making the example equivalent to "xz -6e". preset_number = LZMA_PRESET_DEFAULT; return; } static void str_to_filters(const char *str, uint32_t index, uint32_t flags) { int error_pos; const char *err = lzma_str_to_filters(str, &error_pos, chains[index], flags, NULL); if (err != NULL) { char filter_num[2] = ""; if (index > 0) filter_num[0] = '0' + index; - // FIXME? The message in err isn't translated. - // Including the translations in the xz translations is - // slightly ugly but possible. Creating a new domain for - // liblzma might not be worth it especially since on some - // OSes it adds extra dependencies to translation libraries. + // liblzma doesn't translate the error messages but + // the messages are included in xz's translations. message(V_ERROR, _("Error in --filters%s=FILTERS option:"), filter_num); message(V_ERROR, "%s", str); message(V_ERROR, "%*s^", error_pos, ""); - message_fatal("%s", err); + message_fatal("%s", _(err)); } } extern void coder_add_filters_from_str(const char *filter_str) { // Forget presets and previously defined filter chain. See // coder_add_filter() above for why preset_number must be reset too. forget_filter_chain(); preset_number = LZMA_PRESET_DEFAULT; string_to_filter_used = true; // Include LZMA_STR_ALL_FILTERS so this can be used with --format=raw. str_to_filters(filter_str, 0, LZMA_STR_ALL_FILTERS); // Set the filters_count to be the number of filters converted from // the string. for (filters_count = 0; chains[0][filters_count].id != LZMA_VLI_UNKNOWN; ++filters_count) ; assert(filters_count > 0); return; } extern void coder_add_block_filters(const char *str, size_t slot) { // Free old filters first, if they were previously allocated. if (chains_used_mask & (1U << slot)) lzma_filters_free(chains[slot], NULL); str_to_filters(str, slot, 0); chains_used_mask |= 1U << slot; } tuklib_attr_noreturn static void memlimit_too_small(uint64_t memory_usage) { message(V_ERROR, _("Memory usage limit is too low for the given " "filter setup.")); message_mem_needed(V_ERROR, memory_usage); tuklib_exit(E_ERROR, E_ERROR, false); } #ifdef HAVE_ENCODERS /// \brief Calculate the memory usage of each filter chain. /// /// \param chains_memusages If non-NULL, the memusage of the encoder /// or decoder for each chain is stored in /// this array. /// \param mt If non-NULL, calculate memory usage of /// multithreaded encoder. /// \param encode Whether to calculate encoder or decoder /// memory usage. This must be true if /// mt != NULL. /// /// \return Return the highest memory usage of all of the filter chains. static uint64_t get_chains_memusage(uint64_t *chains_memusages, const lzma_mt *mt, bool encode) { uint64_t max_memusage = 0; #ifdef MYTHREAD_ENABLED // Copy multithreading options to a temporary struct since the // "filters" member needs to be changed. lzma_mt mt_local; if (mt != NULL) mt_local = *mt; #else (void)mt; #endif for (uint32_t i = 0; i < ARRAY_SIZE(chains); i++) { if (!(chains_used_mask & (1U << i))) continue; uint64_t memusage = UINT64_MAX; #ifdef MYTHREAD_ENABLED if (mt != NULL) { assert(encode); mt_local.filters = chains[i]; memusage = lzma_stream_encoder_mt_memusage(&mt_local); } else #endif if (encode) { memusage = lzma_raw_encoder_memusage(chains[i]); } #ifdef HAVE_DECODERS else { memusage = lzma_raw_decoder_memusage(chains[i]); } #endif if (chains_memusages != NULL) chains_memusages[i] = memusage; if (memusage > max_memusage) max_memusage = memusage; } return max_memusage; } #endif extern void coder_set_compression_settings(void) { #ifdef HAVE_LZIP_DECODER // .lz compression isn't supported. assert(opt_format != FORMAT_LZIP); #endif // The default check type is CRC64, but fallback to CRC32 // if CRC64 isn't supported by the copy of liblzma we are // using. CRC32 is always supported. if (check_default) { check = LZMA_CHECK_CRC64; if (!lzma_check_is_supported(check)) check = LZMA_CHECK_CRC32; } #ifdef HAVE_ENCODERS if (opt_block_list != NULL) { // args.c ensures these. assert(opt_mode == MODE_COMPRESS); assert(opt_format == FORMAT_XZ); // Find out if block_list_chain_mask has a bit set that // isn't set in chains_used_mask. const uint32_t missing_chains_mask = (block_list_chain_mask ^ chains_used_mask) & block_list_chain_mask; // If a filter chain was specified in --block-list but no // matching --filtersX option was used, exit with an error. if (missing_chains_mask != 0) { // Get the number of the first missing filter chain // and show it in the error message. const unsigned first_missing = (unsigned)ctz32(missing_chains_mask); message_fatal(_("filter chain %u used by " "--block-list but not specified " "with --filters%u="), first_missing, first_missing); } // Omit the unused filter chains from mask of used chains. // // (FIXME? When built with debugging, coder_free() will free() // the filter chains (except the default chain) which makes // Valgrind show fewer reachable allocations. But coder_free() // uses this mask to determine which chains to free. Thus it // won't free the ones that are cleared here from the mask. // In practice this doesn't matter.) chains_used_mask &= block_list_chain_mask; } else { // Reset filters used mask in case --block-list is not // used, but --filtersX is used. chains_used_mask = 1U << 0; } #endif // Options for LZMA1 or LZMA2 in case we are using a preset. static lzma_options_lzma opt_lzma; // The first filter in the chains[] array is for the default // filter chain. lzma_filter *default_filters = chains[0]; if (filters_count == 0 && chains_used_mask & 1) { // We are using a preset. This is not a good idea in raw mode // except when playing around with things. Different versions // of this software may use different options in presets, and // thus make uncompressing the raw data difficult. if (opt_format == FORMAT_RAW) { // The message is shown only if warnings are allowed // but the exit status isn't changed. message(V_WARNING, _("Using a preset in raw mode " "is discouraged.")); message(V_WARNING, _("The exact options of the " "presets may vary between software " "versions.")); } // Get the preset for LZMA1 or LZMA2. if (lzma_lzma_preset(&opt_lzma, preset_number)) message_bug(); // Use LZMA2 except with --format=lzma we use LZMA1. default_filters[0].id = opt_format == FORMAT_LZMA ? LZMA_FILTER_LZMA1 : LZMA_FILTER_LZMA2; default_filters[0].options = &opt_lzma; filters_count = 1; // Terminate the filter options array. default_filters[1].id = LZMA_VLI_UNKNOWN; } // If we are using the .lzma format, allow exactly one filter // which has to be LZMA1. There is no need to check if the default // filter chain is being used since it can only be disabled if // --block-list is used, which is incompatible with FORMAT_LZMA. if (opt_format == FORMAT_LZMA && (filters_count != 1 || default_filters[0].id != LZMA_FILTER_LZMA1)) message_fatal(_("The .lzma format supports only " "the LZMA1 filter")); // If we are using the .xz format, make sure that there is no LZMA1 // filter to prevent LZMA_PROG_ERROR. With the chains from --filtersX // we have already ensured this by calling lzma_str_to_filters() // without setting the flags that would allow non-.xz filters. if (opt_format == FORMAT_XZ && chains_used_mask & 1) for (size_t i = 0; i < filters_count; ++i) if (default_filters[i].id == LZMA_FILTER_LZMA1) message_fatal(_("LZMA1 cannot be used " "with the .xz format")); if (chains_used_mask & 1) { // Print the selected default filter chain. message_filters_show(V_DEBUG, default_filters); } // The --flush-timeout option requires LZMA_SYNC_FLUSH support // from the filter chain. Currently the threaded encoder doesn't // support LZMA_SYNC_FLUSH so single-threaded mode must be used. if (opt_mode == MODE_COMPRESS && opt_flush_timeout != 0) { for (unsigned i = 0; i < ARRAY_SIZE(chains); ++i) { if (!(chains_used_mask & (1U << i))) continue; const lzma_filter *fc = chains[i]; for (size_t j = 0; fc[j].id != LZMA_VLI_UNKNOWN; j++) { switch (fc[j].id) { case LZMA_FILTER_LZMA2: case LZMA_FILTER_DELTA: break; default: message_fatal(_("Filter chain %u is " "incompatible with " "--flush-timeout"), i); } } } if (hardware_threads_is_mt()) { message(V_WARNING, _("Switching to single-threaded " "mode due to --flush-timeout")); hardware_threads_set(1); } } // Get memory limit and the memory usage of the used filter chains. // Note that if --format=raw was used, we can be decompressing // using the default filter chain. // // If multithreaded .xz compression is done, the memory limit // will be replaced. uint64_t memory_limit = hardware_memlimit_get(opt_mode); uint64_t memory_usage = UINT64_MAX; #ifdef HAVE_ENCODERS // Memory usage for each encoder filter chain (default // or --filtersX). The encoder options may need to be // scaled down depending on the memory usage limit. uint64_t encoder_memusages[ARRAY_SIZE(chains)]; #endif if (opt_mode == MODE_COMPRESS) { #ifdef HAVE_ENCODERS # ifdef MYTHREAD_ENABLED if (opt_format == FORMAT_XZ && hardware_threads_is_mt()) { memory_limit = hardware_memlimit_mtenc_get(); mt_options.threads = hardware_threads_get(); uint64_t block_size = opt_block_size; // If opt_block_size is not set, find the maximum // recommended Block size based on the filter chains if (block_size == 0) { for (unsigned i = 0; i < ARRAY_SIZE(chains); i++) { if (!(chains_used_mask & (1U << i))) continue; uint64_t size = lzma_mt_block_size( chains[i]); // If this returns an error, then one // of the filter chains in use is // invalid, so there is no point in // progressing further. if (size == UINT64_MAX) message_fatal(_("Unsupported " "options in filter " "chain %u"), i); if (size > block_size) block_size = size; } // If --block-list was used and our current // Block size exceeds the largest size // in --block-list, reduce the Block size of // the multithreaded encoder. The extra size // would only be a waste of RAM. With a // smaller Block size we might even be able // to use more threads in some cases. if (block_list_largest > 0 && block_size > block_list_largest) block_size = block_list_largest; } mt_options.block_size = block_size; mt_options.check = check; memory_usage = get_chains_memusage(encoder_memusages, &mt_options, true); if (memory_usage != UINT64_MAX) message(V_DEBUG, _("Using up to %" PRIu32 " threads."), mt_options.threads); } else # endif { memory_usage = get_chains_memusage(encoder_memusages, NULL, true); } #endif } else { #ifdef HAVE_DECODERS memory_usage = lzma_raw_decoder_memusage(default_filters); #endif } if (memory_usage == UINT64_MAX) message_fatal(_("Unsupported filter chain or filter options")); // Print memory usage info before possible dictionary // size auto-adjusting. // // NOTE: If only encoder support was built, we cannot show // what the decoder memory usage will be. message_mem_needed(V_DEBUG, memory_usage); #if defined(HAVE_ENCODERS) && defined(HAVE_DECODERS) if (opt_mode == MODE_COMPRESS && message_verbosity_get() >= V_DEBUG) { const uint64_t decmem = get_chains_memusage(NULL, NULL, false); if (decmem != UINT64_MAX) message(V_DEBUG, _("Decompression will need " "%s MiB of memory."), uint64_to_str( round_up_to_mib(decmem), 0)); } #endif if (memory_usage <= memory_limit) return; // With --format=raw settings are never adjusted to meet // the memory usage limit. if (opt_format == FORMAT_RAW) memlimit_too_small(memory_usage); assert(opt_mode == MODE_COMPRESS); #ifdef HAVE_ENCODERS # ifdef MYTHREAD_ENABLED if (opt_format == FORMAT_XZ && hardware_threads_is_mt()) { // Try to reduce the number of threads before // adjusting the compression settings down. while (mt_options.threads > 1) { // Reduce the number of threads by one and check // the memory usage. --mt_options.threads; memory_usage = get_chains_memusage(encoder_memusages, &mt_options, true); if (memory_usage == UINT64_MAX) message_bug(); if (memory_usage <= memory_limit) { // The memory usage is now low enough. // // Since 5.6.1: This is only shown at // V_DEBUG instead of V_WARNING because // changing the number of threads doesn't // affect the output. On some systems this // message would be too common now that // multithreaded compression is the default. message(V_DEBUG, _("Reduced the number of " "threads from %s to %s to not exceed " "the memory usage limit of %s MiB"), uint64_to_str( hardware_threads_get(), 0), uint64_to_str(mt_options.threads, 1), uint64_to_str(round_up_to_mib( memory_limit), 2)); return; } } // If the memory usage limit is only a soft limit (automatic // number of threads and no --memlimit-compress), the limit // is only used to reduce the number of threads and once at // just one thread, the limit is completely ignored. This // way -T0 won't use insane amount of memory but at the same // time the soft limit will never make xz fail and never make // xz change settings that would affect the compressed output. // // Since 5.6.1: Like above, this is now shown at V_DEBUG // instead of V_WARNING. if (hardware_memlimit_mtenc_is_default()) { message(V_DEBUG, _("Reduced the number of threads " "from %s to one. The automatic memory usage " "limit of %s MiB is still being exceeded. " "%s MiB of memory is required. " "Continuing anyway."), uint64_to_str(hardware_threads_get(), 0), uint64_to_str( round_up_to_mib(memory_limit), 1), uint64_to_str( round_up_to_mib(memory_usage), 2)); return; } // If --no-adjust was used, we cannot drop to single-threaded // mode since it produces different compressed output. // // NOTE: In xz 5.2.x, --no-adjust also prevented reducing // the number of threads. This changed in 5.3.3alpha. if (!opt_auto_adjust) memlimit_too_small(memory_usage); // Switch to single-threaded mode. It uses // less memory than using one thread in // the multithreaded mode but the output // is also different. hardware_threads_set(1); memory_usage = get_chains_memusage(encoder_memusages, NULL, true); message(V_WARNING, _("Switching to single-threaded mode " "to not exceed the memory usage limit of %s MiB"), uint64_to_str(round_up_to_mib(memory_limit), 0)); } # endif if (memory_usage <= memory_limit) return; // Don't adjust LZMA2 or LZMA1 dictionary size if --no-adjust // was specified as that would change the compressed output. if (!opt_auto_adjust) memlimit_too_small(memory_usage); // Adjust each filter chain that is exceeding the memory usage limit. for (unsigned i = 0; i < ARRAY_SIZE(chains); i++) { // Skip unused chains. if (!(chains_used_mask & (1U << i))) continue; // Skip chains that already meet the memory usage limit. if (encoder_memusages[i] <= memory_limit) continue; // Look for the last filter if it is LZMA2 or LZMA1, so we // can make it use less RAM. We cannot adjust other filters. unsigned j = 0; while (chains[i][j].id != LZMA_FILTER_LZMA2 && chains[i][j].id != LZMA_FILTER_LZMA1) { // NOTE: This displays the too high limit of this // particular filter chain. If multiple chains are // specified and another one would need more then // this message could be confusing. As long as LZMA2 // is the only memory hungry filter in .xz this // doesn't matter at all in practice. // // FIXME? However, it's sort of odd still if we had // switched from multithreaded mode to single-threaded // mode because single-threaded produces different // output. So the messages could perhaps be clearer. // Another case of this is a few lines below. if (chains[i][j].id == LZMA_VLI_UNKNOWN) memlimit_too_small(encoder_memusages[i]); ++j; } // Decrease the dictionary size until we meet the memory // usage limit. First round down to full mebibytes. lzma_options_lzma *opt = chains[i][j].options; const uint32_t orig_dict_size = opt->dict_size; opt->dict_size &= ~((UINT32_C(1) << 20) - 1); while (true) { // If it is below 1 MiB, auto-adjusting failed. // // FIXME? See the FIXME a few lines above. if (opt->dict_size < (UINT32_C(1) << 20)) memlimit_too_small(encoder_memusages[i]); encoder_memusages[i] = lzma_raw_encoder_memusage(chains[i]); if (encoder_memusages[i] == UINT64_MAX) message_bug(); // Accept it if it is low enough. if (encoder_memusages[i] <= memory_limit) break; // Otherwise adjust it 1 MiB down and try again. opt->dict_size -= UINT32_C(1) << 20; } // Tell the user that we decreased the dictionary size. // The message is slightly different between the default // filter chain (0) or and chains from --filtersX. const char lzma_num = chains[i][j].id == LZMA_FILTER_LZMA2 ? '2' : '1'; const char *from_size = uint64_to_str(orig_dict_size >> 20, 0); const char *to_size = uint64_to_str(opt->dict_size >> 20, 1); const char *limit_size = uint64_to_str(round_up_to_mib( memory_limit), 2); if (i == 0) message(V_WARNING, _("Adjusted LZMA%c dictionary " "size from %s MiB to %s MiB to not exceed the " "memory usage limit of %s MiB"), lzma_num, from_size, to_size, limit_size); else message(V_WARNING, _("Adjusted LZMA%c dictionary size " "for --filters%u from %s MiB to %s MiB to not " "exceed the memory usage limit of %s MiB"), lzma_num, i, from_size, to_size, limit_size); } #endif return; } #ifdef HAVE_DECODERS /// Return true if the data in in_buf seems to be in the .xz format. static bool is_format_xz(void) { // Specify the magic as hex to be compatible with EBCDIC systems. static const uint8_t magic[6] = { 0xFD, 0x37, 0x7A, 0x58, 0x5A, 0x00 }; return strm.avail_in >= sizeof(magic) && memcmp(in_buf.u8, magic, sizeof(magic)) == 0; } /// Return true if the data in in_buf seems to be in the .lzma format. static bool is_format_lzma(void) { // The .lzma header is 13 bytes. if (strm.avail_in < 13) return false; // Decode the LZMA1 properties. lzma_filter filter = { .id = LZMA_FILTER_LZMA1 }; if (lzma_properties_decode(&filter, NULL, in_buf.u8, 5) != LZMA_OK) return false; // A hack to ditch tons of false positives: We allow only dictionary // sizes that are 2^n or 2^n + 2^(n-1) or UINT32_MAX. LZMA_Alone // created only files with 2^n, but accepts any dictionary size. // If someone complains, this will be reconsidered. lzma_options_lzma *opt = filter.options; const uint32_t dict_size = opt->dict_size; free(opt); if (dict_size != UINT32_MAX) { uint32_t d = dict_size - 1; d |= d >> 2; d |= d >> 3; d |= d >> 4; d |= d >> 8; d |= d >> 16; ++d; if (d != dict_size || dict_size == 0) return false; } // Another hack to ditch false positives: Assume that if the // uncompressed size is known, it must be less than 256 GiB. // Again, if someone complains, this will be reconsidered. uint64_t uncompressed_size = 0; for (size_t i = 0; i < 8; ++i) uncompressed_size |= (uint64_t)(in_buf.u8[5 + i]) << (i * 8); if (uncompressed_size != UINT64_MAX && uncompressed_size > (UINT64_C(1) << 38)) return false; return true; } #ifdef HAVE_LZIP_DECODER /// Return true if the data in in_buf seems to be in the .lz format. static bool is_format_lzip(void) { static const uint8_t magic[4] = { 0x4C, 0x5A, 0x49, 0x50 }; return strm.avail_in >= sizeof(magic) && memcmp(in_buf.u8, magic, sizeof(magic)) == 0; } #endif #endif /// Detect the input file type (for now, this done only when decompressing), /// and initialize an appropriate coder. Return value indicates if a normal /// liblzma-based coder was initialized (CODER_INIT_NORMAL), if passthru /// mode should be used (CODER_INIT_PASSTHRU), or if an error occurred /// (CODER_INIT_ERROR). static enum coder_init_ret coder_init(file_pair *pair) { lzma_ret ret = LZMA_PROG_ERROR; // In most cases if there is input left when coding finishes, // something has gone wrong. Exceptions are --single-stream // and decoding .lz files which can contain trailing non-.lz data. // These will be handled later in this function. allow_trailing_input = false; // Set the first filter chain. If the --block-list option is not // used then use the default filter chain (chains[0]). // Otherwise, use first filter chain from the block list. lzma_filter *active_filters = opt_block_list == NULL ? chains[0] : chains[opt_block_list[0].chain_num]; if (opt_mode == MODE_COMPRESS) { #ifdef HAVE_ENCODERS switch (opt_format) { case FORMAT_AUTO: // args.c ensures this. assert(0); break; case FORMAT_XZ: # ifdef MYTHREAD_ENABLED mt_options.filters = active_filters; if (hardware_threads_is_mt()) ret = lzma_stream_encoder_mt( &strm, &mt_options); else # endif ret = lzma_stream_encoder( &strm, active_filters, check); break; case FORMAT_LZMA: ret = lzma_alone_encoder(&strm, active_filters[0].options); break; # ifdef HAVE_LZIP_DECODER case FORMAT_LZIP: // args.c should disallow this. assert(0); ret = LZMA_PROG_ERROR; break; # endif case FORMAT_RAW: ret = lzma_raw_encoder(&strm, active_filters); break; } #endif } else { #ifdef HAVE_DECODERS uint32_t flags = 0; // It seems silly to warn about unsupported check if the // check won't be verified anyway due to --ignore-check. if (opt_ignore_check) flags |= LZMA_IGNORE_CHECK; else flags |= LZMA_TELL_UNSUPPORTED_CHECK; if (opt_single_stream) allow_trailing_input = true; else flags |= LZMA_CONCATENATED; // We abuse FORMAT_AUTO to indicate unknown file format, // for which we may consider passthru mode. enum format_type init_format = FORMAT_AUTO; switch (opt_format) { case FORMAT_AUTO: // .lz is checked before .lzma since .lzma detection // is more complicated (no magic bytes). if (is_format_xz()) init_format = FORMAT_XZ; # ifdef HAVE_LZIP_DECODER else if (is_format_lzip()) init_format = FORMAT_LZIP; # endif else if (is_format_lzma()) init_format = FORMAT_LZMA; break; case FORMAT_XZ: if (is_format_xz()) init_format = FORMAT_XZ; break; case FORMAT_LZMA: if (is_format_lzma()) init_format = FORMAT_LZMA; break; # ifdef HAVE_LZIP_DECODER case FORMAT_LZIP: if (is_format_lzip()) init_format = FORMAT_LZIP; break; # endif case FORMAT_RAW: init_format = FORMAT_RAW; break; } switch (init_format) { case FORMAT_AUTO: // Unknown file format. If --decompress --stdout // --force have been given, then we copy the input // as is to stdout. Checking for MODE_DECOMPRESS // is needed, because we don't want to do use // passthru mode with --test. if (opt_mode == MODE_DECOMPRESS && opt_stdout && opt_force) { // These are needed for progress info. strm.total_in = 0; strm.total_out = 0; return CODER_INIT_PASSTHRU; } ret = LZMA_FORMAT_ERROR; break; case FORMAT_XZ: # ifdef MYTHREAD_ENABLED mt_options.flags = flags; mt_options.threads = hardware_threads_get(); mt_options.memlimit_stop = hardware_memlimit_get(MODE_DECOMPRESS); // If single-threaded mode was requested, set the // memlimit for threading to zero. This forces the // decoder to use single-threaded mode which matches // the behavior of lzma_stream_decoder(). // // Otherwise use the limit for threaded decompression // which has a sane default (users are still free to // make it insanely high though). mt_options.memlimit_threading = mt_options.threads == 1 ? 0 : hardware_memlimit_mtdec_get(); ret = lzma_stream_decoder_mt(&strm, &mt_options); # else ret = lzma_stream_decoder(&strm, hardware_memlimit_get( MODE_DECOMPRESS), flags); # endif break; case FORMAT_LZMA: ret = lzma_alone_decoder(&strm, hardware_memlimit_get( MODE_DECOMPRESS)); break; # ifdef HAVE_LZIP_DECODER case FORMAT_LZIP: allow_trailing_input = true; ret = lzma_lzip_decoder(&strm, hardware_memlimit_get( MODE_DECOMPRESS), flags); break; # endif case FORMAT_RAW: // Memory usage has already been checked in // coder_set_compression_settings(). ret = lzma_raw_decoder(&strm, active_filters); break; } // Try to decode the headers. This will catch too low // memory usage limit in case it happens in the first // Block of the first Stream, which is where it very // probably will happen if it is going to happen. // // This will also catch unsupported check type which // we treat as a warning only. If there are empty // concatenated Streams with unsupported check type then // the message can be shown more than once here. The loop // is used in case there is first a warning about // unsupported check type and then the first Block // would exceed the memlimit. if (ret == LZMA_OK && init_format != FORMAT_RAW) { strm.next_out = NULL; strm.avail_out = 0; while ((ret = lzma_code(&strm, LZMA_RUN)) == LZMA_UNSUPPORTED_CHECK) - message_warning(_("%s: %s"), pair->src_name, - message_strm(ret)); + message_warning(_("%s: %s"), + tuklib_mask_nonprint(pair->src_name), + message_strm(ret)); // With --single-stream lzma_code won't wait for // LZMA_FINISH and thus it can return LZMA_STREAM_END // if the file has no uncompressed data inside. // So treat LZMA_STREAM_END as LZMA_OK here. // When lzma_code() is called again in coder_normal() // it will return LZMA_STREAM_END again. if (ret == LZMA_STREAM_END) ret = LZMA_OK; } #endif } if (ret != LZMA_OK) { - message_error(_("%s: %s"), pair->src_name, message_strm(ret)); + message_error(_("%s: %s"), + tuklib_mask_nonprint(pair->src_name), + message_strm(ret)); if (ret == LZMA_MEMLIMIT_ERROR) message_mem_needed(V_ERROR, lzma_memusage(&strm)); return CODER_INIT_ERROR; } return CODER_INIT_NORMAL; } #ifdef HAVE_ENCODERS /// Resolve conflicts between opt_block_size and opt_block_list in single /// threaded mode. We want to default to opt_block_list, except when it is /// larger than opt_block_size. If this is the case for the current Block /// at *list_pos, then we break into smaller Blocks. Otherwise advance /// to the next Block in opt_block_list, and break apart if needed. static void split_block(uint64_t *block_remaining, uint64_t *next_block_remaining, size_t *list_pos) { if (*next_block_remaining > 0) { // The Block at *list_pos has previously been split up. assert(!hardware_threads_is_mt()); assert(opt_block_size > 0); assert(opt_block_list != NULL); if (*next_block_remaining > opt_block_size) { // We have to split the current Block at *list_pos // into another opt_block_size length Block. *block_remaining = opt_block_size; } else { // This is the last remaining split Block for the // Block at *list_pos. *block_remaining = *next_block_remaining; } *next_block_remaining -= *block_remaining; } else { // The Block at *list_pos has been finished. Go to the next // entry in the list. If the end of the list has been // reached, reuse the size and filters of the last Block. if (opt_block_list[*list_pos + 1].size != 0) { ++*list_pos; // Update the filters if needed. if (opt_block_list[*list_pos - 1].chain_num != opt_block_list[*list_pos].chain_num) { const unsigned chain_num = opt_block_list[*list_pos].chain_num; const lzma_filter *next = chains[chain_num]; const lzma_ret ret = lzma_filters_update( &strm, next); if (ret != LZMA_OK) { // This message is only possible if // the filter chain has unsupported // options since the filter chain is // validated using // lzma_raw_encoder_memusage() or // lzma_stream_encoder_mt_memusage(). // Some options are not validated until // the encoders are initialized. message_fatal( _("Error changing to " "filter chain %u: %s"), chain_num, message_strm(ret)); } } } *block_remaining = opt_block_list[*list_pos].size; // If in single-threaded mode, split up the Block if needed. // This is not needed in multi-threaded mode because liblzma // will do this due to how threaded encoding works. if (!hardware_threads_is_mt() && opt_block_size > 0 && *block_remaining > opt_block_size) { *next_block_remaining = *block_remaining - opt_block_size; *block_remaining = opt_block_size; } } } #endif static bool coder_write_output(file_pair *pair) { if (opt_mode != MODE_TEST) { if (io_write(pair, &out_buf, IO_BUFFER_SIZE - strm.avail_out)) return true; } strm.next_out = out_buf.u8; strm.avail_out = IO_BUFFER_SIZE; return false; } /// Compress or decompress using liblzma. static bool coder_normal(file_pair *pair) { // Encoder needs to know when we have given all the input to it. // The decoders need to know it too when we are using // LZMA_CONCATENATED. We need to check for src_eof here, because // the first input chunk has been already read if decompressing, // and that may have been the only chunk we will read. lzma_action action = pair->src_eof ? LZMA_FINISH : LZMA_RUN; lzma_ret ret; // Assume that something goes wrong. bool success = false; #ifdef HAVE_ENCODERS // block_remaining indicates how many input bytes to encode before // finishing the current .xz Block. The Block size is set with // --block-size=SIZE and --block-list. They have an effect only when // compressing to the .xz format. If block_remaining == UINT64_MAX, // only a single block is created. uint64_t block_remaining = UINT64_MAX; // next_block_remaining for when we are in single-threaded mode and // the Block in --block-list is larger than the --block-size=SIZE. uint64_t next_block_remaining = 0; // Position in opt_block_list. Unused if --block-list wasn't used. size_t list_pos = 0; // Handle --block-size for single-threaded mode and the first step // of --block-list. if (opt_mode == MODE_COMPRESS && opt_format == FORMAT_XZ) { // --block-size doesn't do anything here in threaded mode, // because the threaded encoder will take care of splitting // to fixed-sized Blocks. if (!hardware_threads_is_mt() && opt_block_size > 0) block_remaining = opt_block_size; // If --block-list was used, start with the first size. // // For threaded case, --block-size specifies how big Blocks // the encoder needs to be prepared to create at maximum // and --block-list will simultaneously cause new Blocks // to be started at specified intervals. To keep things // logical, the same is done in single-threaded mode. The // output is still not identical because in single-threaded // mode the size info isn't written into Block Headers. if (opt_block_list != NULL) { if (block_remaining < opt_block_list[list_pos].size) { assert(!hardware_threads_is_mt()); next_block_remaining = opt_block_list[list_pos].size - block_remaining; } else { block_remaining = opt_block_list[list_pos].size; } } } #endif strm.next_out = out_buf.u8; strm.avail_out = IO_BUFFER_SIZE; while (!user_abort) { // Fill the input buffer if it is empty and we aren't // flushing or finishing. if (strm.avail_in == 0 && action == LZMA_RUN) { strm.next_in = in_buf.u8; #ifdef HAVE_ENCODERS const size_t read_size = my_min(block_remaining, IO_BUFFER_SIZE); #else const size_t read_size = IO_BUFFER_SIZE; #endif strm.avail_in = io_read(pair, &in_buf, read_size); if (strm.avail_in == SIZE_MAX) break; if (pair->src_eof) { action = LZMA_FINISH; } #ifdef HAVE_ENCODERS else if (block_remaining != UINT64_MAX) { // Start a new Block after every // opt_block_size bytes of input. block_remaining -= strm.avail_in; if (block_remaining == 0) action = LZMA_FULL_BARRIER; } if (action == LZMA_RUN && pair->flush_needed) action = LZMA_SYNC_FLUSH; #endif } // Let liblzma do the actual work. ret = lzma_code(&strm, action); // Write out if the output buffer became full. if (strm.avail_out == 0) { if (coder_write_output(pair)) break; } #ifdef HAVE_ENCODERS if (ret == LZMA_STREAM_END && (action == LZMA_SYNC_FLUSH || action == LZMA_FULL_BARRIER)) { if (action == LZMA_SYNC_FLUSH) { // Flushing completed. Write the pending data // out immediately so that the reading side // can decompress everything compressed so far. if (coder_write_output(pair)) break; // Mark that we haven't seen any new input // since the previous flush. pair->src_has_seen_input = false; pair->flush_needed = false; } else { // Start a new Block after LZMA_FULL_BARRIER. if (opt_block_list == NULL) { assert(!hardware_threads_is_mt()); assert(opt_block_size > 0); block_remaining = opt_block_size; } else { split_block(&block_remaining, &next_block_remaining, &list_pos); } } // Start a new Block after LZMA_FULL_FLUSH or continue // the same block after LZMA_SYNC_FLUSH. action = LZMA_RUN; } else #endif if (ret != LZMA_OK) { // Determine if the return value indicates that we // won't continue coding. LZMA_NO_CHECK would be // here too if LZMA_TELL_ANY_CHECK was used. const bool stop = ret != LZMA_UNSUPPORTED_CHECK; if (stop) { // Write the remaining bytes even if something // went wrong, because that way the user gets // as much data as possible, which can be good // when trying to get at least some useful // data out of damaged files. if (coder_write_output(pair)) break; } if (ret == LZMA_STREAM_END) { if (allow_trailing_input) { io_fix_src_pos(pair, strm.avail_in); success = true; break; } // Check that there is no trailing garbage. // This is needed for LZMA_Alone and raw // streams. This is *not* done with .lz files // as that format specifically requires // allowing trailing garbage. if (strm.avail_in == 0 && !pair->src_eof) { // Try reading one more byte. // Hopefully we don't get any more // input, and thus pair->src_eof // becomes true. strm.avail_in = io_read( pair, &in_buf, 1); if (strm.avail_in == SIZE_MAX) break; assert(strm.avail_in == 0 || strm.avail_in == 1); } if (strm.avail_in == 0) { assert(pair->src_eof); success = true; break; } // We hadn't reached the end of the file. ret = LZMA_DATA_ERROR; assert(stop); } // If we get here and stop is true, something went // wrong and we print an error. Otherwise it's just // a warning and coding can continue. if (stop) { - message_error(_("%s: %s"), pair->src_name, - message_strm(ret)); + message_error(_("%s: %s"), + tuklib_mask_nonprint(pair->src_name), + message_strm(ret)); } else { - message_warning(_("%s: %s"), pair->src_name, - message_strm(ret)); + message_warning(_("%s: %s"), + tuklib_mask_nonprint(pair->src_name), + message_strm(ret)); // When compressing, all possible errors set // stop to true. assert(opt_mode != MODE_COMPRESS); } if (ret == LZMA_MEMLIMIT_ERROR) { // Display how much memory it would have // actually needed. message_mem_needed(V_ERROR, lzma_memusage(&strm)); } if (stop) break; } // Show progress information under certain conditions. message_progress_update(); } return success; } /// Copy from input file to output file without processing the data in any /// way. This is used only when trying to decompress unrecognized files /// with --decompress --stdout --force, so the output is always stdout. static bool coder_passthru(file_pair *pair) { while (strm.avail_in != 0) { if (user_abort) return false; if (io_write(pair, &in_buf, strm.avail_in)) return false; strm.total_in += strm.avail_in; strm.total_out = strm.total_in; message_progress_update(); strm.avail_in = io_read(pair, &in_buf, IO_BUFFER_SIZE); if (strm.avail_in == SIZE_MAX) return false; } return true; } extern void coder_run(const char *filename) { // Set and possibly print the filename for the progress message. message_filename(filename); // Try to open the input file. file_pair *pair = io_open_src(filename); if (pair == NULL) return; // Assume that something goes wrong. bool success = false; if (opt_mode == MODE_COMPRESS) { strm.next_in = NULL; strm.avail_in = 0; } else { // Read the first chunk of input data. This is needed // to detect the input file type. strm.next_in = in_buf.u8; strm.avail_in = io_read(pair, &in_buf, IO_BUFFER_SIZE); } if (strm.avail_in != SIZE_MAX) { // Initialize the coder. This will detect the file format // and, in decompression or testing mode, check the memory // usage of the first Block too. This way we don't try to // open the destination file if we see that coding wouldn't // work at all anyway. This also avoids deleting the old // "target" file if --force was used. const enum coder_init_ret init_ret = coder_init(pair); if (init_ret != CODER_INIT_ERROR && !user_abort) { // Don't open the destination file when --test // is used. if (opt_mode == MODE_TEST || !io_open_dest(pair)) { // Remember the current time. It is needed // for progress indicator. mytime_set_start_time(); // Initialize the progress indicator. // // NOTE: When reading from stdin, fstat() // isn't called on it and thus src_st.st_size // is zero. If stdin pointed to a regular // file, it would still be possible to know // the file size but then we would also need // to take into account the current reading // position since with stdin it isn't // necessarily at the beginning of the file. const bool is_passthru = init_ret == CODER_INIT_PASSTHRU; const uint64_t in_size = pair->src_st.st_size <= 0 ? 0 : (uint64_t)(pair->src_st.st_size); message_progress_start(&strm, is_passthru, in_size); // Do the actual coding or passthru. if (is_passthru) success = coder_passthru(pair); else success = coder_normal(pair); message_progress_end(success); } } } // Close the file pair. It needs to know if coding was successful to // know if the source or target file should be unlinked. io_close(pair, success); return; } #ifndef NDEBUG extern void coder_free(void) { // Free starting from the second filter chain since the default // filter chain may have its options set from a static variable // in coder_set_compression_settings(). Since this is only run in // debug mode and will be freed when the process ends anyway, we // don't worry about freeing it. for (uint32_t i = 1; i < ARRAY_SIZE(chains); i++) { if (chains_used_mask & (1U << i)) lzma_filters_free(chains[i], NULL); } lzma_end(&strm); return; } #endif diff --git a/src/xz/file_io.c b/src/xz/file_io.c index 678a9a5ca860..8c83269b13fa 100644 --- a/src/xz/file_io.c +++ b/src/xz/file_io.c @@ -1,1317 +1,1483 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file file_io.c /// \brief File opening, unlinking, and closing // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "private.h" #include #ifdef TUKLIB_DOSLIKE # include #else # include +# include static bool warn_fchown; #endif #if defined(HAVE_FUTIMES) || defined(HAVE_FUTIMESAT) || defined(HAVE_UTIMES) # include #elif defined(HAVE__FUTIME) # include #elif defined(HAVE_UTIME) # include #endif #include "tuklib_open_stdxxx.h" #ifdef _MSC_VER # ifdef _WIN64 typedef __int64 ssize_t; # else typedef int ssize_t; # endif typedef int mode_t; # define S_IRUSR _S_IREAD # define S_IWUSR _S_IWRITE # define setmode _setmode # define open _open # define close _close # define lseek _lseeki64 # define unlink _unlink // The casts are to silence warnings. // The sizes are known to be small enough. # define read(fd, buf, size) _read(fd, buf, (unsigned int)(size)) # define write(fd, buf, size) _write(fd, buf, (unsigned int)(size)) # define S_ISDIR(m) (((m) & _S_IFMT) == _S_IFDIR) # define S_ISREG(m) (((m) & _S_IFMT) == _S_IFREG) #endif +#if defined(_WIN32) && !defined(__CYGWIN__) +# define fsync _commit +#endif + #ifndef O_BINARY # define O_BINARY 0 #endif #ifndef O_NOCTTY # define O_NOCTTY 0 #endif +// In musl 1.2.5, O_SEARCH is defined to O_PATH. As of Linux 6.12, +// a file descriptor from open("dir", O_SEARCH | O_DIRECTORY) cannot be +// used with fsync() (fails with EBADF). musl 1.2.5 doesn't emulate it +// using /proc/self/fd. Even if it did, it might need to do it with +// fd = open("/proc/...", O_RDONLY); fsync(fd); which fails if the +// directory lacks read permission. Since we need a working fsync(), +// O_RDONLY imitates O_SEARCH better than O_PATH. +#if defined(O_SEARCH) && defined(O_PATH) && O_SEARCH == O_PATH +# undef O_SEARCH +#endif + +#ifndef O_SEARCH +# define O_SEARCH O_RDONLY +#endif + +#ifndef O_DIRECTORY +# define O_DIRECTORY 0 +#endif + // Using this macro to silence a warning from gcc -Wlogical-op. #if EAGAIN == EWOULDBLOCK # define IS_EAGAIN_OR_EWOULDBLOCK(e) ((e) == EAGAIN) #else # define IS_EAGAIN_OR_EWOULDBLOCK(e) \ ((e) == EAGAIN || (e) == EWOULDBLOCK) #endif typedef enum { IO_WAIT_MORE, // Reading or writing is possible. IO_WAIT_ERROR, // Error or user_abort IO_WAIT_TIMEOUT, // poll() timed out } io_wait_ret; /// If true, try to create sparse files when decompressing. static bool try_sparse = true; #ifndef TUKLIB_DOSLIKE /// File status flags of standard input. This is used by io_open_src() /// and io_close_src(). static int stdin_flags; static bool restore_stdin_flags = false; /// Original file status flags of standard output. This is used by /// io_open_dest() and io_close_dest() to save and restore the flags. static int stdout_flags; static bool restore_stdout_flags = false; /// Self-pipe used together with the user_abort variable to avoid /// race conditions with signal handling. static int user_abort_pipe[2]; #endif static bool io_write_buf(file_pair *pair, const uint8_t *buf, size_t size); extern void io_init(void) { // Make sure that stdin, stdout, and stderr are connected to // a valid file descriptor. Exit immediately with exit code ERROR // if we cannot make the file descriptors valid. Maybe we should // print an error message, but our stderr could be screwed anyway. tuklib_open_stdxxx(E_ERROR); #ifndef TUKLIB_DOSLIKE // If fchown() fails setting the owner, we warn about it only if // we are root. warn_fchown = geteuid() == 0; // Create a pipe for the self-pipe trick. if (pipe(user_abort_pipe)) message_fatal(_("Error creating a pipe: %s"), strerror(errno)); // Make both ends of the pipe non-blocking. for (unsigned i = 0; i < 2; ++i) { int flags = fcntl(user_abort_pipe[i], F_GETFL); if (flags == -1 || fcntl(user_abort_pipe[i], F_SETFL, flags | O_NONBLOCK) == -1) message_fatal(_("Error creating a pipe: %s"), strerror(errno)); } #endif #ifdef __DJGPP__ // Avoid doing useless things when statting files. // This isn't important but doesn't hurt. _djstat_flags = _STAT_EXEC_EXT | _STAT_EXEC_MAGIC | _STAT_DIRSIZE; #endif return; } #ifndef TUKLIB_DOSLIKE extern void io_write_to_user_abort_pipe(void) { // If the write() fails, it's probably due to the pipe being full. // Failing in that case is fine. If the reason is something else, // there's not much we can do since this is called in a signal // handler. So ignore the errors and try to avoid warnings with // GCC and glibc when _FORTIFY_SOURCE=2 is used. uint8_t b = '\0'; const ssize_t ret = write(user_abort_pipe[1], &b, 1); (void)ret; return; } #endif extern void io_no_sparse(void) { try_sparse = false; return; } #ifndef TUKLIB_DOSLIKE /// \brief Waits for input or output to become available or for a signal /// /// This uses the self-pipe trick to avoid a race condition that can occur /// if a signal is caught after user_abort has been checked but before e.g. /// read() has been called. In that situation read() could block unless /// non-blocking I/O is used. With non-blocking I/O something like select() /// or poll() is needed to avoid a busy-wait loop, and the same race condition /// pops up again. There are pselect() (POSIX-1.2001) and ppoll() (not in /// POSIX) but neither is portable enough in 2013. The self-pipe trick is /// old and very portable. static io_wait_ret io_wait(file_pair *pair, int timeout, bool is_reading) { struct pollfd pfd[2]; if (is_reading) { pfd[0].fd = pair->src_fd; pfd[0].events = POLLIN; } else { pfd[0].fd = pair->dest_fd; pfd[0].events = POLLOUT; } pfd[1].fd = user_abort_pipe[0]; pfd[1].events = POLLIN; while (true) { const int ret = poll(pfd, 2, timeout); if (user_abort) return IO_WAIT_ERROR; if (ret == -1) { if (errno == EINTR || errno == EAGAIN) continue; message_error(_("%s: poll() failed: %s"), - is_reading ? pair->src_name - : pair->dest_name, + tuklib_mask_nonprint(is_reading + ? pair->src_name + : pair->dest_name), strerror(errno)); return IO_WAIT_ERROR; } if (ret == 0) return IO_WAIT_TIMEOUT; if (pfd[0].revents != 0) return IO_WAIT_MORE; } } #endif /// \brief Unlink a file /// /// This tries to verify that the file being unlinked really is the file that /// we want to unlink by verifying device and inode numbers. There's still /// a small unavoidable race, but this is much better than nothing (the file /// could have been moved/replaced even hours earlier). static void io_unlink(const char *name, const struct stat *known_st) { #if defined(TUKLIB_DOSLIKE) // On DOS-like systems, st_ino is meaningless, so don't bother // testing it. Just silence a compiler warning. (void)known_st; #else struct stat new_st; // If --force was used, use stat() instead of lstat(). This way // (de)compressing symlinks works correctly. However, it also means // that xz cannot detect if a regular file foo is renamed to bar // and then a symlink foo -> bar is created. Because of stat() // instead of lstat(), xz will think that foo hasn't been replaced // with another file. Thus, xz will remove foo even though it no // longer is the same file that xz used when it started compressing. // Probably it's not too bad though, so this doesn't need a more // complex fix. const int stat_ret = opt_force ? stat(name, &new_st) : lstat(name, &new_st); if (stat_ret # ifdef __VMS // st_ino is an array, and we don't want to // compare st_dev at all. || memcmp(&new_st.st_ino, &known_st->st_ino, sizeof(new_st.st_ino)) != 0 # else // Typical POSIX-like system || new_st.st_dev != known_st->st_dev || new_st.st_ino != known_st->st_ino # endif ) // TRANSLATORS: When compression or decompression finishes, // and xz is going to remove the source file, xz first checks // if the source file still exists, and if it does, does its // device and inode numbers match what xz saw when it opened // the source file. If these checks fail, this message is // shown, %s being the filename, and the file is not deleted. // The check for device and inode numbers is there, because // it is possible that the user has put a new file in place // of the original file, and in that case it obviously // shouldn't be removed. message_warning(_("%s: File seems to have been moved, " - "not removing"), name); + "not removing"), tuklib_mask_nonprint(name)); else #endif // There's a race condition between lstat() and unlink() // but at least we have tried to avoid removing wrong file. if (unlink(name)) message_warning(_("%s: Cannot remove: %s"), - name, strerror(errno)); + tuklib_mask_nonprint(name), + strerror(errno)); return; } /// \brief Copies owner/group and permissions /// /// \todo ACL and EA support /// static void io_copy_attrs(const file_pair *pair) { // Skip chown and chmod on Windows. #ifndef TUKLIB_DOSLIKE // This function is more tricky than you may think at first. // Blindly copying permissions may permit users to access the // destination file who didn't have permission to access the // source file. // Try changing the owner of the file. If we aren't root or the owner // isn't already us, fchown() probably doesn't succeed. We warn // about failing fchown() only if we are root. if (fchown(pair->dest_fd, pair->src_st.st_uid, (gid_t)(-1)) && warn_fchown) message_warning(_("%s: Cannot set the file owner: %s"), - pair->dest_name, strerror(errno)); + tuklib_mask_nonprint(pair->dest_name), + strerror(errno)); mode_t mode; // With BSD semantics the new dest file may have a group that // does not belong to the user. If the src file has the same gid // nothing has to be done. Nevertheless OpenBSD fchown(2) fails // in this case which seems to be POSIX compliant. As there is // nothing to do, skip the system call. if (pair->dest_st.st_gid != pair->src_st.st_gid && fchown(pair->dest_fd, (uid_t)(-1), pair->src_st.st_gid)) { message_warning(_("%s: Cannot set the file group: %s"), - pair->dest_name, strerror(errno)); + tuklib_mask_nonprint(pair->dest_name), + strerror(errno)); // We can still safely copy some additional permissions: // 'group' must be at least as strict as 'other' and // also vice versa. // // NOTE: After this, the owner of the source file may // get additional permissions. This shouldn't be too bad, // because the owner would have had permission to chmod // the original file anyway. mode = ((pair->src_st.st_mode & 0070) >> 3) & (pair->src_st.st_mode & 0007); mode = (pair->src_st.st_mode & 0700) | (mode << 3) | mode; } else { // Drop the setuid, setgid, and sticky bits. mode = pair->src_st.st_mode & 0777; } if (fchmod(pair->dest_fd, mode)) message_warning(_("%s: Cannot set the file permissions: %s"), - pair->dest_name, strerror(errno)); + tuklib_mask_nonprint(pair->dest_name), + strerror(errno)); #endif // Copy the timestamps. We have several possible ways to do this, of // which some are better in both security and precision. // // First, get the nanosecond part of the timestamps. As of writing, // it's not standardized by POSIX, and there are several names for // the same thing in struct stat. long atime_nsec; long mtime_nsec; # if defined(HAVE_STRUCT_STAT_ST_ATIM_TV_NSEC) // GNU and Solaris atime_nsec = pair->src_st.st_atim.tv_nsec; mtime_nsec = pair->src_st.st_mtim.tv_nsec; # elif defined(HAVE_STRUCT_STAT_ST_ATIMESPEC_TV_NSEC) // BSD atime_nsec = pair->src_st.st_atimespec.tv_nsec; mtime_nsec = pair->src_st.st_mtimespec.tv_nsec; # elif defined(HAVE_STRUCT_STAT_ST_ATIMENSEC) // GNU and BSD without extensions atime_nsec = pair->src_st.st_atimensec; mtime_nsec = pair->src_st.st_mtimensec; # elif defined(HAVE_STRUCT_STAT_ST_UATIME) // Tru64 atime_nsec = pair->src_st.st_uatime * 1000; mtime_nsec = pair->src_st.st_umtime * 1000; # elif defined(HAVE_STRUCT_STAT_ST_ATIM_ST__TIM_TV_NSEC) // UnixWare atime_nsec = pair->src_st.st_atim.st__tim.tv_nsec; mtime_nsec = pair->src_st.st_mtim.st__tim.tv_nsec; # else // Safe fallback atime_nsec = 0; mtime_nsec = 0; # endif // Construct a structure to hold the timestamps and call appropriate // function to set the timestamps. #if defined(HAVE_FUTIMENS) // Use nanosecond precision. struct timespec tv[2]; tv[0].tv_sec = pair->src_st.st_atime; tv[0].tv_nsec = atime_nsec; tv[1].tv_sec = pair->src_st.st_mtime; tv[1].tv_nsec = mtime_nsec; (void)futimens(pair->dest_fd, tv); #elif defined(HAVE_FUTIMES) || defined(HAVE_FUTIMESAT) || defined(HAVE_UTIMES) // Use microsecond precision. struct timeval tv[2]; tv[0].tv_sec = pair->src_st.st_atime; tv[0].tv_usec = atime_nsec / 1000; tv[1].tv_sec = pair->src_st.st_mtime; tv[1].tv_usec = mtime_nsec / 1000; # if defined(HAVE_FUTIMES) (void)futimes(pair->dest_fd, tv); # elif defined(HAVE_FUTIMESAT) (void)futimesat(pair->dest_fd, NULL, tv); # else // Argh, no function to use a file descriptor to set the timestamp. (void)utimes(pair->dest_name, tv); # endif #elif defined(HAVE__FUTIME) // Use one-second precision with Windows-specific _futime(). // We could use utime() too except that for some reason the // timestamp will get reset at close(). With _futime() it works. // This struct cannot be const as _futime() takes a non-const pointer. struct _utimbuf buf = { .actime = pair->src_st.st_atime, .modtime = pair->src_st.st_mtime, }; // Avoid warnings. (void)atime_nsec; (void)mtime_nsec; (void)_futime(pair->dest_fd, &buf); #elif defined(HAVE_UTIME) // Use one-second precision. utime() doesn't support using file // descriptor either. Some systems have broken utime() prototype // so don't make this const. struct utimbuf buf = { .actime = pair->src_st.st_atime, .modtime = pair->src_st.st_mtime, }; // Avoid warnings. (void)atime_nsec; (void)mtime_nsec; (void)utime(pair->dest_name, &buf); #endif return; } +/// \brief Synchronizes the destination file to permanent storage +/// +/// \param pair File pair having the destination file open for writing +/// +/// \return On success, false is returned. On error, error message +/// is printed and true is returned. +static bool +io_sync_dest(file_pair *pair) +{ + assert(pair->dest_fd != -1); + assert(pair->dest_fd != STDOUT_FILENO); + + if (fsync(pair->dest_fd)) { + message_error(_("%s: Synchronizing the file failed: %s"), + tuklib_mask_nonprint(pair->dest_name), + strerror(errno)); + return true; + } + +#ifndef TUKLIB_DOSLIKE + if (fsync(pair->dir_fd)) { + message_error(_("%s: Synchronizing the directory of " + "the file failed: %s"), + tuklib_mask_nonprint(pair->dest_name), + strerror(errno)); + return true; + } +#endif + + return false; +} + + /// Opens the source file. Returns false on success, true on error. static bool io_open_src_real(file_pair *pair) { // There's nothing to open when reading from stdin. if (pair->src_name == stdin_filename) { pair->src_fd = STDIN_FILENO; #ifdef TUKLIB_DOSLIKE setmode(STDIN_FILENO, O_BINARY); #else // Try to set stdin to non-blocking mode. It won't work // e.g. on OpenBSD if stdout is e.g. /dev/null. In such // case we proceed as if stdin were non-blocking anyway // (in case of /dev/null it will be in practice). The // same applies to stdout in io_open_dest_real(). stdin_flags = fcntl(STDIN_FILENO, F_GETFL); if (stdin_flags == -1) { message_error(_("Error getting the file status flags " "from standard input: %s"), strerror(errno)); return true; } if ((stdin_flags & O_NONBLOCK) == 0 && fcntl(STDIN_FILENO, F_SETFL, stdin_flags | O_NONBLOCK) != -1) restore_stdin_flags = true; #endif #ifdef HAVE_POSIX_FADVISE // It will fail if stdin is a pipe and that's fine. (void)posix_fadvise(STDIN_FILENO, 0, 0, opt_mode == MODE_LIST ? POSIX_FADV_RANDOM : POSIX_FADV_SEQUENTIAL); #endif return false; } // Symlinks are not followed unless writing to stdout or --force // or --keep was used. const bool follow_symlinks = opt_stdout || opt_force || opt_keep_original; // We accept only regular files if we are writing the output // to disk too. bzip2 allows overriding this with --force but // gzip and xz don't. const bool reg_files_only = !opt_stdout; // Flags for open() int flags = O_RDONLY | O_BINARY | O_NOCTTY; #ifndef TUKLIB_DOSLIKE // Use non-blocking I/O: // - It prevents blocking when opening FIFOs and some other // special files, which is good if we want to accept only // regular files. // - It can help avoiding some race conditions with signal handling. flags |= O_NONBLOCK; #endif #if defined(O_NOFOLLOW) if (!follow_symlinks) flags |= O_NOFOLLOW; #elif !defined(TUKLIB_DOSLIKE) // Some POSIX-like systems lack O_NOFOLLOW (it's not required // by POSIX). Check for symlinks with a separate lstat() on // these systems. if (!follow_symlinks) { struct stat st; if (lstat(pair->src_name, &st)) { - message_error(_("%s: %s"), pair->src_name, + message_error(_("%s: %s"), + tuklib_mask_nonprint(pair->src_name), strerror(errno)); return true; } else if (S_ISLNK(st.st_mode)) { message_warning(_("%s: Is a symbolic link, " - "skipping"), pair->src_name); + "skipping"), + tuklib_mask_nonprint(pair->src_name)); return true; } } #else // Avoid warnings. (void)follow_symlinks; #endif // Try to open the file. Signals have been blocked so EINTR shouldn't // be possible. pair->src_fd = open(pair->src_name, flags); if (pair->src_fd == -1) { // Signals (that have a signal handler) have been blocked. assert(errno != EINTR); #ifdef O_NOFOLLOW // Give an understandable error message if the reason // for failing was that the file was a symbolic link. // // Note that at least Linux, OpenBSD, Solaris, and Darwin // use ELOOP to indicate that O_NOFOLLOW was the reason // that open() failed. Because there may be // directories in the pathname, ELOOP may occur also // because of a symlink loop in the directory part. // So ELOOP doesn't tell us what actually went wrong, // and this stupidity went into POSIX-1.2008 too. // // FreeBSD associates EMLINK with O_NOFOLLOW and // Tru64 uses ENOTSUP. We use these directly here // and skip the lstat() call and the associated race. // I want to hear if there are other kernels that // fail with something else than ELOOP with O_NOFOLLOW. bool was_symlink = false; # if defined(__FreeBSD__) || defined(__DragonFly__) if (errno == EMLINK) was_symlink = true; # elif defined(__digital__) && defined(__unix__) if (errno == ENOTSUP) was_symlink = true; # elif defined(__NetBSD__) if (errno == EFTYPE) was_symlink = true; # else if (errno == ELOOP && !follow_symlinks) { const int saved_errno = errno; struct stat st; if (lstat(pair->src_name, &st) == 0 && S_ISLNK(st.st_mode)) was_symlink = true; errno = saved_errno; } # endif if (was_symlink) message_warning(_("%s: Is a symbolic link, " - "skipping"), pair->src_name); + "skipping"), + tuklib_mask_nonprint(pair->src_name)); else #endif // Something else than O_NOFOLLOW failing // (assuming that the race conditions didn't // confuse us). - message_error(_("%s: %s"), pair->src_name, + message_error(_("%s: %s"), + tuklib_mask_nonprint(pair->src_name), strerror(errno)); return true; } // Stat the source file. We need the result also when we copy // the permissions, and when unlinking. // // NOTE: Use stat() instead of fstat() with DJGPP, because // then we have a better chance to get st_ino value that can // be used in io_open_dest_real() to prevent overwriting the // source file. #ifdef __DJGPP__ if (stat(pair->src_name, &pair->src_st)) goto error_msg; #else if (fstat(pair->src_fd, &pair->src_st)) goto error_msg; #endif if (S_ISDIR(pair->src_st.st_mode)) { message_warning(_("%s: Is a directory, skipping"), - pair->src_name); + tuklib_mask_nonprint(pair->src_name)); goto error; } if (reg_files_only && !S_ISREG(pair->src_st.st_mode)) { message_warning(_("%s: Not a regular file, skipping"), - pair->src_name); + tuklib_mask_nonprint(pair->src_name)); goto error; } #ifndef TUKLIB_DOSLIKE if (reg_files_only && !opt_force && !opt_keep_original) { if (pair->src_st.st_mode & (S_ISUID | S_ISGID)) { // gzip rejects setuid and setgid files even // when --force was used. bzip2 doesn't check // for them, but calls fchown() after fchmod(), // and many systems automatically drop setuid // and setgid bits there. // // We accept setuid and setgid files if // --force or --keep was used. We drop these bits // explicitly in io_copy_attr(). message_warning(_("%s: File has setuid or " "setgid bit set, skipping"), - pair->src_name); + tuklib_mask_nonprint(pair->src_name)); goto error; } if (pair->src_st.st_mode & S_ISVTX) { message_warning(_("%s: File has sticky bit " "set, skipping"), - pair->src_name); + tuklib_mask_nonprint(pair->src_name)); goto error; } if (pair->src_st.st_nlink > 1) { message_warning(_("%s: Input file has more " - "than one hard link, " - "skipping"), pair->src_name); + "than one hard link, skipping"), + tuklib_mask_nonprint(pair->src_name)); goto error; } } // If it is something else than a regular file, wait until // there is input available. This way reading from FIFOs // will work when open() is used with O_NONBLOCK. if (!S_ISREG(pair->src_st.st_mode)) { signals_unblock(); const io_wait_ret ret = io_wait(pair, -1, true); signals_block(); if (ret != IO_WAIT_MORE) goto error; } #endif #ifdef HAVE_POSIX_FADVISE // It will fail with some special files like FIFOs but that is fine. (void)posix_fadvise(pair->src_fd, 0, 0, opt_mode == MODE_LIST ? POSIX_FADV_RANDOM : POSIX_FADV_SEQUENTIAL); #endif return false; error_msg: - message_error(_("%s: %s"), pair->src_name, strerror(errno)); + message_error(_("%s: %s"), tuklib_mask_nonprint(pair->src_name), + strerror(errno)); error: (void)close(pair->src_fd); return true; } extern file_pair * io_open_src(const char *src_name) { if (src_name[0] == '\0') { message_error(_("Empty filename, skipping")); return NULL; } // Since we have only one file open at a time, we can use // a statically allocated structure. static file_pair pair; // This implicitly also initializes src_st.st_size to zero // which is expected to be <= 0 by default. fstat() isn't // called when reading from standard input but src_st.st_size // is still read. pair = (file_pair){ .src_name = src_name, .dest_name = NULL, .src_fd = -1, .dest_fd = -1, +#ifndef TUKLIB_DOSLIKE + .dir_fd = -1, +#endif .src_eof = false, .src_has_seen_input = false, .flush_needed = false, .dest_try_sparse = false, .dest_pending_sparse = 0, }; // Block the signals, for which we have a custom signal handler, so // that we don't need to worry about EINTR. signals_block(); const bool error = io_open_src_real(&pair); signals_unblock(); #ifdef ENABLE_SANDBOX if (!error) sandbox_enable_strict_if_allowed(pair.src_fd, user_abort_pipe[0], user_abort_pipe[1]); #endif return error ? NULL : &pair; } /// \brief Closes source file of the file_pair structure /// /// \param pair File whose src_fd should be closed /// \param success If true, the file will be removed from the disk if /// closing succeeds and --keep hasn't been used. static void io_close_src(file_pair *pair, bool success) { #ifndef TUKLIB_DOSLIKE if (restore_stdin_flags) { assert(pair->src_fd == STDIN_FILENO); restore_stdin_flags = false; if (fcntl(STDIN_FILENO, F_SETFL, stdin_flags) == -1) message_error(_("Error restoring the status flags " "to standard input: %s"), strerror(errno)); } #endif if (pair->src_fd != STDIN_FILENO && pair->src_fd != -1) { // Close the file before possibly unlinking it. On DOS-like // systems this is always required since unlinking will fail // if the file is open. On POSIX systems it usually works // to unlink open files, but in some cases it doesn't and // one gets EBUSY in errno. // // xz 5.2.2 and older unlinked the file before closing it // (except on DOS-like systems). The old code didn't handle // EBUSY and could fail e.g. on some CIFS shares. The // advantage of unlinking before closing is negligible // (avoids a race between close() and stat()/lstat() and // unlink()), so let's keep this simple. (void)close(pair->src_fd); if (success && !opt_keep_original) io_unlink(pair->src_name, &pair->src_st); } return; } static bool io_open_dest_real(file_pair *pair) { if (opt_stdout || pair->src_fd == STDIN_FILENO) { // We don't modify or free() this. pair->dest_name = (char *)"(stdout)"; pair->dest_fd = STDOUT_FILENO; #ifdef TUKLIB_DOSLIKE setmode(STDOUT_FILENO, O_BINARY); #else // Try to set O_NONBLOCK if it isn't already set. // If it fails, we assume that stdout is non-blocking // in practice. See the comments in io_open_src_real() // for similar situation with stdin. // // NOTE: O_APPEND may be unset later in this function // and it relies on stdout_flags being set here. stdout_flags = fcntl(STDOUT_FILENO, F_GETFL); if (stdout_flags == -1) { message_error(_("Error getting the file status flags " "from standard output: %s"), strerror(errno)); return true; } if ((stdout_flags & O_NONBLOCK) == 0 && fcntl(STDOUT_FILENO, F_SETFL, stdout_flags | O_NONBLOCK) != -1) restore_stdout_flags = true; #endif } else { pair->dest_name = suffix_get_dest_name(pair->src_name); if (pair->dest_name == NULL) return true; +#ifndef TUKLIB_DOSLIKE + if (opt_synchronous) { + // Open the directory where the destination file will + // be created (the file descriptor is needed for + // fsync()). Do this before creating the destination + // file: + // + // - We currently have no files to clean up if + // opening the directory fails. (We aren't + // reading from stdin so there are no stdin_flags + // to restore either.) + // + // - Allocating memory with xstrdup() is safe only + // when we have nothing to clean up. + char *buf = xstrdup(pair->dest_name); + const char *dir_name = dirname(buf); + + // O_NOCTTY and O_NONBLOCK are there in case + // O_DIRECTORY is 0 and dir_name doesn't refer + // to a directory. (We opened the source file + // already but directories might have been renamed + // after the source file was opened.) + pair->dir_fd = open(dir_name, O_SEARCH | O_DIRECTORY + | O_NOCTTY | O_NONBLOCK); + if (pair->dir_fd == -1) { + // Since we did open the source file + // successfully, we should rarely get here. + // Perhaps something has been renamed or + // had its permissions changed. + // + // In an odd case, the directory has write + // and search permissions but not read + // permission (d-wx------), and O_SEARCH is + // actually O_RDONLY. Then we would be able + // to create a new file and only the directory + // syncing would be impossible. But let's be + // strict about syncing and require users to + // explicitly disable it if they don't want it. + message_error(_("%s: Opening the directory " + "failed: %s"), + tuklib_mask_nonprint(dir_name), + strerror(errno)); + free(buf); + goto error; + } + + free(buf); + } +#endif + #ifdef __DJGPP__ struct stat st; if (stat(pair->dest_name, &st) == 0) { // Check that it isn't a special file like "prn". if (st.st_dev == -1) { message_error("%s: Refusing to write to " "a DOS special file", - pair->dest_name); - free(pair->dest_name); - return true; + tuklib_mask_nonprint( + pair->dest_name)); + goto error; } // Check that we aren't overwriting the source file. if (st.st_dev == pair->src_st.st_dev && st.st_ino == pair->src_st.st_ino) { message_error("%s: Output file is the same " "as the input file", - pair->dest_name); - free(pair->dest_name); - return true; + tuklib_mask_nonprint( + pair->dest_name)); + goto error; } } #endif // If --force was used, unlink the target file first. if (opt_force && unlink(pair->dest_name) && errno != ENOENT) { message_error(_("%s: Cannot remove: %s"), - pair->dest_name, strerror(errno)); - free(pair->dest_name); - return true; + tuklib_mask_nonprint(pair->dest_name), + strerror(errno)); + goto error; } // Open the file. int flags = O_WRONLY | O_BINARY | O_NOCTTY | O_CREAT | O_EXCL; #ifndef TUKLIB_DOSLIKE flags |= O_NONBLOCK; #endif const mode_t mode = S_IRUSR | S_IWUSR; pair->dest_fd = open(pair->dest_name, flags, mode); if (pair->dest_fd == -1) { - message_error(_("%s: %s"), pair->dest_name, + message_error(_("%s: %s"), + tuklib_mask_nonprint(pair->dest_name), strerror(errno)); - free(pair->dest_name); - return true; + goto error; } + + // We could sync dir_fd now and close it. However, performance + // can be better if this is delayed until dest_fd has been + // synced in io_sync_dest(). } if (fstat(pair->dest_fd, &pair->dest_st)) { // If fstat() really fails, we have a safe fallback here. #if defined(__VMS) pair->dest_st.st_ino[0] = 0; pair->dest_st.st_ino[1] = 0; pair->dest_st.st_ino[2] = 0; #else pair->dest_st.st_dev = 0; pair->dest_st.st_ino = 0; #endif } #if defined(TUKLIB_DOSLIKE) && !defined(__DJGPP__) // Check that the output file is a regular file. We open with O_EXCL // but that doesn't prevent open()/_open() on Windows from opening // files like "con" or "nul". // // With DJGPP this check is done with stat() even before opening // the output file. That method or a variant of it doesn't work on // Windows because on Windows stat()/_stat64() sets st.st_mode so // that S_ISREG(st.st_mode) will be true even for special files. // With fstat()/_fstat64() it works. else if (pair->dest_fd != STDOUT_FILENO && !S_ISREG(pair->dest_st.st_mode)) { - message_error("%s: Destination is not a regular file", - pair->dest_name); + message_error(_("%s: Destination is not a regular file"), + tuklib_mask_nonprint(pair->dest_name)); // dest_fd needs to be reset to -1 to keep io_close() working. (void)close(pair->dest_fd); pair->dest_fd = -1; - - free(pair->dest_name); - return true; + goto error; } #elif !defined(TUKLIB_DOSLIKE) else if (try_sparse && opt_mode == MODE_DECOMPRESS) { // When writing to standard output, we need to be extra // careful: // - It may be connected to something else than // a regular file. // - We aren't necessarily writing to a new empty file // or to the end of an existing file. // - O_APPEND may be active. // // TODO: I'm keeping this disabled for DOS-like systems // for now. FAT doesn't support sparse files, but NTFS // does, so maybe this should be enabled on Windows after // some testing. if (pair->dest_fd == STDOUT_FILENO) { if (!S_ISREG(pair->dest_st.st_mode)) return false; if (stdout_flags & O_APPEND) { // Creating a sparse file is not possible // when O_APPEND is active (it's used by // shell's >> redirection). As I understand // it, it is safe to temporarily disable // O_APPEND in xz, because if someone // happened to write to the same file at the // same time, results would be bad anyway // (users shouldn't assume that xz uses any // specific block size when writing data). // // The write position may be something else // than the end of the file, so we must fix // it to start writing at the end of the file // to imitate O_APPEND. if (lseek(STDOUT_FILENO, 0, SEEK_END) == -1) return false; // Construct the new file status flags. // If O_NONBLOCK was set earlier in this // function, it must be kept here too. int flags = stdout_flags & ~O_APPEND; if (restore_stdout_flags) flags |= O_NONBLOCK; // If this fcntl() fails, we continue but won't // try to create sparse output. The original // flags will still be restored if needed (to // unset O_NONBLOCK) when the file is finished. if (fcntl(STDOUT_FILENO, F_SETFL, flags) == -1) return false; // Disabling O_APPEND succeeded. Mark // that the flags should be restored // in io_close_dest(). (This may have already // been set when enabling O_NONBLOCK.) restore_stdout_flags = true; } else if (lseek(STDOUT_FILENO, 0, SEEK_CUR) != pair->dest_st.st_size) { // Writing won't start exactly at the end // of the file. We cannot use sparse output, // because it would probably corrupt the file. return false; } } pair->dest_try_sparse = true; } #endif return false; + +error: +#ifndef TUKLIB_DOSLIKE + // io_close() closes pair->dir_fd but let's do it here anyway. + if (pair->dir_fd != -1) { + (void)close(pair->dir_fd); + pair->dir_fd = -1; + } +#endif + + free(pair->dest_name); + return true; } extern bool io_open_dest(file_pair *pair) { signals_block(); const bool ret = io_open_dest_real(pair); signals_unblock(); return ret; } /// \brief Closes destination file of the file_pair structure /// /// \param pair File whose dest_fd should be closed /// \param success If false, the file will be removed from the disk. /// -/// \return Zero if closing succeeds. On error, -1 is returned and -/// error message printed. +/// \return If closing succeeds, false is returned. On error, an error +/// message is printed and true is returned. static bool io_close_dest(file_pair *pair, bool success) { #ifndef TUKLIB_DOSLIKE // If io_open_dest() has disabled O_APPEND, restore it here. if (restore_stdout_flags) { assert(pair->dest_fd == STDOUT_FILENO); restore_stdout_flags = false; if (fcntl(STDOUT_FILENO, F_SETFL, stdout_flags) == -1) { message_error(_("Error restoring the O_APPEND flag " "to standard output: %s"), strerror(errno)); return true; } } #endif if (pair->dest_fd == -1 || pair->dest_fd == STDOUT_FILENO) return false; +#ifndef TUKLIB_DOSLIKE + // dir_fd was only used for syncing the directory. + // Error checking was done when syncing. + if (pair->dir_fd != -1) + (void)close(pair->dir_fd); +#endif + if (close(pair->dest_fd)) { message_error(_("%s: Closing the file failed: %s"), - pair->dest_name, strerror(errno)); + tuklib_mask_nonprint(pair->dest_name), + strerror(errno)); // Closing destination file failed, so we cannot trust its // contents. Get rid of junk: io_unlink(pair->dest_name, &pair->dest_st); free(pair->dest_name); return true; } // If the operation using this file wasn't successful, we git rid // of the junk file. if (!success) io_unlink(pair->dest_name, &pair->dest_st); free(pair->dest_name); return false; } extern void io_close(file_pair *pair, bool success) { // Take care of sparseness at the end of the output file. if (success && pair->dest_try_sparse && pair->dest_pending_sparse > 0) { // Seek forward one byte less than the size of the pending // hole, then write one zero-byte. This way the file grows // to its correct size. An alternative would be to use // ftruncate() but that isn't portable enough (e.g. it // doesn't work with FAT on Linux; FAT isn't that important // since it doesn't support sparse files anyway, but we don't // want to create corrupt files on it). if (lseek(pair->dest_fd, pair->dest_pending_sparse - 1, SEEK_CUR) == -1) { message_error(_("%s: Seeking failed when trying " "to create a sparse file: %s"), - pair->dest_name, strerror(errno)); + tuklib_mask_nonprint(pair->dest_name), + strerror(errno)); success = false; } else { const uint8_t zero[1] = { '\0' }; if (io_write_buf(pair, zero, 1)) success = false; } } signals_block(); - // Copy the file attributes. We need to skip this if destination - // file isn't open or it is standard output. - if (success && pair->dest_fd != -1 && pair->dest_fd != STDOUT_FILENO) + if (success && pair->dest_fd != -1 && pair->dest_fd != STDOUT_FILENO) { + // Copy the file attributes. This may produce warnings but + // not errors so "success" isn't affected. io_copy_attrs(pair); + // Synchronize the file and its directory if needed. + if (opt_synchronous) + success = !io_sync_dest(pair); + } + // Close the destination first. If it fails, we must not remove // the source file! if (io_close_dest(pair, success)) success = false; // Close the source file, and unlink it if the operation using this // file pair was successful and we haven't requested to keep the // source file. io_close_src(pair, success); signals_unblock(); return; } extern void io_fix_src_pos(file_pair *pair, size_t rewind_size) { assert(rewind_size <= IO_BUFFER_SIZE); if (rewind_size > 0) { // This doesn't need to work on unseekable file descriptors, // so just ignore possible errors. (void)lseek(pair->src_fd, -(off_t)(rewind_size), SEEK_CUR); } return; } extern size_t io_read(file_pair *pair, io_buf *buf, size_t size) { assert(size <= IO_BUFFER_SIZE); size_t pos = 0; while (pos < size) { const ssize_t amount = read( pair->src_fd, buf->u8 + pos, size - pos); if (amount == 0) { pair->src_eof = true; break; } if (amount == -1) { if (errno == EINTR) { if (user_abort) return SIZE_MAX; continue; } #ifndef TUKLIB_DOSLIKE if (IS_EAGAIN_OR_EWOULDBLOCK(errno)) { // Disable the flush-timeout if no input has // been seen since the previous flush and thus // there would be nothing to flush after the // timeout expires (avoids busy waiting). const int timeout = pair->src_has_seen_input ? mytime_get_flush_timeout() : -1; switch (io_wait(pair, timeout, true)) { case IO_WAIT_MORE: continue; case IO_WAIT_ERROR: return SIZE_MAX; case IO_WAIT_TIMEOUT: pair->flush_needed = true; return pos; default: message_bug(); } } #endif message_error(_("%s: Read error: %s"), - pair->src_name, strerror(errno)); + tuklib_mask_nonprint(pair->src_name), + strerror(errno)); return SIZE_MAX; } pos += (size_t)(amount); if (!pair->src_has_seen_input) { pair->src_has_seen_input = true; mytime_set_flush_time(); } } return pos; } extern bool io_seek_src(file_pair *pair, uint64_t pos) { // Caller must not attempt to seek past the end of the input file // (seeking to 100 in a 100-byte file is seeking to the end of // the file, not past the end of the file, and thus that is allowed). // // This also validates that pos can be safely cast to off_t. if (pos > (uint64_t)(pair->src_st.st_size)) message_bug(); if (lseek(pair->src_fd, (off_t)(pos), SEEK_SET) == -1) { message_error(_("%s: Error seeking the file: %s"), - pair->src_name, strerror(errno)); + tuklib_mask_nonprint(pair->src_name), + strerror(errno)); return true; } pair->src_eof = false; return false; } extern bool io_pread(file_pair *pair, io_buf *buf, size_t size, uint64_t pos) { // Using lseek() and read() is more portable than pread() and // for us it is as good as real pread(). if (io_seek_src(pair, pos)) return true; const size_t amount = io_read(pair, buf, size); if (amount == SIZE_MAX) return true; if (amount != size) { message_error(_("%s: Unexpected end of file"), - pair->src_name); + tuklib_mask_nonprint(pair->src_name)); return true; } return false; } static bool is_sparse(const io_buf *buf) { assert(IO_BUFFER_SIZE % sizeof(uint64_t) == 0); for (size_t i = 0; i < ARRAY_SIZE(buf->u64); ++i) if (buf->u64[i] != 0) return false; return true; } static bool io_write_buf(file_pair *pair, const uint8_t *buf, size_t size) { assert(size <= IO_BUFFER_SIZE); while (size > 0) { const ssize_t amount = write(pair->dest_fd, buf, size); if (amount == -1) { if (errno == EINTR) { if (user_abort) return true; continue; } #ifndef TUKLIB_DOSLIKE if (IS_EAGAIN_OR_EWOULDBLOCK(errno)) { if (io_wait(pair, -1, false) == IO_WAIT_MORE) continue; return true; } #endif +#if defined(_WIN32) && !defined(__CYGWIN__) + // On native Windows, broken pipe is reported as + // EINVAL. Don't show an error message in this case. + // Try: xz -dc bigfile.xz | head -n1 + if (errno == EINVAL + && pair->dest_fd == STDOUT_FILENO) { + // Emulate SIGPIPE by setting user_abort here. + user_abort = true; + set_exit_status(E_ERROR); + return true; + } +#endif + // Handle broken pipe specially. gzip and bzip2 // don't print anything on SIGPIPE. In addition, // gzip --quiet uses exit status 2 (warning) on // broken pipe instead of whatever raise(SIGPIPE) // would make it return. It is there to hide "Broken // pipe" message on some old shells (probably old // GNU bash). // // We don't do anything special with --quiet, which // is what bzip2 does too. If we get SIGPIPE, we // will handle it like other signals by setting // user_abort, and get EPIPE here. if (errno != EPIPE) message_error(_("%s: Write error: %s"), - pair->dest_name, strerror(errno)); + tuklib_mask_nonprint(pair->dest_name), + strerror(errno)); return true; } buf += (size_t)(amount); size -= (size_t)(amount); } return false; } extern bool io_write(file_pair *pair, const io_buf *buf, size_t size) { assert(size <= IO_BUFFER_SIZE); if (pair->dest_try_sparse) { // Check if the block is sparse (contains only zeros). If it // sparse, we just store the amount and return. We will take // care of actually skipping over the hole when we hit the // next data block or close the file. // // Since io_close() requires that dest_pending_sparse > 0 // if the file ends with sparse block, we must also return // if size == 0 to avoid doing the lseek(). if (size == IO_BUFFER_SIZE) { // Even if the block was sparse, treat it as non-sparse // if the pending sparse amount is large compared to // the size of off_t. In practice this only matters // on 32-bit systems where off_t isn't always 64 bits. const off_t pending_max = (off_t)(1) << (sizeof(off_t) * CHAR_BIT - 2); if (is_sparse(buf) && pair->dest_pending_sparse < pending_max) { pair->dest_pending_sparse += (off_t)(size); return false; } } else if (size == 0) { return false; } // This is not a sparse block. If we have a pending hole, // skip it now. if (pair->dest_pending_sparse > 0) { if (lseek(pair->dest_fd, pair->dest_pending_sparse, SEEK_CUR) == -1) { message_error(_("%s: Seeking failed when " "trying to create a sparse " - "file: %s"), pair->dest_name, + "file: %s"), + tuklib_mask_nonprint( + pair->dest_name), strerror(errno)); return true; } pair->dest_pending_sparse = 0; } } return io_write_buf(pair, buf->u8, size); } diff --git a/src/xz/file_io.h b/src/xz/file_io.h index ae7e2f38f520..9903f5a0adf8 100644 --- a/src/xz/file_io.h +++ b/src/xz/file_io.h @@ -1,182 +1,188 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file file_io.h /// \brief I/O types and functions // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// // Some systems have suboptimal BUFSIZ. Use a bit bigger value on them. // We also need that IO_BUFFER_SIZE is a multiple of 8 (sizeof(uint64_t)) #if BUFSIZ <= 1024 # define IO_BUFFER_SIZE 8192 #else # define IO_BUFFER_SIZE (BUFSIZ & ~7U) #endif #ifdef _MSC_VER // The first one renames both "struct stat" -> "struct _stat64" // and stat() -> _stat64(). The documentation mentions only // "struct __stat64", not "struct _stat64", but the latter // works too. # define stat _stat64 # define fstat _fstat64 # define off_t __int64 #endif /// is_sparse() accesses the buffer as uint64_t for maximum speed. /// The u32 and u64 members must only be access through this union /// to avoid strict aliasing violations. Taking a pointer of u8 /// should be fine as long as uint8_t maps to unsigned char which /// can alias anything. typedef union { uint8_t u8[IO_BUFFER_SIZE]; uint32_t u32[IO_BUFFER_SIZE / sizeof(uint32_t)]; uint64_t u64[IO_BUFFER_SIZE / sizeof(uint64_t)]; } io_buf; typedef struct { /// Name of the source filename (as given on the command line) or /// pointer to static "(stdin)" when reading from standard input. const char *src_name; /// Destination filename converted from src_name or pointer to static /// "(stdout)" when writing to standard output. char *dest_name; /// File descriptor of the source file int src_fd; /// File descriptor of the target file int dest_fd; +#ifndef TUKLIB_DOSLIKE + /// File descriptor of the directory of the target file (which is + /// also the directory of the source file) + int dir_fd; +#endif + /// True once end of the source file has been detected. bool src_eof; /// For --flush-timeout: True if at least one byte has been read /// since the previous flush or the start of the file. bool src_has_seen_input; /// For --flush-timeout: True when flushing is needed. bool flush_needed; /// If true, we look for long chunks of zeros and try to create /// a sparse file. bool dest_try_sparse; /// This is used only if dest_try_sparse is true. This holds the /// number of zero bytes we haven't written out, because we plan /// to make that byte range a sparse chunk. off_t dest_pending_sparse; /// Stat of the source file. struct stat src_st; /// Stat of the destination file. struct stat dest_st; } file_pair; /// \brief Initialize the I/O module extern void io_init(void); #ifndef TUKLIB_DOSLIKE /// \brief Write a byte to user_abort_pipe[1] /// /// This is called from a signal handler. extern void io_write_to_user_abort_pipe(void); #endif /// \brief Disable creation of sparse files when decompressing extern void io_no_sparse(void); /// \brief Open the source file extern file_pair *io_open_src(const char *src_name); /// \brief Open the destination file extern bool io_open_dest(file_pair *pair); /// \brief Closes the file descriptors and frees possible allocated memory /// /// The success argument determines if source or destination file gets /// unlinked: /// - false: The destination file is unlinked. /// - true: The source file is unlinked unless writing to stdout or --keep /// was used. extern void io_close(file_pair *pair, bool success); /// \brief Reads from the source file to a buffer /// /// \param pair File pair having the source file open for reading /// \param buf Destination buffer to hold the read data /// \param size Size of the buffer; must be at most IO_BUFFER_SIZE /// /// \return On success, number of bytes read is returned. On end of /// file zero is returned and pair->src_eof set to true. /// On error, SIZE_MAX is returned and error message printed. extern size_t io_read(file_pair *pair, io_buf *buf, size_t size); /// \brief Fix the position in src_fd /// /// This is used when --single-thream has been specified and decompression /// is successful. If the input file descriptor supports seeking, this /// function fixes the input position to point to the next byte after the /// decompressed stream. /// /// \param pair File pair having the source file open for reading /// \param rewind_size How many bytes of extra have been read i.e. /// how much to seek backwards. extern void io_fix_src_pos(file_pair *pair, size_t rewind_size); /// \brief Seek to the given absolute position in the source file /// /// This calls lseek() and also clears pair->src_eof. /// /// \param pair Seekable source file /// \param pos Offset relative to the beginning of the file, /// from which the data should be read. /// /// \return On success, false is returned. On error, error message /// is printed and true is returned. extern bool io_seek_src(file_pair *pair, uint64_t pos); /// \brief Read from source file from given offset to a buffer /// /// This is remotely similar to standard pread(). This uses lseek() though, /// so the read offset is changed on each call. /// /// \param pair Seekable source file /// \param buf Destination buffer /// \param size Amount of data to read /// \param pos Offset relative to the beginning of the file, /// from which the data should be read. /// /// \return On success, false is returned. On error, error message /// is printed and true is returned. extern bool io_pread(file_pair *pair, io_buf *buf, size_t size, uint64_t pos); /// \brief Writes a buffer to the destination file /// /// \param pair File pair having the destination file open for writing /// \param buf Buffer containing the data to be written /// \param size Size of the buffer; must be at most IO_BUFFER_SIZE /// -/// \return On success, zero is returned. On error, -1 is returned -/// and error message printed. +/// \return On success, false is returned. On error, error message +/// is printed and true is returned. extern bool io_write(file_pair *pair, const io_buf *buf, size_t size); diff --git a/src/xz/list.c b/src/xz/list.c index e4a64668c76e..6a71d01e437e 100644 --- a/src/xz/list.c +++ b/src/xz/list.c @@ -1,1346 +1,1355 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file list.c /// \brief Listing information about .xz files // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "private.h" #include "tuklib_integer.h" /// Information about a .xz file typedef struct { /// Combined Index of all Streams in the file lzma_index *idx; /// Total amount of Stream Padding uint64_t stream_padding; /// Highest memory usage so far uint64_t memusage_max; /// True if all Blocks so far have Compressed Size and /// Uncompressed Size fields bool all_have_sizes; /// Oldest XZ Utils version that will decompress the file uint32_t min_version; } xz_file_info; #define XZ_FILE_INFO_INIT { NULL, 0, 0, true, 50000002 } /// Information about a .xz Block typedef struct { /// Size of the Block Header uint32_t header_size; /// A few of the Block Flags as a string char flags[3]; /// Size of the Compressed Data field in the Block lzma_vli compressed_size; /// Decoder memory usage for this Block uint64_t memusage; /// The filter chain of this Block in human-readable form char *filter_chain; } block_header_info; #define BLOCK_HEADER_INFO_INIT { .filter_chain = NULL } #define block_header_info_end(bhi) free((bhi)->filter_chain) /// Strings ending in a colon. These are used for lines like /// " Foo: 123 MiB". These are grouped because translated strings /// may have different maximum string length, and we want to pad all /// strings so that the values are aligned nicely. static const char *colon_strs[] = { N_("Streams:"), N_("Blocks:"), N_("Compressed size:"), N_("Uncompressed size:"), N_("Ratio:"), N_("Check:"), N_("Stream Padding:"), N_("Memory needed:"), N_("Sizes in headers:"), // This won't be aligned because it's so long: //N_("Minimum XZ Utils version:"), N_("Number of files:"), }; /// Enum matching the above strings. enum { COLON_STR_STREAMS, COLON_STR_BLOCKS, COLON_STR_COMPRESSED_SIZE, COLON_STR_UNCOMPRESSED_SIZE, COLON_STR_RATIO, COLON_STR_CHECK, COLON_STR_STREAM_PADDING, COLON_STR_MEMORY_NEEDED, COLON_STR_SIZES_IN_HEADERS, //COLON_STR_MINIMUM_XZ_VERSION, COLON_STR_NUMBER_OF_FILES, }; /// Field widths to use with printf to pad the strings to use the same number /// of columns on a terminal. static int colon_strs_fw[ARRAY_SIZE(colon_strs)]; /// Convenience macro to get the translated string and its field width /// using a COLON_STR_foo enum. #define COLON_STR(num) colon_strs_fw[num], _(colon_strs[num]) /// Column headings static struct { /// Table column heading string const char *str; /// Number of terminal-columns to use for this table-column. /// If a translated string is longer than the initial value, /// this value will be increased in init_headings(). int columns; /// Field width to use for printf() to pad "str" to use "columns" /// number of columns on a terminal. This is calculated in /// init_headings(). int fw; } headings[] = { { N_("Stream"), 6, 0 }, { N_("Block"), 9, 0 }, { N_("Blocks"), 9, 0 }, { N_("CompOffset"), 15, 0 }, { N_("UncompOffset"), 15, 0 }, { N_("CompSize"), 15, 0 }, { N_("UncompSize"), 15, 0 }, { N_("TotalSize"), 15, 0 }, { N_("Ratio"), 5, 0 }, { N_("Check"), 10, 0 }, { N_("CheckVal"), 1, 0 }, { N_("Padding"), 7, 0 }, { N_("Header"), 5, 0 }, { N_("Flags"), 2, 0 }, { N_("MemUsage"), 7 + 4, 0 }, // +4 is for " MiB" { N_("Filters"), 1, 0 }, }; /// Enum matching the above strings. enum { HEADING_STREAM, HEADING_BLOCK, HEADING_BLOCKS, HEADING_COMPOFFSET, HEADING_UNCOMPOFFSET, HEADING_COMPSIZE, HEADING_UNCOMPSIZE, HEADING_TOTALSIZE, HEADING_RATIO, HEADING_CHECK, HEADING_CHECKVAL, HEADING_PADDING, HEADING_HEADERSIZE, HEADING_HEADERFLAGS, HEADING_MEMUSAGE, HEADING_FILTERS, }; #define HEADING_STR(num) headings[num].fw, _(headings[num].str) /// Check ID to string mapping static const char check_names[LZMA_CHECK_ID_MAX + 1][12] = { // TRANSLATORS: Indicates that there is no integrity check. // This string is used in tables. In older xz version this // string was limited to ten columns in a fixed-width font, but // nowadays there is no strict length restriction anymore. N_("None"), "CRC32", // TRANSLATORS: Indicates that integrity check name is not known, // but the Check ID is known (here 2). In older xz version these // strings were limited to ten columns in a fixed-width font, but // nowadays there is no strict length restriction anymore. N_("Unknown-2"), N_("Unknown-3"), "CRC64", N_("Unknown-5"), N_("Unknown-6"), N_("Unknown-7"), N_("Unknown-8"), N_("Unknown-9"), "SHA-256", N_("Unknown-11"), N_("Unknown-12"), N_("Unknown-13"), N_("Unknown-14"), N_("Unknown-15"), }; /// Buffer size for get_check_names(). This may be a bit ridiculous, /// but at least it's enough if some language needs many multibyte chars. #define CHECKS_STR_SIZE 1024 /// Value of the Check field as hexadecimal string. /// This is set by parse_check_value(). static char check_value[2 * LZMA_CHECK_SIZE_MAX + 1]; /// Totals that are displayed if there was more than one file. /// The "files" counter is also used in print_info_adv() to show /// the file number. static struct { uint64_t files; uint64_t streams; uint64_t blocks; uint64_t compressed_size; uint64_t uncompressed_size; uint64_t stream_padding; uint64_t memusage_max; uint32_t checks; uint32_t min_version; bool all_have_sizes; } totals = { 0, 0, 0, 0, 0, 0, 0, 0, 50000002, true }; /// Initialize colon_strs_fw[]. static void init_colon_strs(void) { // Lengths of translated strings as bytes. size_t lens[ARRAY_SIZE(colon_strs)]; // Lengths of translated strings as columns. size_t widths[ARRAY_SIZE(colon_strs)]; // Maximum number of columns needed by a translated string. size_t width_max = 0; for (unsigned i = 0; i < ARRAY_SIZE(colon_strs); ++i) { widths[i] = tuklib_mbstr_width(_(colon_strs[i]), &lens[i]); // If debugging is enabled, catch invalid strings with // an assertion. However, when not debugging, use the // byte count as the fallback width. This shouldn't // ever happen unless there is a bad string in the // translations, but in such case I guess it's better // to try to print something useful instead of failing // completely. assert(widths[i] != (size_t)-1); if (widths[i] == (size_t)-1) widths[i] = lens[i]; if (widths[i] > width_max) width_max = widths[i]; } // Calculate the field width for printf("%*s") so that the strings // will use width_max columns on a terminal. for (unsigned i = 0; i < ARRAY_SIZE(colon_strs); ++i) colon_strs_fw[i] = (int)(lens[i] + width_max - widths[i]); return; } /// Initialize headings[]. static void init_headings(void) { // Before going through the heading strings themselves, treat // the Check heading specially: Look at the widths of the various // check names and increase the width of the Check column if needed. // The width of the heading name "Check" will then be handled normally // with other heading names in the second loop in this function. for (unsigned i = 0; i < ARRAY_SIZE(check_names); ++i) { size_t len; size_t w = tuklib_mbstr_width(_(check_names[i]), &len); // Error handling like in init_colon_strs(). assert(w != (size_t)-1); if (w == (size_t)-1) w = len; // If the translated string is wider than the minimum width // set at compile time, increase the width. if ((size_t)(headings[HEADING_CHECK].columns) < w) headings[HEADING_CHECK].columns = (int)w; } for (unsigned i = 0; i < ARRAY_SIZE(headings); ++i) { size_t len; size_t w = tuklib_mbstr_width(_(headings[i].str), &len); // Error handling like in init_colon_strs(). assert(w != (size_t)-1); if (w == (size_t)-1) w = len; // If the translated string is wider than the minimum width // set at compile time, increase the width. if ((size_t)(headings[i].columns) < w) headings[i].columns = (int)w; // Calculate the field width for printf("%*s") so that // the string uses .columns number of columns on a terminal. headings[i].fw = (int)(len + (size_t)headings[i].columns - w); } return; } /// Initialize the printf field widths that are needed to get nicely aligned /// output with translated strings. static void init_field_widths(void) { init_colon_strs(); init_headings(); return; } /// Convert XZ Utils version number to a string. static const char * xz_ver_to_str(uint32_t ver) { static char buf[32]; unsigned int major = ver / 10000000U; ver -= major * 10000000U; unsigned int minor = ver / 10000U; ver -= minor * 10000U; unsigned int patch = ver / 10U; ver -= patch * 10U; const char *stability = ver == 0 ? "alpha" : ver == 1 ? "beta" : ""; snprintf(buf, sizeof(buf), "%u.%u.%u%s", major, minor, patch, stability); return buf; } /// \brief Parse the Index(es) from the given .xz file /// /// \param xfi Pointer to structure where the decoded information /// is stored. /// \param pair Input file /// /// \return On success, false is returned. On error, true is returned. /// static bool parse_indexes(xz_file_info *xfi, file_pair *pair) { if (pair->src_st.st_size <= 0) { - message_error(_("%s: File is empty"), pair->src_name); + message_error(_("%s: File is empty"), + tuklib_mask_nonprint(pair->src_name)); return true; } if (pair->src_st.st_size < 2 * LZMA_STREAM_HEADER_SIZE) { message_error(_("%s: Too small to be a valid .xz file"), - pair->src_name); + tuklib_mask_nonprint(pair->src_name)); return true; } io_buf buf; lzma_stream strm = LZMA_STREAM_INIT; lzma_index *idx = NULL; lzma_ret ret = lzma_file_info_decoder(&strm, &idx, hardware_memlimit_get(MODE_LIST), (uint64_t)(pair->src_st.st_size)); if (ret != LZMA_OK) { - message_error(_("%s: %s"), pair->src_name, message_strm(ret)); + message_error(_("%s: %s"), + tuklib_mask_nonprint(pair->src_name), + message_strm(ret)); return true; } while (true) { if (strm.avail_in == 0) { strm.next_in = buf.u8; strm.avail_in = io_read(pair, &buf, IO_BUFFER_SIZE); if (strm.avail_in == SIZE_MAX) goto error; } ret = lzma_code(&strm, LZMA_RUN); switch (ret) { case LZMA_OK: break; case LZMA_SEEK_NEEDED: // liblzma won't ask us to seek past the known size // of the input file. assert(strm.seek_pos <= (uint64_t)(pair->src_st.st_size)); if (io_seek_src(pair, strm.seek_pos)) goto error; // avail_in must be zero so that we will read new // input. strm.avail_in = 0; break; case LZMA_STREAM_END: { lzma_end(&strm); xfi->idx = idx; // Calculate xfi->stream_padding. lzma_index_iter iter; lzma_index_iter_init(&iter, xfi->idx); while (!lzma_index_iter_next(&iter, LZMA_INDEX_ITER_STREAM)) xfi->stream_padding += iter.stream.padding; return false; } default: - message_error(_("%s: %s"), pair->src_name, + message_error(_("%s: %s"), + tuklib_mask_nonprint(pair->src_name), message_strm(ret)); // If the error was too low memory usage limit, // show also how much memory would have been needed. if (ret == LZMA_MEMLIMIT_ERROR) message_mem_needed(V_ERROR, lzma_memusage(&strm)); goto error; } } error: lzma_end(&strm); return true; } /// \brief Parse the Block Header /// /// The result is stored into *bhi. The caller takes care of initializing it. /// /// \return False on success, true on error. static bool parse_block_header(file_pair *pair, const lzma_index_iter *iter, block_header_info *bhi, xz_file_info *xfi) { #if IO_BUFFER_SIZE < LZMA_BLOCK_HEADER_SIZE_MAX # error IO_BUFFER_SIZE < LZMA_BLOCK_HEADER_SIZE_MAX #endif // Get the whole Block Header with one read, but don't read past // the end of the Block (or even its Check field). const uint32_t size = my_min(iter->block.total_size - lzma_check_size(iter->stream.flags->check), LZMA_BLOCK_HEADER_SIZE_MAX); io_buf buf; if (io_pread(pair, &buf, size, iter->block.compressed_file_offset)) return true; // Zero would mean Index Indicator and thus not a valid Block. if (buf.u8[0] == 0) goto data_error; // Initialize the block structure and decode Block Header Size. lzma_filter filters[LZMA_FILTERS_MAX + 1]; lzma_block block; block.version = 0; block.check = iter->stream.flags->check; block.filters = filters; block.header_size = lzma_block_header_size_decode(buf.u8[0]); if (block.header_size > size) goto data_error; // Decode the Block Header. switch (lzma_block_header_decode(&block, NULL, buf.u8)) { case LZMA_OK: break; case LZMA_OPTIONS_ERROR: - message_error(_("%s: %s"), pair->src_name, + message_error(_("%s: %s"), + tuklib_mask_nonprint(pair->src_name), message_strm(LZMA_OPTIONS_ERROR)); return true; case LZMA_DATA_ERROR: goto data_error; default: message_bug(); } // Check the Block Flags. These must be done before calling // lzma_block_compressed_size(), because it overwrites // block.compressed_size. // // NOTE: If you add new characters here, update the minimum number of // columns in headings[HEADING_HEADERFLAGS] to match the number of // characters used here. bhi->flags[0] = block.compressed_size != LZMA_VLI_UNKNOWN ? 'c' : '-'; bhi->flags[1] = block.uncompressed_size != LZMA_VLI_UNKNOWN ? 'u' : '-'; bhi->flags[2] = '\0'; // Collect information if all Blocks have both Compressed Size // and Uncompressed Size fields. They can be useful e.g. for // multi-threaded decompression so it can be useful to know it. xfi->all_have_sizes &= block.compressed_size != LZMA_VLI_UNKNOWN && block.uncompressed_size != LZMA_VLI_UNKNOWN; // Validate or set block.compressed_size. switch (lzma_block_compressed_size(&block, iter->block.unpadded_size)) { case LZMA_OK: // Validate also block.uncompressed_size if it is present. // If it isn't present, there's no need to set it since // we aren't going to actually decompress the Block; if // we were decompressing, then we should set it so that // the Block decoder could validate the Uncompressed Size // that was stored in the Index. if (block.uncompressed_size == LZMA_VLI_UNKNOWN || block.uncompressed_size == iter->block.uncompressed_size) break; // If the above fails, the file is corrupt so // LZMA_DATA_ERROR is a good error code. - - // Fall through + FALLTHROUGH; case LZMA_DATA_ERROR: // Free the memory allocated by lzma_block_header_decode(). lzma_filters_free(filters, NULL); goto data_error; default: message_bug(); } // Copy the known sizes. bhi->header_size = block.header_size; bhi->compressed_size = block.compressed_size; // Calculate the decoder memory usage and update the maximum // memory usage of this Block. bhi->memusage = lzma_raw_decoder_memusage(filters); if (xfi->memusage_max < bhi->memusage) xfi->memusage_max = bhi->memusage; // Determine the minimum XZ Utils version that supports this Block. // - RISC-V filter needs 5.6.0. // // - ARM64 filter needs 5.4.0. // // - 5.0.0 doesn't support empty LZMA2 streams and thus empty // Blocks that use LZMA2. This decoder bug was fixed in 5.0.2. if (xfi->min_version < 50060002U) { for (size_t i = 0; filters[i].id != LZMA_VLI_UNKNOWN; ++i) { if (filters[i].id == LZMA_FILTER_RISCV) { xfi->min_version = 50060002U; break; } } } if (xfi->min_version < 50040002U) { for (size_t i = 0; filters[i].id != LZMA_VLI_UNKNOWN; ++i) { if (filters[i].id == LZMA_FILTER_ARM64) { xfi->min_version = 50040002U; break; } } } if (xfi->min_version < 50000022U) { size_t i = 0; while (filters[i + 1].id != LZMA_VLI_UNKNOWN) ++i; if (filters[i].id == LZMA_FILTER_LZMA2 && iter->block.uncompressed_size == 0) xfi->min_version = 50000022U; } // Convert the filter chain to human readable form. const lzma_ret str_ret = lzma_str_from_filters( &bhi->filter_chain, filters, LZMA_STR_DECODER | LZMA_STR_GETOPT_LONG, NULL); // Free the memory allocated by lzma_block_header_decode(). lzma_filters_free(filters, NULL); // Check if the stringification succeeded. if (str_ret != LZMA_OK) { - message_error(_("%s: %s"), pair->src_name, + message_error(_("%s: %s"), + tuklib_mask_nonprint(pair->src_name), message_strm(str_ret)); return true; } return false; data_error: // Show the error message. - message_error(_("%s: %s"), pair->src_name, + message_error(_("%s: %s"), + tuklib_mask_nonprint(pair->src_name), message_strm(LZMA_DATA_ERROR)); return true; } /// \brief Parse the Check field and put it into check_value[] /// /// \return False on success, true on error. static bool parse_check_value(file_pair *pair, const lzma_index_iter *iter) { // Don't read anything from the file if there is no integrity Check. if (iter->stream.flags->check == LZMA_CHECK_NONE) { snprintf(check_value, sizeof(check_value), "---"); return false; } // Locate and read the Check field. const uint32_t size = lzma_check_size(iter->stream.flags->check); const uint64_t offset = iter->block.compressed_file_offset + iter->block.total_size - size; io_buf buf; if (io_pread(pair, &buf, size, offset)) return true; // CRC32 and CRC64 are in little endian. Guess that all the future // 32-bit and 64-bit Check values are little endian too. It shouldn't // be a too big problem if this guess is wrong. if (size == 4) snprintf(check_value, sizeof(check_value), "%08" PRIx32, conv32le(buf.u32[0])); else if (size == 8) snprintf(check_value, sizeof(check_value), "%016" PRIx64, conv64le(buf.u64[0])); else for (size_t i = 0; i < size; ++i) snprintf(check_value + i * 2, 3, "%02x", buf.u8[i]); return false; } /// \brief Parse detailed information about a Block /// /// Since this requires seek(s), listing information about all Blocks can /// be slow. /// /// \param pair Input file /// \param iter Location of the Block whose Check value should /// be printed. /// \param bhi Pointer to structure where to store the information /// about the Block Header field. /// /// \return False on success, true on error. If an error occurs, /// the error message is printed too so the caller doesn't /// need to worry about that. static bool parse_details(file_pair *pair, const lzma_index_iter *iter, block_header_info *bhi, xz_file_info *xfi) { if (parse_block_header(pair, iter, bhi, xfi)) return true; if (parse_check_value(pair, iter)) return true; return false; } /// \brief Get the compression ratio /// /// This has slightly different format than that is used in message.c. static const char * get_ratio(uint64_t compressed_size, uint64_t uncompressed_size) { if (uncompressed_size == 0) return "---"; const double ratio = (double)(compressed_size) / (double)(uncompressed_size); if (ratio > 9.999) return "---"; static char buf[16]; snprintf(buf, sizeof(buf), "%.3f", ratio); return buf; } /// \brief Get a comma-separated list of Check names /// /// The check names are translated with gettext except when in robot mode. /// /// \param buf Buffer to hold the resulting string /// \param checks Bit mask of Checks to print /// \param space_after_comma /// It's better to not use spaces in table-like listings, /// but in more verbose formats a space after a comma /// is good for readability. static void get_check_names(char buf[CHECKS_STR_SIZE], uint32_t checks, bool space_after_comma) { // If we get called when there are no Checks to print, set checks // to 1 so that we print "None". This can happen in the robot mode // when printing the totals line if there are no valid input files. if (checks == 0) checks = 1; char *pos = buf; size_t left = CHECKS_STR_SIZE; const char *sep = space_after_comma ? ", " : ","; bool comma = false; for (size_t i = 0; i <= LZMA_CHECK_ID_MAX; ++i) { if (checks & (UINT32_C(1) << i)) { my_snprintf(&pos, &left, "%s%s", comma ? sep : "", opt_robot ? check_names[i] : _(check_names[i])); comma = true; } } return; } static bool print_info_basic(const xz_file_info *xfi, file_pair *pair) { static bool headings_displayed = false; if (!headings_displayed) { headings_displayed = true; // TRANSLATORS: These are column headings. From Strms (Streams) // to Ratio, the columns are right aligned. Check and Filename // are left aligned. If you need longer words, it's OK to // use two lines here. Test with "xz -l foo.xz". puts(_("Strms Blocks Compressed Uncompressed Ratio " "Check Filename")); } char checks[CHECKS_STR_SIZE]; get_check_names(checks, lzma_index_checks(xfi->idx), false); - const char *cols[7] = { + const char *cols[6] = { uint64_to_str(lzma_index_stream_count(xfi->idx), 0), uint64_to_str(lzma_index_block_count(xfi->idx), 1), uint64_to_nicestr(lzma_index_file_size(xfi->idx), NICESTR_B, NICESTR_TIB, false, 2), uint64_to_nicestr(lzma_index_uncompressed_size(xfi->idx), NICESTR_B, NICESTR_TIB, false, 3), get_ratio(lzma_index_file_size(xfi->idx), lzma_index_uncompressed_size(xfi->idx)), checks, - pair->src_name, }; printf("%*s %*s %*s %*s %*s %-*s %s\n", tuklib_mbstr_fw(cols[0], 5), cols[0], tuklib_mbstr_fw(cols[1], 7), cols[1], tuklib_mbstr_fw(cols[2], 11), cols[2], tuklib_mbstr_fw(cols[3], 11), cols[3], tuklib_mbstr_fw(cols[4], 5), cols[4], tuklib_mbstr_fw(cols[5], 7), cols[5], - cols[6]); + tuklib_mask_nonprint(pair->src_name)); return false; } static void print_adv_helper(uint64_t stream_count, uint64_t block_count, uint64_t compressed_size, uint64_t uncompressed_size, uint32_t checks, uint64_t stream_padding) { char checks_str[CHECKS_STR_SIZE]; get_check_names(checks_str, checks, true); printf(" %-*s %s\n", COLON_STR(COLON_STR_STREAMS), uint64_to_str(stream_count, 0)); printf(" %-*s %s\n", COLON_STR(COLON_STR_BLOCKS), uint64_to_str(block_count, 0)); printf(" %-*s %s\n", COLON_STR(COLON_STR_COMPRESSED_SIZE), uint64_to_nicestr(compressed_size, NICESTR_B, NICESTR_TIB, true, 0)); printf(" %-*s %s\n", COLON_STR(COLON_STR_UNCOMPRESSED_SIZE), uint64_to_nicestr(uncompressed_size, NICESTR_B, NICESTR_TIB, true, 0)); printf(" %-*s %s\n", COLON_STR(COLON_STR_RATIO), get_ratio(compressed_size, uncompressed_size)); printf(" %-*s %s\n", COLON_STR(COLON_STR_CHECK), checks_str); printf(" %-*s %s\n", COLON_STR(COLON_STR_STREAM_PADDING), uint64_to_nicestr(stream_padding, NICESTR_B, NICESTR_TIB, true, 0)); return; } static bool print_info_adv(xz_file_info *xfi, file_pair *pair) { // Print the overall information. print_adv_helper(lzma_index_stream_count(xfi->idx), lzma_index_block_count(xfi->idx), lzma_index_file_size(xfi->idx), lzma_index_uncompressed_size(xfi->idx), lzma_index_checks(xfi->idx), xfi->stream_padding); // Size of the biggest Check. This is used to calculate the width // of the CheckVal field. The table would get insanely wide if // we always reserved space for 64-byte Check (128 chars as hex). uint32_t check_max = 0; // Print information about the Streams. // // All except Check are right aligned; Check is left aligned. // Test with "xz -lv foo.xz". printf(" %s\n %*s %*s %*s %*s %*s %*s %*s %-*s %*s\n", _(colon_strs[COLON_STR_STREAMS]), HEADING_STR(HEADING_STREAM), HEADING_STR(HEADING_BLOCKS), HEADING_STR(HEADING_COMPOFFSET), HEADING_STR(HEADING_UNCOMPOFFSET), HEADING_STR(HEADING_COMPSIZE), HEADING_STR(HEADING_UNCOMPSIZE), HEADING_STR(HEADING_RATIO), HEADING_STR(HEADING_CHECK), HEADING_STR(HEADING_PADDING)); lzma_index_iter iter; lzma_index_iter_init(&iter, xfi->idx); while (!lzma_index_iter_next(&iter, LZMA_INDEX_ITER_STREAM)) { const char *cols1[4] = { uint64_to_str(iter.stream.number, 0), uint64_to_str(iter.stream.block_count, 1), uint64_to_str(iter.stream.compressed_offset, 2), uint64_to_str(iter.stream.uncompressed_offset, 3), }; printf(" %*s %*s %*s %*s ", tuklib_mbstr_fw(cols1[0], headings[HEADING_STREAM].columns), cols1[0], tuklib_mbstr_fw(cols1[1], headings[HEADING_BLOCKS].columns), cols1[1], tuklib_mbstr_fw(cols1[2], headings[HEADING_COMPOFFSET].columns), cols1[2], tuklib_mbstr_fw(cols1[3], headings[HEADING_UNCOMPOFFSET].columns), cols1[3]); const char *cols2[5] = { uint64_to_str(iter.stream.compressed_size, 0), uint64_to_str(iter.stream.uncompressed_size, 1), get_ratio(iter.stream.compressed_size, iter.stream.uncompressed_size), _(check_names[iter.stream.flags->check]), uint64_to_str(iter.stream.padding, 2), }; printf("%*s %*s %*s %-*s %*s\n", tuklib_mbstr_fw(cols2[0], headings[HEADING_COMPSIZE].columns), cols2[0], tuklib_mbstr_fw(cols2[1], headings[HEADING_UNCOMPSIZE].columns), cols2[1], tuklib_mbstr_fw(cols2[2], headings[HEADING_RATIO].columns), cols2[2], tuklib_mbstr_fw(cols2[3], headings[HEADING_CHECK].columns), cols2[3], tuklib_mbstr_fw(cols2[4], headings[HEADING_PADDING].columns), cols2[4]); // Update the maximum Check size. if (lzma_check_size(iter.stream.flags->check) > check_max) check_max = lzma_check_size(iter.stream.flags->check); } // Cache the verbosity level to a local variable. const bool detailed = message_verbosity_get() >= V_DEBUG; // Print information about the Blocks but only if there is // at least one Block. if (lzma_index_block_count(xfi->idx) > 0) { // Calculate the width of the CheckVal column. This can be // used as is as the field width for printf() when printing // the actual check value as it is hexadecimal. However, to // print the column heading, further calculation is needed // to handle a translated string (it's done a few lines later). assert(check_max <= LZMA_CHECK_SIZE_MAX); const int checkval_width = my_max( headings[HEADING_CHECKVAL].columns, (int)(2 * check_max)); // All except Check are right aligned; Check is left aligned. printf(" %s\n %*s %*s %*s %*s %*s %*s %*s %-*s", _(colon_strs[COLON_STR_BLOCKS]), HEADING_STR(HEADING_STREAM), HEADING_STR(HEADING_BLOCK), HEADING_STR(HEADING_COMPOFFSET), HEADING_STR(HEADING_UNCOMPOFFSET), HEADING_STR(HEADING_TOTALSIZE), HEADING_STR(HEADING_UNCOMPSIZE), HEADING_STR(HEADING_RATIO), detailed ? headings[HEADING_CHECK].fw : 1, _(headings[HEADING_CHECK].str)); if (detailed) { // CheckVal (Check value), Flags, and Filters are // left aligned. Block Header Size, CompSize, and // MemUsage are right aligned. Test with // "xz -lvv foo.xz". printf(" %-*s %*s %-*s %*s %*s %s", headings[HEADING_CHECKVAL].fw + checkval_width - headings[HEADING_CHECKVAL].columns, _(headings[HEADING_CHECKVAL].str), HEADING_STR(HEADING_HEADERSIZE), HEADING_STR(HEADING_HEADERFLAGS), HEADING_STR(HEADING_COMPSIZE), HEADING_STR(HEADING_MEMUSAGE), _(headings[HEADING_FILTERS].str)); } putchar('\n'); lzma_index_iter_init(&iter, xfi->idx); // Iterate over the Blocks. while (!lzma_index_iter_next(&iter, LZMA_INDEX_ITER_BLOCK)) { // If in detailed mode, collect the information from // Block Header before starting to print the next line. block_header_info bhi = BLOCK_HEADER_INFO_INIT; if (detailed && parse_details(pair, &iter, &bhi, xfi)) return true; const char *cols1[4] = { uint64_to_str(iter.stream.number, 0), uint64_to_str( iter.block.number_in_stream, 1), uint64_to_str( iter.block.compressed_file_offset, 2), uint64_to_str( iter.block.uncompressed_file_offset, 3) }; printf(" %*s %*s %*s %*s ", tuklib_mbstr_fw(cols1[0], headings[HEADING_STREAM].columns), cols1[0], tuklib_mbstr_fw(cols1[1], headings[HEADING_BLOCK].columns), cols1[1], tuklib_mbstr_fw(cols1[2], headings[HEADING_COMPOFFSET].columns), cols1[2], tuklib_mbstr_fw(cols1[3], headings[ HEADING_UNCOMPOFFSET].columns), cols1[3]); const char *cols2[4] = { uint64_to_str(iter.block.total_size, 0), uint64_to_str(iter.block.uncompressed_size, 1), get_ratio(iter.block.total_size, iter.block.uncompressed_size), _(check_names[iter.stream.flags->check]) }; printf("%*s %*s %*s %-*s", tuklib_mbstr_fw(cols2[0], headings[HEADING_TOTALSIZE].columns), cols2[0], tuklib_mbstr_fw(cols2[1], headings[HEADING_UNCOMPSIZE].columns), cols2[1], tuklib_mbstr_fw(cols2[2], headings[HEADING_RATIO].columns), cols2[2], tuklib_mbstr_fw(cols2[3], detailed ? headings[HEADING_CHECK].columns : 1), cols2[3]); if (detailed) { const lzma_vli compressed_size = iter.block.unpadded_size - bhi.header_size - lzma_check_size( iter.stream.flags->check); const char *cols3[6] = { check_value, uint64_to_str(bhi.header_size, 0), bhi.flags, uint64_to_str(compressed_size, 1), uint64_to_str( round_up_to_mib(bhi.memusage), 2), bhi.filter_chain }; // Show MiB for memory usage, because it // is the only size which is not in bytes. printf(" %-*s %*s %-*s %*s %*s MiB %s", checkval_width, cols3[0], tuklib_mbstr_fw(cols3[1], headings[ HEADING_HEADERSIZE].columns), cols3[1], tuklib_mbstr_fw(cols3[2], headings[ HEADING_HEADERFLAGS].columns), cols3[2], tuklib_mbstr_fw(cols3[3], headings[ HEADING_COMPSIZE].columns), cols3[3], tuklib_mbstr_fw(cols3[4], headings[ HEADING_MEMUSAGE].columns - 4), cols3[4], cols3[5]); } putchar('\n'); block_header_info_end(&bhi); } } if (detailed) { printf(" %-*s %s MiB\n", COLON_STR(COLON_STR_MEMORY_NEEDED), uint64_to_str( round_up_to_mib(xfi->memusage_max), 0)); printf(" %-*s %s\n", COLON_STR(COLON_STR_SIZES_IN_HEADERS), xfi->all_have_sizes ? _("Yes") : _("No")); //printf(" %-*s %s\n", COLON_STR(COLON_STR_MINIMUM_XZ_VERSION), - printf(_(" Minimum XZ Utils version: %s\n"), + printf(" %s %s\n", _("Minimum XZ Utils version:"), xz_ver_to_str(xfi->min_version)); } return false; } static bool print_info_robot(xz_file_info *xfi, file_pair *pair) { char checks[CHECKS_STR_SIZE]; get_check_names(checks, lzma_index_checks(xfi->idx), false); - printf("name\t%s\n", pair->src_name); + // Robot mode has to mask at least some control chars to prevent + // the output from getting out of sync if filename is malicious. + // Masking all non-printable chars is more than we need but + // perhaps this is good enough in practice. + printf("name\t%s\n", tuklib_mask_nonprint(pair->src_name)); printf("file\t%" PRIu64 "\t%" PRIu64 "\t%" PRIu64 "\t%" PRIu64 "\t%s\t%s\t%" PRIu64 "\n", lzma_index_stream_count(xfi->idx), lzma_index_block_count(xfi->idx), lzma_index_file_size(xfi->idx), lzma_index_uncompressed_size(xfi->idx), get_ratio(lzma_index_file_size(xfi->idx), lzma_index_uncompressed_size(xfi->idx)), checks, xfi->stream_padding); if (message_verbosity_get() >= V_VERBOSE) { lzma_index_iter iter; lzma_index_iter_init(&iter, xfi->idx); while (!lzma_index_iter_next(&iter, LZMA_INDEX_ITER_STREAM)) printf("stream\t%" PRIu64 "\t%" PRIu64 "\t%" PRIu64 "\t%" PRIu64 "\t%" PRIu64 "\t%" PRIu64 "\t%s\t%s\t%" PRIu64 "\n", iter.stream.number, iter.stream.block_count, iter.stream.compressed_offset, iter.stream.uncompressed_offset, iter.stream.compressed_size, iter.stream.uncompressed_size, get_ratio(iter.stream.compressed_size, iter.stream.uncompressed_size), check_names[iter.stream.flags->check], iter.stream.padding); lzma_index_iter_rewind(&iter); while (!lzma_index_iter_next(&iter, LZMA_INDEX_ITER_BLOCK)) { block_header_info bhi = BLOCK_HEADER_INFO_INIT; if (message_verbosity_get() >= V_DEBUG && parse_details( pair, &iter, &bhi, xfi)) return true; printf("block\t%" PRIu64 "\t%" PRIu64 "\t%" PRIu64 "\t%" PRIu64 "\t%" PRIu64 "\t%" PRIu64 "\t%" PRIu64 "\t%s\t%s", iter.stream.number, iter.block.number_in_stream, iter.block.number_in_file, iter.block.compressed_file_offset, iter.block.uncompressed_file_offset, iter.block.total_size, iter.block.uncompressed_size, get_ratio(iter.block.total_size, iter.block.uncompressed_size), check_names[iter.stream.flags->check]); if (message_verbosity_get() >= V_DEBUG) printf("\t%s\t%" PRIu32 "\t%s\t%" PRIu64 "\t%" PRIu64 "\t%s", check_value, bhi.header_size, bhi.flags, bhi.compressed_size, bhi.memusage, bhi.filter_chain); putchar('\n'); block_header_info_end(&bhi); } } if (message_verbosity_get() >= V_DEBUG) printf("summary\t%" PRIu64 "\t%s\t%" PRIu32 "\n", xfi->memusage_max, xfi->all_have_sizes ? "yes" : "no", xfi->min_version); return false; } static void update_totals(const xz_file_info *xfi) { // TODO: Integer overflow checks ++totals.files; totals.streams += lzma_index_stream_count(xfi->idx); totals.blocks += lzma_index_block_count(xfi->idx); totals.compressed_size += lzma_index_file_size(xfi->idx); totals.uncompressed_size += lzma_index_uncompressed_size(xfi->idx); totals.stream_padding += xfi->stream_padding; totals.checks |= lzma_index_checks(xfi->idx); if (totals.memusage_max < xfi->memusage_max) totals.memusage_max = xfi->memusage_max; if (totals.min_version < xfi->min_version) totals.min_version = xfi->min_version; totals.all_have_sizes &= xfi->all_have_sizes; return; } static void print_totals_basic(void) { // Print a separator line. char line[80]; memset(line, '-', sizeof(line)); line[sizeof(line) - 1] = '\0'; puts(line); // Get the check names. char checks[CHECKS_STR_SIZE]; get_check_names(checks, totals.checks, false); // Print the totals except the file count, which needs // special handling. printf("%5s %7s %11s %11s %5s %-7s ", uint64_to_str(totals.streams, 0), uint64_to_str(totals.blocks, 1), uint64_to_nicestr(totals.compressed_size, NICESTR_B, NICESTR_TIB, false, 2), uint64_to_nicestr(totals.uncompressed_size, NICESTR_B, NICESTR_TIB, false, 3), get_ratio(totals.compressed_size, totals.uncompressed_size), checks); #if defined(__sun) && (defined(__GNUC__) || defined(__clang__)) # pragma GCC diagnostic push # pragma GCC diagnostic ignored "-Wformat-nonliteral" #endif // Since we print totals only when there are at least two files, // the English message will always use "%s files". But some other // languages need different forms for different plurals so we // have to translate this with ngettext(). // // TRANSLATORS: %s is an integer. Only the plural form of this // message is used (e.g. "2 files"). Test with "xz -l foo.xz bar.xz". printf(ngettext("%s file\n", "%s files\n", totals.files <= ULONG_MAX ? totals.files : (totals.files % 1000000) + 1000000), uint64_to_str(totals.files, 0)); #if defined(__sun) && (defined(__GNUC__) || defined(__clang__)) # pragma GCC diagnostic pop #endif return; } static void print_totals_adv(void) { putchar('\n'); puts(_("Totals:")); printf(" %-*s %s\n", COLON_STR(COLON_STR_NUMBER_OF_FILES), uint64_to_str(totals.files, 0)); print_adv_helper(totals.streams, totals.blocks, totals.compressed_size, totals.uncompressed_size, totals.checks, totals.stream_padding); if (message_verbosity_get() >= V_DEBUG) { printf(" %-*s %s MiB\n", COLON_STR(COLON_STR_MEMORY_NEEDED), uint64_to_str( round_up_to_mib(totals.memusage_max), 0)); printf(" %-*s %s\n", COLON_STR(COLON_STR_SIZES_IN_HEADERS), totals.all_have_sizes ? _("Yes") : _("No")); //printf(" %-*s %s\n", COLON_STR(COLON_STR_MINIMUM_XZ_VERSION), - printf(_(" Minimum XZ Utils version: %s\n"), + printf(" %s %s\n", _("Minimum XZ Utils version:"), xz_ver_to_str(totals.min_version)); } return; } static void print_totals_robot(void) { char checks[CHECKS_STR_SIZE]; get_check_names(checks, totals.checks, false); printf("totals\t%" PRIu64 "\t%" PRIu64 "\t%" PRIu64 "\t%" PRIu64 "\t%s\t%s\t%" PRIu64 "\t%" PRIu64, totals.streams, totals.blocks, totals.compressed_size, totals.uncompressed_size, get_ratio(totals.compressed_size, totals.uncompressed_size), checks, totals.stream_padding, totals.files); if (message_verbosity_get() >= V_DEBUG) printf("\t%" PRIu64 "\t%s\t%" PRIu32, totals.memusage_max, totals.all_have_sizes ? "yes" : "no", totals.min_version); putchar('\n'); return; } extern void list_totals(void) { if (opt_robot) { // Always print totals in --robot mode. It can be convenient // in some cases and doesn't complicate usage of the // single-file case much. print_totals_robot(); } else if (totals.files > 1) { // For non-robot mode, totals are printed only if there // is more than one file. if (message_verbosity_get() <= V_WARNING) print_totals_basic(); else print_totals_adv(); } return; } extern void list_file(const char *filename) { if (opt_format != FORMAT_XZ && opt_format != FORMAT_AUTO) { // The 'lzmainfo' message is printed only when --format=lzma // is used (it is implied if using "lzma" as the command // name). Thus instead of using message_fatal(), print // the messages separately and then call tuklib_exit() // like message_fatal() does. message(V_ERROR, _("--list works only on .xz files " "(--format=xz or --format=auto)")); if (opt_format == FORMAT_LZMA) message(V_ERROR, _("Try 'lzmainfo' with .lzma files.")); tuklib_exit(E_ERROR, E_ERROR, false); } message_filename(filename); if (filename == stdin_filename) { message_error(_("--list does not support reading from " "standard input")); return; } init_field_widths(); // Unset opt_stdout so that io_open_src() won't accept special files. // Set opt_force so that io_open_src() will follow symlinks. opt_stdout = false; opt_force = true; file_pair *pair = io_open_src(filename); if (pair == NULL) return; xz_file_info xfi = XZ_FILE_INFO_INIT; if (!parse_indexes(&xfi, pair)) { bool fail; // We have three main modes: // - --robot, which has submodes if --verbose is specified // once or twice // - Normal --list without --verbose // - --list with one or two --verbose if (opt_robot) fail = print_info_robot(&xfi, pair); else if (message_verbosity_get() <= V_WARNING) fail = print_info_basic(&xfi, pair); else fail = print_info_adv(&xfi, pair); // Update the totals that are displayed after all // the individual files have been listed. Don't count // broken files. if (!fail) update_totals(&xfi); lzma_index_end(xfi.idx, NULL); } io_close(pair, false); return; } diff --git a/src/xz/main.c b/src/xz/main.c index 71b5ef7b7001..1b8b37881172 100644 --- a/src/xz/main.c +++ b/src/xz/main.c @@ -1,367 +1,371 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file main.c /// \brief main() // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "private.h" #include /// Exit status to use. This can be changed with set_exit_status(). static enum exit_status_type exit_status = E_SUCCESS; #if defined(_WIN32) && !defined(__CYGWIN__) /// exit_status has to be protected with a critical section due to /// how "signal handling" is done on Windows. See signals.c for details. static CRITICAL_SECTION exit_status_cs; #endif /// True if --no-warn is specified. When this is true, we don't set /// the exit status to E_WARNING when something worth a warning happens. static bool no_warn = false; extern void set_exit_status(enum exit_status_type new_status) { assert(new_status == E_WARNING || new_status == E_ERROR); #if defined(_WIN32) && !defined(__CYGWIN__) EnterCriticalSection(&exit_status_cs); #endif if (exit_status != E_ERROR) exit_status = new_status; #if defined(_WIN32) && !defined(__CYGWIN__) LeaveCriticalSection(&exit_status_cs); #endif return; } extern void set_exit_no_warn(void) { no_warn = true; return; } static const char * read_name(const args_info *args) { // FIXME: Maybe we should have some kind of memory usage limit here // like the tool has for the actual compression and decompression. // Giving some huge text file with --files0 makes us to read the // whole file in RAM. static char *name = NULL; static size_t size = 256; // Allocate the initial buffer. This is never freed, since after it // is no longer needed, the program exits very soon. It is safe to // use xmalloc() and xrealloc() in this function, because while // executing this function, no files are open for writing, and thus // there's no need to cleanup anything before exiting. if (name == NULL) name = xmalloc(size); // Write position in name size_t pos = 0; // Read one character at a time into name. while (!user_abort) { const int c = fgetc(args->files_file); if (ferror(args->files_file)) { // Take care of EINTR since we have established // the signal handlers already. if (errno == EINTR) continue; message_error(_("%s: Error reading filenames: %s"), - args->files_name, strerror(errno)); + tuklib_mask_nonprint(args->files_name), + strerror(errno)); return NULL; } if (feof(args->files_file)) { if (pos != 0) message_error(_("%s: Unexpected end of input " "when reading filenames"), - args->files_name); + tuklib_mask_nonprint( + args->files_name)); return NULL; } if (c == args->files_delim) { // We allow consecutive newline (--files) or '\0' // characters (--files0), and ignore such empty // filenames. if (pos == 0) continue; // A non-empty name was read. Terminate it with '\0' // and return it. name[pos] = '\0'; return name; } if (c == '\0') { // A null character was found when using --files, // which expects plain text input separated with // newlines. message_error(_("%s: Null character found when " "reading filenames; maybe you meant " "to use '--files0' instead " - "of '--files'?"), args->files_name); + "of '--files'?"), + tuklib_mask_nonprint( + args->files_name)); return NULL; } name[pos++] = c; // Allocate more memory if needed. There must always be space // at least for one character to allow terminating the string // with '\0'. if (pos == size) { size *= 2; name = xrealloc(name, size); } } return NULL; } int main(int argc, char **argv) { #if defined(_WIN32) && !defined(__CYGWIN__) InitializeCriticalSection(&exit_status_cs); #endif // Set up the progname variable needed for messages. tuklib_progname_init(argv); // Initialize the file I/O. This makes sure that // stdin, stdout, and stderr are something valid. // This must be done before we might open any files // even indirectly like locale and gettext initializations. io_init(); #ifdef ENABLE_SANDBOX // Enable such sandboxing that can always be enabled. // This requires that progname has been set up. // It's also good that io_init() has been called because it // might need to do things that the initial sandbox won't allow. // Otherwise this should be called as early as possible. // // NOTE: Calling this before tuklib_gettext_init() means that // translated error message won't be available if sandbox // initialization fails. However, sandbox_init() shouldn't // fail and this order simply feels better. sandbox_init(); #endif // Set up the locale and message translations. tuklib_gettext_init(PACKAGE, LOCALEDIR); // Initialize progress message handling. It's not always needed // but it's simpler to do this unconditionally. message_init(); // Set hardware-dependent default values. These can be overridden // on the command line, thus this must be done before args_parse(). hardware_init(); // Parse the command line arguments and get an array of filenames. // This doesn't return if something is wrong with the command line // arguments. If there are no arguments, one filename ("-") is still // returned to indicate stdin. args_info args; args_parse(&args, argc, argv); if (opt_mode != MODE_LIST && opt_robot) message_fatal(_("Compression and decompression with --robot " "are not supported yet.")); // Tell the message handling code how many input files there are if // we know it. This way the progress indicator can show it. if (args.files_name != NULL) message_set_files(0); else message_set_files(args.arg_count); // Refuse to write compressed data to standard output if it is // a terminal. if (opt_mode == MODE_COMPRESS) { if (opt_stdout || (args.arg_count == 1 && strcmp(args.arg_names[0], "-") == 0)) { if (is_tty_stdout()) { message_try_help(); tuklib_exit(E_ERROR, E_ERROR, false); } } } // Set up the signal handlers. We don't need these before we // start the actual action and not in --list mode, so this is // done after parsing the command line arguments. // // It's good to keep signal handlers in normal compression and // decompression modes even when only writing to stdout, because // we might need to restore O_APPEND flag on stdout before exiting. // In --test mode, signal handlers aren't really needed, but let's // keep them there for consistency with normal decompression. if (opt_mode != MODE_LIST) signals_init(); #ifdef ENABLE_SANDBOX // Read-only sandbox can be enabled if we won't create or delete // any files: // // - --stdout, --test, or --list was used. Note that --test // implies opt_stdout = true but --list doesn't. // // - Output goes to stdout because --files or --files0 wasn't used // and no arguments were given on the command line or the // arguments are all "-" (indicating standard input). bool to_stdout_only = opt_stdout || opt_mode == MODE_LIST; if (!to_stdout_only && args.files_name == NULL) { // If all of the filenames provided are "-" (more than one // "-" could be specified), then we are only going to be // writing to standard output. Note that if no filename args // were provided, args.c puts a single "-" in arg_names[0]. to_stdout_only = true; for (unsigned i = 0; i < args.arg_count; ++i) { if (strcmp("-", args.arg_names[i]) != 0) { to_stdout_only = false; break; } } } if (to_stdout_only) { sandbox_enable_read_only(); // Allow strict sandboxing if we are processing exactly one // file to standard output. This requires that --files or // --files0 wasn't specified (an unknown number of filenames // could be provided that way). if (args.files_name == NULL && args.arg_count == 1) sandbox_allow_strict(); } #endif // coder_run() handles compression, decompression, and testing. // list_file() is for --list. void (*run)(const char *filename) = &coder_run; #ifdef HAVE_DECODERS if (opt_mode == MODE_LIST) run = &list_file; #endif // Process the files given on the command line. Note that if no names // were given, args_parse() gave us a fake "-" filename. for (unsigned i = 0; i < args.arg_count && !user_abort; ++i) { if (strcmp("-", args.arg_names[i]) == 0) { // Processing from stdin to stdout. Check that we // aren't writing compressed data to a terminal or // reading it from a terminal. if (opt_mode == MODE_COMPRESS) { if (is_tty_stdout()) continue; } else if (is_tty_stdin()) { continue; } // It doesn't make sense to compress data from stdin // if we are supposed to read filenames from stdin // too (enabled with --files or --files0). if (args.files_name == stdin_filename) { message_error(_("Cannot read data from " "standard input when " "reading filenames " "from standard input")); continue; } // Replace the "-" with a special pointer, which is // recognized by coder_run() and other things. // This way error messages get a proper filename // string and the code still knows that it is // handling the special case of stdin. args.arg_names[i] = (char *)stdin_filename; } // Do the actual compression or decompression. run(args.arg_names[i]); } // If --files or --files0 was used, process the filenames from the // given file or stdin. Note that here we don't consider "-" to // indicate stdin like we do with the command line arguments. if (args.files_name != NULL) { // read_name() checks for user_abort so we don't need to // check it as loop termination condition. while (true) { const char *name = read_name(&args); if (name == NULL) break; // read_name() doesn't return empty names. assert(name[0] != '\0'); run(name); } if (args.files_name != stdin_filename) (void)fclose(args.files_file); } #ifdef HAVE_DECODERS // All files have now been handled. If in --list mode, display // the totals before exiting. We don't have signal handlers // enabled in --list mode, so we don't need to check user_abort. if (opt_mode == MODE_LIST) { assert(!user_abort); list_totals(); } #endif #ifndef NDEBUG coder_free(); args_free(); #endif // If we have got a signal, raise it to kill the program instead // of calling tuklib_exit(). signals_exit(); // Make a local copy of exit_status to keep the Windows code // thread safe. At this point it is fine if we miss the user // pressing C-c and don't set the exit_status to E_ERROR on // Windows. #if defined(_WIN32) && !defined(__CYGWIN__) EnterCriticalSection(&exit_status_cs); #endif enum exit_status_type es = exit_status; #if defined(_WIN32) && !defined(__CYGWIN__) LeaveCriticalSection(&exit_status_cs); #endif // Suppress the exit status indicating a warning if --no-warn // was specified. if (es == E_WARNING && no_warn) es = E_SUCCESS; tuklib_exit((int)es, E_ERROR, message_verbosity_get() != V_SILENT); } diff --git a/src/xz/message.c b/src/xz/message.c index deafdb438320..7657e85648da 100644 --- a/src/xz/message.c +++ b/src/xz/message.c @@ -1,1172 +1,1326 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file message.c /// \brief Printing messages // // Authors: Lasse Collin // Jia Tan // /////////////////////////////////////////////////////////////////////////////// #include "private.h" - +#include "tuklib_mbstr_wrap.h" #include /// Number of the current file static unsigned int files_pos = 0; /// Total number of input files; zero if unknown. static unsigned int files_total; /// Verbosity level static enum message_verbosity verbosity = V_WARNING; /// Filename which we will print with the verbose messages static const char *filename; /// True once the a filename has been printed to stderr as part of progress /// message. If automatic progress updating isn't enabled, this becomes true /// after the first progress message has been printed due to user sending /// SIGINFO, SIGUSR1, or SIGALRM. Once this variable is true, we will print /// an empty line before the next filename to make the output more readable. static bool first_filename_printed = false; /// This is set to true when we have printed the current filename to stderr /// as part of a progress message. This variable is useful only if not /// updating progress automatically: if user sends many SIGINFO, SIGUSR1, or /// SIGALRM signals, we won't print the name of the same file multiple times. static bool current_filename_printed = false; /// True if we should print progress indicator and update it automatically /// if also verbose >= V_VERBOSE. static bool progress_automatic = false; /// True if message_progress_start() has been called but /// message_progress_end() hasn't been called yet. static bool progress_started = false; /// This is true when a progress message was printed and the cursor is still /// on the same line with the progress message. In that case, a newline has /// to be printed before any error messages. static bool progress_active = false; /// Pointer to lzma_stream used to do the encoding or decoding. static lzma_stream *progress_strm; /// This is true if we are in passthru mode (not actually compressing or /// decompressing) and thus cannot use lzma_get_progress(progress_strm, ...). /// That is, we are using coder_passthru() in coder.c. static bool progress_is_from_passthru; /// Expected size of the input stream is needed to show completion percentage /// and estimate remaining time. static uint64_t expected_in_size; // Use alarm() and SIGALRM when they are supported. This has two minor // advantages over the alternative of polling gettimeofday(): // - It is possible for the user to send SIGINFO, SIGUSR1, or SIGALRM to // get intermediate progress information even when --verbose wasn't used // or stderr is not a terminal. // - alarm() + SIGALRM seems to have slightly less overhead than polling // gettimeofday(). #ifdef SIGALRM const int message_progress_sigs[] = { SIGALRM, #ifdef SIGINFO SIGINFO, #endif #ifdef SIGUSR1 SIGUSR1, #endif 0 }; /// The signal handler for SIGALRM sets this to true. It is set back to false /// once the progress message has been updated. static volatile sig_atomic_t progress_needs_updating = false; /// Signal handler for SIGALRM static void progress_signal_handler(int sig lzma_attribute((__unused__))) { progress_needs_updating = true; return; } #else /// This is true when progress message printing is wanted. Using the same /// variable name as above to avoid some ifdefs. static bool progress_needs_updating = false; /// Elapsed time when the next progress message update should be done. static uint64_t progress_next_update; #endif extern void message_init(void) { // If --verbose is used, we use a progress indicator if and only // if stderr is a terminal. If stderr is not a terminal, we print // verbose information only after finishing the file. As a special // exception, even if --verbose was not used, user can send SIGALRM // to make us print progress information once without automatic // updating. progress_automatic = is_tty(STDERR_FILENO); #ifdef SIGALRM // Establish the signal handlers which set a flag to tell us that // progress info should be updated. struct sigaction sa; sigemptyset(&sa.sa_mask); sa.sa_flags = 0; sa.sa_handler = &progress_signal_handler; for (size_t i = 0; message_progress_sigs[i] != 0; ++i) if (sigaction(message_progress_sigs[i], &sa, NULL)) message_signal_handler(); #endif return; } extern void message_verbosity_increase(void) { if (verbosity < V_DEBUG) ++verbosity; return; } extern void message_verbosity_decrease(void) { if (verbosity > V_SILENT) --verbosity; return; } extern enum message_verbosity message_verbosity_get(void) { return verbosity; } extern void message_set_files(unsigned int files) { files_total = files; return; } /// Prints the name of the current file if it hasn't been printed already, /// except if we are processing exactly one stream from stdin to stdout. /// I think it looks nicer to not print "(stdin)" when --verbose is used /// in a pipe and no other files are processed. static void print_filename(void) { if (!opt_robot && (files_total != 1 || filename != stdin_filename)) { signals_block(); FILE *file = opt_mode == MODE_LIST ? stdout : stderr; // If a file was already processed, put an empty line // before the next filename to improve readability. if (first_filename_printed) fputc('\n', file); first_filename_printed = true; current_filename_printed = true; // If we don't know how many files there will be due // to usage of --files or --files0. if (files_total == 0) - fprintf(file, "%s (%u)\n", filename, + fprintf(file, "%s (%u)\n", + tuklib_mask_nonprint(filename), files_pos); else - fprintf(file, "%s (%u/%u)\n", filename, + fprintf(file, "%s (%u/%u)\n", + tuklib_mask_nonprint(filename), files_pos, files_total); signals_unblock(); } return; } extern void message_filename(const char *src_name) { // Start numbering the files starting from one. ++files_pos; filename = src_name; if (verbosity >= V_VERBOSE && (progress_automatic || opt_mode == MODE_LIST)) print_filename(); else current_filename_printed = false; return; } extern void message_progress_start(lzma_stream *strm, bool is_passthru, uint64_t in_size) { // Store the pointer to the lzma_stream used to do the coding. // It is needed to find out the position in the stream. progress_strm = strm; progress_is_from_passthru = is_passthru; // Store the expected size of the file. If we aren't printing any // statistics, then is will be unused. But since it is possible // that the user sends us a signal to show statistics, we need // to have it available anyway. expected_in_size = in_size; // Indicate that progress info may need to be printed before // printing error messages. progress_started = true; // If progress indicator is wanted, print the filename and possibly // the file count now. if (verbosity >= V_VERBOSE && progress_automatic) { // Start the timer to display the first progress message // after one second. An alternative would be to show the // first message almost immediately, but delaying by one // second looks better to me, since extremely early // progress info is pretty much useless. #ifdef SIGALRM // First disable a possibly existing alarm. alarm(0); progress_needs_updating = false; alarm(1); #else progress_needs_updating = true; progress_next_update = 1000; #endif } return; } /// Make the string indicating completion percentage. static const char * progress_percentage(uint64_t in_pos) { // If the size of the input file is unknown or the size told us is // clearly wrong since we have processed more data than the alleged // size of the file, show a static string indicating that we have // no idea of the completion percentage. if (expected_in_size == 0 || in_pos > expected_in_size) return "--- %"; // Never show 100.0 % before we actually are finished. double percentage = (double)(in_pos) / (double)(expected_in_size) * 99.9; // Use big enough buffer to hold e.g. a multibyte decimal point. static char buf[16]; snprintf(buf, sizeof(buf), "%.1f %%", percentage); return buf; } /// Make the string containing the amount of input processed, amount of /// output produced, and the compression ratio. static const char * progress_sizes(uint64_t compressed_pos, uint64_t uncompressed_pos, bool final) { // Use big enough buffer to hold e.g. a multibyte thousand separators. static char buf[128]; char *pos = buf; size_t left = sizeof(buf); // Print the sizes. If this the final message, use more reasonable // units than MiB if the file was small. const enum nicestr_unit unit_min = final ? NICESTR_B : NICESTR_MIB; my_snprintf(&pos, &left, "%s / %s", uint64_to_nicestr(compressed_pos, unit_min, NICESTR_TIB, false, 0), uint64_to_nicestr(uncompressed_pos, unit_min, NICESTR_TIB, false, 1)); // Avoid division by zero. If we cannot calculate the ratio, set // it to some nice number greater than 10.0 so that it gets caught // in the next if-clause. const double ratio = uncompressed_pos > 0 ? (double)(compressed_pos) / (double)(uncompressed_pos) : 16.0; // If the ratio is very bad, just indicate that it is greater than // 9.999. This way the length of the ratio field stays fixed. if (ratio > 9.999) snprintf(pos, left, " > %.3f", 9.999); else snprintf(pos, left, " = %.3f", ratio); return buf; } /// Make the string containing the processing speed of uncompressed data. static const char * progress_speed(uint64_t uncompressed_pos, uint64_t elapsed) { // Don't print the speed immediately, since the early values look // somewhat random. if (elapsed < 3000) return ""; // The first character of KiB/s, MiB/s, or GiB/s: static const char unit[] = { 'K', 'M', 'G' }; size_t unit_index = 0; // Calculate the speed as KiB/s. double speed = (double)(uncompressed_pos) / ((double)(elapsed) * (1024.0 / 1000.0)); // Adjust the unit of the speed if needed. while (speed > 999.0) { speed /= 1024.0; if (++unit_index == ARRAY_SIZE(unit)) return ""; // Way too fast ;-) } // Use decimal point only if the number is small. Examples: // - 0.1 KiB/s // - 9.9 KiB/s // - 99 KiB/s // - 999 KiB/s // Use big enough buffer to hold e.g. a multibyte decimal point. static char buf[16]; snprintf(buf, sizeof(buf), "%.*f %ciB/s", speed > 9.9 ? 0 : 1, speed, unit[unit_index]); return buf; } /// Make a string indicating elapsed time. The format is either /// M:SS or H:MM:SS depending on if the time is an hour or more. static const char * progress_time(uint64_t mseconds) { // 9999 hours = 416 days static char buf[sizeof("9999:59:59")]; // 32-bit variable is enough for elapsed time (136 years). uint32_t seconds = (uint32_t)(mseconds / 1000); // Don't show anything if the time is zero or ridiculously big. if (seconds == 0 || seconds > ((9999 * 60) + 59) * 60 + 59) return ""; uint32_t minutes = seconds / 60; seconds %= 60; if (minutes >= 60) { const uint32_t hours = minutes / 60; minutes %= 60; snprintf(buf, sizeof(buf), "%" PRIu32 ":%02" PRIu32 ":%02" PRIu32, hours, minutes, seconds); } else { snprintf(buf, sizeof(buf), "%" PRIu32 ":%02" PRIu32, minutes, seconds); } return buf; } /// Return a string containing estimated remaining time when /// reasonably possible. static const char * progress_remaining(uint64_t in_pos, uint64_t elapsed) { // Don't show the estimated remaining time when it wouldn't // make sense: // - Input size is unknown. // - Input has grown bigger since we started (de)compressing. // - We haven't processed much data yet, so estimate would be // too inaccurate. // - Only a few seconds has passed since we started (de)compressing, // so estimate would be too inaccurate. if (expected_in_size == 0 || in_pos > expected_in_size || in_pos < (UINT64_C(1) << 19) || elapsed < 8000) return ""; // Calculate the estimate. Don't give an estimate of zero seconds, // since it is possible that all the input has been already passed // to the library, but there is still quite a bit of output pending. uint32_t remaining = (uint32_t)((double)(expected_in_size - in_pos) * ((double)(elapsed) / 1000.0) / (double)(in_pos)); if (remaining < 1) remaining = 1; static char buf[sizeof("9 h 55 min")]; // Select appropriate precision for the estimated remaining time. if (remaining <= 10) { // A maximum of 10 seconds remaining. // Show the number of seconds as is. snprintf(buf, sizeof(buf), "%" PRIu32 " s", remaining); } else if (remaining <= 50) { // A maximum of 50 seconds remaining. // Round up to the next multiple of five seconds. remaining = (remaining + 4) / 5 * 5; snprintf(buf, sizeof(buf), "%" PRIu32 " s", remaining); } else if (remaining <= 590) { // A maximum of 9 minutes and 50 seconds remaining. // Round up to the next multiple of ten seconds. remaining = (remaining + 9) / 10 * 10; snprintf(buf, sizeof(buf), "%" PRIu32 " min %" PRIu32 " s", remaining / 60, remaining % 60); } else if (remaining <= 59 * 60) { // A maximum of 59 minutes remaining. // Round up to the next multiple of a minute. remaining = (remaining + 59) / 60; snprintf(buf, sizeof(buf), "%" PRIu32 " min", remaining); } else if (remaining <= 9 * 3600 + 50 * 60) { // A maximum of 9 hours and 50 minutes left. // Round up to the next multiple of ten minutes. remaining = (remaining + 599) / 600 * 10; snprintf(buf, sizeof(buf), "%" PRIu32 " h %" PRIu32 " min", remaining / 60, remaining % 60); } else if (remaining <= 23 * 3600) { // A maximum of 23 hours remaining. // Round up to the next multiple of an hour. remaining = (remaining + 3599) / 3600; snprintf(buf, sizeof(buf), "%" PRIu32 " h", remaining); } else if (remaining <= 9 * 24 * 3600 + 23 * 3600) { // A maximum of 9 days and 23 hours remaining. // Round up to the next multiple of an hour. remaining = (remaining + 3599) / 3600; snprintf(buf, sizeof(buf), "%" PRIu32 " d %" PRIu32 " h", remaining / 24, remaining % 24); } else if (remaining <= 999 * 24 * 3600) { // A maximum of 999 days remaining. ;-) // Round up to the next multiple of a day. remaining = (remaining + 24 * 3600 - 1) / (24 * 3600); snprintf(buf, sizeof(buf), "%" PRIu32 " d", remaining); } else { // The estimated remaining time is too big. Don't show it. return ""; } return buf; } /// Get how much uncompressed and compressed data has been processed. static void progress_pos(uint64_t *in_pos, uint64_t *compressed_pos, uint64_t *uncompressed_pos) { uint64_t out_pos; if (progress_is_from_passthru) { // In passthru mode the progress info is in total_in/out but // the *progress_strm itself isn't initialized and thus we // cannot use lzma_get_progress(). *in_pos = progress_strm->total_in; out_pos = progress_strm->total_out; } else { lzma_get_progress(progress_strm, in_pos, &out_pos); } // It cannot have processed more input than it has been given. assert(*in_pos <= progress_strm->total_in); // It cannot have produced more output than it claims to have ready. assert(out_pos >= progress_strm->total_out); if (opt_mode == MODE_COMPRESS) { *compressed_pos = out_pos; *uncompressed_pos = *in_pos; } else { *compressed_pos = *in_pos; *uncompressed_pos = out_pos; } return; } extern void message_progress_update(void) { if (!progress_needs_updating) return; // Calculate how long we have been processing this file. const uint64_t elapsed = mytime_get_elapsed(); #ifndef SIGALRM if (progress_next_update > elapsed) return; progress_next_update = elapsed + 1000; #endif // Get our current position in the stream. uint64_t in_pos; uint64_t compressed_pos; uint64_t uncompressed_pos; progress_pos(&in_pos, &compressed_pos, &uncompressed_pos); // Block signals so that fprintf() doesn't get interrupted. signals_block(); // Print the filename if it hasn't been printed yet. if (!current_filename_printed) print_filename(); // Print the actual progress message. The idea is that there is at // least three spaces between the fields in typical situations, but // even in rare situations there is at least one space. const char *cols[5] = { progress_percentage(in_pos), progress_sizes(compressed_pos, uncompressed_pos, false), progress_speed(uncompressed_pos, elapsed), progress_time(elapsed), progress_remaining(in_pos, elapsed), }; fprintf(stderr, "\r %*s %*s %*s %10s %10s\r", tuklib_mbstr_fw(cols[0], 6), cols[0], tuklib_mbstr_fw(cols[1], 35), cols[1], tuklib_mbstr_fw(cols[2], 9), cols[2], cols[3], cols[4]); #ifdef SIGALRM // Updating the progress info was finished. Reset // progress_needs_updating to wait for the next SIGALRM. // // NOTE: This has to be done before alarm(1) or with (very) bad // luck we could be setting this to false after the alarm has already // been triggered. progress_needs_updating = false; if (verbosity >= V_VERBOSE && progress_automatic) { // Mark that the progress indicator is active, so if an error // occurs, the error message gets printed cleanly. progress_active = true; // Restart the timer so that progress_needs_updating gets // set to true after about one second. alarm(1); } else { // The progress message was printed because user had sent us // SIGALRM. In this case, each progress message is printed // on its own line. fputc('\n', stderr); } #else // When SIGALRM isn't supported and we get here, it's always due to // automatic progress update. We set progress_active here too like // described above. assert(verbosity >= V_VERBOSE); assert(progress_automatic); progress_active = true; #endif signals_unblock(); return; } static void progress_flush(bool finished) { if (!progress_started || verbosity < V_VERBOSE) return; uint64_t in_pos; uint64_t compressed_pos; uint64_t uncompressed_pos; progress_pos(&in_pos, &compressed_pos, &uncompressed_pos); // Avoid printing intermediate progress info if some error occurs // in the beginning of the stream. (If something goes wrong later in // the stream, it is sometimes useful to tell the user where the // error approximately occurred, especially if the error occurs // after a time-consuming operation.) if (!finished && !progress_active && (compressed_pos == 0 || uncompressed_pos == 0)) return; progress_active = false; const uint64_t elapsed = mytime_get_elapsed(); signals_block(); // When using the auto-updating progress indicator, the final // statistics are printed in the same format as the progress // indicator itself. if (progress_automatic) { const char *cols[5] = { finished ? "100 %" : progress_percentage(in_pos), progress_sizes(compressed_pos, uncompressed_pos, true), progress_speed(uncompressed_pos, elapsed), progress_time(elapsed), finished ? "" : progress_remaining(in_pos, elapsed), }; fprintf(stderr, "\r %*s %*s %*s %10s %10s\n", tuklib_mbstr_fw(cols[0], 6), cols[0], tuklib_mbstr_fw(cols[1], 35), cols[1], tuklib_mbstr_fw(cols[2], 9), cols[2], cols[3], cols[4]); } else { // The filename is always printed. - fprintf(stderr, _("%s: "), filename); + fprintf(stderr, _("%s: "), tuklib_mask_nonprint(filename)); // Percentage is printed only if we didn't finish yet. if (!finished) { // Don't print the percentage when it isn't known // (starts with a dash). const char *percentage = progress_percentage(in_pos); if (percentage[0] != '-') fprintf(stderr, "%s, ", percentage); } // Size information is always printed. fprintf(stderr, "%s", progress_sizes( compressed_pos, uncompressed_pos, true)); // The speed and elapsed time aren't always shown. const char *speed = progress_speed(uncompressed_pos, elapsed); if (speed[0] != '\0') fprintf(stderr, ", %s", speed); const char *elapsed_str = progress_time(elapsed); if (elapsed_str[0] != '\0') fprintf(stderr, ", %s", elapsed_str); fputc('\n', stderr); } signals_unblock(); return; } extern void message_progress_end(bool success) { assert(progress_started); progress_flush(success); progress_started = false; return; } static void vmessage(enum message_verbosity v, const char *fmt, va_list ap) { if (v <= verbosity) { signals_block(); progress_flush(false); // TRANSLATORS: This is the program name in the beginning // of the line in messages. Usually it becomes "xz: ". // This is a translatable string because French needs // a space before a colon. fprintf(stderr, _("%s: "), progname); #ifdef __clang__ # pragma GCC diagnostic push # pragma GCC diagnostic ignored "-Wformat-nonliteral" #endif vfprintf(stderr, fmt, ap); #ifdef __clang__ # pragma GCC diagnostic pop #endif fputc('\n', stderr); signals_unblock(); } return; } extern void message(enum message_verbosity v, const char *fmt, ...) { va_list ap; va_start(ap, fmt); vmessage(v, fmt, ap); va_end(ap); return; } extern void message_warning(const char *fmt, ...) { va_list ap; va_start(ap, fmt); vmessage(V_WARNING, fmt, ap); va_end(ap); set_exit_status(E_WARNING); return; } extern void message_error(const char *fmt, ...) { va_list ap; va_start(ap, fmt); vmessage(V_ERROR, fmt, ap); va_end(ap); set_exit_status(E_ERROR); return; } extern void message_fatal(const char *fmt, ...) { va_list ap; va_start(ap, fmt); vmessage(V_ERROR, fmt, ap); va_end(ap); tuklib_exit(E_ERROR, E_ERROR, false); } extern void message_bug(void) { message_fatal(_("Internal error (bug)")); } extern void message_signal_handler(void) { message_fatal(_("Cannot establish signal handlers")); } extern const char * message_strm(lzma_ret code) { switch (code) { case LZMA_NO_CHECK: return _("No integrity check; not verifying file integrity"); case LZMA_UNSUPPORTED_CHECK: return _("Unsupported type of integrity check; " "not verifying file integrity"); case LZMA_MEM_ERROR: return strerror(ENOMEM); case LZMA_MEMLIMIT_ERROR: return _("Memory usage limit reached"); case LZMA_FORMAT_ERROR: return _("File format not recognized"); case LZMA_OPTIONS_ERROR: return _("Unsupported options"); case LZMA_DATA_ERROR: return _("Compressed data is corrupt"); case LZMA_BUF_ERROR: return _("Unexpected end of input"); case LZMA_OK: case LZMA_STREAM_END: case LZMA_GET_CHECK: case LZMA_PROG_ERROR: case LZMA_SEEK_NEEDED: case LZMA_RET_INTERNAL1: case LZMA_RET_INTERNAL2: case LZMA_RET_INTERNAL3: case LZMA_RET_INTERNAL4: case LZMA_RET_INTERNAL5: case LZMA_RET_INTERNAL6: case LZMA_RET_INTERNAL7: case LZMA_RET_INTERNAL8: // Without "default", compiler will warn if new constants // are added to lzma_ret, it is not too easy to forget to // add the new constants to this function. break; } return _("Internal error (bug)"); } extern void message_mem_needed(enum message_verbosity v, uint64_t memusage) { if (v > verbosity) return; // Convert memusage to MiB, rounding up to the next full MiB. // This way the user can always use the displayed usage as // the new memory usage limit. (If we rounded to the nearest, // the user might need to +1 MiB to get high enough limit.) memusage = round_up_to_mib(memusage); uint64_t memlimit = hardware_memlimit_get(opt_mode); // Handle the case when there is no memory usage limit. // This way we don't print a weird message with a huge number. if (memlimit == UINT64_MAX) { message(v, _("%s MiB of memory is required. " "The limiter is disabled."), uint64_to_str(memusage, 0)); return; } // With US-ASCII: // 2^64 with thousand separators + " MiB" suffix + '\0' = 26 + 4 + 1 // But there may be multibyte chars so reserve enough space. char memlimitstr[128]; // Show the memory usage limit as MiB unless it is less than 1 MiB. // This way it's easy to notice errors where one has typed // --memory=123 instead of --memory=123MiB. if (memlimit < (UINT32_C(1) << 20)) { snprintf(memlimitstr, sizeof(memlimitstr), "%s B", uint64_to_str(memlimit, 1)); } else { // Round up just like with memusage. If this function is // called for informational purposes (to just show the // current usage and limit), we should never show that // the usage is higher than the limit, which would give // a false impression that the memory usage limit isn't // properly enforced. snprintf(memlimitstr, sizeof(memlimitstr), "%s MiB", uint64_to_str(round_up_to_mib(memlimit), 1)); } message(v, _("%s MiB of memory is required. The limit is %s."), uint64_to_str(memusage, 0), memlimitstr); return; } extern void message_filters_show(enum message_verbosity v, const lzma_filter *filters) { if (v > verbosity) return; char *buf; const lzma_ret ret = lzma_str_from_filters(&buf, filters, LZMA_STR_ENCODER | LZMA_STR_GETOPT_LONG, NULL); if (ret != LZMA_OK) message_fatal("%s", message_strm(ret)); fprintf(stderr, _("%s: Filter chain: %s\n"), progname, buf); free(buf); return; } extern void message_try_help(void) { // Print this with V_WARNING instead of V_ERROR to prevent it from // showing up when --quiet has been specified. message(V_WARNING, _("Try '%s --help' for more information."), progname); return; } extern void message_version(void) { // It is possible that liblzma version is different than the command // line tool version, so print both. if (opt_robot) { printf("XZ_VERSION=%" PRIu32 "\nLIBLZMA_VERSION=%" PRIu32 "\n", LZMA_VERSION, lzma_version_number()); } else { printf("xz (" PACKAGE_NAME ") " LZMA_VERSION_STRING "\n"); printf("liblzma %s\n", lzma_version_string()); } tuklib_exit(E_SUCCESS, E_ERROR, verbosity != V_SILENT); } -extern void -message_help(bool long_help) +static void +detect_wrapping_errors(int error_mask) { - printf(_("Usage: %s [OPTION]... [FILE]...\n" - "Compress or decompress FILEs in the .xz format.\n\n"), - progname); +#ifndef NDEBUG + // This might help in catching problematic strings in translations. + // It's a debug message so don't translate this. + if (error_mask & TUKLIB_WRAP_WARN_OVERLONG) + message_fatal("The help text contains overlong lines"); +#endif - // NOTE: The short help doesn't currently have options that - // take arguments. - if (long_help) - puts(_("Mandatory arguments to long options are mandatory " - "for short options too.\n")); + if (error_mask & ~TUKLIB_WRAP_WARN_OVERLONG) + message_fatal(_("Error printing the help text " + "(error code %d)"), error_mask); - if (long_help) - puts(_(" Operation mode:\n")); + return; +} - puts(_( -" -z, --compress force compression\n" -" -d, --decompress force decompression\n" -" -t, --test test compressed file integrity\n" -" -l, --list list information about .xz files")); - if (long_help) - puts(_("\n Operation modifiers:\n")); +extern void +message_help(bool long_help) +{ + static const struct tuklib_wrap_opt wrap0 = { 0, 0, 0, 0, 79 }; + static const struct tuklib_wrap_opt wrap1 = { 1, 1, 1, 1, 79 }; + static const struct tuklib_wrap_opt wrap2 = { 2, 2, 22, 22, 79 }; + static const struct tuklib_wrap_opt wrap3 = { 24, 24, 36, 36, 79 }; - puts(_( -" -k, --keep keep (don't delete) input files\n" -" -f, --force force overwrite of output file and (de)compress links\n" -" -c, --stdout write to standard output and don't delete input files")); - // NOTE: --to-stdout isn't included above because it's not - // the recommended spelling. It was copied from gzip but other - // compressors with gzip-like syntax don't support it. + // Accumulated error codes from tuklib_wraps() and tuklib_wrapf() + int e = 0; + + printf(_("Usage: %s [OPTION]... [FILE]...\n"), progname); + e |= tuklib_wraps(stdout, &wrap0, + W_("Compress or decompress FILEs in the .xz format.")); + putchar('\n'); + + e |= tuklib_wraps(stdout, &wrap0, + W_("Mandatory arguments to long options are " + "mandatory for short options too.")); + putchar('\n'); if (long_help) { - puts(_( -" --single-stream decompress only the first stream, and silently\n" -" ignore possible remaining input data")); - puts(_( -" --no-sparse do not create sparse files when decompressing\n" -" -S, --suffix=.SUF use the suffix '.SUF' on compressed files\n" -" --files[=FILE] read filenames to process from FILE; if FILE is\n" -" omitted, filenames are read from the standard input;\n" -" filenames must be terminated with the newline character\n" -" --files0[=FILE] like --files but use the null character as terminator")); + e |= tuklib_wraps(stdout, &wrap1, W_("Operation mode:")); + putchar('\n'); } + e |= tuklib_wrapf(stdout, &wrap2, + "-z, --compress\v%s\r" + "-d, --decompress\v%s\r" + "-t, --test\v%s\r" + "-l, --list\v%s", + W_("force compression"), + W_("force decompression"), + W_("test compressed file integrity"), + W_("list information about .xz files")); + if (long_help) { - puts(_("\n Basic file format and compression options:\n")); - puts(_( -" -F, --format=FMT file format to encode or decode; possible values are\n" -" 'auto' (default), 'xz', 'lzma', 'lzip', and 'raw'\n" -" -C, --check=CHECK integrity check type: 'none' (use with caution),\n" -" 'crc32', 'crc64' (default), or 'sha256'")); - puts(_( -" --ignore-check don't verify the integrity check when decompressing")); + putchar('\n'); + e |= tuklib_wraps(stdout, &wrap1, W_("Operation modifiers:")); + putchar('\n'); } - puts(_( -" -0 ... -9 compression preset; default is 6; take compressor *and*\n" -" decompressor memory usage into account before using 7-9!")); + e |= tuklib_wrapf(stdout, &wrap2, + "-k, --keep\v%s\r" + "-f, --force\v%s\r" + "-c, --stdout\v%s", + W_("keep (don't delete) input files"), + W_("force overwrite of output file and (de)compress links"), + W_("write to standard output and don't delete input files")); + // NOTE: --to-stdout isn't included above because it's not + // the recommended spelling. It was copied from gzip but other + // compressors with gzip-like syntax don't support it. - puts(_( -" -e, --extreme try to improve compression ratio by using more CPU time;\n" -" does not affect decompressor memory requirements")); + if (long_help) { + e |= tuklib_wrapf(stdout, &wrap2, + " --no-sync\v%s\r" + " --single-stream\v%s\r" + " --no-sparse\v%s\r" + "-S, --suffix=%s\v%s\r" + " --files[=%s]\v%s\r" + " --files0[=%s]\v%s\r", + W_("don't synchronize the output file to the storage " + "device before removing the input file"), + W_("decompress only the first stream, and silently " + "ignore possible remaining input data"), + W_("do not create sparse files when decompressing"), + _(".SUF"), + W_("use the suffix '.SUF' on compressed files"), + _("FILE"), + W_("read filenames to process from FILE; " + "if FILE is omitted, " + "filenames are read from the standard input; " + "filenames must be terminated with " + "the newline character"), + _("FILE"), + W_("like --files but use the null character as " + "terminator")); + + e |= tuklib_wraps(stdout, &wrap1, + W_("Basic file format and compression options:")); + + e |= tuklib_wrapf(stdout, &wrap2, + "\n" + "-F, --format=%s\v%s\r" + "-C, --check=%s\v%s\r" + " --ignore-check\v%s", + _("FORMAT"), + W_("file format to encode or decode; possible values " + "are 'auto' (default), 'xz', 'lzma', 'lzip', " + "and 'raw'"), + _("NAME"), + W_("integrity check type: 'none' (use with caution), " + "'crc32', 'crc64' (default), or 'sha256'"), + W_("don't verify the integrity check when " + "decompressing")); + } - puts(_( -" -T, --threads=NUM use at most NUM threads; the default is 0 which uses\n" -" as many threads as there are processor cores")); + e |= tuklib_wrapf(stdout, &wrap2, + "-0 ... -9\v%s\r" + "-e, --extreme\v%s\r" + "-T, --threads=%s\v%s", + W_("compression preset; default is 6; take compressor *and* " + "decompressor memory usage into account before " + "using 7-9!"), + W_("try to improve compression ratio by using more CPU time; " + "does not affect decompressor memory requirements"), + // TRANSLATORS: Short for NUMBER. A longer string is fine but + // wider than 5 columns makes --long-help a few lines longer. + _("NUM"), + W_("use at most NUM threads; the default is 0 which uses " + "as many threads as there are processor cores")); if (long_help) { - puts(_( -" --block-size=SIZE\n" -" start a new .xz block after every SIZE bytes of input;\n" -" use this to set the block size for threaded compression")); - puts(_( -" --block-list=BLOCKS\n" -" start a new .xz block after the given comma-separated\n" -" intervals of uncompressed data; optionally, specify a\n" -" filter chain number (0-9) followed by a ':' before the\n" -" uncompressed data size")); - puts(_( -" --flush-timeout=TIMEOUT\n" -" when compressing, if more than TIMEOUT milliseconds has\n" -" passed since the previous flush and reading more input\n" -" would block, all pending data is flushed out" - )); - puts(_( // xgettext:no-c-format -" --memlimit-compress=LIMIT\n" -" --memlimit-decompress=LIMIT\n" -" --memlimit-mt-decompress=LIMIT\n" -" -M, --memlimit=LIMIT\n" -" set memory usage limit for compression, decompression,\n" -" threaded decompression, or all of these; LIMIT is in\n" -" bytes, % of RAM, or 0 for defaults")); - - puts(_( -" --no-adjust if compression settings exceed the memory usage limit,\n" -" give an error instead of adjusting the settings downwards")); + e |= tuklib_wrapf(stdout, &wrap2, + " --block-size=%s\v%s\r" + " --block-list=%s\v%s\r" + " --flush-timeout=%s\v%s", + _("SIZE"), + W_("start a new .xz block after every SIZE bytes " + "of input; use this to set the block size " + "for threaded compression"), + _("BLOCKS"), + W_("start a new .xz block after the given " + "comma-separated intervals of uncompressed " + "data; optionally, specify a " + "filter chain number (0-9) followed by " + "a ':' before the uncompressed data size"), + _("NUM"), + W_("when compressing, if more than NUM " + "milliseconds has passed since the previous " + "flush and reading more input would block, " + "all pending data is flushed out")); + + e |= tuklib_wrapf(stdout, &wrap2, + " --memlimit-compress=%s\n" + " --memlimit-decompress=%s\n" + " --memlimit-mt-decompress=%s\n" + "-M, --memlimit=%s\v%s\r" + " --no-adjust\v%s", + _("LIMIT"), + _("LIMIT"), + _("LIMIT"), + _("LIMIT"), + // xgettext:no-c-format + W_("set memory usage limit for compression, " + "decompression, threaded decompression, " + "or all of these; LIMIT is in " + "bytes, % of RAM, or 0 for defaults"), + W_("if compression settings exceed the " + "memory usage limit, " + "give an error instead of adjusting " + "the settings downwards")); } if (long_help) { - puts(_( -"\n Custom filter chain for compression (alternative for using presets):")); - - puts(_( -"\n" -" --filters=FILTERS set the filter chain using the liblzma filter string\n" -" syntax; use --filters-help for more information" - )); - - puts(_( -" --filters1=FILTERS ... --filters9=FILTERS\n" -" set additional filter chains using the liblzma filter\n" -" string syntax to use with --block-list" - )); - - puts(_( -" --filters-help display more information about the liblzma filter string\n" -" syntax and exit." - )); + putchar('\n'); + + e |= tuklib_wraps(stdout, &wrap1, + W_("Custom filter chain for compression " + "(an alternative to using presets):")); + + e |= tuklib_wrapf(stdout, &wrap2, + "\n" + "--filters=%s\v%s\r" + "--filters1=%s ... --filters9=%s\v%s\r" + "--filters-help\v%s", + _("FILTERS"), + W_("set the filter chain using the " + "liblzma filter string syntax; " + "use --filters-help for more information"), + _("FILTERS"), + _("FILTERS"), + W_("set additional filter chains using the " + "liblzma filter string syntax to use " + "with --block-list"), + W_("display more information about the " + "liblzma filter string syntax and exit")); #if defined(HAVE_ENCODER_LZMA1) || defined(HAVE_DECODER_LZMA1) \ || defined(HAVE_ENCODER_LZMA2) || defined(HAVE_DECODER_LZMA2) - // TRANSLATORS: The word "literal" in "literal context bits" - // means how many "context bits" to use when encoding - // literals. A literal is a single 8-bit byte. It doesn't - // mean "literally" here. - puts(_( -"\n" -" --lzma1[=OPTS] LZMA1 or LZMA2; OPTS is a comma-separated list of zero or\n" -" --lzma2[=OPTS] more of the following options (valid values; default):\n" -" preset=PRE reset options to a preset (0-9[e])\n" -" dict=NUM dictionary size (4KiB - 1536MiB; 8MiB)\n" -" lc=NUM number of literal context bits (0-4; 3)\n" -" lp=NUM number of literal position bits (0-4; 0)\n" -" pb=NUM number of position bits (0-4; 2)\n" -" mode=MODE compression mode (fast, normal; normal)\n" -" nice=NUM nice length of a match (2-273; 64)\n" -" mf=NAME match finder (hc3, hc4, bt2, bt3, bt4; bt4)\n" -" depth=NUM maximum search depth; 0=automatic (default)")); + e |= tuklib_wrapf(stdout, &wrap2, + "\n" + "--lzma1[=%s]\n" + "--lzma2[=%s]\v%s", + // TRANSLATORS: Short for OPTIONS. + _("OPTS"), + _("OPTS"), + // TRANSLATORS: Use semicolon (or its fullwidth form) + // in "(valid values; default)" even if it is weird in + // your language. There are non-translatable strings + // that look like "(foo, bar, baz; foo)" which list + // the supported values and the default value. + W_("LZMA1 or LZMA2; OPTS is a comma-separated list " + "of zero or more of the following options " + "(valid values; default):")); + + e |= tuklib_wrapf(stdout, &wrap3, + "preset=%s\v%s (0-9[e])\r" + "dict=%s\v%s \b(4KiB - 1536MiB; 8MiB)\b\r" + "lc=%s\v%s \b(0-4; 3)\b\r" + "lp=%s\v%s \b(0-4; 0)\b\r" + "pb=%s\v%s \b(0-4; 2)\b\r" + "mode=%s\v%s (fast, normal; normal)\r" + "nice=%s\v%s \b(2-273; 64)\b\r" + "mf=%s\v%s (hc3, hc4, bt2, bt3, bt4; bt4)\r" + "depth=%s\v%s", + // TRANSLATORS: Short for PRESET. A longer string is + // fine but wider than 4 columns makes --long-help + // one line longer. + _("PRE"), + W_("reset options to a preset"), + _("NUM"), W_("dictionary size"), + _("NUM"), + // TRANSLATORS: The word "literal" in "literal context + // bits" means how many "context bits" to use when + // encoding literals. A literal is a single 8-bit + // byte. It doesn't mean "literally" here. + W_("number of literal context bits"), + _("NUM"), W_("number of literal position bits"), + _("NUM"), W_("number of position bits"), + _("MODE"), W_("compression mode"), + _("NUM"), W_("nice length of a match"), + _("NAME"), W_("match finder"), + _("NUM"), W_("maximum search depth; " + "0=automatic (default)")); #endif - puts(_( -"\n" -" --x86[=OPTS] x86 BCJ filter (32-bit and 64-bit)\n" -" --arm[=OPTS] ARM BCJ filter\n" -" --armthumb[=OPTS] ARM-Thumb BCJ filter\n" -" --arm64[=OPTS] ARM64 BCJ filter\n" -" --powerpc[=OPTS] PowerPC BCJ filter (big endian only)\n" -" --ia64[=OPTS] IA-64 (Itanium) BCJ filter\n" -" --sparc[=OPTS] SPARC BCJ filter\n" -" --riscv[=OPTS] RISC-V BCJ filter\n" -" Valid OPTS for all BCJ filters:\n" -" start=NUM start offset for conversions (default=0)")); + e |= tuklib_wrapf(stdout, &wrap2, + "\n" + "--x86[=%s]\v%s\r" + "--arm[=%s]\v%s\r" + "--armthumb[=%s]\v%s\r" + "--arm64[=%s]\v%s\r" + "--powerpc[=%s]\v%s\r" + "--ia64[=%s]\v%s\r" + "--sparc[=%s]\v%s\r" + "--riscv[=%s]\v%s\r" + "\v%s", + _("OPTS"), + W_("x86 BCJ filter (32-bit and 64-bit)"), + _("OPTS"), + W_("ARM BCJ filter"), + _("OPTS"), + W_("ARM-Thumb BCJ filter"), + _("OPTS"), + W_("ARM64 BCJ filter"), + _("OPTS"), + W_("PowerPC BCJ filter (big endian only)"), + _("OPTS"), + W_("IA-64 (Itanium) BCJ filter"), + _("OPTS"), + W_("SPARC BCJ filter"), + _("OPTS"), + W_("RISC-V BCJ filter"), + W_("Valid OPTS for all BCJ filters:")); + e |= tuklib_wrapf(stdout, &wrap3, + "start=%s\v%s", + _("NUM"), + W_("start offset for conversions (default=0)")); #if defined(HAVE_ENCODER_DELTA) || defined(HAVE_DECODER_DELTA) - puts(_( -"\n" -" --delta[=OPTS] Delta filter; valid OPTS (valid values; default):\n" -" dist=NUM distance between bytes being subtracted\n" -" from each other (1-256; 1)")); + e |= tuklib_wrapf(stdout, &wrap2, + "\n" + "--delta[=%s]\v%s", + _("OPTS"), + W_("Delta filter; valid OPTS " + "(valid values; default):")); + e |= tuklib_wrapf(stdout, &wrap3, + "dist=%s\v%s \b(1-256; 1)\b", + _("NUM"), + W_("distance between bytes being subtracted " + "from each other")); #endif } - if (long_help) - puts(_("\n Other options:\n")); + if (long_help) { + putchar('\n'); + e |= tuklib_wraps(stdout, &wrap1, W_("Other options:")); + putchar('\n'); + } - puts(_( -" -q, --quiet suppress warnings; specify twice to suppress errors too\n" -" -v, --verbose be verbose; specify twice for even more verbose")); + e |= tuklib_wrapf(stdout, &wrap2, + "-q, --quiet\v%s\r" + "-v, --verbose\v%s", + W_("suppress warnings; specify twice to suppress errors too"), + W_("be verbose; specify twice for even more verbose")); if (long_help) { - puts(_( -" -Q, --no-warn make warnings not affect the exit status")); - puts(_( -" --robot use machine-parsable messages (useful for scripts)")); - puts(""); - puts(_( -" --info-memory display the total amount of RAM and the currently active\n" -" memory usage limits, and exit")); - puts(_( -" -h, --help display the short help (lists only the basic options)\n" -" -H, --long-help display this long help and exit")); + e |= tuklib_wrapf(stdout, &wrap2, + "-Q, --no-warn\v%s\r" + " --robot\v%s\r" + "\n" + " --info-memory\v%s\r" + "-h, --help\v%s\r" + "-H, --long-help\v%s", + W_("make warnings not affect the exit status"), + W_("use machine-parsable messages (useful for scripts)"), + W_("display the total amount of RAM and the currently active " + "memory usage limits, and exit"), + W_("display the short help (lists only the basic options)"), + W_("display this long help and exit")); } else { - puts(_( -" -h, --help display this short help and exit\n" -" -H, --long-help display the long help (lists also the advanced options)")); + e |= tuklib_wrapf(stdout, &wrap2, + "-h, --help\v%s\r" + "-H, --long-help\v%s", + W_("display this short help and exit"), + W_("display the long help (lists also the advanced options)")); } - puts(_( -" -V, --version display the version number and exit")); + e |= tuklib_wrapf(stdout, &wrap2, "-V, --version\v%s", + W_("display the version number and exit")); + + putchar('\n'); + e |= tuklib_wraps(stdout, &wrap0, + W_("With no FILE, or when FILE is -, read standard input.")); + putchar('\n'); - puts(_("\nWith no FILE, or when FILE is -, read standard input.\n")); + e |= tuklib_wrapf(stdout, &wrap0, + // TRANSLATORS: This message indicates the bug reporting + // address for this package. Please add another line saying + // "\nReport translation bugs to <...>." with the email or WWW + // address for translation bugs. Thanks! + W_("Report bugs to <%s> (in English or Finnish)."), + PACKAGE_BUGREPORT); - // TRANSLATORS: This message indicates the bug reporting address - // for this package. Please add _another line_ saying - // "Report translation bugs to <...>\n" with the email or WWW - // address for translation bugs. Thanks. - printf(_("Report bugs to <%s> (in English or Finnish).\n"), - PACKAGE_BUGREPORT); - printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL); + e |= tuklib_wrapf(stdout, &wrap0, + // TRANSLATORS: The first %s is the name of this software. + // The second <%s> is an URL. + W_("%s home page: <%s>"), PACKAGE_NAME, PACKAGE_URL); #if LZMA_VERSION_STABILITY != LZMA_VERSION_STABILITY_STABLE - puts(_( + e |= tuklib_wraps(stdout, &wrap0, W_( "THIS IS A DEVELOPMENT VERSION NOT INTENDED FOR PRODUCTION USE.")); #endif + detect_wrapping_errors(e); tuklib_exit(E_SUCCESS, E_ERROR, verbosity != V_SILENT); } extern void message_filters_help(void) { + static const struct tuklib_wrap_opt wrap = { .right_margin = 76 }; + char *encoder_options; if (lzma_str_list_filters(&encoder_options, LZMA_VLI_UNKNOWN, LZMA_STR_ENCODER, NULL) != LZMA_OK) message_bug(); if (!opt_robot) { - puts(_( -"Filter chains are set using the --filters=FILTERS or\n" -"--filters1=FILTERS ... --filters9=FILTERS options. Each filter in the chain\n" -"can be separated by spaces or '--'. Alternatively a preset <0-9>[e] can be\n" -"specified instead of a filter chain.\n" - )); - - puts(_("The supported filters and their options are:")); + int e = tuklib_wrapf(stdout, &wrap, +W_("Filter chains are set using the --filters=FILTERS or " +"--filters1=FILTERS ... --filters9=FILTERS options. " +"Each filter in the chain can be separated by spaces or '--'. " +"Alternatively a preset %s can be specified instead of a filter chain."), + "<0-9>[e]"); + putchar('\n'); + e |= tuklib_wraps(stdout, &wrap, + W_("The supported filters and their options are:")); + + detect_wrapping_errors(e); } puts(encoder_options); tuklib_exit(E_SUCCESS, E_ERROR, verbosity != V_SILENT); } diff --git a/src/xz/options.c b/src/xz/options.c index bc8bc1a6c36c..c4f56b495609 100644 --- a/src/xz/options.c +++ b/src/xz/options.c @@ -1,358 +1,361 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file options.c /// \brief Parser for filter-specific options // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "private.h" /////////////////// // Generic stuff // /////////////////// typedef struct { const char *name; uint64_t id; } name_id_map; typedef struct { const char *name; const name_id_map *map; uint64_t min; uint64_t max; } option_map; /// Parses option=value pairs that are separated with commas: /// opt=val,opt=val,opt=val /// /// Each option is a string, that is converted to an integer using the /// index where the option string is in the array. /// /// Value can be /// - a string-id map mapping a list of possible string values to integers /// (opts[i].map != NULL, opts[i].min and opts[i].max are ignored); /// - a number with minimum and maximum value limit /// (opts[i].map == NULL && opts[i].min != UINT64_MAX); /// - a string that will be parsed by the filter-specific code /// (opts[i].map == NULL && opts[i].min == UINT64_MAX, opts[i].max ignored) /// /// When parsing both option and value succeed, a filter-specific function /// is called, which should update the given value to filter-specific /// options structure. /// /// This returns only if no errors occur. /// /// \param str String containing the options from the command line /// \param opts Filter-specific option map /// \param set Filter-specific function to update filter_options /// \param filter_options Pointer to filter-specific options structure /// static void parse_options(const char *str, const option_map *opts, void (*set)(void *filter_options, unsigned key, uint64_t value, const char *valuestr), void *filter_options) { if (str == NULL || str[0] == '\0') return; char *s = xstrdup(str); char *name = s; while (*name != '\0') { if (*name == ',') { ++name; continue; } char *split = strchr(name, ','); if (split != NULL) *split = '\0'; char *value = strchr(name, '='); if (value != NULL) *value++ = '\0'; if (value == NULL || value[0] == '\0') - message_fatal(_("%s: Options must be 'name=value' " - "pairs separated with commas"), str); + message_fatal(_("%s: %s"), tuklib_mask_nonprint(str), + _("Options must be 'name=value' " + "pairs separated with commas")); // Look for the option name from the option map. unsigned i = 0; while (true) { if (opts[i].name == NULL) message_fatal(_("%s: Invalid option name"), - name); + tuklib_mask_nonprint(name)); if (strcmp(name, opts[i].name) == 0) break; ++i; } // Option was found from the map. See how we should handle it. if (opts[i].map != NULL) { // value is a string which we should map // to an integer. unsigned j; for (j = 0; opts[i].map[j].name != NULL; ++j) { if (strcmp(opts[i].map[j].name, value) == 0) break; } if (opts[i].map[j].name == NULL) - message_fatal(_("%s: Invalid option value"), - value); + message_fatal(_("%s: %s"), + tuklib_mask_nonprint(value), + _("Invalid option value")); set(filter_options, i, opts[i].map[j].id, value); } else if (opts[i].min == UINT64_MAX) { // value is a special string that will be // parsed by set(). set(filter_options, i, 0, value); } else { // value is an integer. const uint64_t v = str_to_uint64(name, value, opts[i].min, opts[i].max); set(filter_options, i, v, value); } // Check if it was the last option. if (split == NULL) break; name = split + 1; } free(s); return; } /////////// // Delta // /////////// enum { OPT_DIST, }; static void set_delta(void *options, unsigned key, uint64_t value, const char *valuestr lzma_attribute((__unused__))) { lzma_options_delta *opt = options; switch (key) { case OPT_DIST: opt->dist = value; break; } } extern lzma_options_delta * options_delta(const char *str) { static const option_map opts[] = { { "dist", NULL, LZMA_DELTA_DIST_MIN, LZMA_DELTA_DIST_MAX }, { NULL, NULL, 0, 0 } }; lzma_options_delta *options = xmalloc(sizeof(lzma_options_delta)); *options = (lzma_options_delta){ // It's hard to give a useful default for this. .type = LZMA_DELTA_TYPE_BYTE, .dist = LZMA_DELTA_DIST_MIN, }; parse_options(str, opts, &set_delta, options); return options; } ///////// // BCJ // ///////// enum { OPT_START_OFFSET, }; static void set_bcj(void *options, unsigned key, uint64_t value, const char *valuestr lzma_attribute((__unused__))) { lzma_options_bcj *opt = options; switch (key) { case OPT_START_OFFSET: opt->start_offset = value; break; } } extern lzma_options_bcj * options_bcj(const char *str) { static const option_map opts[] = { { "start", NULL, 0, UINT32_MAX }, { NULL, NULL, 0, 0 } }; lzma_options_bcj *options = xmalloc(sizeof(lzma_options_bcj)); *options = (lzma_options_bcj){ .start_offset = 0, }; parse_options(str, opts, &set_bcj, options); return options; } ////////// // LZMA // ////////// enum { OPT_PRESET, OPT_DICT, OPT_LC, OPT_LP, OPT_PB, OPT_MODE, OPT_NICE, OPT_MF, OPT_DEPTH, }; tuklib_attr_noreturn static void error_lzma_preset(const char *valuestr) { - message_fatal(_("Unsupported LZMA1/LZMA2 preset: %s"), valuestr); + message_fatal(_("Unsupported LZMA1/LZMA2 preset: %s"), + tuklib_mask_nonprint(valuestr)); } static void set_lzma(void *options, unsigned key, uint64_t value, const char *valuestr) { lzma_options_lzma *opt = options; switch (key) { case OPT_PRESET: { if (valuestr[0] < '0' || valuestr[0] > '9') error_lzma_preset(valuestr); uint32_t preset = (uint32_t)(valuestr[0] - '0'); // Currently only "e" is supported as a modifier, // so keep this simple for now. if (valuestr[1] != '\0') { if (valuestr[1] == 'e') preset |= LZMA_PRESET_EXTREME; else error_lzma_preset(valuestr); if (valuestr[2] != '\0') error_lzma_preset(valuestr); } if (lzma_lzma_preset(options, preset)) error_lzma_preset(valuestr); break; } case OPT_DICT: opt->dict_size = value; break; case OPT_LC: opt->lc = value; break; case OPT_LP: opt->lp = value; break; case OPT_PB: opt->pb = value; break; case OPT_MODE: opt->mode = value; break; case OPT_NICE: opt->nice_len = value; break; case OPT_MF: opt->mf = value; break; case OPT_DEPTH: opt->depth = value; break; } } extern lzma_options_lzma * options_lzma(const char *str) { static const name_id_map modes[] = { { "fast", LZMA_MODE_FAST }, { "normal", LZMA_MODE_NORMAL }, { NULL, 0 } }; static const name_id_map mfs[] = { { "hc3", LZMA_MF_HC3 }, { "hc4", LZMA_MF_HC4 }, { "bt2", LZMA_MF_BT2 }, { "bt3", LZMA_MF_BT3 }, { "bt4", LZMA_MF_BT4 }, { NULL, 0 } }; static const option_map opts[] = { { "preset", NULL, UINT64_MAX, 0 }, { "dict", NULL, LZMA_DICT_SIZE_MIN, (UINT32_C(1) << 30) + (UINT32_C(1) << 29) }, { "lc", NULL, LZMA_LCLP_MIN, LZMA_LCLP_MAX }, { "lp", NULL, LZMA_LCLP_MIN, LZMA_LCLP_MAX }, { "pb", NULL, LZMA_PB_MIN, LZMA_PB_MAX }, { "mode", modes, 0, 0 }, { "nice", NULL, 2, 273 }, { "mf", mfs, 0, 0 }, { "depth", NULL, 0, UINT32_MAX }, { NULL, NULL, 0, 0 } }; lzma_options_lzma *options = xmalloc(sizeof(lzma_options_lzma)); if (lzma_lzma_preset(options, LZMA_PRESET_DEFAULT)) message_bug(); parse_options(str, opts, &set_lzma, options); if (options->lc + options->lp > LZMA_LCLP_MAX) message_fatal(_("The sum of lc and lp must not exceed 4")); return options; } diff --git a/src/xz/private.h b/src/xz/private.h index b370472e32c8..d351a995eec4 100644 --- a/src/xz/private.h +++ b/src/xz/private.h @@ -1,80 +1,81 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file private.h /// \brief Common includes, definitions, and prototypes // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "sysdefs.h" #include "mythread.h" #include "lzma.h" #include #include #include #include #include #include #ifndef _MSC_VER # include #endif #include "tuklib_gettext.h" #include "tuklib_progname.h" #include "tuklib_exit.h" +#include "tuklib_mbstr_nonprint.h" #include "tuklib_mbstr.h" #if defined(_WIN32) && !defined(__CYGWIN__) # define WIN32_LEAN_AND_MEAN # include #endif #ifdef _MSC_VER # define fileno _fileno #endif #ifndef STDIN_FILENO # define STDIN_FILENO (fileno(stdin)) #endif #ifndef STDOUT_FILENO # define STDOUT_FILENO (fileno(stdout)) #endif #ifndef STDERR_FILENO # define STDERR_FILENO (fileno(stderr)) #endif // Handling SIGTSTP keeps time-keeping for progress indicator correct // if xz is stopped. It requires use of clock_gettime() as that is // async-signal safe in POSIX. Require also SIGALRM support since // on systems where SIGALRM isn't available, progress indicator code // polls the time and the SIGTSTP handling adds slight overhead to // that code. Most (all?) systems that have SIGTSTP also have SIGALRM // so this requirement won't exclude many systems. #if defined(HAVE_CLOCK_GETTIME) && defined(SIGTSTP) && defined(SIGALRM) # define USE_SIGTSTP_HANDLER 1 #endif #include "main.h" #include "mytime.h" #include "coder.h" #include "message.h" #include "args.h" #include "hardware.h" #include "file_io.h" #include "options.h" #include "sandbox.h" #include "signals.h" #include "suffix.h" #include "util.h" #ifdef HAVE_DECODERS # include "list.h" #endif diff --git a/src/xz/sandbox.c b/src/xz/sandbox.c index 5bd227370751..f5576960d9aa 100644 --- a/src/xz/sandbox.c +++ b/src/xz/sandbox.c @@ -1,355 +1,311 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file sandbox.c /// \brief Sandbox support // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "private.h" #ifndef ENABLE_SANDBOX // Prevent an empty translation unit when no sandboxing is supported. typedef int dummy; #else /// If the conditions for strict sandboxing (described in main()) /// have been met, sandbox_allow_strict() can be called to set this /// variable to true. static bool strict_sandbox_allowed = false; extern void sandbox_allow_strict(void) { strict_sandbox_allowed = true; return; } // Strict sandboxing prevents opening any files. This *tries* to ensure // that any auxiliary files that might be required are already open. // // Returns true if strict sandboxing is allowed, false otherwise. static bool prepare_for_strict_sandbox(void) { if (!strict_sandbox_allowed) return false; const char dummy_str[] = "x"; // Try to ensure that both libc and xz locale files have been // loaded when NLS is enabled. snprintf(NULL, 0, "%s%s", _(dummy_str), strerror(EINVAL)); // Try to ensure that iconv data files needed for handling multibyte // characters have been loaded. This is needed at least with glibc. tuklib_mbstr_width(dummy_str, NULL); return true; } #endif #if defined(HAVE_PLEDGE) /////////////// // pledge(2) // /////////////// #include extern void sandbox_init(void) { if (pledge("stdio rpath wpath cpath fattr", "")) { // gettext hasn't been initialized yet so // there's no point to call it here. message_fatal("Failed to enable the sandbox"); } return; } extern void sandbox_enable_read_only(void) { // We will be opening files for reading but // won't create or remove any files. if (pledge("stdio rpath", "")) message_fatal(_("Failed to enable the sandbox")); return; } extern void sandbox_enable_strict_if_allowed(int src_fd lzma_attribute((__unused__)), int pipe_event_fd lzma_attribute((__unused__)), int pipe_write_fd lzma_attribute((__unused__))) { if (!prepare_for_strict_sandbox()) return; // All files that need to be opened have already been opened. if (pledge("stdio", "")) message_fatal(_("Failed to enable the sandbox")); return; } #elif defined(HAVE_LINUX_LANDLOCK) ////////////// // Landlock // ////////////// -#include -#include -#include - - -// Highest Landlock ABI version supported by this file: -// - For ABI versions 1-3 we don't need anything from -// that isn't part of version 1. -// - For ABI version 4 we need the larger struct landlock_ruleset_attr -// with the handled_access_net member. That is bundled with the macros -// LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP. -#ifdef LANDLOCK_ACCESS_NET_BIND_TCP -# define LANDLOCK_ABI_MAX 4 -#else -# define LANDLOCK_ABI_MAX 3 -#endif - - -/// Landlock ABI version supported by the kernel -static int landlock_abi; +#include "my_landlock.h" // The required_rights should have those bits set that must not be restricted. // This function will then bitwise-and ~required_rights with a mask matching // the Landlock ABI version, leaving only those bits set that are supported // by the ABI and allowed to be restricted by the function argument. static void enable_landlock(uint64_t required_rights) { - assert(landlock_abi <= LANDLOCK_ABI_MAX); - - if (landlock_abi <= 0) + // Initialize the ruleset to forbid all actions that the available + // Landlock ABI version supports. Return if Landlock isn't supported + // at all. + struct landlock_ruleset_attr attr; + if (my_landlock_ruleset_attr_forbid_all(&attr) == -1) return; - // We want to set all supported flags in handled_access_fs. - // This way the ruleset will initially forbid access to all - // actions that the available Landlock ABI version supports. - // Exceptions can be added using landlock_add_rule(2) to - // allow certain actions on certain files or directories. - // - // The same flag values are used on all archs. ABI v2 and v3 - // both add one new flag. - // - // First in ABI v1: LANDLOCK_ACCESS_FS_EXECUTE = 1ULL << 0 - // Last in ABI v1: LANDLOCK_ACCESS_FS_MAKE_SYM = 1ULL << 12 - // Last in ABI v2: LANDLOCK_ACCESS_FS_REFER = 1ULL << 13 - // Last in ABI v3: LANDLOCK_ACCESS_FS_TRUNCATE = 1ULL << 14 - // - // This makes it simple to set the mask based on the ABI - // version and we don't need to care which flags are #defined - // in the installed for ABI versions 1-3. - const struct landlock_ruleset_attr attr = { - .handled_access_fs = ~required_rights - & ((1ULL << (12 + my_min(3, landlock_abi))) - 1), -#if LANDLOCK_ABI_MAX >= 4 - .handled_access_net = landlock_abi < 4 ? 0 : - (LANDLOCK_ACCESS_NET_BIND_TCP - | LANDLOCK_ACCESS_NET_CONNECT_TCP), -#endif - }; + // Allow the required rights. + attr.handled_access_fs &= ~required_rights; - const int ruleset_fd = syscall(SYS_landlock_create_ruleset, - &attr, sizeof(attr), 0U); + // Create the ruleset in the kernel. This shouldn't fail. + const int ruleset_fd = my_landlock_create_ruleset( + &attr, sizeof(attr), 0); if (ruleset_fd < 0) message_fatal(_("Failed to enable the sandbox")); // All files we need should have already been opened. Thus, // we don't need to add any rules using landlock_add_rule(2) // before activating the sandbox. // // NOTE: It's possible that the hack prepare_for_strict_sandbox() // isn't be good enough. It tries to get translations and // libc-specific files loaded but if it's not good enough // then perhaps a Landlock rule to allow reading from /usr // and/or the xz installation prefix would be needed. // // prctl(PR_SET_NO_NEW_PRIVS, ...) was already called in // sandbox_init() so we don't do it here again. - if (syscall(SYS_landlock_restrict_self, ruleset_fd, 0U) != 0) + if (my_landlock_restrict_self(ruleset_fd, 0) != 0) message_fatal(_("Failed to enable the sandbox")); + (void)close(ruleset_fd); return; } extern void sandbox_init(void) { // Prevent the process from gaining new privileges. This must be done // before landlock_restrict_self(2) but since we will never need new // privileges, this call can be done here already. // // This is supported since Linux 3.5. Ignore the return value to // keep compatibility with old kernels. landlock_restrict_self(2) // will fail if the no_new_privs attribute isn't set, thus if prctl() // fails here the error will still be detected when it matters. (void)prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); - // Get the highest Landlock ABI version supported by the kernel. - landlock_abi = syscall(SYS_landlock_create_ruleset, - (void *)NULL, 0, LANDLOCK_CREATE_RULESET_VERSION); - - // The kernel might support a newer ABI than this file. - if (landlock_abi > LANDLOCK_ABI_MAX) - landlock_abi = LANDLOCK_ABI_MAX; - // These are all in ABI version 1 already. We don't need truncate // rights because files are created with open() using O_EXCL and // without O_TRUNC. // - // LANDLOCK_ACCESS_FS_READ_DIR is included here to get a clear error + // LANDLOCK_ACCESS_FS_READ_DIR is required to synchronize the + // directory before removing the source file. + // + // LANDLOCK_ACCESS_FS_READ_DIR is also helpful to show a clear error // message if xz is given a directory name. Without this permission // the message would be "Permission denied" but with this permission // it's "Is a directory, skipping". It could be worked around with // stat()/lstat() but just giving this permission is simpler and // shouldn't make the sandbox much weaker in practice. const uint64_t required_rights = LANDLOCK_ACCESS_FS_WRITE_FILE | LANDLOCK_ACCESS_FS_READ_FILE | LANDLOCK_ACCESS_FS_READ_DIR | LANDLOCK_ACCESS_FS_REMOVE_FILE | LANDLOCK_ACCESS_FS_MAKE_REG; enable_landlock(required_rights); return; } extern void sandbox_enable_read_only(void) { // We will be opening files for reading but // won't create or remove any files. const uint64_t required_rights = LANDLOCK_ACCESS_FS_READ_FILE | LANDLOCK_ACCESS_FS_READ_DIR; enable_landlock(required_rights); return; } extern void sandbox_enable_strict_if_allowed(int src_fd lzma_attribute((__unused__)), int pipe_event_fd lzma_attribute((__unused__)), int pipe_write_fd lzma_attribute((__unused__))) { if (!prepare_for_strict_sandbox()) return; // Allow all restrictions that the kernel supports with the // highest Landlock ABI version that the kernel or xz supports. // // NOTE: LANDLOCK_ACCESS_FS_READ_DIR isn't needed here because // the only input file has already been opened. enable_landlock(0); return; } #elif defined(HAVE_CAP_RIGHTS_LIMIT) ////////////// // Capsicum // ////////////// #include extern void sandbox_init(void) { // Nothing to do. return; } extern void sandbox_enable_read_only(void) { // Nothing to do. return; } extern void sandbox_enable_strict_if_allowed( int src_fd, int pipe_event_fd, int pipe_write_fd) { if (!prepare_for_strict_sandbox()) return; // Capsicum needs FreeBSD 10.2 or later. cap_rights_t rights; if (cap_enter()) goto error; if (cap_rights_limit(src_fd, cap_rights_init(&rights, CAP_EVENT, CAP_FCNTL, CAP_LOOKUP, CAP_READ, CAP_SEEK))) goto error; // If not reading from stdin, remove all capabilities from it. if (src_fd != STDIN_FILENO && cap_rights_limit( STDIN_FILENO, cap_rights_clear(&rights))) goto error; if (cap_rights_limit(STDOUT_FILENO, cap_rights_init(&rights, CAP_EVENT, CAP_FCNTL, CAP_FSTAT, CAP_LOOKUP, CAP_WRITE, CAP_SEEK))) goto error; if (cap_rights_limit(STDERR_FILENO, cap_rights_init(&rights, CAP_WRITE))) goto error; if (cap_rights_limit(pipe_event_fd, cap_rights_init(&rights, CAP_EVENT))) goto error; if (cap_rights_limit(pipe_write_fd, cap_rights_init(&rights, CAP_WRITE))) goto error; return; error: // If a kernel is configured without capability mode support or // used in an emulator that does not implement the capability // system calls, then the Capsicum system calls will fail and set // errno to ENOSYS. In that case xz will silently run without // the sandbox. if (errno == ENOSYS) return; message_fatal(_("Failed to enable the sandbox")); } #endif diff --git a/src/xz/suffix.c b/src/xz/suffix.c index 1d548e485b8c..2fd4c7fc9573 100644 --- a/src/xz/suffix.c +++ b/src/xz/suffix.c @@ -1,406 +1,408 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file suffix.c /// \brief Checks filename suffix and creates the destination filename // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "private.h" #ifdef __DJGPP__ # include #endif // For case-insensitive filename suffix on case-insensitive systems #if defined(TUKLIB_DOSLIKE) || defined(__VMS) # ifdef HAVE_STRINGS_H # include # endif # ifdef _MSC_VER # define suffix_strcmp _stricmp # else # define suffix_strcmp strcasecmp # endif #else # define suffix_strcmp strcmp #endif static char *custom_suffix = NULL; /// \brief Test if the char is a directory separator static bool is_dir_sep(char c) { #ifdef TUKLIB_DOSLIKE return c == '/' || c == '\\' || c == ':'; #else return c == '/'; #endif } /// \brief Test if the string contains a directory separator static bool has_dir_sep(const char *str) { #ifdef TUKLIB_DOSLIKE return strpbrk(str, "/\\:") != NULL; #else return strchr(str, '/') != NULL; #endif } #ifdef __DJGPP__ /// \brief Test for special suffix used for 8.3 short filenames (SFN) /// /// \return If str matches *.?- or *.??-, true is returned. Otherwise /// false is returned. static bool has_sfn_suffix(const char *str, size_t len) { if (len >= 4 && str[len - 1] == '-' && str[len - 2] != '.' && !is_dir_sep(str[len - 2])) { // *.?- if (str[len - 3] == '.') return !is_dir_sep(str[len - 4]); // *.??- if (len >= 5 && !is_dir_sep(str[len - 3]) && str[len - 4] == '.') return !is_dir_sep(str[len - 5]); } return false; } #endif /// \brief Checks if src_name has given compressed_suffix /// /// \param suffix Filename suffix to look for /// \param src_name Input filename /// \param src_len strlen(src_name) /// /// \return If src_name has the suffix, src_len - strlen(suffix) is /// returned. It's always a positive integer. Otherwise zero /// is returned. static size_t test_suffix(const char *suffix, const char *src_name, size_t src_len) { const size_t suffix_len = strlen(suffix); // The filename must have at least one character in addition to // the suffix. src_name may contain path to the filename, so we // need to check for directory separator too. if (src_len <= suffix_len || is_dir_sep(src_name[src_len - suffix_len - 1])) return 0; if (suffix_strcmp(suffix, src_name + src_len - suffix_len) == 0) return src_len - suffix_len; return 0; } /// \brief Removes the filename suffix of the compressed file /// /// \return Name of the uncompressed file, or NULL if file has unknown /// suffix. static char * uncompressed_name(const char *src_name, const size_t src_len) { static const struct { const char *compressed; const char *uncompressed; } suffixes[] = { { ".xz", "" }, { ".txz", ".tar" }, // .txz abbreviation for .txt.gz is rare. { ".lzma", "" }, #ifdef __DJGPP__ { ".lzm", "" }, #endif { ".tlz", ".tar" }, // Both .tar.lzma and .tar.lz #ifdef HAVE_LZIP_DECODER { ".lz", "" }, #endif }; const char *new_suffix = ""; size_t new_len = 0; if (opt_format != FORMAT_RAW) { for (size_t i = 0; i < ARRAY_SIZE(suffixes); ++i) { new_len = test_suffix(suffixes[i].compressed, src_name, src_len); if (new_len != 0) { new_suffix = suffixes[i].uncompressed; break; } } #ifdef __DJGPP__ // Support also *.?- -> *.? and *.??- -> *.?? on DOS. // This is done also when long filenames are available // to keep it easy to decompress files created when // long filename support wasn't available. if (new_len == 0 && has_sfn_suffix(src_name, src_len)) { new_suffix = ""; new_len = src_len - 1; } #endif } if (new_len == 0 && custom_suffix != NULL) new_len = test_suffix(custom_suffix, src_name, src_len); if (new_len == 0) { message_warning(_("%s: Filename has an unknown suffix, " - "skipping"), src_name); + "skipping"), tuklib_mask_nonprint(src_name)); return NULL; } const size_t new_suffix_len = strlen(new_suffix); char *dest_name = xmalloc(new_len + new_suffix_len + 1); memcpy(dest_name, src_name, new_len); memcpy(dest_name + new_len, new_suffix, new_suffix_len); dest_name[new_len + new_suffix_len] = '\0'; return dest_name; } -/// This message is needed in multiple places in compressed_name(), -/// so the message has been put into its own function. static void msg_suffix(const char *src_name, const char *suffix) { + char *mem = NULL; message_warning(_("%s: File already has '%s' suffix, skipping"), - src_name, suffix); + tuklib_mask_nonprint(src_name), + tuklib_mask_nonprint_r(suffix, &mem)); + free(mem); return; } /// \brief Appends suffix to src_name /// /// In contrast to uncompressed_name(), we check only suffixes that are valid /// for the specified file format. static char * compressed_name(const char *src_name, size_t src_len) { // The order of these must match the order in args.h. static const char *const all_suffixes[][4] = { { ".xz", ".txz", NULL }, { ".lzma", #ifdef __DJGPP__ ".lzm", #endif ".tlz", NULL #ifdef HAVE_LZIP_DECODER // This is needed to keep the table indexing in sync with // enum format_type from coder.h. }, { /* ".lz", */ NULL #endif }, { // --format=raw requires specifying the suffix // manually or using stdout. NULL } }; // args.c ensures these. assert(opt_format != FORMAT_AUTO); #ifdef HAVE_LZIP_DECODER assert(opt_format != FORMAT_LZIP); #endif const size_t format = opt_format - 1; const char *const *suffixes = all_suffixes[format]; // Look for known filename suffixes and refuse to compress them. for (size_t i = 0; suffixes[i] != NULL; ++i) { if (test_suffix(suffixes[i], src_name, src_len) != 0) { msg_suffix(src_name, suffixes[i]); return NULL; } } #ifdef __DJGPP__ // Recognize also the special suffix that is used when long // filename (LFN) support isn't available. This suffix is // recognized on LFN systems too. if (opt_format == FORMAT_XZ && has_sfn_suffix(src_name, src_len)) { msg_suffix(src_name, "-"); return NULL; } #endif if (custom_suffix != NULL) { if (test_suffix(custom_suffix, src_name, src_len) != 0) { msg_suffix(src_name, custom_suffix); return NULL; } } const char *suffix = custom_suffix != NULL ? custom_suffix : suffixes[0]; size_t suffix_len = strlen(suffix); #ifdef __DJGPP__ if (!_use_lfn(src_name)) { // Long filename (LFN) support isn't available and we are // limited to 8.3 short filenames (SFN). // // Look for suffix separator from the filename, and make sure // that it is in the filename, not in a directory name. const char *sufsep = strrchr(src_name, '.'); if (sufsep == NULL || sufsep[1] == '\0' || has_dir_sep(sufsep)) { // src_name has no filename extension. // // Examples: // xz foo -> foo.xz // xz -F lzma foo -> foo.lzm // xz -S x foo -> foox // xz -S x foo. -> foo.x // xz -S x.y foo -> foox.y // xz -S .x foo -> foo.x // xz -S .x foo. -> foo.x // // Avoid double dots: if (sufsep != NULL && sufsep[1] == '\0' && suffix[0] == '.') --src_len; } else if (custom_suffix == NULL && strcasecmp(sufsep, ".tar") == 0) { // ".tar" is handled specially. // // Examples: // xz foo.tar -> foo.txz // xz -F lzma foo.tar -> foo.tlz static const char *const tar_suffixes[] = { ".txz", // .tar.xz ".tlz", // .tar.lzma /* ".tlz", // .tar.lz */ }; suffix = tar_suffixes[format]; suffix_len = 4; src_len -= 4; } else { if (custom_suffix == NULL && opt_format == FORMAT_XZ) { // Instead of the .xz suffix, use a single // character at the end of the filename // extension. This is to minimize name // conflicts when compressing multiple files // with the same basename. E.g. foo.txt and // foo.exe become foo.tx- and foo.ex-. Dash // is rare as the last character of the // filename extension, so it seems to be // quite safe choice and it stands out better // in directory listings than e.g. x. For // comparison, gzip uses z. suffix = "-"; suffix_len = 1; } if (suffix[0] == '.') { // The first character of the suffix is a dot. // Throw away the original filename extension // and replace it with the new suffix. // // Examples: // xz -F lzma foo.txt -> foo.lzm // xz -S .x foo.txt -> foo.x src_len = sufsep - src_name; } else { // The first character of the suffix is not // a dot. Preserve the first 0-2 characters // of the original filename extension. // // Examples: // xz foo.txt -> foo.tx- // xz -S x foo.c -> foo.cx // xz -S ab foo.c -> foo.cab // xz -S ab foo.txt -> foo.tab // xz -S abc foo.txt -> foo.abc // // Truncate the suffix to three chars: if (suffix_len > 3) suffix_len = 3; // If needed, overwrite 1-3 characters. if (strlen(sufsep) > 4 - suffix_len) src_len = sufsep - src_name + 4 - suffix_len; } } } #endif char *dest_name = xmalloc(src_len + suffix_len + 1); memcpy(dest_name, src_name, src_len); memcpy(dest_name + src_len, suffix, suffix_len); dest_name[src_len + suffix_len] = '\0'; return dest_name; } extern char * suffix_get_dest_name(const char *src_name) { assert(src_name != NULL); // Length of the name is needed in all cases to locate the end of // the string to compare the suffix, so calculate the length here. const size_t src_len = strlen(src_name); return opt_mode == MODE_COMPRESS ? compressed_name(src_name, src_len) : uncompressed_name(src_name, src_len); } extern void suffix_set(const char *suffix) { // Empty suffix and suffixes having a directory separator are // rejected. Such suffixes would break things later. if (suffix[0] == '\0' || has_dir_sep(suffix)) - message_fatal(_("%s: Invalid filename suffix"), suffix); + message_fatal(_("%s: Invalid filename suffix"), + tuklib_mask_nonprint(suffix)); // Replace the old custom_suffix (if any) with the new suffix. free(custom_suffix); custom_suffix = xstrdup(suffix); return; } extern bool suffix_is_set(void) { return custom_suffix != NULL; } diff --git a/src/xz/util.c b/src/xz/util.c index 0d339aede675..e5485beef80d 100644 --- a/src/xz/util.c +++ b/src/xz/util.c @@ -1,307 +1,311 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file util.c /// \brief Miscellaneous utility functions // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "private.h" #include /// Buffers for uint64_to_str() and uint64_to_nicestr() static char bufs[4][128]; // Thousand separator support in uint64_to_str() and uint64_to_nicestr(): // // DJGPP 2.05 added support for thousands separators but it's broken // at least under WinXP with Finnish locale that uses a non-breaking space // as the thousands separator. Workaround by disabling thousands separators // for DJGPP builds. // // MSVC doesn't support thousand separators. -#if defined(__DJGPP__) || defined(_MSC_VER) +// +// MinGW-w64 supports thousand separators only with its own stdio functions +// which our sysdefs.h disables when _UCRT && HAVE_SMALL. +#if defined(__DJGPP__) || defined(_MSC_VER) \ + || (defined(__MINGW32__) && __USE_MINGW_ANSI_STDIO == 0) # define FORMAT_THOUSAND_SEP(prefix, suffix) prefix suffix # define check_thousand_sep(slot) do { } while (0) #else # define FORMAT_THOUSAND_SEP(prefix, suffix) ((thousand == WORKS) \ ? prefix "'" suffix \ : prefix suffix) static enum { UNKNOWN, WORKS, BROKEN } thousand = UNKNOWN; /// Check if thousands separator is supported. Run-time checking is easiest /// because it seems to be sometimes lacking even on a POSIXish system. /// Note that trying to use thousands separators when snprintf() doesn't /// support them results in undefined behavior. This just has happened to /// work well enough in practice. /// /// This must be called before using the FORMAT_THOUSAND_SEP macro. static void check_thousand_sep(uint32_t slot) { if (thousand == UNKNOWN) { bufs[slot][0] = '\0'; snprintf(bufs[slot], sizeof(bufs[slot]), "%'u", 1U); thousand = bufs[slot][0] == '1' ? WORKS : BROKEN; } return; } #endif extern void * xrealloc(void *ptr, size_t size) { assert(size > 0); // Save ptr so that we can free it if realloc fails. // The point is that message_fatal ends up calling stdio functions // which in some libc implementations might allocate memory from // the heap. Freeing ptr improves the chances that there's free // memory for stdio functions if they need it. void *p = ptr; ptr = realloc(ptr, size); if (ptr == NULL) { const int saved_errno = errno; free(p); message_fatal("%s", strerror(saved_errno)); } return ptr; } extern char * xstrdup(const char *src) { assert(src != NULL); const size_t size = strlen(src) + 1; char *dest = xmalloc(size); return memcpy(dest, src, size); } extern uint64_t str_to_uint64(const char *name, const char *value, uint64_t min, uint64_t max) { uint64_t result = 0; // Skip blanks. while (*value == ' ' || *value == '\t') ++value; // Accept special value "max". Supporting "min" doesn't seem useful. if (strcmp(value, "max") == 0) return max; if (*value < '0' || *value > '9') - message_fatal(_("%s: Value is not a non-negative " - "decimal integer"), value); + message_fatal(_("%s: %s"), value, + _("Value is not a non-negative decimal integer")); do { // Don't overflow. if (result > UINT64_MAX / 10) goto error; result *= 10; // Another overflow check const uint32_t add = (uint32_t)(*value - '0'); if (UINT64_MAX - add < result) goto error; result += add; ++value; } while (*value >= '0' && *value <= '9'); if (*value != '\0') { // Look for suffix. Originally this supported both base-2 // and base-10, but since there seems to be little need // for base-10 in this program, treat everything as base-2 // and also be more relaxed about the case of the first // letter of the suffix. uint64_t multiplier = 0; if (*value == 'k' || *value == 'K') multiplier = UINT64_C(1) << 10; else if (*value == 'm' || *value == 'M') multiplier = UINT64_C(1) << 20; else if (*value == 'g' || *value == 'G') multiplier = UINT64_C(1) << 30; ++value; // Allow also e.g. Ki, KiB, and KB. if (*value != '\0' && strcmp(value, "i") != 0 && strcmp(value, "iB") != 0 && strcmp(value, "B") != 0) multiplier = 0; if (multiplier == 0) { message(V_ERROR, _("%s: Invalid multiplier suffix"), value - 1); message_fatal(_("Valid suffixes are 'KiB' (2^10), " "'MiB' (2^20), and 'GiB' (2^30).")); } // Don't overflow here either. if (result > UINT64_MAX / multiplier) goto error; result *= multiplier; } if (result < min || result > max) goto error; return result; error: message_fatal(_("Value of the option '%s' must be in the range " "[%" PRIu64 ", %" PRIu64 "]"), name, min, max); } extern uint64_t round_up_to_mib(uint64_t n) { return (n >> 20) + ((n & ((UINT32_C(1) << 20) - 1)) != 0); } extern const char * uint64_to_str(uint64_t value, uint32_t slot) { assert(slot < ARRAY_SIZE(bufs)); check_thousand_sep(slot); snprintf(bufs[slot], sizeof(bufs[slot]), FORMAT_THOUSAND_SEP("%", PRIu64), value); return bufs[slot]; } extern const char * uint64_to_nicestr(uint64_t value, enum nicestr_unit unit_min, enum nicestr_unit unit_max, bool always_also_bytes, uint32_t slot) { assert(unit_min <= unit_max); assert(unit_max <= NICESTR_TIB); assert(slot < ARRAY_SIZE(bufs)); check_thousand_sep(slot); enum nicestr_unit unit = NICESTR_B; char *pos = bufs[slot]; size_t left = sizeof(bufs[slot]); if ((unit_min == NICESTR_B && value < 10000) || unit_max == NICESTR_B) { // The value is shown as bytes. my_snprintf(&pos, &left, FORMAT_THOUSAND_SEP("%", "u"), (unsigned int)value); } else { // Scale the value to a nicer unit. Unless unit_min and // unit_max limit us, we will show at most five significant // digits with one decimal place. double d = (double)(value); do { d /= 1024.0; ++unit; } while (unit < unit_min || (d > 9999.9 && unit < unit_max)); my_snprintf(&pos, &left, FORMAT_THOUSAND_SEP("%", ".1f"), d); } static const char suffix[5][4] = { "B", "KiB", "MiB", "GiB", "TiB" }; my_snprintf(&pos, &left, " %s", suffix[unit]); if (always_also_bytes && value >= 10000) snprintf(pos, left, FORMAT_THOUSAND_SEP(" (%", PRIu64 " B)"), value); return bufs[slot]; } extern void my_snprintf(char **pos, size_t *left, const char *fmt, ...) { va_list ap; va_start(ap, fmt); const int len = vsnprintf(*pos, *left, fmt, ap); va_end(ap); // If an error occurred, we want the caller to think that the whole // buffer was used. This way no more data will be written to the // buffer. We don't need better error handling here, although it // is possible that the result looks garbage on the terminal if // e.g. an UTF-8 character gets split. That shouldn't (easily) // happen though, because the buffers used have some extra room. if (len < 0 || (size_t)(len) >= *left) { *left = 0; } else { *pos += len; *left -= (size_t)(len); } return; } extern bool is_tty(int fd) { #if defined(_WIN32) && !defined(__CYGWIN__) // There is no need to check if handle == INVALID_HANDLE_VALUE // because it will return false anyway when used in GetConsoleMode(). // The resulting HANDLE is owned by the file descriptor. // The HANDLE must not be closed here. intptr_t handle = _get_osfhandle(fd); DWORD mode; // GetConsoleMode() is an easy way to tell if the HANDLE is a // console or not. We do not care about the value of mode since we // do not plan to use any further Windows console functions. return GetConsoleMode((HANDLE)handle, &mode); #else return isatty(fd); #endif } extern bool is_tty_stdin(void) { const bool ret = is_tty(STDIN_FILENO); if (ret) message_error(_("Compressed data cannot be read from " "a terminal")); return ret; } extern bool is_tty_stdout(void) { const bool ret = is_tty(STDOUT_FILENO); if (ret) message_error(_("Compressed data cannot be written to " "a terminal")); return ret; } diff --git a/src/xz/xz.1 b/src/xz/xz.1 index 5b880e81e8c2..0bc30a9af384 100644 --- a/src/xz/xz.1 +++ b/src/xz/xz.1 @@ -1,3183 +1,3271 @@ '\" t .\" SPDX-License-Identifier: 0BSD .\" .\" Authors: Lasse Collin .\" Jia Tan .\" -.TH XZ 1 "2024-04-08" "Tukaani" "XZ Utils" +.TH XZ 1 "2025-03-08" "Tukaani" "XZ Utils" . .SH NAME xz, unxz, xzcat, lzma, unlzma, lzcat \- Compress or decompress .xz and .lzma files . .SH SYNOPSIS .B xz .RI [ option... ] .RI [ file... ] . .SH COMMAND ALIASES .B unxz is equivalent to .BR "xz \-\-decompress" . .br .B xzcat is equivalent to .BR "xz \-\-decompress \-\-stdout" . .br .B lzma is equivalent to .BR "xz \-\-format=lzma" . .br .B unlzma is equivalent to .BR "xz \-\-format=lzma \-\-decompress" . .br .B lzcat is equivalent to .BR "xz \-\-format=lzma \-\-decompress \-\-stdout" . .PP When writing scripts that need to decompress files, it is recommended to always use the name .B xz with appropriate arguments .RB ( "xz \-d" or .BR "xz \-dc" ) instead of the names .B unxz and .BR xzcat . . .SH DESCRIPTION .B xz is a general-purpose data compression tool with command line syntax similar to .BR gzip (1) and .BR bzip2 (1). The native file format is the .B .xz format, but the legacy .B .lzma format used by LZMA Utils and raw compressed streams with no container format headers are also supported. In addition, decompression of the .B .lz format used by .B lzip is supported. .PP .B xz compresses or decompresses each .I file according to the selected operation mode. If no .I files are given or .I file is .BR \- , .B xz reads from standard input and writes the processed data to standard output. .B xz will refuse (display an error and skip the .IR file ) to write compressed data to standard output if it is a terminal. Similarly, .B xz will refuse to read compressed data from standard input if it is a terminal. .PP Unless .B \-\-stdout is specified, .I files other than .B \- are written to a new file whose name is derived from the source .I file name: .IP \(bu 3 When compressing, the suffix of the target file format .RB ( .xz or .BR .lzma ) is appended to the source filename to get the target filename. .IP \(bu 3 When decompressing, the .BR .xz , .BR .lzma , or .B .lz suffix is removed from the filename to get the target filename. .B xz also recognizes the suffixes .B .txz and .BR .tlz , and replaces them with the .B .tar suffix. .PP If the target file already exists, an error is displayed and the .I file is skipped. .PP Unless writing to standard output, .B xz will display a warning and skip the .I file if any of the following applies: .IP \(bu 3 .I File is not a regular file. Symbolic links are not followed, and thus they are not considered to be regular files. .IP \(bu 3 .I File has more than one hard link. .IP \(bu 3 .I File has setuid, setgid, or sticky bit set. .IP \(bu 3 The operation mode is set to compress and the .I file already has a suffix of the target file format .RB ( .xz or .B .txz when compressing to the .B .xz format, and .B .lzma or .B .tlz when compressing to the .B .lzma format). .IP \(bu 3 The operation mode is set to decompress and the .I file doesn't have a suffix of any of the supported file formats .RB ( .xz , .BR .txz , .BR .lzma , .BR .tlz , or .BR .lz ). .PP After successfully compressing or decompressing the .IR file , .B xz copies the owner, group, permissions, access time, and modification time from the source .I file to the target file. If copying the group fails, the permissions are modified so that the target file doesn't become accessible to users who didn't have permission to access the source .IR file . .B xz doesn't support copying other metadata like access control lists or extended attributes yet. .PP Once the target file has been successfully closed, the source .I file is removed unless .B \-\-keep was specified. The source .I file is never removed if the output is written to standard output or if an error occurs. .PP Sending .B SIGINFO or .B SIGUSR1 to the .B xz process makes it print progress information to standard error. This has only limited use since when standard error is a terminal, using .B \-\-verbose will display an automatically updating progress indicator. . .SS "Memory usage" The memory usage of .B xz varies from a few hundred kilobytes to several gigabytes depending on the compression settings. The settings used when compressing a file determine the memory requirements of the decompressor. Typically the decompressor needs 5\ % to 20\ % of the amount of memory that the compressor needed when creating the file. For example, decompressing a file created with .B xz \-9 currently requires 65\ MiB of memory. Still, it is possible to have .B .xz files that require several gigabytes of memory to decompress. .PP Especially users of older systems may find the possibility of very large memory usage annoying. To prevent uncomfortable surprises, .B xz has a built-in memory usage limiter, which is disabled by default. While some operating systems provide ways to limit the memory usage of processes, relying on it wasn't deemed to be flexible enough (for example, using .BR ulimit (1) to limit virtual memory tends to cripple .BR mmap (2)). .PP The memory usage limiter can be enabled with the command line option \fB\-\-memlimit=\fIlimit\fR. Often it is more convenient to enable the limiter by default by setting the environment variable +.\" TRANSLATORS: Don't translate the uppercase XZ_DEFAULTS. +.\" It's a name of an environment variable. .BR XZ_DEFAULTS , for example, .BR XZ_DEFAULTS=\-\-memlimit=150MiB . It is possible to set the limits separately for compression and decompression by using .BI \-\-memlimit\-compress= limit and \fB\-\-memlimit\-decompress=\fIlimit\fR. Using these two options outside .B XZ_DEFAULTS is rarely useful because a single run of .B xz cannot do both compression and decompression and .BI \-\-memlimit= limit (or .B \-M .IR limit ) is shorter to type on the command line. .PP If the specified memory usage limit is exceeded when decompressing, .B xz will display an error and decompressing the file will fail. If the limit is exceeded when compressing, .B xz will try to scale the settings down so that the limit is no longer exceeded (except when using .B \-\-format=raw or .BR \-\-no\-adjust ). This way the operation won't fail unless the limit is very small. The scaling of the settings is done in steps that don't match the compression level presets, for example, if the limit is only slightly less than the amount required for .BR "xz \-9" , the settings will be scaled down only a little, not all the way down to .BR "xz \-8" . . .SS "Concatenation and padding with .xz files" It is possible to concatenate .B .xz files as is. .B xz will decompress such files as if they were a single .B .xz file. .PP It is possible to insert padding between the concatenated parts or after the last part. The padding must consist of null bytes and the size of the padding must be a multiple of four bytes. This can be useful, for example, if the .B .xz file is stored on a medium that measures file sizes in 512-byte blocks. .PP Concatenation and padding are not allowed with .B .lzma files or raw streams. . .SH OPTIONS . .SS "Integer suffixes and special values" In most places where an integer argument is expected, an optional suffix is supported to easily indicate large integers. There must be no space between the integer and the suffix. .TP .B KiB Multiply the integer by 1,024 (2^10). .BR Ki , .BR k , .BR kB , .BR K , and .B KB are accepted as synonyms for .BR KiB . .TP .B MiB Multiply the integer by 1,048,576 (2^20). .BR Mi , .BR m , .BR M , and .B MB are accepted as synonyms for .BR MiB . .TP .B GiB Multiply the integer by 1,073,741,824 (2^30). .BR Gi , .BR g , .BR G , and .B GB are accepted as synonyms for .BR GiB . .PP The special value .B max can be used to indicate the maximum integer value supported by the option. . .SS "Operation mode" If multiple operation mode options are given, the last one takes effect. .TP .BR \-z ", " \-\-compress Compress. This is the default operation mode when no operation mode option is specified and no other operation mode is implied from the command name (for example, .B unxz implies .BR \-\-decompress ). +.IP "" +.\" The DESCRIPTION section already says this but it's good to repeat it +.\" here because the default behavior is a bit dangerous and new users +.\" in a hurry may skip reading the DESCRIPTION section. +After successful compression, the source file is removed +unless writing to standard output or +.B \-\-keep +was specified. .TP .BR \-d ", " \-\-decompress ", " \-\-uncompress Decompress. +.\" The DESCRIPTION section already says this but it's good to repeat it +.\" here because the default behavior is a bit dangerous and new users +.\" in a hurry may skip reading the DESCRIPTION section. +After successful decompression, the source file is removed +unless writing to standard output or +.B \-\-keep +was specified. .TP .BR \-t ", " \-\-test Test the integrity of compressed .IR files . This option is equivalent to .B "\-\-decompress \-\-stdout" except that the decompressed data is discarded instead of being written to standard output. No files are created or removed. .TP .BR \-l ", " \-\-list Print information about compressed .IR files . No uncompressed output is produced, and no files are created or removed. In list mode, the program cannot read the compressed data from standard input or from other unseekable sources. .IP "" The default listing shows basic information about .IR files , one file per line. To get more detailed information, use also the .B \-\-verbose option. For even more information, use .B \-\-verbose twice, but note that this may be slow, because getting all the extra information requires many seeks. The width of verbose output exceeds 80 characters, so piping the output to, for example, .B "less\ \-S" may be convenient if the terminal isn't wide enough. .IP "" The exact output may vary between .B xz versions and different locales. For machine-readable output, .B \-\-robot \-\-list should be used. . .SS "Operation modifiers" .TP .BR \-k ", " \-\-keep Don't delete the input files. .IP "" Since .B xz 5.2.6, this option also makes .B xz compress or decompress even if the input is a symbolic link to a regular file, has more than one hard link, or has the setuid, setgid, or sticky bit set. The setuid, setgid, and sticky bits are not copied to the target file. In earlier versions this was only done with .BR \-\-force . .TP .BR \-f ", " \-\-force This option has several effects: .RS .IP \(bu 3 If the target file already exists, delete it before compressing or decompressing. .IP \(bu 3 Compress or decompress even if the input is a symbolic link to a regular file, has more than one hard link, or has the setuid, setgid, or sticky bit set. The setuid, setgid, and sticky bits are not copied to the target file. .IP \(bu 3 When used with .B \-\-decompress .B \-\-stdout and .B xz cannot recognize the type of the source file, copy the source file as is to standard output. This allows .B xzcat .B \-\-force to be used like .BR cat (1) for files that have not been compressed with .BR xz . Note that in future, .B xz might support new compressed file formats, which may make .B xz decompress more types of files instead of copying them as is to standard output. .BI \-\-format= format can be used to restrict .B xz to decompress only a single file format. .RE .TP .BR \-c ", " \-\-stdout ", " \-\-to\-stdout Write the compressed or decompressed data to standard output instead of a file. This implies .BR \-\-keep . .TP .B \-\-single\-stream Decompress only the first .B .xz stream, and silently ignore possible remaining input data following the stream. Normally such trailing garbage makes .B xz display an error. .IP "" .B xz never decompresses more than one stream from .B .lzma files or raw streams, but this option still makes .B xz ignore the possible trailing data after the .B .lzma file or raw stream. .IP "" This option has no effect if the operation mode is not .B \-\-decompress or .BR \-\-test . +.IP "" +Since +.B xz +5.7.1alpha, +.B \-\-single\-stream +implies +.BR \-\-keep . .TP .B \-\-no\-sparse Disable creation of sparse files. By default, if decompressing into a regular file, .B xz tries to make the file sparse if the decompressed data contains long sequences of binary zeros. It also works when writing to standard output as long as standard output is connected to a regular file and certain additional conditions are met to make it safe. Creating sparse files may save disk space and speed up the decompression by reducing the amount of disk I/O. .TP \fB\-S\fR \fI.suf\fR, \fB\-\-suffix=\fI.suf When compressing, use .I .suf as the suffix for the target file instead of .B .xz or .BR .lzma . If not writing to standard output and the source file already has the suffix .IR .suf , a warning is displayed and the file is skipped. .IP "" When decompressing, recognize files with the suffix .I .suf in addition to files with the .BR .xz , .BR .txz , .BR .lzma , .BR .tlz , or .B .lz suffix. If the source file has the suffix .IR .suf , the suffix is removed to get the target filename. .IP "" When compressing or decompressing raw streams .RB ( \-\-format=raw ), the suffix must always be specified unless writing to standard output, because there is no default suffix for raw streams. .TP \fB\-\-files\fR[\fB=\fIfile\fR] Read the filenames to process from .IR file ; if .I file is omitted, filenames are read from standard input. Filenames must be terminated with the newline character. A dash .RB ( \- ) is taken as a regular filename; it doesn't mean standard input. If filenames are given also as command line arguments, they are processed before the filenames read from .IR file . .TP \fB\-\-files0\fR[\fB=\fIfile\fR] This is identical to \fB\-\-files\fR[\fB=\fIfile\fR] except that each filename must be terminated with the null character. . .SS "Basic file format and compression options" .TP \fB\-F\fR \fIformat\fR, \fB\-\-format=\fIformat Specify the file .I format to compress or decompress: .RS .TP +.\" TRANSLATORS: Don't translate bold string B. .B auto This is the default. When compressing, .B auto is equivalent to .BR xz . When decompressing, the format of the input file is automatically detected. Note that raw streams (created with .BR \-\-format=raw ) cannot be auto-detected. .TP .B xz Compress to the .B .xz file format, or accept only .B .xz files when decompressing. .TP .BR lzma ", " alone Compress to the legacy .B .lzma file format, or accept only .B .lzma files when decompressing. The alternative name .B alone is provided for backwards compatibility with LZMA Utils. .TP .B lzip Accept only .B .lz files when decompressing. Compression is not supported. .IP "" The .B .lz format version 0 and the unextended version 1 are supported. Version 0 files were produced by .B lzip 1.3 and older. Such files aren't common but may be found from file archives as a few source packages were released in this format. People might have old personal files in this format too. Decompression support for the format version 0 was removed in .B lzip 1.18. .IP "" .B lzip 1.4 and later create files in the format version 1. The sync flush marker extension to the format version 1 was added in .B lzip 1.6. This extension is rarely used and isn't supported by .B xz (diagnosed as corrupt input). .TP .B raw Compress or uncompress a raw stream (no headers). This is meant for advanced users only. To decode raw streams, you need use .B \-\-format=raw and explicitly specify the filter chain, which normally would have been stored in the container headers. .RE .TP \fB\-C\fR \fIcheck\fR, \fB\-\-check=\fIcheck Specify the type of the integrity check. The check is calculated from the uncompressed data and stored in the .B .xz file. This option has an effect only when compressing into the .B .xz format; the .B .lzma format doesn't support integrity checks. The integrity check (if any) is verified when the .B .xz file is decompressed. .IP "" Supported .I check types: .RS .TP +.\" TRANSLATORS: Don't translate the bold strings B, B, +.\" B, and B. The command line option --check accepts +.\" only the untranslated strings. .B none Don't calculate an integrity check at all. This is usually a bad idea. This can be useful when integrity of the data is verified by other means anyway. .TP .B crc32 Calculate CRC32 using the polynomial from IEEE-802.3 (Ethernet). .TP .B crc64 Calculate CRC64 using the polynomial from ECMA-182. This is the default, since it is slightly better than CRC32 at detecting damaged files and the speed difference is negligible. .TP .B sha256 Calculate SHA-256. This is somewhat slower than CRC32 and CRC64. .RE .IP "" Integrity of the .B .xz headers is always verified with CRC32. It is not possible to change or disable it. .TP .B \-\-ignore\-check Don't verify the integrity check of the compressed data when decompressing. The CRC32 values in the .B .xz headers will still be verified normally. .IP "" .B "Do not use this option unless you know what you are doing." Possible reasons to use this option: .RS .IP \(bu 3 Trying to recover data from a corrupt .xz file. .IP \(bu 3 Speeding up decompression. This matters mostly with SHA-256 or with files that have compressed extremely well. It's recommended to not use this option for this purpose unless the file integrity is verified externally in some other way. .RE .TP .BR \-0 " ... " \-9 Select a compression preset level. The default is .BR \-6 . If multiple preset levels are specified, the last one takes effect. If a custom filter chain was already specified, setting a compression preset level clears the custom filter chain. .IP "" The differences between the presets are more significant than with .BR gzip (1) and .BR bzip2 (1). The selected compression settings determine the memory requirements of the decompressor, thus using a too high preset level might make it painful to decompress the file on an old system with little RAM. Specifically, .B "it's not a good idea to blindly use \-9 for everything" like it often is with .BR gzip (1) and .BR bzip2 (1). .RS .TP .BR "\-0" " ... " "\-3" These are somewhat fast presets. .B \-0 is sometimes faster than .B "gzip \-9" while compressing much better. The higher ones often have speed comparable to .BR bzip2 (1) with comparable or better compression ratio, although the results depend a lot on the type of data being compressed. .TP .BR "\-4" " ... " "\-6" Good to very good compression while keeping decompressor memory usage reasonable even for old systems. .B \-6 is the default, which is usually a good choice for distributing files that need to be decompressible even on systems with only 16\ MiB RAM. .RB ( \-5e or .B \-6e may be worth considering too. See .BR \-\-extreme .) .TP .B "\-7 ... \-9" These are like .B \-6 but with higher compressor and decompressor memory requirements. These are useful only when compressing files bigger than 8\ MiB, 16\ MiB, and 32\ MiB, respectively. .RE .IP "" On the same hardware, the decompression speed is approximately a constant number of bytes of compressed data per second. In other words, the better the compression, the faster the decompression will usually be. This also means that the amount of uncompressed output produced per second can vary a lot. .IP "" The following table summarises the features of the presets: .RS .RS .PP .TS tab(;); c c c c c n n n n n. Preset;DictSize;CompCPU;CompMem;DecMem \-0;256 KiB;0;3 MiB;1 MiB \-1;1 MiB;1;9 MiB;2 MiB \-2;2 MiB;2;17 MiB;3 MiB \-3;4 MiB;3;32 MiB;5 MiB \-4;4 MiB;4;48 MiB;5 MiB \-5;8 MiB;5;94 MiB;9 MiB \-6;8 MiB;6;94 MiB;9 MiB \-7;16 MiB;6;186 MiB;17 MiB \-8;32 MiB;6;370 MiB;33 MiB \-9;64 MiB;6;674 MiB;65 MiB .TE .RE .RE .IP "" Column descriptions: .RS .IP \(bu 3 DictSize is the LZMA2 dictionary size. It is waste of memory to use a dictionary bigger than the size of the uncompressed file. This is why it is good to avoid using the presets .BR \-7 " ... " \-9 when there's no real need for them. At .B \-6 and lower, the amount of memory wasted is usually low enough to not matter. .IP \(bu 3 CompCPU is a simplified representation of the LZMA2 settings that affect compression speed. The dictionary size affects speed too, so while CompCPU is the same for levels .BR \-6 " ... " \-9 , higher levels still tend to be a little slower. To get even slower and thus possibly better compression, see .BR \-\-extreme . .IP \(bu 3 CompMem contains the compressor memory requirements in the single-threaded mode. It may vary slightly between .B xz versions. .IP \(bu 3 DecMem contains the decompressor memory requirements. That is, the compression settings determine the memory requirements of the decompressor. The exact decompressor memory usage is slightly more than the LZMA2 dictionary size, but the values in the table have been rounded up to the next full MiB. .RE .IP "" Memory requirements of the multi-threaded mode are significantly higher than that of the single-threaded mode. With the default value of .BR \-\-block\-size , each thread needs 3*3*DictSize plus CompMem or DecMem. For example, four threads with preset .B \-6 needs 660\(en670\ MiB of memory. .TP .BR \-e ", " \-\-extreme Use a slower variant of the selected compression preset level .RB ( \-0 " ... " \-9 ) to hopefully get a little bit better compression ratio, but with bad luck this can also make it worse. Decompressor memory usage is not affected, but compressor memory usage increases a little at preset levels .BR \-0 " ... " \-3 . .IP "" Since there are two presets with dictionary sizes 4\ MiB and 8\ MiB, the presets .B \-3e and .B \-5e use slightly faster settings (lower CompCPU) than .B \-4e and .BR \-6e , respectively. That way no two presets are identical. .RS .RS .PP .TS tab(;); c c c c c n n n n n. Preset;DictSize;CompCPU;CompMem;DecMem \-0e;256 KiB;8;4 MiB;1 MiB \-1e;1 MiB;8;13 MiB;2 MiB \-2e;2 MiB;8;25 MiB;3 MiB \-3e;4 MiB;7;48 MiB;5 MiB \-4e;4 MiB;8;48 MiB;5 MiB \-5e;8 MiB;7;94 MiB;9 MiB \-6e;8 MiB;8;94 MiB;9 MiB \-7e;16 MiB;8;186 MiB;17 MiB \-8e;32 MiB;8;370 MiB;33 MiB \-9e;64 MiB;8;674 MiB;65 MiB .TE .RE .RE .IP "" For example, there are a total of four presets that use 8\ MiB dictionary, whose order from the fastest to the slowest is .BR \-5 , .BR \-6 , .BR \-5e , and .BR \-6e . .TP .B \-\-fast .PD 0 .TP .B \-\-best .PD These are somewhat misleading aliases for .B \-0 and .BR \-9 , respectively. These are provided only for backwards compatibility with LZMA Utils. Avoid using these options. .TP .BI \-\-block\-size= size When compressing to the .B .xz format, split the input data into blocks of .I size bytes. The blocks are compressed independently from each other, which helps with multi-threading and makes limited random-access decompression possible. This option is typically used to override the default block size in multi-threaded mode, but this option can be used in single-threaded mode too. .IP "" In multi-threaded mode about three times .I size bytes will be allocated in each thread for buffering input and output. The default .I size is three times the LZMA2 dictionary size or 1 MiB, whichever is more. Typically a good value is 2\(en4 times the size of the LZMA2 dictionary or at least 1 MiB. Using .I size less than the LZMA2 dictionary size is waste of RAM because then the LZMA2 dictionary buffer will never get fully used. In multi-threaded mode, the sizes of the blocks are stored in the block headers. This size information is required for multi-threaded decompression. .IP "" In single-threaded mode no block splitting is done by default. Setting this option doesn't affect memory usage. No size information is stored in block headers, thus files created in single-threaded mode won't be identical to files created in multi-threaded mode. The lack of size information also means that .B xz won't be able decompress the files in multi-threaded mode. .TP .BI \-\-block\-list= items When compressing to the .B .xz format, start a new block with an optional custom filter chain after the given intervals of uncompressed data. .IP "" The .I items are a comma-separated list. Each item consists of an optional filter chain number between 0 and 9 followed by a colon .RB ( : ) and a required size of uncompressed data. Omitting an item (two or more consecutive commas) is a shorthand to use the size and filters of the previous item. .IP "" If the input file is bigger than the sum of the sizes in .IR items , the last item is repeated until the end of the file. A special value of .B 0 may be used as the last size to indicate that the rest of the file should be encoded as a single block. .IP "" An alternative filter chain for each block can be specified in combination with the .BI \-\-filters1= filters \&...\& .BI \-\-filters9= filters options. These options define filter chains with an identifier between 1\(en9. Filter chain 0 can be used to refer to the default filter chain, which is the same as not specifying a filter chain. The filter chain identifier can be used before the uncompressed size, followed by a colon .RB ( : ). For example, if one specifies .B \-\-block\-list=1:2MiB,3:2MiB,2:4MiB,,2MiB,0:4MiB then blocks will be created using: .RS .IP \(bu 3 The filter chain specified by .B \-\-filters1 and 2 MiB input .IP \(bu 3 The filter chain specified by .B \-\-filters3 and 2 MiB input .IP \(bu 3 The filter chain specified by .B \-\-filters2 and 4 MiB input .IP \(bu 3 The filter chain specified by .B \-\-filters2 and 4 MiB input .IP \(bu 3 The default filter chain and 2 MiB input .IP \(bu 3 The default filter chain and 4 MiB input for every block until end of input. .RE .IP "" If one specifies a size that exceeds the encoder's block size (either the default value in threaded mode or the value specified with \fB\-\-block\-size=\fIsize\fR), the encoder will create additional blocks while keeping the boundaries specified in .IR items . For example, if one specifies .B \-\-block\-size=10MiB .B \-\-block\-list=5MiB,10MiB,8MiB,12MiB,24MiB and the input file is 80 MiB, one will get 11 blocks: 5, 10, 8, 10, 2, 10, 10, 4, 10, 10, and 1 MiB. .IP "" In multi-threaded mode the sizes of the blocks are stored in the block headers. This isn't done in single-threaded mode, so the encoded output won't be identical to that of the multi-threaded mode. .TP .BI \-\-flush\-timeout= timeout When compressing, if more than .I timeout milliseconds (a positive integer) has passed since the previous flush and reading more input would block, all the pending input data is flushed from the encoder and made available in the output stream. This can be useful if .B xz is used to compress data that is streamed over a network. Small .I timeout values make the data available at the receiving end with a small delay, but large .I timeout values give better compression ratio. .IP "" This feature is disabled by default. If this option is specified more than once, the last one takes effect. The special .I timeout value of .B 0 can be used to explicitly disable this feature. .IP "" This feature is not available on non-POSIX systems. .IP "" .\" FIXME .B "This feature is still experimental." Currently .B xz is unsuitable for decompressing the stream in real time due to how .B xz does buffering. .TP +.B \-\-no\-sync +Do not synchronize the target file and its directory +to the storage device before removing the source file. +This can improve performance if compressing or decompressing +many small files. +However, if the system crashes soon after the deletion, +it is possible that the target file was not written +to the storage device but the delete operation was. +In that case neither the original source file +nor the target file is available. +.IP "" +This option has an effect only when +.B xz +is going to remove the source file. +In other cases synchronization is never done. +.IP "" +The synchronization and +.B \-\-no\-sync +were added in +.B xz +5.7.1alpha. +.TP .BI \-\-memlimit\-compress= limit Set a memory usage limit for compression. If this option is specified multiple times, the last one takes effect. .IP "" If the compression settings exceed the .IR limit , .B xz will attempt to adjust the settings downwards so that the limit is no longer exceeded and display a notice that automatic adjustment was done. The adjustments are done in this order: reducing the number of threads, switching to single-threaded mode if even one thread in multi-threaded mode exceeds the .IR limit , and finally reducing the LZMA2 dictionary size. .IP "" When compressing with .B \-\-format=raw or if .B \-\-no\-adjust has been specified, only the number of threads may be reduced since it can be done without affecting the compressed output. .IP "" If the .I limit cannot be met even with the adjustments described above, an error is displayed and .B xz will exit with exit status 1. .IP "" The .I limit can be specified in multiple ways: .RS .IP \(bu 3 The .I limit can be an absolute value in bytes. Using an integer suffix like .B MiB can be useful. Example: .B "\-\-memlimit\-compress=80MiB" .IP \(bu 3 The .I limit can be specified as a percentage of total physical memory (RAM). This can be useful especially when setting the .B XZ_DEFAULTS environment variable in a shell initialization script that is shared between different computers. That way the limit is automatically bigger on systems with more memory. Example: .B "\-\-memlimit\-compress=70%" .IP \(bu 3 The .I limit can be reset back to its default value by setting it to .BR 0 . This is currently equivalent to setting the .I limit to .B max (no memory usage limit). .RE .IP "" For 32-bit .B xz there is a special case: if the .I limit would be over .BR "4020\ MiB" , the .I limit is set to .BR "4020\ MiB" . On MIPS32 .B "2000\ MiB" is used instead. (The values .B 0 and .B max aren't affected by this. A similar feature doesn't exist for decompression.) This can be helpful when a 32-bit executable has access to 4\ GiB address space (2 GiB on MIPS32) while hopefully doing no harm in other situations. .IP "" See also the section .BR "Memory usage" . .TP .BI \-\-memlimit\-decompress= limit Set a memory usage limit for decompression. This also affects the .B \-\-list mode. If the operation is not possible without exceeding the .IR limit , .B xz will display an error and decompressing the file will fail. See .BI \-\-memlimit\-compress= limit for possible ways to specify the .IR limit . .TP .BI \-\-memlimit\-mt\-decompress= limit Set a memory usage limit for multi-threaded decompression. This can only affect the number of threads; this will never make .B xz refuse to decompress a file. If .I limit is too low to allow any multi-threading, the .I limit is ignored and .B xz will continue in single-threaded mode. Note that if also .B \-\-memlimit\-decompress is used, it will always apply to both single-threaded and multi-threaded modes, and so the effective .I limit for multi-threading will never be higher than the limit set with .BR \-\-memlimit\-decompress . .IP "" In contrast to the other memory usage limit options, .BI \-\-memlimit\-mt\-decompress= limit has a system-specific default .IR limit . .B "xz \-\-info\-memory" can be used to see the current value. .IP "" This option and its default value exist because without any limit the threaded decompressor could end up allocating an insane amount of memory with some input files. If the default .I limit is too low on your system, feel free to increase the .I limit but never set it to a value larger than the amount of usable RAM as with appropriate input files .B xz will attempt to use that amount of memory even with a low number of threads. Running out of memory or swapping will not improve decompression performance. .IP "" See .BI \-\-memlimit\-compress= limit for possible ways to specify the .IR limit . Setting .I limit to .B 0 resets the .I limit to the default system-specific value. .TP \fB\-M\fR \fIlimit\fR, \fB\-\-memlimit=\fIlimit\fR, \fB\-\-memory=\fIlimit This is equivalent to specifying .BI \-\-memlimit\-compress= limit .BI \-\-memlimit-decompress= limit \fB\-\-memlimit\-mt\-decompress=\fIlimit\fR. .TP .B \-\-no\-adjust Display an error and exit if the memory usage limit cannot be met without adjusting settings that affect the compressed output. That is, this prevents .B xz from switching the encoder from multi-threaded mode to single-threaded mode and from reducing the LZMA2 dictionary size. Even when this option is used the number of threads may be reduced to meet the memory usage limit as that won't affect the compressed output. .IP "" Automatic adjusting is always disabled when creating raw streams .RB ( \-\-format=raw ). .TP \fB\-T\fR \fIthreads\fR, \fB\-\-threads=\fIthreads Specify the number of worker threads to use. Setting .I threads to a special value .B 0 makes .B xz use up to as many threads as the processor(s) on the system support. The actual number of threads can be fewer than .I threads if the input file is not big enough for threading with the given settings or if using more threads would exceed the memory usage limit. .IP "" The single-threaded and multi-threaded compressors produce different output. Single-threaded compressor will give the smallest file size but only the output from the multi-threaded compressor can be decompressed using multiple threads. Setting .I threads to .B 1 will use the single-threaded mode. Setting .I threads to any other value, including .BR 0 , will use the multi-threaded compressor even if the system supports only one hardware thread. .RB ( xz 5.2.x used single-threaded mode in this situation.) .IP "" To use multi-threaded mode with only one thread, set .I threads to .BR +1 . The .B + prefix has no effect with values other than .BR 1 . A memory usage limit can still make .B xz switch to single-threaded mode unless .B \-\-no\-adjust is used. Support for the .B + prefix was added in .B xz 5.4.0. .IP "" If an automatic number of threads has been requested and no memory usage limit has been specified, then a system-specific default soft limit will be used to possibly limit the number of threads. It is a soft limit in sense that it is ignored if the number of threads becomes one, thus a soft limit will never stop .B xz from compressing or decompressing. This default soft limit will not make .B xz switch from multi-threaded mode to single-threaded mode. The active limits can be seen with .BR "xz \-\-info\-memory" . .IP "" Currently the only threading method is to split the input into blocks and compress them independently from each other. The default block size depends on the compression level and can be overridden with the .BI \-\-block\-size= size option. .IP "" Threaded decompression only works on files that contain multiple blocks with size information in block headers. All large enough files compressed in multi-threaded mode meet this condition, but files compressed in single-threaded mode don't even if .BI \-\-block\-size= size has been used. .IP "" The default value for .I threads is .BR 0 . In .B xz 5.4.x and older the default is .BR 1 . . .SS "Custom compressor filter chains" A custom filter chain allows specifying the compression settings in detail instead of relying on the settings associated to the presets. When a custom filter chain is specified, preset options .RB ( \-0 \&...\& .B \-9 and .BR \-\-extreme ) earlier on the command line are forgotten. If a preset option is specified after one or more custom filter chain options, the new preset takes effect and the custom filter chain options specified earlier are forgotten. .PP A filter chain is comparable to piping on the command line. When compressing, the uncompressed input goes to the first filter, whose output goes to the next filter (if any). The output of the last filter gets written to the compressed file. The maximum number of filters in the chain is four, but typically a filter chain has only one or two filters. .PP Many filters have limitations on where they can be in the filter chain: some filters can work only as the last filter in the chain, some only as a non-last filter, and some work in any position in the chain. Depending on the filter, this limitation is either inherent to the filter design or exists to prevent security issues. .PP A custom filter chain can be specified in two different ways. The options .BI \-\-filters= filters and .BI \-\-filters1= filters \&...\& .BI \-\-filters9= filters allow specifying an entire filter chain in one option using the liblzma filter string syntax. Alternatively, a filter chain can be specified by using one or more individual filter options in the order they are wanted in the filter chain. That is, the order of the individual filter options is significant! When decoding raw streams .RB ( \-\-format=raw ), the filter chain must be specified in the same order as it was specified when compressing. Any individual filter or preset options specified before the full chain option (\fB\-\-filters=\fIfilters\fR) will be forgotten. Individual filters specified after the full chain option will reset the filter chain. .PP Both the full and individual filter options take filter-specific .I options as a comma-separated list. Extra commas in .I options are ignored. Every option has a default value, so specify those you want to change. .PP To see the whole filter chain and .IR options , use .B "xz \-vv" (that is, use .B \-\-verbose twice). This works also for viewing the filter chain options used by presets. .TP .BI \-\-filters= filters Specify the full filter chain or a preset in a single option. Each filter can be separated by spaces or two dashes .RB ( \-\- ). .I filters may need to be quoted on the shell command line so it is parsed as a single option. To denote .IR options , use .B : or .BR = . A preset can be prefixed with a .B \- and followed with zero or more flags. The only supported flag is .B e to apply the same options as .BR \-\-extreme . .TP \fB\-\-filters1\fR=\fIfilters\fR ... \fB\-\-filters9\fR=\fIfilters Specify up to nine additional filter chains that can be used with .BR \-\-block\-list . .IP "" For example, when compressing an archive with executable files followed by text files, the executable part could use a filter chain with a BCJ filter and the text part only the LZMA2 filter. .TP .B \-\-filters-help Display a help message describing how to specify presets and custom filter chains in the .B \-\-filters and .BI \-\-filters1= filters \&...\& .BI \-\-filters9= filters options, and exit successfully. .TP \fB\-\-lzma1\fR[\fB=\fIoptions\fR] .PD 0 .TP \fB\-\-lzma2\fR[\fB=\fIoptions\fR] .PD Add LZMA1 or LZMA2 filter to the filter chain. These filters can be used only as the last filter in the chain. .IP "" LZMA1 is a legacy filter, which is supported almost solely due to the legacy .B .lzma file format, which supports only LZMA1. LZMA2 is an updated version of LZMA1 to fix some practical issues of LZMA1. The .B .xz format uses LZMA2 and doesn't support LZMA1 at all. Compression speed and ratios of LZMA1 and LZMA2 are practically the same. .IP "" LZMA1 and LZMA2 share the same set of .IR options : .RS .TP +.\" TRANSLATORS: Don't translate bold strings like B, B, +.\" B, B, B, or B because those are command line +.\" options. On the other hand, do translate the italic strings like +.\" I, I, and I, because such italic strings are +.\" placeholders which a user replaces with an actual value. .BI preset= preset Reset all LZMA1 or LZMA2 .I options to .IR preset . .I Preset consist of an integer, which may be followed by single-letter preset modifiers. The integer can be from .B 0 to .BR 9 , matching the command line options .B \-0 \&...\& .BR \-9 . The only supported modifier is currently .BR e , which matches .BR \-\-extreme . If no .B preset is specified, the default values of LZMA1 or LZMA2 .I options are taken from the preset .BR 6 . .TP .BI dict= size Dictionary (history buffer) .I size indicates how many bytes of the recently processed uncompressed data is kept in memory. The algorithm tries to find repeating byte sequences (matches) in the uncompressed data, and replace them with references to the data currently in the dictionary. The bigger the dictionary, the higher is the chance to find a match. Thus, increasing dictionary .I size usually improves compression ratio, but a dictionary bigger than the uncompressed file is waste of memory. .IP "" Typical dictionary .I size is from 64\ KiB to 64\ MiB. The minimum is 4\ KiB. The maximum for compression is currently 1.5\ GiB (1536\ MiB). The decompressor already supports dictionaries up to one byte less than 4\ GiB, which is the maximum for the LZMA1 and LZMA2 stream formats. .IP "" Dictionary .I size and match finder .RI ( mf ) together determine the memory usage of the LZMA1 or LZMA2 encoder. The same (or bigger) dictionary .I size is required for decompressing that was used when compressing, thus the memory usage of the decoder is determined by the dictionary size used when compressing. The .B .xz headers store the dictionary .I size either as .RI "2^" n or .RI "2^" n " + 2^(" n "\-1)," so these .I sizes are somewhat preferred for compression. Other .I sizes will get rounded up when stored in the .B .xz headers. .TP .BI lc= lc Specify the number of literal context bits. The minimum is 0 and the maximum is 4; the default is 3. In addition, the sum of .I lc and .I lp must not exceed 4. .IP "" All bytes that cannot be encoded as matches are encoded as literals. That is, literals are simply 8-bit bytes that are encoded one at a time. .IP "" The literal coding makes an assumption that the highest .I lc bits of the previous uncompressed byte correlate with the next byte. For example, in typical English text, an upper-case letter is often followed by a lower-case letter, and a lower-case letter is usually followed by another lower-case letter. In the US-ASCII character set, the highest three bits are 010 for upper-case letters and 011 for lower-case letters. When .I lc is at least 3, the literal coding can take advantage of this property in the uncompressed data. .IP "" The default value (3) is usually good. If you want maximum compression, test .BR lc=4 . Sometimes it helps a little, and sometimes it makes compression worse. If it makes it worse, test .B lc=2 too. .TP .BI lp= lp Specify the number of literal position bits. The minimum is 0 and the maximum is 4; the default is 0. .IP "" .I Lp affects what kind of alignment in the uncompressed data is assumed when encoding literals. See .I pb below for more information about alignment. .TP .BI pb= pb Specify the number of position bits. The minimum is 0 and the maximum is 4; the default is 2. .IP "" .I Pb affects what kind of alignment in the uncompressed data is assumed in general. The default means four-byte alignment .RI (2^ pb =2^2=4), which is often a good choice when there's no better guess. .IP "" When the alignment is known, setting .I pb accordingly may reduce the file size a little. For example, with text files having one-byte alignment (US-ASCII, ISO-8859-*, UTF-8), setting .B pb=0 can improve compression slightly. For UTF-16 text, .B pb=1 is a good choice. If the alignment is an odd number like 3 bytes, .B pb=0 might be the best choice. .IP "" Even though the assumed alignment can be adjusted with .I pb and .IR lp , LZMA1 and LZMA2 still slightly favor 16-byte alignment. It might be worth taking into account when designing file formats that are likely to be often compressed with LZMA1 or LZMA2. .TP .BI mf= mf Match finder has a major effect on encoder speed, memory usage, and compression ratio. Usually Hash Chain match finders are faster than Binary Tree match finders. The default depends on the .IR preset : 0 uses .BR hc3 , 1\(en3 use .BR hc4 , and the rest use .BR bt4 . .IP "" The following match finders are supported. The memory usage formulas below are rough approximations, which are closest to the reality when .I dict is a power of two. .RS .TP .B hc3 Hash Chain with 2- and 3-byte hashing .br Minimum value for .IR nice : 3 .br Memory usage: .br .I dict * 7.5 (if .I dict <= 16 MiB); .br .I dict * 5.5 + 64 MiB (if .I dict > 16 MiB) .TP .B hc4 Hash Chain with 2-, 3-, and 4-byte hashing .br Minimum value for .IR nice : 4 .br Memory usage: .br .I dict * 7.5 (if .I dict <= 32 MiB); .br .I dict * 6.5 (if .I dict > 32 MiB) .TP .B bt2 Binary Tree with 2-byte hashing .br Minimum value for .IR nice : 2 .br Memory usage: .I dict * 9.5 .TP .B bt3 Binary Tree with 2- and 3-byte hashing .br Minimum value for .IR nice : 3 .br Memory usage: .br .I dict * 11.5 (if .I dict <= 16 MiB); .br .I dict * 9.5 + 64 MiB (if .I dict > 16 MiB) .TP .B bt4 Binary Tree with 2-, 3-, and 4-byte hashing .br Minimum value for .IR nice : 4 .br Memory usage: .br .I dict * 11.5 (if .I dict <= 32 MiB); .br .I dict * 10.5 (if .I dict > 32 MiB) .RE .TP .BI mode= mode Compression .I mode specifies the method to analyze the data produced by the match finder. Supported .I modes are .B fast and .BR normal . The default is .B fast for .I presets 0\(en3 and .B normal for .I presets 4\(en9. .IP "" Usually .B fast is used with Hash Chain match finders and .B normal with Binary Tree match finders. This is also what the .I presets do. .TP .BI nice= nice Specify what is considered to be a nice length for a match. Once a match of at least .I nice bytes is found, the algorithm stops looking for possibly better matches. .IP "" .I Nice can be 2\(en273 bytes. Higher values tend to give better compression ratio at the expense of speed. The default depends on the .IR preset . .TP .BI depth= depth Specify the maximum search depth in the match finder. The default is the special value of 0, which makes the compressor determine a reasonable .I depth from .I mf and .IR nice . .IP "" Reasonable .I depth for Hash Chains is 4\(en100 and 16\(en1000 for Binary Trees. Using very high values for .I depth can make the encoder extremely slow with some files. Avoid setting the .I depth over 1000 unless you are prepared to interrupt the compression in case it is taking far too long. .RE .IP "" When decoding raw streams .RB ( \-\-format=raw ), LZMA2 needs only the dictionary .IR size . LZMA1 needs also .IR lc , .IR lp , and .IR pb . .TP \fB\-\-x86\fR[\fB=\fIoptions\fR] .PD 0 .TP \fB\-\-arm\fR[\fB=\fIoptions\fR] .TP \fB\-\-armthumb\fR[\fB=\fIoptions\fR] .TP \fB\-\-arm64\fR[\fB=\fIoptions\fR] .TP \fB\-\-powerpc\fR[\fB=\fIoptions\fR] .TP \fB\-\-ia64\fR[\fB=\fIoptions\fR] .TP \fB\-\-sparc\fR[\fB=\fIoptions\fR] .TP \fB\-\-riscv\fR[\fB=\fIoptions\fR] .PD Add a branch/call/jump (BCJ) filter to the filter chain. These filters can be used only as a non-last filter in the filter chain. .IP "" A BCJ filter converts relative addresses in the machine code to their absolute counterparts. This doesn't change the size of the data but it increases redundancy, which can help LZMA2 to produce 0\(en15\ % smaller .B .xz file. The BCJ filters are always reversible, so using a BCJ filter for wrong type of data doesn't cause any data loss, although it may make the compression ratio slightly worse. The BCJ filters are very fast and use an insignificant amount of memory. .IP "" These BCJ filters have known problems related to the compression ratio: .RS .IP \(bu 3 Some types of files containing executable code (for example, object files, static libraries, and Linux kernel modules) have the addresses in the instructions filled with filler values. These BCJ filters will still do the address conversion, which will make the compression worse with these files. .IP \(bu 3 If a BCJ filter is applied on an archive, it is possible that it makes the compression ratio worse than not using a BCJ filter. For example, if there are similar or even identical executables then filtering will likely make the files less similar and thus compression is worse. The contents of non-executable files in the same archive can matter too. In practice one has to try with and without a BCJ filter to see which is better in each situation. .RE .IP "" Different instruction sets have different alignment: the executable file must be aligned to a multiple of this value in the input data to make the filter work. .RS .RS .PP .TS tab(;); l n l l n l. Filter;Alignment;Notes x86;1;32-bit or 64-bit x86 ARM;4; ARM-Thumb;2; ARM64;4;4096-byte alignment is best PowerPC;4;Big endian only IA-64;16;Itanium SPARC;4; RISC-V;2; .TE .RE .RE .IP "" Since the BCJ-filtered data is usually compressed with LZMA2, the compression ratio may be improved slightly if the LZMA2 options are set to match the alignment of the selected BCJ filter. Examples: .RS .IP \(bu 3 IA-64 filter has 16-byte alignment so .B pb=4,lp=4,lc=0 is good with LZMA2 (2^4=16). .IP \(bu 3 RISC-V code has 2-byte or 4-byte alignment depending on whether the file contains 16-bit compressed instructions (the C extension). When 16-bit instructions are used, .B pb=2,lp=1,lc=3 or .B pb=1,lp=1,lc=3 is good. When 16-bit instructions aren't present, .B pb=2,lp=2,lc=2 is the best. .B readelf \-h can be used to check if "RVC" appears on the "Flags" line. .IP \(bu 3 ARM64 is always 4-byte aligned so .B pb=2,lp=2,lc=2 is the best. .IP \(bu 3 The x86 filter is an exception. It's usually good to stick to LZMA2's defaults .RB ( pb=2,lp=0,lc=3 ) when compressing x86 executables. .RE .IP "" All BCJ filters support the same .IR options : .RS .TP .BI start= offset Specify the start .I offset that is used when converting between relative and absolute addresses. The .I offset must be a multiple of the alignment of the filter (see the table above). The default is zero. In practice, the default is good; specifying a custom .I offset is almost never useful. .RE .TP \fB\-\-delta\fR[\fB=\fIoptions\fR] Add the Delta filter to the filter chain. The Delta filter can be only used as a non-last filter in the filter chain. .IP "" Currently only simple byte-wise delta calculation is supported. It can be useful when compressing, for example, uncompressed bitmap images or uncompressed PCM audio. However, special purpose algorithms may give significantly better results than Delta + LZMA2. This is true especially with audio, which compresses faster and better, for example, with .BR flac (1). .IP "" Supported .IR options : .RS .TP .BI dist= distance Specify the .I distance of the delta calculation in bytes. .I distance must be 1\(en256. The default is 1. .IP "" For example, with .B dist=2 and eight-byte input A1 B1 A2 B3 A3 B5 A4 B7, the output will be A1 B1 01 02 01 02 01 02. .RE . .SS "Other options" .TP .BR \-q ", " \-\-quiet Suppress warnings and notices. Specify this twice to suppress errors too. This option has no effect on the exit status. That is, even if a warning was suppressed, the exit status to indicate a warning is still used. .TP .BR \-v ", " \-\-verbose Be verbose. If standard error is connected to a terminal, .B xz will display a progress indicator. Specifying .B \-\-verbose twice will give even more verbose output. .IP "" The progress indicator shows the following information: .RS .IP \(bu 3 Completion percentage is shown if the size of the input file is known. That is, the percentage cannot be shown in pipes. .IP \(bu 3 Amount of compressed data produced (compressing) or consumed (decompressing). .IP \(bu 3 Amount of uncompressed data consumed (compressing) or produced (decompressing). .IP \(bu 3 Compression ratio, which is calculated by dividing the amount of compressed data processed so far by the amount of uncompressed data processed so far. .IP \(bu 3 Compression or decompression speed. This is measured as the amount of uncompressed data consumed (compression) or produced (decompression) per second. It is shown after a few seconds have passed since .B xz started processing the file. .IP \(bu 3 Elapsed time in the format M:SS or H:MM:SS. .IP \(bu 3 Estimated remaining time is shown only when the size of the input file is known and a couple of seconds have already passed since .B xz started processing the file. The time is shown in a less precise format which never has any colons, for example, 2 min 30 s. .RE .IP "" When standard error is not a terminal, .B \-\-verbose will make .B xz print the filename, compressed size, uncompressed size, compression ratio, and possibly also the speed and elapsed time on a single line to standard error after compressing or decompressing the file. The speed and elapsed time are included only when the operation took at least a few seconds. If the operation didn't finish, for example, due to user interruption, also the completion percentage is printed if the size of the input file is known. .TP .BR \-Q ", " \-\-no\-warn Don't set the exit status to 2 even if a condition worth a warning was detected. This option doesn't affect the verbosity level, thus both .B \-\-quiet and .B \-\-no\-warn have to be used to not display warnings and to not alter the exit status. .TP .B \-\-robot Print messages in a machine-parsable format. This is intended to ease writing frontends that want to use .B xz instead of liblzma, which may be the case with various scripts. The output with this option enabled is meant to be stable across .B xz releases. See the section .B "ROBOT MODE" for details. .TP .B \-\-info\-memory Display, in human-readable format, how much physical memory (RAM) and how many processor threads .B xz thinks the system has and the memory usage limits for compression and decompression, and exit successfully. .TP .BR \-h ", " \-\-help Display a help message describing the most commonly used options, and exit successfully. .TP .BR \-H ", " \-\-long\-help Display a help message describing all features of .BR xz , and exit successfully .TP .BR \-V ", " \-\-version Display the version number of .B xz and liblzma in human readable format. To get machine-parsable output, specify .B \-\-robot before .BR \-\-version . . .SH "ROBOT MODE" The robot mode is activated with the .B \-\-robot option. It makes the output of .B xz easier to parse by other programs. Currently .B \-\-robot is supported only together with .BR \-\-list , .BR \-\-filters\-help , .BR \-\-info\-memory , and .BR \-\-version . It will be supported for compression and decompression in the future. . .SS "List mode" .B "xz \-\-robot \-\-list" uses tab-separated output. The first column of every line has a string that indicates the type of the information found on that line: .TP +.\" TRANSLATORS: The bold strings B, B, B, B, +.\" B, and B are produced by the xz tool for scripts to +.\" parse, thus the untranslated strings must be included in the translated +.\" man page. It may be useful to provide a translated string in parenthesis +.\" without bold, for example: "B (nimi)" .B name This is always the first line when starting to list a file. The second column on the line is the filename. .TP .B file This line contains overall information about the .B .xz file. This line is always printed after the .B name line. .TP .B stream This line type is used only when .B \-\-verbose was specified. There are as many .B stream lines as there are streams in the .B .xz file. .TP .B block This line type is used only when .B \-\-verbose was specified. There are as many .B block lines as there are blocks in the .B .xz file. The .B block lines are shown after all the .B stream lines; different line types are not interleaved. .TP .B summary This line type is used only when .B \-\-verbose was specified twice. This line is printed after all .B block lines. Like the .B file line, the .B summary line contains overall information about the .B .xz file. .TP .B totals This line is always the very last line of the list output. It shows the total counts and sizes. .PP The columns of the .B file lines: .PD 0 .RS .IP 2. 4 Number of streams in the file .IP 3. 4 Total number of blocks in the stream(s) .IP 4. 4 Compressed size of the file .IP 5. 4 Uncompressed size of the file .IP 6. 4 Compression ratio, for example, .BR 0.123 . If ratio is over 9.999, three dashes .RB ( \-\-\- ) are displayed instead of the ratio. .IP 7. 4 Comma-separated list of integrity check names. The following strings are used for the known check types: +.\" TRANSLATORS: Don't translate the bold strings B, B, +.\" B, B, or B here. In robot mode, xz produces +.\" them in untranslated form for scripts to parse. .BR None , .BR CRC32 , .BR CRC64 , and .BR SHA\-256 . For unknown check types, .BI Unknown\- N is used, where .I N is the Check ID as a decimal number (one or two digits). .IP 8. 4 Total size of stream padding in the file .RE .PD .PP The columns of the .B stream lines: .PD 0 .RS .IP 2. 4 Stream number (the first stream is 1) .IP 3. 4 Number of blocks in the stream .IP 4. 4 Compressed start offset .IP 5. 4 Uncompressed start offset .IP 6. 4 Compressed size (does not include stream padding) .IP 7. 4 Uncompressed size .IP 8. 4 Compression ratio .IP 9. 4 Name of the integrity check .IP 10. 4 Size of stream padding .RE .PD .PP The columns of the .B block lines: .PD 0 .RS .IP 2. 4 Number of the stream containing this block .IP 3. 4 Block number relative to the beginning of the stream (the first block is 1) .IP 4. 4 Block number relative to the beginning of the file .IP 5. 4 Compressed start offset relative to the beginning of the file .IP 6. 4 Uncompressed start offset relative to the beginning of the file .IP 7. 4 Total compressed size of the block (includes headers) .IP 8. 4 Uncompressed size .IP 9. 4 Compression ratio .IP 10. 4 Name of the integrity check .RE .PD .PP If .B \-\-verbose was specified twice, additional columns are included on the .B block lines. These are not displayed with a single .BR \-\-verbose , because getting this information requires many seeks and can thus be slow: .PD 0 .RS .IP 11. 4 Value of the integrity check in hexadecimal .IP 12. 4 Block header size .IP 13. 4 Block flags: .B c indicates that compressed size is present, and .B u indicates that uncompressed size is present. If the flag is not set, a dash .RB ( \- ) is shown instead to keep the string length fixed. New flags may be added to the end of the string in the future. .IP 14. 4 Size of the actual compressed data in the block (this excludes the block header, block padding, and check fields) .IP 15. 4 Amount of memory (in bytes) required to decompress this block with this .B xz version .IP 16. 4 Filter chain. Note that most of the options used at compression time cannot be known, because only the options that are needed for decompression are stored in the .B .xz headers. .RE .PD .PP The columns of the .B summary lines: .PD 0 .RS .IP 2. 4 Amount of memory (in bytes) required to decompress this file with this .B xz version .IP 3. 4 .B yes or .B no indicating if all block headers have both compressed size and uncompressed size stored in them .PP .I Since .B xz .I 5.1.2alpha: .IP 4. 4 Minimum .B xz version required to decompress the file .RE .PD .PP The columns of the .B totals line: .PD 0 .RS .IP 2. 4 Number of streams .IP 3. 4 Number of blocks .IP 4. 4 Compressed size .IP 5. 4 Uncompressed size .IP 6. 4 Average compression ratio .IP 7. 4 Comma-separated list of integrity check names that were present in the files .IP 8. 4 Stream padding size .IP 9. 4 Number of files. This is here to keep the order of the earlier columns the same as on .B file lines. .PD .RE .PP If .B \-\-verbose was specified twice, additional columns are included on the .B totals line: .PD 0 .RS .IP 10. 4 Maximum amount of memory (in bytes) required to decompress the files with this .B xz version .IP 11. 4 .B yes or .B no indicating if all block headers have both compressed size and uncompressed size stored in them .PP .I Since .B xz .I 5.1.2alpha: .IP 12. 4 Minimum .B xz version required to decompress the file .RE .PD .PP Future versions may add new line types and new columns can be added to the existing line types, but the existing columns won't be changed. . .SS "Filters help" .B "xz \-\-robot \-\-filters-help" prints the supported filters in the following format: .PP \fIfilter\fB:\fIoption\fB=<\fIvalue\fB>,\fIoption\fB=<\fIvalue\fB>\fR... .TP .I filter Name of the filter .TP .I option Name of a filter specific option .TP .I value Numeric .I value ranges appear as \fB<\fImin\fB\-\fImax\fB>\fR. String .I value choices are shown within .B "< >" and separated by a .B | character. .PP Each filter is printed on its own line. . .SS "Memory limit information" .B "xz \-\-robot \-\-info\-memory" prints a single line with multiple tab-separated columns: .IP 1. 4 Total amount of physical memory (RAM) in bytes. .IP 2. 4 Memory usage limit for compression in bytes .RB ( \-\-memlimit\-compress ). A special value of .B 0 indicates the default setting which for single-threaded mode is the same as no limit. .IP 3. 4 Memory usage limit for decompression in bytes .RB ( \-\-memlimit\-decompress ). A special value of .B 0 indicates the default setting which for single-threaded mode is the same as no limit. .IP 4. 4 Since .B xz 5.3.4alpha: Memory usage for multi-threaded decompression in bytes .RB ( \-\-memlimit\-mt\-decompress ). This is never zero because a system-specific default value shown in the column 5 is used if no limit has been specified explicitly. This is also never greater than the value in the column 3 even if a larger value has been specified with .BR \-\-memlimit\-mt\-decompress . .IP 5. 4 Since .B xz 5.3.4alpha: A system-specific default memory usage limit that is used to limit the number of threads when compressing with an automatic number of threads .RB ( \-\-threads=0 ) and no memory usage limit has been specified .RB ( \-\-memlimit\-compress ). This is also used as the default value for .BR \-\-memlimit\-mt\-decompress . .IP 6. 4 Since .B xz 5.3.4alpha: Number of available processor threads. .PP In the future, the output of .B "xz \-\-robot \-\-info\-memory" may have more columns, but never more than a single line. . .SS Version .B "xz \-\-robot \-\-version" prints the version number of .B xz and liblzma in the following format: .PP +.\" TRANSLATORS: Don't translate the uppercase XZ_VERSION or LIBLZMA_VERSION. .BI XZ_VERSION= XYYYZZZS .br .BI LIBLZMA_VERSION= XYYYZZZS .TP .I X Major version. .TP .I YYY Minor version. Even numbers are stable. Odd numbers are alpha or beta versions. .TP .I ZZZ Patch level for stable releases or just a counter for development releases. .TP .I S Stability. 0 is alpha, 1 is beta, and 2 is stable. .I S should be always 2 when .I YYY is even. .PP .I XYYYZZZS are the same on both lines if .B xz and liblzma are from the same XZ Utils release. .PP Examples: 4.999.9beta is .B 49990091 and 5.0.0 is .BR 50000002 . . .SH "EXIT STATUS" .TP .B 0 All is good. .TP .B 1 An error occurred. .TP .B 2 Something worth a warning occurred, but no actual errors occurred. .PP Notices (not warnings or errors) printed on standard error don't affect the exit status. . .SH ENVIRONMENT .B xz parses space-separated lists of options from the environment variables +.\" TRANSLATORS: Don't translate the uppercase XZ_DEFAULTS or XZ_OPT. +.\" They are names of environment variables. .B XZ_DEFAULTS and .BR XZ_OPT , in this order, before parsing the options from the command line. Note that only options are parsed from the environment variables; all non-options are silently ignored. Parsing is done with .BR getopt_long (3) which is used also for the command line arguments. +.PP +.B Warning: +By setting these environment variables, +one is effectively modifying programs and scripts that run +.BR xz . +Most of the time it is safe to set memory usage limits, number of threads, +and compression options via the environment variables. +However, some options can break scripts. +An obvious example is +.B \-\-help +which makes +.B xz +show the help text instead of compressing or decompressing a file. +More subtle examples are +.B \-\-quiet +and +.BR \-\-verbose . +In many cases it works well to enable the progress indicator using +.BR \-\-verbose , +but in some situations the extra messages create problems. +The verbosity level also affects the behavior of +.BR \-\-list . .TP .B XZ_DEFAULTS User-specific or system-wide default options. Typically this is set in a shell initialization script to enable .BR xz 's -memory usage limiter by default. +memory usage limiter by default or set the default number of threads. Excluding shell initialization scripts -and similar special cases, scripts must never set or unset +and similar special cases, scripts should never set or unset .BR XZ_DEFAULTS . .TP .B XZ_OPT This is for passing options to .B xz when it is not possible to set the options directly on the .B xz command line. This is the case when .B xz is run by a script or tool, for example, GNU .BR tar (1): .RS .RS .PP .nf .ft CR XZ_OPT=\-2v tar caf foo.tar.xz foo .ft R .fi .RE .RE .IP "" Scripts may use .BR XZ_OPT , for example, to set script-specific default compression options. It is still recommended to allow users to override .B XZ_OPT if that is reasonable. For example, in .BR sh (1) scripts one may use something like this: .RS .RS .PP .nf .ft CR XZ_OPT=${XZ_OPT\-"\-7e"} export XZ_OPT .ft R .fi .RE .RE . .SH "LZMA UTILS COMPATIBILITY" The command line syntax of .B xz is practically a superset of .BR lzma , .BR unlzma , and .B lzcat as found from LZMA Utils 4.32.x. In most cases, it is possible to replace LZMA Utils with XZ Utils without breaking existing scripts. There are some incompatibilities though, which may sometimes cause problems. . .SS "Compression preset levels" The numbering of the compression level presets is not identical in .B xz and LZMA Utils. The most important difference is how dictionary sizes are mapped to different presets. Dictionary size is roughly equal to the decompressor memory usage. .RS .PP .TS tab(;); c c c c n n. Level;xz;LZMA Utils \-0;256 KiB;N/A \-1;1 MiB;64 KiB \-2;2 MiB;1 MiB \-3;4 MiB;512 KiB \-4;4 MiB;1 MiB \-5;8 MiB;2 MiB \-6;8 MiB;4 MiB \-7;16 MiB;8 MiB \-8;32 MiB;16 MiB \-9;64 MiB;32 MiB .TE .RE .PP The dictionary size differences affect the compressor memory usage too, but there are some other differences between LZMA Utils and XZ Utils, which make the difference even bigger: .RS .PP .TS tab(;); c c c c n n. Level;xz;LZMA Utils 4.32.x \-0;3 MiB;N/A \-1;9 MiB;2 MiB \-2;17 MiB;12 MiB \-3;32 MiB;12 MiB \-4;48 MiB;16 MiB \-5;94 MiB;26 MiB \-6;94 MiB;45 MiB \-7;186 MiB;83 MiB \-8;370 MiB;159 MiB \-9;674 MiB;311 MiB .TE .RE .PP The default preset level in LZMA Utils is .B \-7 while in XZ Utils it is .BR \-6 , so both use an 8 MiB dictionary by default. . .SS "Streamed vs. non-streamed .lzma files" The uncompressed size of the file can be stored in the .B .lzma header. LZMA Utils does that when compressing regular files. The alternative is to mark that uncompressed size is unknown and use end-of-payload marker to indicate where the decompressor should stop. LZMA Utils uses this method when uncompressed size isn't known, which is the case, for example, in pipes. .PP .B xz supports decompressing .B .lzma files with or without end-of-payload marker, but all .B .lzma files created by .B xz will use end-of-payload marker and have uncompressed size marked as unknown in the .B .lzma header. This may be a problem in some uncommon situations. For example, a .B .lzma decompressor in an embedded device might work only with files that have known uncompressed size. If you hit this problem, you need to use LZMA Utils or LZMA SDK to create .B .lzma files with known uncompressed size. . .SS "Unsupported .lzma files" The .B .lzma format allows .I lc values up to 8, and .I lp values up to 4. LZMA Utils can decompress files with any .I lc and .IR lp , but always creates files with .B lc=3 and .BR lp=0 . Creating files with other .I lc and .I lp is possible with .B xz and with LZMA SDK. .PP The implementation of the LZMA1 filter in liblzma requires that the sum of .I lc and .I lp must not exceed 4. Thus, .B .lzma files, which exceed this limitation, cannot be decompressed with .BR xz . .PP LZMA Utils creates only .B .lzma files which have a dictionary size of .RI "2^" n (a power of 2) but accepts files with any dictionary size. liblzma accepts only .B .lzma files which have a dictionary size of .RI "2^" n or .RI "2^" n " + 2^(" n "\-1)." This is to decrease false positives when detecting .B .lzma files. .PP These limitations shouldn't be a problem in practice, since practically all .B .lzma files have been compressed with settings that liblzma will accept. . .SS "Trailing garbage" When decompressing, LZMA Utils silently ignore everything after the first .B .lzma stream. In most situations, this is a bug. This also means that LZMA Utils don't support decompressing concatenated .B .lzma files. .PP If there is data left after the first .B .lzma stream, .B xz considers the file to be corrupt unless .B \-\-single\-stream was used. This may break obscure scripts which have assumed that trailing garbage is ignored. . .SH NOTES . .SS "Compressed output may vary" The exact compressed output produced from the same uncompressed input file may vary between XZ Utils versions even if compression options are identical. This is because the encoder can be improved (faster or better compression) without affecting the file format. The output can vary even between different builds of the same XZ Utils version, if different build options are used. .PP The above means that once .B \-\-rsyncable has been implemented, the resulting files won't necessarily be rsyncable unless both old and new files have been compressed with the same xz version. This problem can be fixed if a part of the encoder implementation is frozen to keep rsyncable output stable across xz versions. . .SS "Embedded .xz decompressors" Embedded .B .xz decompressor implementations like XZ Embedded don't necessarily support files created with integrity .I check types other than .B none and .BR crc32 . Since the default is .BR \-\-check=crc64 , you must use .B \-\-check=none or .B \-\-check=crc32 when creating files for embedded systems. .PP Outside embedded systems, all .B .xz format decompressors support all the .I check types, or at least are able to decompress the file without verifying the integrity check if the particular .I check is not supported. .PP XZ Embedded supports BCJ filters, but only with the default start offset. . .SH EXAMPLES . .SS Basics Compress the file .I foo into .I foo.xz using the default compression level .RB ( \-6 ), and remove .I foo if compression is successful: .RS .PP .nf .ft CR xz foo .ft R .fi .RE .PP Decompress .I bar.xz into .I bar and don't remove .I bar.xz even if decompression is successful: .RS .PP .nf .ft CR xz \-dk bar.xz .ft R .fi .RE .PP Create .I baz.tar.xz with the preset .B \-4e .RB ( "\-4 \-\-extreme" ), which is slower than the default .BR \-6 , but needs less memory for compression and decompression (48\ MiB and 5\ MiB, respectively): .RS .PP .nf .ft CR tar cf \- baz | xz \-4e > baz.tar.xz .ft R .fi .RE .PP A mix of compressed and uncompressed files can be decompressed to standard output with a single command: .RS .PP .nf .ft CR xz \-dcf a.txt b.txt.xz c.txt d.txt.lzma > abcd.txt .ft R .fi .RE . .SS "Parallel compression of many files" On GNU and *BSD, .BR find (1) and .BR xargs (1) can be used to parallelize compression of many files: .RS .PP .nf .ft CR find . \-type f \e! \-name '*.xz' \-print0 \e | xargs \-0r \-P4 \-n16 xz \-T1 .ft R .fi .RE .PP The .B \-P option to .BR xargs (1) sets the number of parallel .B xz processes. The best value for the .B \-n option depends on how many files there are to be compressed. If there are only a couple of files, the value should probably be 1; with tens of thousands of files, 100 or even more may be appropriate to reduce the number of .B xz processes that .BR xargs (1) will eventually create. .PP The option .B \-T1 for .B xz is there to force it to single-threaded mode, because .BR xargs (1) is used to control the amount of parallelization. . .SS "Robot mode" Calculate how many bytes have been saved in total after compressing multiple files: .RS .PP .nf .ft CR xz \-\-robot \-\-list *.xz | awk '/^totals/{print $5\-$4}' .ft R .fi .RE .PP A script may want to know that it is using new enough .BR xz . The following .BR sh (1) script checks that the version number of the .B xz tool is at least 5.0.0. This method is compatible with old beta versions, which didn't support the .B \-\-robot option: .RS .PP .nf .ft CR if ! eval "$(xz \-\-robot \-\-version 2> /dev/null)" || [ "$XZ_VERSION" \-lt 50000002 ]; then echo "Your xz is too old." fi unset XZ_VERSION LIBLZMA_VERSION .ft R .fi .RE .PP Set a memory usage limit for decompression using .BR XZ_OPT , but if a limit has already been set, don't increase it: .RS .PP .nf .ft CR NEWLIM=$((123 << 20))\ \ # 123 MiB OLDLIM=$(xz \-\-robot \-\-info\-memory | cut \-f3) if [ $OLDLIM \-eq 0 \-o $OLDLIM \-gt $NEWLIM ]; then XZ_OPT="$XZ_OPT \-\-memlimit\-decompress=$NEWLIM" export XZ_OPT fi .ft R .fi .RE . .SS "Custom compressor filter chains" The simplest use for custom filter chains is customizing a LZMA2 preset. This can be useful, because the presets cover only a subset of the potentially useful combinations of compression settings. .PP The CompCPU columns of the tables from the descriptions of the options .BR "\-0" " ... " "\-9" and .B \-\-extreme are useful when customizing LZMA2 presets. Here are the relevant parts collected from those two tables: .RS .PP .TS tab(;); c c n n. Preset;CompCPU \-0;0 \-1;1 \-2;2 \-3;3 \-4;4 \-5;5 \-6;6 \-5e;7 \-6e;8 .TE .RE .PP If you know that a file requires somewhat big dictionary (for example, 32\ MiB) to compress well, but you want to compress it quicker than .B "xz \-8" would do, a preset with a low CompCPU value (for example, 1) can be modified to use a bigger dictionary: .RS .PP .nf .ft CR xz \-\-lzma2=preset=1,dict=32MiB foo.tar .ft R .fi .RE .PP With certain files, the above command may be faster than .B "xz \-6" while compressing significantly better. However, it must be emphasized that only some files benefit from a big dictionary while keeping the CompCPU value low. The most obvious situation, where a big dictionary can help a lot, is an archive containing very similar files of at least a few megabytes each. The dictionary size has to be significantly bigger than any individual file to allow LZMA2 to take full advantage of the similarities between consecutive files. .PP If very high compressor and decompressor memory usage is fine, and the file being compressed is at least several hundred megabytes, it may be useful to use an even bigger dictionary than the 64 MiB that .B "xz \-9" would use: .RS .PP .nf .ft CR xz \-vv \-\-lzma2=dict=192MiB big_foo.tar .ft R .fi .RE .PP Using .B \-vv .RB ( "\-\-verbose \-\-verbose" ) like in the above example can be useful to see the memory requirements of the compressor and decompressor. Remember that using a dictionary bigger than the size of the uncompressed file is waste of memory, so the above command isn't useful for small files. .PP Sometimes the compression time doesn't matter, but the decompressor memory usage has to be kept low, for example, to make it possible to decompress the file on an embedded system. The following command uses .B \-6e .RB ( "\-6 \-\-extreme" ) as a base and sets the dictionary to only 64\ KiB. The resulting file can be decompressed with XZ Embedded (that's why there is .BR \-\-check=crc32 ) using about 100\ KiB of memory. .RS .PP .nf .ft CR xz \-\-check=crc32 \-\-lzma2=preset=6e,dict=64KiB foo .ft R .fi .RE .PP If you want to squeeze out as many bytes as possible, adjusting the number of literal context bits .RI ( lc ) and number of position bits .RI ( pb ) can sometimes help. Adjusting the number of literal position bits .RI ( lp ) might help too, but usually .I lc and .I pb are more important. For example, a source code archive contains mostly US-ASCII text, so something like the following might give slightly (like 0.1\ %) smaller file than .B "xz \-6e" (try also without .BR lc=4 ): .RS .PP .nf .ft CR xz \-\-lzma2=preset=6e,pb=0,lc=4 source_code.tar .ft R .fi .RE .PP Using another filter together with LZMA2 can improve compression with certain file types. For example, to compress a x86-32 or x86-64 shared library using the x86 BCJ filter: .RS .PP .nf .ft CR xz \-\-x86 \-\-lzma2 libfoo.so .ft R .fi .RE .PP Note that the order of the filter options is significant. If .B \-\-x86 is specified after .BR \-\-lzma2 , .B xz will give an error, because there cannot be any filter after LZMA2, and also because the x86 BCJ filter cannot be used as the last filter in the chain. .PP The Delta filter together with LZMA2 can give good results with bitmap images. It should usually beat PNG, which has a few more advanced filters than simple delta but uses Deflate for the actual compression. .PP The image has to be saved in uncompressed format, for example, as uncompressed TIFF. The distance parameter of the Delta filter is set to match the number of bytes per pixel in the image. For example, 24-bit RGB bitmap needs .BR dist=3 , and it is also good to pass .B pb=0 to LZMA2 to accommodate the three-byte alignment: .RS .PP .nf .ft CR xz \-\-delta=dist=3 \-\-lzma2=pb=0 foo.tiff .ft R .fi .RE .PP If multiple images have been put into a single archive (for example, .BR .tar ), the Delta filter will work on that too as long as all images have the same number of bytes per pixel. . .SH "SEE ALSO" .BR xzdec (1), .BR xzdiff (1), .BR xzgrep (1), .BR xzless (1), .BR xzmore (1), .BR gzip (1), .BR bzip2 (1), .BR 7z (1) .PP XZ Utils: .br XZ Embedded: .br LZMA SDK: diff --git a/src/xzdec/xzdec.c b/src/xzdec/xzdec.c index a75ea42a52fb..96e2444438c2 100644 --- a/src/xzdec/xzdec.c +++ b/src/xzdec/xzdec.c @@ -1,484 +1,489 @@ // SPDX-License-Identifier: 0BSD /////////////////////////////////////////////////////////////////////////////// // /// \file xzdec.c /// \brief Simple single-threaded tool to uncompress .xz or .lzma files // // Author: Lasse Collin // /////////////////////////////////////////////////////////////////////////////// #include "sysdefs.h" #include "lzma.h" #include #include +#include #include #ifndef _MSC_VER # include #endif #ifdef HAVE_CAP_RIGHTS_LIMIT # include #endif #ifdef HAVE_LINUX_LANDLOCK -# include -# include -# include -# ifdef LANDLOCK_ACCESS_NET_BIND_TCP -# define LANDLOCK_ABI_MAX 4 -# else -# define LANDLOCK_ABI_MAX 3 -# endif +# include "my_landlock.h" #endif #if defined(HAVE_CAP_RIGHTS_LIMIT) || defined(HAVE_PLEDGE) \ || defined(HAVE_LINUX_LANDLOCK) # define ENABLE_SANDBOX 1 #endif #include "getopt.h" #include "tuklib_progname.h" +#include "tuklib_mbstr_nonprint.h" #include "tuklib_exit.h" #ifdef TUKLIB_DOSLIKE # include # include # ifdef _MSC_VER # define fileno _fileno # define setmode _setmode # endif #endif #ifdef LZMADEC # define TOOL_FORMAT "lzma" #else # define TOOL_FORMAT "xz" #endif /// Error messages are suppressed if this is zero, which is the case when /// --quiet has been given at least twice. static int display_errors = 2; lzma_attribute((__format__(__printf__, 1, 2))) static void my_errorf(const char *fmt, ...) { va_list ap; va_start(ap, fmt); if (display_errors) { fprintf(stderr, "%s: ", progname); vfprintf(stderr, fmt, ap); fprintf(stderr, "\n"); } va_end(ap); return; } tuklib_attr_noreturn static void help(void) { printf( "Usage: %s [OPTION]... [FILE]...\n" "Decompress files in the ." TOOL_FORMAT " format to standard output.\n" "\n" " -d, --decompress (ignored, only decompression is supported)\n" " -k, --keep (ignored, files are never deleted)\n" " -c, --stdout (ignored, output is always written to standard output)\n" " -q, --quiet specify *twice* to suppress errors\n" " -Q, --no-warn (ignored, the exit status 2 is never used)\n" " -h, --help display this help and exit\n" " -V, --version display the version number and exit\n" "\n" "With no FILE, or when FILE is -, read standard input.\n" "\n" "Report bugs to <" PACKAGE_BUGREPORT "> (in English or Finnish).\n" PACKAGE_NAME " home page: <" PACKAGE_URL ">\n", progname); tuklib_exit(EXIT_SUCCESS, EXIT_FAILURE, display_errors); } tuklib_attr_noreturn static void version(void) { printf(TOOL_FORMAT "dec (" PACKAGE_NAME ") " LZMA_VERSION_STRING "\n" "liblzma %s\n", lzma_version_string()); tuklib_exit(EXIT_SUCCESS, EXIT_FAILURE, display_errors); } /// Parses command line options. static void parse_options(int argc, char **argv) { static const char short_opts[] = "cdkhqQV"; static const struct option long_opts[] = { { "stdout", no_argument, NULL, 'c' }, { "to-stdout", no_argument, NULL, 'c' }, { "decompress", no_argument, NULL, 'd' }, { "uncompress", no_argument, NULL, 'd' }, { "keep", no_argument, NULL, 'k' }, { "quiet", no_argument, NULL, 'q' }, { "no-warn", no_argument, NULL, 'Q' }, { "help", no_argument, NULL, 'h' }, { "version", no_argument, NULL, 'V' }, { NULL, 0, NULL, 0 } }; int c; while ((c = getopt_long(argc, argv, short_opts, long_opts, NULL)) != -1) { switch (c) { case 'c': case 'd': case 'k': case 'Q': break; case 'q': if (display_errors > 0) --display_errors; break; case 'h': help(); case 'V': version(); default: exit(EXIT_FAILURE); } } return; } static void uncompress(lzma_stream *strm, FILE *file, const char *filename) { lzma_ret ret; // Initialize the decoder #ifdef LZMADEC ret = lzma_alone_decoder(strm, UINT64_MAX); #else ret = lzma_stream_decoder(strm, UINT64_MAX, LZMA_CONCATENATED); #endif // The only reasonable error here is LZMA_MEM_ERROR. if (ret != LZMA_OK) { my_errorf("%s", ret == LZMA_MEM_ERROR ? strerror(ENOMEM) : "Internal error (bug)"); exit(EXIT_FAILURE); } // Input and output buffers uint8_t in_buf[BUFSIZ]; uint8_t out_buf[BUFSIZ]; strm->avail_in = 0; strm->next_out = out_buf; strm->avail_out = BUFSIZ; lzma_action action = LZMA_RUN; while (true) { if (strm->avail_in == 0) { strm->next_in = in_buf; strm->avail_in = fread(in_buf, 1, BUFSIZ, file); if (ferror(file)) { // POSIX says that fread() sets errno if // an error occurred. ferror() doesn't // touch errno. my_errorf("%s: Error reading input file: %s", - filename, strerror(errno)); + tuklib_mask_nonprint(filename), + strerror(errno)); exit(EXIT_FAILURE); } #ifndef LZMADEC // When using LZMA_CONCATENATED, we need to tell // liblzma when it has got all the input. if (feof(file)) action = LZMA_FINISH; #endif } ret = lzma_code(strm, action); // Write and check write error before checking decoder error. // This way as much data as possible gets written to output // even if decoder detected an error. if (strm->avail_out == 0 || ret != LZMA_OK) { const size_t write_size = BUFSIZ - strm->avail_out; if (fwrite(out_buf, 1, write_size, stdout) != write_size) { // Wouldn't be a surprise if writing to stderr // would fail too but at least try to show an // error message. - my_errorf("Cannot write to standard output: " +#if defined(_WIN32) && !defined(__CYGWIN__) + // On native Windows, broken pipe is reported + // as EINVAL. Don't show an error message + // in this case. + if (errno != EINVAL) +#endif + { + my_errorf("Cannot write to " + "standard output: " "%s", strerror(errno)); + } exit(EXIT_FAILURE); } strm->next_out = out_buf; strm->avail_out = BUFSIZ; } if (ret != LZMA_OK) { if (ret == LZMA_STREAM_END) { #ifdef LZMADEC // Check that there's no trailing garbage. if (strm->avail_in != 0 || fread(in_buf, 1, 1, file) != 0 || !feof(file)) ret = LZMA_DATA_ERROR; else return; #else // lzma_stream_decoder() already guarantees // that there's no trailing garbage. assert(strm->avail_in == 0); assert(action == LZMA_FINISH); assert(feof(file)); return; #endif } const char *msg; switch (ret) { case LZMA_MEM_ERROR: msg = strerror(ENOMEM); break; case LZMA_FORMAT_ERROR: msg = "File format not recognized"; break; case LZMA_OPTIONS_ERROR: // FIXME: Better message? msg = "Unsupported compression options"; break; case LZMA_DATA_ERROR: msg = "File is corrupt"; break; case LZMA_BUF_ERROR: msg = "Unexpected end of input"; break; default: msg = "Internal error (bug)"; break; } - my_errorf("%s: %s", filename, msg); + my_errorf("%s: %s", tuklib_mask_nonprint(filename), + msg); exit(EXIT_FAILURE); } } } #ifdef ENABLE_SANDBOX static void sandbox_enter(int src_fd) { #if defined(HAVE_CAP_RIGHTS_LIMIT) // Capsicum needs FreeBSD 10.2 or later. cap_rights_t rights; if (cap_enter()) goto error; if (cap_rights_limit(src_fd, cap_rights_init(&rights, CAP_READ))) goto error; // If not reading from stdin, remove all capabilities from it. if (src_fd != STDIN_FILENO && cap_rights_limit( STDIN_FILENO, cap_rights_clear(&rights))) goto error; if (cap_rights_limit(STDOUT_FILENO, cap_rights_init(&rights, CAP_WRITE))) goto error; if (cap_rights_limit(STDERR_FILENO, cap_rights_init(&rights, CAP_WRITE))) goto error; #elif defined(HAVE_PLEDGE) // pledge() was introduced in OpenBSD 5.9. if (pledge("stdio", "")) goto error; (void)src_fd; #elif defined(HAVE_LINUX_LANDLOCK) - int landlock_abi = syscall(SYS_landlock_create_ruleset, - (void *)NULL, 0, LANDLOCK_CREATE_RULESET_VERSION); - - if (landlock_abi > 0) { - if (landlock_abi > LANDLOCK_ABI_MAX) - landlock_abi = LANDLOCK_ABI_MAX; - - const struct landlock_ruleset_attr attr = { - .handled_access_fs = (1ULL - << (12 + my_min(3, landlock_abi))) - 1, -# if LANDLOCK_ABI_MAX >= 4 - .handled_access_net = landlock_abi < 4 ? 0 : - (LANDLOCK_ACCESS_NET_BIND_TCP - | LANDLOCK_ACCESS_NET_CONNECT_TCP), -# endif - }; - - const int ruleset_fd = syscall(SYS_landlock_create_ruleset, - &attr, sizeof(attr), 0U); + struct landlock_ruleset_attr attr; + if (my_landlock_ruleset_attr_forbid_all(&attr) > 0) { + const int ruleset_fd = my_landlock_create_ruleset( + &attr, sizeof(attr), 0); if (ruleset_fd < 0) goto error; // All files we need should have already been opened. Thus, // we don't need to add any rules using landlock_add_rule(2) // before activating the sandbox. - if (syscall(SYS_landlock_restrict_self, ruleset_fd, 0U) != 0) + if (my_landlock_restrict_self(ruleset_fd, 0) != 0) goto error; + + (void)close(ruleset_fd); } (void)src_fd; #else # error ENABLE_SANDBOX is defined but no sandboxing method was found. #endif return; error: #ifdef HAVE_CAP_RIGHTS_LIMIT // If a kernel is configured without capability mode support or // used in an emulator that does not implement the capability // system calls, then the Capsicum system calls will fail and set // errno to ENOSYS. In that case xzdec will silently run without // the sandbox. if (errno == ENOSYS) return; #endif my_errorf("Failed to enable the sandbox"); exit(EXIT_FAILURE); } #endif int main(int argc, char **argv) { + // Initialize progname which will be used in error messages. + tuklib_progname_init(argv); + #ifdef HAVE_PLEDGE // OpenBSD's pledge(2) sandbox. // Initially enable the sandbox slightly more relaxed so that // the process can still open files. This allows the sandbox to // be enabled when parsing command line arguments and decompressing // all files (the more strict sandbox only restricts the last file // that is decompressed). if (pledge("stdio rpath", "")) { my_errorf("Failed to enable the sandbox"); exit(EXIT_FAILURE); } #endif #ifdef HAVE_LINUX_LANDLOCK // Prevent the process from gaining new privileges. This must be done // before landlock_restrict_self(2) but since we will never need new // privileges, this call can be done here already. // // This is supported since Linux 3.5. Ignore the return value to // keep compatibility with old kernels. landlock_restrict_self(2) // will fail if the no_new_privs attribute isn't set, thus if prctl() // fails here the error will still be detected when it matters. (void)prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); #endif - // Initialize progname which we will be used in error messages. - tuklib_progname_init(argv); + // We need to set the locale even though we don't have any + // translated messages: + // + // - tuklib_mask_nonprint() has locale-specific behavior (LC_CTYPE). + // + // - This is needed on Windows to make non-ASCII filenames display + // properly when the active code page has been set to UTF-8 + // in the application manifest. + setlocale(LC_ALL, ""); // Parse the command line options. parse_options(argc, argv); // The same lzma_stream is used for all files that we decode. This way // we don't need to reallocate memory for every file if they use same // compression settings. lzma_stream strm = LZMA_STREAM_INIT; // Some systems require setting stdin and stdout to binary mode. #ifdef TUKLIB_DOSLIKE setmode(fileno(stdin), O_BINARY); setmode(fileno(stdout), O_BINARY); #endif if (optind == argc) { // No filenames given, decode from stdin. #ifdef ENABLE_SANDBOX sandbox_enter(STDIN_FILENO); #endif uncompress(&strm, stdin, "(stdin)"); } else { // Loop through the filenames given on the command line. do { FILE *src_file; const char *src_name; // "-" indicates stdin. if (strcmp(argv[optind], "-") == 0) { src_file = stdin; src_name = "(stdin)"; } else { src_name = argv[optind]; src_file = fopen(src_name, "rb"); if (src_file == NULL) { - my_errorf("%s: %s", src_name, - strerror(errno)); + my_errorf("%s: %s", + tuklib_mask_nonprint( + src_name), + strerror(errno)); exit(EXIT_FAILURE); } } #ifdef ENABLE_SANDBOX // Enable the strict sandbox for the last file. // Then the process can no longer open additional // files. The typical xzdec use case is to decompress // a single file so this way the strictest sandboxing // is used in most cases. if (optind == argc - 1) sandbox_enter(fileno(src_file)); #endif uncompress(&strm, src_file, src_name); if (src_file != stdin) (void)fclose(src_file); } while (++optind < argc); } #ifndef NDEBUG // Free the memory only when debugging. Freeing wastes some time, // but allows detecting possible memory leaks with Valgrind. lzma_end(&strm); #endif tuklib_exit(EXIT_SUCCESS, EXIT_FAILURE, display_errors); }