XZ Utils review notes

General

Changes to translations were checked separately, not per commit basis.

While the commits weren’t signed, I didn’t spot any signs of committer fraud.

v5.2, v5.4, and v5.6

Cherry-picks to v5.2, v5.4, and v5.6 are OK. This means that (apart from translations) it’s enough to review the master branch. If master is fine then the stable branches are too.

Timezones

There has been discussion about commit timezones and if those could be giving clues about something. There are mundane explanations for these though:

Putting a commit in other person’s name with permission.
Use of git rebase --reset-author-date.
Possibly use of git rebase --committer-date-is-author-date.

Of these I never did the last one but I don’t know if Jia did.

Commits

Trivial commits like updates to THANKS or typo fixes in comments aren’t mentioned here.

I never reviewed anything under the .github directory and those files are skipped here as well.

The months refer to commit date, not author date.

2022-12

CMake: Update .gitignore for CMake artifacts from in source build.

OK.

liblzma: Fix lzma_microlzma_encoder() return value.

OK.

Doxygen: Update .gitignore for generating docs for in source build.

OK.

Tests: Adds lzip decoder tests

test_lzip_decoder.c has got a several minor tweaks later but overall this was a fine start.

.gitignore has a typo which was fixed in Tests: Update .gitignore..

liblzma: Update documentation for lzma_filter_encoder.

OK.

Build: No longer require HAVE_DECL_CLOCK_MONOTONIC to always be set.
and
Build: Only define HAVE_PROGRAM_INVOCATION_NAME if it is set to 1.

The default behavior of AC_CHECK_DECLS can feel a bit awkward so these changes are nice.

The comments got a quick fix by me (Build: Fix config.h comments.).

liblzma: Includes sys/time.h conditionally in mythread
and
Style: Change #if !defined() to #ifndef in mythread.h.

OK.

xz: Includes <time.h> and <sys/time.h> conditionally in mytime.c.

OK.

2023-01

Tests: Replace non portable shell parameter expansion

OK with my style fixes (Tests: Adjust style in test_compress.sh.).

Build: Add missing stream_decoder_mt.c to .vcxproj files.

OK. Also, the VS project files aren’t present anymore.

liblzma: Replaced hardcoded 0x0 index indicator byte with macro

OK.

liblzma: Add NULL check to lzma_index_hash_append.

OK.

Tests: Creates test_index_hash.c

I made a bunch of cleanups to this a few days later, including fixing a memory leak (Tests: test_index_hash: Fix a memory leak.). It’s OK enough now.

Adds test_index_hash to .gitignore.

OK.

liblzma: Remove common.h include from common/index.h.

He wanted to use liblzma’s internal headers in some tests and eventually I gave in. This required making those particular internal headers standalone in sense that they don’t include other internal headers.

Tests: Refactors existing filter flags tests.
and
Tests: test_filter_flags: Clean up minor issues.

OK enough.

The second commit was combined from my edits squashed into a single commit. The commit message was clearly edited by him, collecting the messages from my separate commits. For example, the way the bullet points are formatted doesn’t match the way I would have done them.

Tests: Fix unused function warning in test_index_hash.
and
Tests: Fix unused function warning in test_block_header.

OK.

Tests: Fix type-limits warning in test_filter_flags.
and
Tests: Fix test_filter_flags copy/paste error..

These are OK together.

When all filters are disabled, the array initialization becomes { } which isn’t valid C99. This was fixed in Tests: Fix C99/C11 compatibility when features are disabled.

xz: Fix warning -Wformat-nonliteral on clang in message.c.

OK.

xz: Do not set compression settings with raw format in list mode.

OK.

xz: Add missing comment for coder_set_compression_settings()

OK.

tuklib_common: Define __has_warning if it is not defined.

I didn’t like this and reverted it a few days later. See Revert "tuklib_common: Define __has_warning if it is not defined." for the reasons.

tuklib_physmem: Silence warning from -Wcast-function-type on MinGW-w64.

I cleaned this up a few days later in tuklib_physmem: Clean up the way -Wcast-function-type is silenced on Windows. and tuklib_physmem: Check for __has_warning before GCC version.. Months later, it was simplified in tuklib_physmem: Comment out support for Windows versions older than 2000..

liblzma: Highlight liblzma API headers should not be included directly.

OK.

Doxygen: Make Doxygen only produce liblzma API documentation by default.

It has quoting typos in the messages in configure.ac. These were fixed in Build: Avoid different quoting style in --enable-doxygen doc.. That code isn’t present anymore anyway.

Doxygen: Update Doxyfile.in from 1.4.7 to 1.8.17.

Likely OK. The Doxyfile was updated later again anyway.

liblzma: Set documentation on all reserved fields to private.

OK.

xz: Refactor duplicated check for custom suffix when using --format=raw
and
xz: Flip the return value of suffix_is_set to match the documentation.

The intent is good, fixing a very minor issue in a rarely-seen corner case. It had waited some time on a branch until I gave the permission to merge it. This introduced a regression that was fixed in 2023-11.

liblzma: Fix documentation in filter.h for lzma_str_to_filters()

The fix is good but commit message goes beyond overthinking. The doc simply was incorrect and that’s it.

2023-02

From
liblzma: Clarify a comment about LZMA_STR_NO_VALIDATION.
to
liblzma: Fix typos in comments in string_conversion.c.

OK. These are the first commits that update the API docs.

Tests: Create test_filter_str.c.

Reviewing the test revealed a bug that was fixed in liblzma: lzma_str_to_filters: Set *error_pos on all errors and in Tests: test_filter_str: Test *error_pos more thoroughly.

liblzma: Fix bug in lzma_str_from_filters() not checking filters[] length.

OK.

liblzma: Improve documentation for version.h.
and
Build: Adjust CMake version search regex.

OK.

xz: Add a comment clarifying the use of start_time in mytime.c.

OK.

From
xz: Improve the comment about start_time in mytime.c.
to
liblzma: Improve documentation for stream_flags.h

OK (documentation updates).

Some cleanups were made in liblzma: Very minor API doc tweaks..

From
liblzma: Rename field => member in documentation.
to
liblzma: Adjust spacing in doc headers in bcj.h.

OK (more documentation updates).

CMake: Add LZIP decoder test to list of tests.

OK.

From
liblzma: Improve documentation for container.h
to
liblzma: Adjust container.h for consistency with filter.h.

OK (even more documentation updates).

From
liblzma: Improve documentation in filter.h.
to
liblzma: Replace ' ' -> newline in filter.h documentation.

OK (documentation updates continue still).

The return value updates were good to do:

lzma_properties_encode should only return LZMA_OK or LZMA_PROG_ERROR. LZMA_OPTIONS_ERROR shouldn’t be possible although it was listed before as a possible value too.
lzma_filter_flags_decode can return also LZMA_DATA_ERROR.

Tests: Small tweak to test-vli.c.

OK.

2023-03

liblzma: Clarify lzma_lzma_preset() documentation in lzma12.h.

OK.

From
xz: Reorder cap_enter() to beginning of capsicum sandbox code.
to
xz: Simplify the error-label in Capsicum sandbox code.

This series has intermediate steps that I didn’t like. The end result is good though (the last two commits are by me).

This series is identical to the two following commits in v5.4:

xz: Don't fail if Capsicum is enabled but kernel doesn't support it.
and xz: Make Capsicum sandbox more strict with stdin and stdout.

Tests: Refactors existing lzma_index tests.

With the tweaks that were made later, this is OK enough.

liblzma: Defines masks for return values from lzma_index_checks().

I found and still find these useless additions to the API. He kept insisting on them so eventually I gave in. There’s nothing technically wrong, I just think they don’t improve readability.

Tests: Remove unused macros and functions.

tuktest.h had made these obsolete.

Doxygen: Refactor Doxyfile.in to doxygen/Doxyfile.

This is one of the supposedly suspicous commits due to the timezone. In reality Jia put it in my name by common agreement because I had done a significant portion of it.

Build: Install Doxygen docs and include in distribution if generated.
and
Build: Create doxygen/update-doxygen script.
and
Build: Generate doxygen documentation in autogen.sh.

OK. These will look different after the 5.6.1 release because pre-generated Doxygen output won’t be included in distribution tarballs anymore (not only to reduce the number of generated files but also to simplify the license questions of the source tarball).

liblzma: Add set lzma.h as the main page for Doxygen documentation.
and
Doc: Update PACKAGERS with details about liblzma API docs install.
and
COPYING: Add a note about the included Doxygen-generated HTML.
and
liblzma: Remove note from lzma_options_bcj about the ARM64 exception.

OK.

Doc: Rename Doxygen HTML doc directory name liblzma => api.

OK.

CMake: Allow configuring features as cache variables.

OK.

CMake: Conditionally build xz list.* files if decoders are enabled.

OK.

CMake: Bump maximum policy version to 3.26.

OK.

Build: Removes redundant check for LZMA1 filter support.

OK.

CMake: Only build xzdec if decoders are enabled.

OK.

CMake: Allows setting thread method.

OK.

CMake: Update liblzma-config.cmake generation.

OK.

2023-04

liblzma: Cleans up old commented out code.

OK.

tuklib_integer: Use __builtin_clz() with Clang.

This is broken. It was fixed in tuklib_integer.h: Fix a recent copypaste error in Clang detection..

Windows: Include <intrin.h> when needed.

It’s OK. The change to memcmplen.h was cleaned up in liblzma: Omit unnecessary parenthesis in a preprocessor directive..

2023-05

tuklib_integer.h: Changes two other UINT_MAX == UINT32_MAX to >=.
and
tuklib_integer.h: Reverts previous commit.

The error got to the master branch because my feedback in the chat came slightly too late. Thus the second commit reverted it. The commit message in the second one likely was supposed to say “if the integer size was not 32 bits”.

liblzma: Creates IS_ENC_DICT_SIZE_VALID() macro.

OK. Needed trivial white space cleanup: liblzma: Clean up white space.

liblzma: Exports lzma_mt_block_size() as an API function.

OK.

liblzma: Adds lzma_nothrow to MicroLZMA API functions.

OK.

liblzma: Slightly rewords lzma_str_list_filters() documentation.

OK.

2023-06

CMake: Protects against double find_package

This is from PR #51. The commit message seems slightly wrong as it speaks about ZLIB::ZLIB but otherwise everything is fine.

Add ifunc check to configure.ac
and
Add ifunc check to CMakeLists.txt
and
liblzma: Add ifunc implementation to crc64_fast.c.

Some genuine fixes were needed later but because ifunc support is now gone from the package, these commits aren’t relevant anymore.

liblzma: Add ifunc implementation to crc64_fast.c.
and
CMake: Rename CHECK_ATTR_IFUNC to ALLOW_ATTR_IFUNC.
and
Minor tweaks to style and comments.

These are my commits that Jia merged. Since ifunc support is now gone from the master branch, these commits aren’t relevant anymore.

liblzma: Prevent warning for MSYS2 Windows build.

It’s fine but I edited it later in liblzma: Tweak #if condition in memcmplen.h. and liblzma: Use 8-byte method in memcmplen.h on ARM64..

liblzma: Prevent uninitialzed warning in mt stream encoder.
and
liblzma: Remove non-portable empty initializer.

These together are good. In v5.4 these were squashed into liblzma: Prevent uninitialzed warning in mt stream encoder..

Tests: Fix memory leaks in test_block_header.

OK. An inline function could have been better than a macro.

Tests: Fix memory leaks in test_index.

OK.

2023-07

Tests: Improve feature testing for skipping.

OK. I think the CI setup with various build configs helped to spot this.

xz: Add --filters option to CLI.

OK.

xz: Update --long-help and man page for new --filters option.

OK.

Tests: Use new --filters option in test_compress.sh

OK.

--powerpc had been abbreviated to --power by accident in the old version but it still worked correctly because getopt_long accepts unambiguous abbreviations.

xz: Separate string to filter conversion into a helper function.

OK. str_to_filter was renamed to more correct plural str_to_filters two commits later.

This commit begins the series of commits that add the --filters1 … --filters9 options.

xz: Use lzma_filters_free() in forget_filter_chain().

OK.

xz: Create command line options for filters[1-9].

It’s OK as an intermediate step. The next commits improve things.

xz: Add a message if --block-list is used outside of xz compresssion.

OK.

xz: Free filters[] in debug mode.

OK.

xz: Do not include block splitting if encoders are disabled.

OK.

I restored the { } to one if statement. I like to keep the two if-statements in the nested form instead of using short-circuiting (&&) to make the function call conditional.

xz: Allows --block-list filters to scale down memory usage.

This does what it is supposed to do but it needed cleaning up:

filters_memusage_max() sets the static array filter_memusages but it’s better to make that array a local variable.
The second half is more complex than it needs to be. Splitting the dictionary size scaling into three steps makes it hard to read.
- orig_dict_size should be uint32_t, not uint64_t.
- The while (count > 0) loop and the need for the count variable are an odd way to do this. It adjusts the chains in parallel instead of handling one chain at a time.
- if (filt_mem_usage < memory_limit) should have had <=. In practice it didn’t matter though as the values are in bytes.

I edited this code quite a bit in 2024-05, including the fixes and cleanups for the things listed above. It might still be a bit heavy to read but it’s not due to the 2023 commits.

xz: Validate --flush-timeout for all specified filter chains.

OK. Using multiple filter chains with --flush-timeout isn’t a likely use case but this isn’t much extra code either.

xz: Set the Block size for mt encoding correctly.

This had some issues which were fixed in xz: Minor clean up for coder.c.

The last comment was odd, maybe a leftover from an earlier version. It was fixed in xz: Omit an incorrect comment.

However, it’s better to collect the required information in parse_block_list() because it avoids the need to loop through the opt_block_list array. I did that change in 2024-05, thus not a lot remains from this commit.

This is the last commit to the --filters1 … --filters9 series.

xz: Ignore filter chains that are set but never used in --block-list.

xz: Update --long-help for the new --filtersX option.

OK.

xz: Update the man page for --block-list and --filtersX

OK.

xz: Add a new --filters-help option.

OK.

xz: Update man page for new --filters-help option.

OK.

xz: Reorder robot mode subsections in the man page.

OK. As the commit message says, it only reorders lines. The following can be helpful:

git log -p f5dc172a402f^! | sed 's/^-/+/' | grep ^+ \
    | sort | uniq -u

xz: Slight reword in xz man page for consistency.

OK.

xz: Add a section to man page for robot mode --filters-help.

OK.

xz: Update man page Authors and date.

xz: Minor clean up for coder.c

OK.

xz: Fix typo in man page.

OK.

Tip	Never tell how many things you are going to list.

Docs: Add a new section to INSTALL for Tests.

OK. We had been asked how to cross-compile on one machine but then run the tests on the target machine, so it was good to document it.

liblzma: Improve comment in string_conversion.c.

OK.

liblzma: Reword lzma_str_list_filters() documentation.

OK.

liblzma: Suppress -Wunused-function warning.

OK but it’s gone from the master now.

Tests: Add ARM64 filter test to test_compress.sh.

OK.

Tests: Skip .lz files in test_files.sh if not configured.

OK. It makes test_files.sh use exit status 77 (skipped test) when the feature isn’t configured instead of exit status 0 (passed test).

liblzma: Prevent an empty translation unit in Windows builds.

OK.

CMake: Conditionally allow the creation of broken symlinks.

The special cases on Windows were researched and commented well but the end result is more complex than it needs to be. Instead of creating the symlinks in the build directory and then installing (copying) them, it’s simpler to create the symlinks at install time directly in the target directory. This was simplified in CMake: Simplify symlink creation and install translated man pages. which itself is a messy commit though as it did multiple things.

Update .gitignore.
and
Add newline to end of .gitignore.

OK.

2023-08

Tests: Style fixes to test_lzip_decoder.c.

OK.

mythread.h: Fix typo error in Vista threads mythread_once().

OK.

Build: Conditionally allow win95 threads and --enable-small.
and
CMake: Conditionally allow win95 threads and --enable-small.

OK.

liblzma: Add overflow check for Unpadded size in lzma_index_append().
and
liblzma: Update assert in vli_ceil4().
and
Tests: Improve invalid unpadded size check in test_lzma_index_append().

This was pondered for a while and determined that in practice the only problem was an assertion failure (but assertions rarely are enabled for production code). A proper fix was good to do still.

Tests: Improve comments in test_index.c.

The diff might appear misleading. The intent is to move the comment but diff shows it as a move of the lzma_index_end call.

2023-09

CMake: Fix unconditionally defining HAVE_CLOCK_MONOTONIC.

OK.

CMake: Fix time.h checks not running on second CMake run.

OK.

From
lib: Copy new header files from Gnulib without modification.
to
lib: Silence -Wsign-conversion in getopt.c.

These are OK, including getopt.m4 which was simplified a lot. xz doesn’t need support for all corner cases of getopt_long so checking for those corner cases isn’t needed either.

The GNU getopt_long isn’t used on GNU/Linux, *BSDs, Solaris, or MinGW-w64.

From
Docs: Change quoting style from `...' to '...'.
to
Scripts: Change quoting style from `...' to '...'.

This was an overdue change. It’s also done properly (I verified it back then as well). It missed one change around in CLOCK_MONOTONIC but that turned out to be mildly convenient as it avoided a simple merge conflict with Build: Check for clock_gettime() even if not using POSIX threads..

To verify these commits, the following can be helpful:

git diff ce162db07f03^..9fb5de41f2fb \
    | sed "/^-/{s/\`/'/g;s/^-/+/}" \
    | grep ^+ | sort | uniq -u

2023-10

Build: Update visibility.m4 from Gnulib.

It matches the file in Gnulib.

The x86 CRC32 CLMUL commits begin from liblzma: Rename crc_macros.h to crc_common.h. and there is a lot of code churn as the code was moved around more than once.

I made further changes that moved the code. Jia didn’t like them at first but he changed his mind in January 2024. I merged my changes on 2024-01-11, the big one being liblzma: Avoid extern lzma_crc32_clmul() and lzma_crc64_clmul()..

The final version is fine.

liblzma: Fix -fsanitize=address failure with crc_clmul functions.

This commit is a continuation to my commit liblzma: Mark crc64_clmul() with __attribute__((__no_sanitize_address__)). where I added no_sanitize_address the first time. It just was needed in more places with some compiler versions when the code was rearranged.

This attribute is obviously scary but it is unfortunately required with this version of the x86 SIMD code. The code makes aligned 16-byte reads which may read up to 15 bytes before the beginning or past the end of the buffer if the buffer is misaligned. The unneeded bytes are then ignored. It cannot cross page boundaries and thus cannot cause access violations.

The correctness is easy to review because memory is read only in a few places. If you do this, I suggest looking at the latest code in the master branch as that is the code that is actually in use now.

However, it also trips memory sanitizer which is a different thing than address sanitizer. Instead of adding another attribute to disable it, this code will be changed so that such attributes aren’t needed. Such a change won’t be backported to 5.4.x though because the current code does work correctly.

CMake: Add ALLOW_CLMUL_CRC option to enable/disable CLMUL.

OK.

Build: Remove check for COND_CHECK_CRC32 in check/Makefile.inc.

OK.

Build: Fix text wrapping in an output message.

OK.

2023-11

liblzma: Add missing comments to lz_encoder.h.

OK.

From
xz: Detect when all data will be written to standard out earlier.
to
xz: Move the check for --suffix with --format=raw a few lines earlier.

In v5.4 these were squashed into these commits:
xz: Fix suffix check.
and
Tests: Create test_suffix.sh.
and
Tests: Fix typo in a comment.

It’s simplest to review this as a whole. It was a bit hilarious series of commits in the master branch as I had spotted a regression that was introduced in xz: Refactor duplicated check for custom suffix when using --format=raw. We discussed it and gave ideas about the fix and then it turned out to be a bit wrong and needing another fix until he thought it’s better to finish it another day. Then the test script was added as well.

xz: Create separate is_tty() function.
and
xz: Use is_tty() in message.c.

These are good although I clarified a comment in the next commit. It fixed a very old bug in Windows build of xz.

Build: Change --enable-ifunc handling.
and
CMake: Change __attribute__((__ifunc__())) detection.

Ifunc detection was causing issues with musl. This seems fine but ifunc support was removed in liblzma: Remove ifunc support. so this doesn’t matter anymore anyway.

CMake: Use consistent indentation with check_c_source_compiles().

It really is just space change, no dots.

2023-12

From
Tests: Rename OSS-Fuzz files.
to
Tests: Add fuzz_encode_stream ossfuzz target.

These are from PR #73 and have been merged correctly with cleanups to commit messages and white space usage.

Tests: Minor cleanups to OSS-Fuzz files.

OK. Needed one minor comment fix.

Tests: Silence -Wsign-conversion warning on GCC version < 10.

OK.

Docs: Update repository URL in Changelog.

The URL is correct.

liblzma: Improve lzma encoder init function consistency.

OK. options definitely must not be NULL, and in practice it never is because it would be a bug in the application too.

liblzma: Make parameter names in function definition match declaration.

OK, just like the commit message says.

liblzma: Set all values in lzma_lz_encoder to NULL after allocation.

This together with the next commit liblzma: Initialize lzma_lz_encoder pointers with NULL. were squashed into liblzma: Set all values in lzma_lz_encoder to NULL after allocation. in the v5.4 branch for the 5.4.6 release. Together these commits only add two new lines to add missing initializers.

The bug isn’t as serious as the commit message makes it sound as no one has a reason to call lzma_filters_update on a LZMA1 encoder, a function that is very rarely used anyway.

xzdec: Add sandbox support for Pledge, Capsicum, and Landlock.

The order of the macros in the #if directive is different from src/xz/sandbox.h but the directive is correct still.

The Capsicum code is slightly simpler and stricter than in xz because xzdec needs to do less. Silently running without sandbox when kernel support is missing (ENOSYS) is an intentional feature and commonly accepted practice: without this, on such kernels xz wouldn’t run at all.

Build: Allow sandbox to be configured for just xzdec.

OK.

CMake: Move sandbox detection outside of xz section.

This moves code around and edits it a little. Comparing the moved sections shows it’s fine including the small edits. No suspicious typos (like mispelling SANDBOX_COMPILE_DEFINITION) are apparent.

2024-01

liblzma: Add RISC-V BCJ filter.

The riscv.c file in this commit was almost solely written by me and matched the file I had outside of the Git tree. Jia did some minor work on it like adding macros to avoid repeated code and a few comment improvements. Those I reviewed and edited further.

The rest of the commit was by Jia. Those changes are good.

liblzma: Update string_conversion.c to support RISC-V Filter.

OK.

Tests: Add two RISC-V Filter test files.

This is the first commit preparing for the backdoor.

xz: Update xz -lvv for RISC-V filter.

It’s very similar to the ARM64 code below the new code.

liblzma: RISC-V filter: Use byte-by-byte access.
and
xz: Man page: Add more examples of LZMA2 options with BCJ filters.

I did author these.

Tests: Skip RISC-V test files if decoder was not built.

This is fine although now that the backdoor has been removed, this commit alone doesn’t do anything. But I left it there so that it’s ready if proper files along with a generator program are added.

Tests: Use smaller dictionary size in RISC-V test files.

I had mentioned the dictionary size and that gave a good excuse to update something in the backdoor code.

CI: Use RISC-V filter when building with BCJ support.

OK.

From
Speed up CRC32 calculation on ARM64
to
liblzma: Update Authors list in crc32_arm64.h.

There is some churn in these commits so it’s simplest to review only the final outcome.

FreeBSD case lacked error checking which was fixed in liblzma: ARM64 CRC32: Add error checking to FreeBSD-specific code.

The check for the CRC32 table omission macro had an odd typo which was fixed in liblzma: ARM64 CRC: Fix omission of CRC32 table.

2024-02

liblzma: Fix build error if only RISC-V BCJ filter is enabled.

OK.

Tests: Add RISC-V filter support in a few places.

OK.

liblzma: Creates separate "safe" range decoder mode.
and
liblzma: Creates Non-resumable and Resumable modes for lzma_decoder.

These are good and were created at my request. They are big but it’s hard to split them into smaller pieces. The original versions of these are from 2023-04-24. I made small edits but it was agreed that I would commit these in his name.

Since these commits are complex, it’s understandable if they look scary. The new code does input buffer bounds checking only once per loop iteration, checking that there is at least 21 bytes available. At first this may look insufficient because, in the worst case, rc_normalize may be executed up to 48 times (22 bit model + 26 direct) per loop iteration. Each rc_normalize macro will read one input byte if needed. However, the condition to read a new byte cannot be true everytime. This has been verified both with math and experimentally. Comments in the code about this could be improved.

The small edits I made included fixing an off-by-one error with the loop condition. The commit originally had 20 but I changed it to 21 because the final normalization after the end of payload marker (EOPM) was included in the non-resumable path. The next commit by me (liblzma: LZMA decoder improvements.) made the non-resumable variant jump to the safe code (goto eopm) instead and thus 21 was changed to 20.

XZ Embedded has had somewhat comparable code with the per-loop 20-byte check since the beginning (2009) and it’s been in the Linux kernel since 2010.

As most of the core compression code in XZ Utils, the original idea for this detail comes from LZMA SDK. It’s just fairly different-looking code.

Update m4/.gitignore.

Modified build-to-host.m4 had the backdoor trigger. Ignoring the file here is correct because it is added among several other .m4 files when running ./autogen.sh or autoreconf -fi.

Build: Fix ARM64 CRC32 instruction feature test.

OK.

xz: Fix Capsicum sandbox compile error.

This is correct.

Build: Define HAVE_MICROLZMA when it is configured.
and
Tests: Add test_microlzma to .gitignore and CMakeLists.txt.
and
Tests: Replace HAVE_MICROLZMA usage in CMake and Autotools builds.

I didn’t like the extra macro that had been added solely for this test. The last commit does more than reverting a single commit but the change is OK.

Tests: Add MicroLZMA test.
and
Tests: Replace HAVE_MICROLZMA usage in CMake and Autotools builds.

This test assumes that the encoder output doesn’t change, that is, the test will fail if the encoder is modified too much. If the encoder is modified, updating the test to match isn’t much extra to do.

Tests: Add a few test files.

Backdoor files.

Note that tests/test_files.sh uses globs to pick the files. So just adding files means that a decompression test will be done with them.

xzmore: Fix typo in xzmore.1.
and
Translations: Patch man pages to avoid fuzzy matches.

We discussed this and I agreed to it. This way man page translations didn’t need to go via translators in the last days of the release rush to fix a typo in the English source text.

Bump version and soname for 5.7.0alpha.
and
Bump version and soname for 5.6.0.

The symbol name was supposed to be XZ_5.6 not XZ_5.6.0. It makes no difference in practice as it is merely a string.

Fix typos in NEWS and CMakeLists.

OK. He liked to run codespell while I didn’t.

Tests: Correct license header in test_microlzma.c.

He fixed it because I noticed it. Clearly the test file had been written some time ago already.

Tests: Add test_microlzma to .gitignore and CMakeLists.txt.

OK.

Build: Fix Linux Landlock feature test in Autotools and CMake builds.

This has a . at the beginning of a line in CMakeLists.txt that sabotages the Landlock detection. This was fixed in CMake: Fix sabotaged Landlock sandbox check..

xz: Add missing RISC-V on the filter list in the man page

The need to add this fix had been discussed.

xz: Change logging level for thread reduction to highest verbosity only.

This was discussed in length with multiple people. The commit matches what was decided.

2024-03

Build: Require attribute no_profile_instrument_function for ifunc usage.
and
liblzma: Use attribute no_profile_instrument_function with ifunc.

This is a real bug but the bug number in the commit message lacks one digit.

liblzma: Fix false Valgrind error report with GCC.

It’s one of the backdoor commits.

liblzma: Fix typos in crc32_fast.c and crc64_fast.c.
and
Tests: Update RISC-V test files.
and
Tests: Test --single-stream can decompress bad-3-corrupt_lzma2.xz.
and
Tests: Update two test files.

More backdoor commits.

Translations: Add missing --riscv option to man page translations.

This is a continuation for xz: Add missing RISC-V on the filter list in the man page. Almost always it’s not good to touch the translations without upstream approval. In this case I felt mixed as I wondered if the date translations would be correct but didn’t want to stop it so that the translation could be in the 5.6.1 release that was going to be made the same day.

When translations are sent to the translators, they will remake these changes anyway (they only pick the original English text from the tarball).

Docs: Simplify SECURITY.md.

I got the impression that a lot of speculation happened around this commit.

I had asked Jia to simplify the GitHub PR and Issue templates already, and simplifying SECURITY.md seemed reasonable too. People reporting such bugs don’t need detailed handholding.

I felt 21–30 days would be appropriate but Jia wanted to keep 90. With backdoor like “bugs”, fast full disclosure is the only correct way though.

Translations

In short, the translation files are OK.

Translations in xz.git

This section applies to the xz.git tags v5.6.1, v5.4.6, and v5.2.12. The master branch had the same .po files as v5.6.1.

The .po files for both program (in the po directory) and man page translations (po4a) are OK. There are differences to the files from the Translation Project but the reasons for the differences are clear.

For example, it has been common that white space changes are needed to fix alignment or line length problems. In most cases these have been discussed with the translators. However, such white space changes might not be in the Translation Project for a few reasons:

The translator replied via email that the proposed changes are OK but never uploaded the new file to the Translation Project.
The translator or the translation team couldn’t be reached. For simple white space changes it seemed better to include a modified file instead of omitting the the translation completely.

Translations in tarballs

This section applies to XZ Utils 5.2.12, 5.4.6, and 5.6.1 tarballs and the matching xz.git tags.

When creating a distribution tarball with make dist (in a generic Autotools-based project) or the make mydist wrapper target in XZ Utils:

The .po files get updated to match the current source code. (This can be disabled by setting DIST_DEPENDS_ON_UPDATE_PO = no in po/Makevars.)
From the updated .po files, matching .gmo binary files are generated and included in the tarball. This way the msgfmt tool isn’t needed when the package is built.

I updated the .po files in the matching xz.git tags using make -C po update-po and compared them against the .po files in the tarballs. There are no differences except POT-Creation-Date. Thus the .po files are OK.

Recreating the .gmo files from the .po files produced identical files as in the tarballs. To reproduce the .gmo files in 5.2.12 and 5.4.6 tarballs using GNU gettext 0.22 or later, the option --no-redundancy needs to be passed on the msgfmt command line.

Release tarballs

This section applies to the release tarballs of 5.2.12 and 5.4.3 to 5.4.6. These tarballs were created and signed by Jia Tan. These have been checked and they don’t contain malicious content.

Note	The tags `v5.2.11` and `v5.4.2` in the Git repository were signed by Jia Tan but the tarballs were created and signed by me.

With the following exceptions, the files in the Git repository match the tarballs:

.po files get updated as part of make mydist (or make dist)
ChangeLog is a generated file in the tarballs.

Each release is available in more than one compression format. The uncompressed .tar is the same for all compression formats for each release.

The file lists in the tarballs are good. For example, the same file doesn’t appear more than once.

The PDF files are hard to reproduce as they contain a timestamp and also depend on the version of the tools being used. However, the PDFs look normal and their file sizes are normal too (only differing by a few bytes).

There still are a few known harmless oddities in certain releases. They are listed in the following subsections.

xz-5.4.4.tar.zst reupload

The first xz-5.4.4.tar.zst on GitHub was clearly too big. It was compressed with the default compression level. I pointed this out and Jia recompressed it with zstd -19 and uploaded a new tarball and signature to GitHub. This is the reason why its signature timestamp differs significantly from the other three tarball signatures.

This reupload was done before sending the announcement email to the xz-announce mailing list. Thus it’s possible that no one downloaded the first file or its signature.

5.4.4 ChangeLog

The newest two entries in ChangeLog have commit IDs that don’t match the v5.4.4 tag in the v5.4 branch in the Git repository.

Jia had created the last two commits in a different directory and then used patch files (possibly with git am) to apply them to the v5.4 branch that was pushed to GitHub. However, the v5.4.4 tag was pushed from the other tree, matching the commit ID that is in the ChangeLog. Thus, the HEAD of v5.4 and the v5.4.4 tag had the same content but the tag wasn’t attached to any branch in the repository on GitHub. Possible fixes:

Force push the commit that was tagged into v5.4. This would replace the latest two commits. The contents of ChangeLog would then also match the tag.
Remove the v5.4.4 tag and recreate it on the commit that was already in v5.4.

I thought that someone might have already gotten the latest v5.4 contents and a force push would then look bad. It felt likely that no one had gotten the tag because it wasn’t attached to any branch; even I didn’t have it locally as git fetch didn’t download it.

Thus the latter option above was chosen even though it meant that the two entries in the ChangeLog would have wrong commit IDs. This also explains why the timestamp of the signature of the final v5.4.4 tag is newer than any of the release tarballs. Requiring Jia to recreate the tarballs didn’t feel worth it.

5.4.5 tar metadata

The command make dist is the standard way to create a distribution tarball when using Autotools (specifically Automake). This uses the current UID, GID, and permissions of the files. I don’t like this. XZ Utils has its own make mydist which, among a few other things, sets the environment variable TAR_OPTIONS from which GNU tar will read extra command line options:

TAR_OPTIONS='--owner=0 --group=0 --numeric-owner --mode=u+rw,go+r-w'

These options clean up the UID, GID, and permissions.

The 5.4.5 tarballs have wrong metadata: UID and GID aren’t zero and file permissions aren’t consistently 0644 or 0755. I noticed this before the release was made official. The simplest explanation would have been that the tarball was created with make dist instead of make mydist. However, Jia didn’t admit that it was such a simple error even though it would have been the easiest way to make me stop asking further questions. But because the error itself was harmless and the tarball had nothing else wrong when I compared to my locally-created one using diff -Narup, I allowed the release to go forward.