The .xz file format
The .xz file format is a container format for compressed streams. There are no archiving capabilities, that is, the .xz format can hold only a single file just like the .gz and .bz2 file formats used by gzip and bzip2, respectively.
Compared to a few other popular stream compression formats, the .xz format provides a few advanced features. At the same time, it has been kept simple enough to be usable in many embedded systems.
Features
- Streamable: It is always possible to create and decompress .xz files in a pipe; no seeking is required.
- Random-access reading: The data can be split into independently compressed blocks. Every .xz file contains an index of the blocks, which makes limited random-access reading possible when the block size is small enough.
- Multiple filters (compression methods): It is possible to add support for new filters, so no new file format is needed every time a new algorithm has been developed. Developers can usea developer-specific filter ID space for experimental filters.
- Filter chaining: Up to four filters can be chained, which is very similar to piping on the UN*X command line. Chaining can improve compression ratio with some file types. Different filter chains can be used for every independently compressed block.
- Integrity checks: Metadata is always protected with CRC32. The integrity of the actual data may be verified with CRC32, CRC64, SHA-256, or the check may be omitted completely. It is possible to add new integrity checks in future, but there is no possibility for developer-specific check IDs like there is for filter IDs.
- Concatenation: Just like with .gz and .bz2 files, .xz files can be concatenated. A decompressor can handle a concatenated file as if it was a regular single-stream .xz file.
- Padding: Binary zeros may be appended to .xz files to pad them to fill e.g. a block on a backup tape. Padding must be a multiple of four bytes because the size of a valid .xz file is always a multiple of four bytes.
Once a new filter or integrity check has been added to the .xz file format specification, it won't be removed. This ensures that all .xz files that use only the filters defined in the .xz file format specification can always be decompressed in future.
New filters, integrity checks, or other additions to the .xz file format are unlikely to occur very often. New filters are only added to the official list when they are clearly useful.
Currently, the only compression algorithm supported by the .xz file format is LZMA2. The other existing filters do not change the size of the data, but instead can increase the redundency of the stream. LZMA2 is a container format that improves upon LZMA in the following ways:
- Data chunks can be stored uncompressed. This improves compression on streams with incompressible parts and limits the possible inflation of the stream when none of the data is compressible.
- Dictionary can be reset between data chunks for multithreaded encoding and decoding.
- The LZMA properties can be reset between data chunks to match different types of data being compressed.