The .xz file format
The .xz file format is a container format for compressed streams. There are no archiving capabilities, that is, the .xz format can hold only a single file just like the .gz and .bz2 file formats used by gzip and bzip2, respectively.
Compared to a few other popular stream compression formats, the .xz format provides a couple of advanced features. At the same time, it has been kept simple enough to be usable in many embedded systems. Here is a summary of the features:
- Streamable: It is always possible to create and decompress .xz files in a pipe; no seeking is required.
- Random-access reading: The data can be split into independently compressed blocks. Every .xz file contains an index of the blocks, which makes limited random-access reading possible when the block size is small enough.
- Multiple filters (algorithms): It is possible to add support for new filters, so no new file format is needed every time a new algorithm has been developed. Developers can use a developer-specific filter ID space for experimental filters.
- Filter chaining: Up to four filters can be chained, which is very similar to piping on the UN*X command line. Chaining can improve compression ratio with some file types. Different filter chain can be used for every independently compressed block.
- Integrity checks: Integrity of all headers is always protected with CRC32. The integrity of the actual data may be verfied with CRC32, CRC64, SHA-256, or the check may be omitted completely. It is possible to add new integrity checks in future, but there is no possibility for developer-specific check IDs like there is for filter IDs.
- Concatenation: Just like with .gz and .bz2 files, it is possible to concatenate .xz files as is. The decompressor can decompress a concatenated file as if it was a regular single-stream .xz file.
- Padding: Binary zeros may be appended to .xz files to pad them to fill e.g. a block on a backup tape. The padding needs to be multiple of four bytes, because the size of every valid .xz file is a multiple of four bytes.
Once a new filter or integrity check has been added to the .xz file format specification, it won't be removed. This is to ensure that all .xz files, that use only the filters defined in the .xz file format specification, can always be decompressed in future.
New filters, integrity checks, or other additions to the .xz file format are unlikely to occur very often. Useless bloat can be avoided when new filters are added to the official list only when the new filters are clearly useful.
The official .xz file format specification
The latest version of the official .xz file format specification is available in plain text form at http://tukaani.org/xz/xz-file-format.txt.
Specific versions of the specification:
At least the following software support the .xz file format: