How the test files were selected

I was especially interested how well LZMA compression would fit in

  • binary package management of GNU/*/Linux distributions

  • distributing source code of free software

In both uses the files are compressed on one computer and decompressed many times by users around the world. In practice the most important factors are:

  • compressed size (faster to download; more packages fit into one CD or DVD)

  • time required in decompression (fast installation is nice)

  • memory requirements for decompression (should the user want to use the file on e.g. an old i486 with 8 MB RAM)

  • common format that everyone knows how to uncompress/install

Less important:

  • time wasted for compressing in the package build process; compiling software usually takes several minutes or even hours so spending one or two minutes to compress the package tightly increases the build time only little.

  • memory requirements for the compressing process; few people build packages on i486 or i586 class machines having 16 to 64 megs of RAM. However, no one wants to use a tool that needs hundreds of megabytes or even gigabytes of RAM to achieve good results.

Despite the many common factors, the contents of binary packages and source tarballs are quite different. Binary packages primarily contain executables and libraries while source tarballs contain mostly ASCII text of some programming language. Naturally both contain data files used by the program and (hopefully) some documentation.

Test conditions

Tests were run on a laptop:

  • AMD mobile Athlon XP2400+

  • 512 MB RAM

  • Linux 2.6.12-rc4 (preempt, 4k stacks, regparm)

  • gzip 1.3.3, bzip2 1.0.3, LZMA SDK 4.17

bzip2 has two compression modes, one for normal use and another designed for small memory footprint (which can be invoked with bzip2 --small). Only the normal mode was tested because it’s faster.

Times are from the output of the command time (line user) and rounded. Because of this, the compression and decompression time and speed tables should be taken as suggestive and not as the absolute truth. In practice, the bigger test files should be more reliable in terms of speed comparison.

When reading the tables, it is important to keep in mind which settings are the default in each program:

  • gzip -6 (best speed/filesize ratio)

  • bzip -9 (best compression ratio)

  • lzmash -7 (excellent compression ratio and reasonable memory requirements)

  • lzmash -e (the extreme mode) is only for reference in case someone wants to see how it affects the compression ratio.

Added in 2024:

lzmash was a wrapper script that provided gzip-like interface for the not-gzip-like lzma tool from LZMA SDK. lzmash preceded the gzip-like lzma tool written in C++.

The tables of the test results

Note: The first column with numbers 1…​9 indicates the compression setting passed to gzip, bzip2 and lzmash (e.g. gzip -9).

Tarball made from a full installation of OpenOffice.org 1.1.4 for Linux

Uncompressed size: 212664320 bytes (203 MB)

Compressed file size in bytes
        gzip            bzip2           lzmash          lzmash -e
1       86322815        76147880        67456213        -
2       84858575        74320824        62085798        -
3       83561997        73467586        59547691        59278372
4       81312776        73044026        58245872        57964166
5       79798262        72762041        56694215        56411631
6       79179298        72540199        56182079        55859514
7       78995264        72512833        55535273        55269226
8       78816280        72314472        54678948        54405078
9       78768334        72223858        54068819        53769958

Compressed size / Uncompressed size * 100%
        gzip            bzip2           lzmash          lzmash -e
1       40,6%           35,8%           31,7%           -
2       39,9%           34,9%           29,2%           -
3       39,3%           34,5%           28,0%           27,9%
4       38,2%           34,3%           27,4%           27,3%
5       37,5%           34,2%           26,7%           26,5%
6       37,2%           34,1%           26,4%           26,3%
7       37,1%           34,1%           26,1%           26,0%
8       37,1%           34,0%           25,7%           25,6%
9       37,0%           34,0%           25,4%           25,3%

Compression time
        gzip            bzip2           lzmash          lzmash -e
1        11.5s          1m 26s           0m 58s         -
2        12.0s          1m 40s           2m  7s         -
3        13.7s          1m 54s           4m 58s          7m 37s
4        15.1s          2m  5s           5m 26s          8m  2s
5        18.4s          2m 11s           6m 47s         11m 18s
6        24.5s          2m 18s           7m 30s         12m  4s
7        29.4s          2m 25s           8m 24s         12m 59s
8        45.5s          2m 32s          10m 59s         20m 17s
9        66.9s          2m 37s          12m 20s         21m 56s

Decompression time
        gzip            bzip2           lzmash          lzmash -e
1       3.3s            16.5s           11.3s           -
2       3.3s            24.2s           10.5s           -
3       3.3s            29.2s           10.5s           10.4s
4       3.3s            32.1s           10.4s           10.3s
5       3.2s            34.2s           10.2s           10.2s
6       3.2s            35.4s           10.2s           10.1s
7       3.2s            36.5s           10.1s           10.0s
8       3.2s            37.5s           10.0s            9.9s
9       3.1s            38.2s           10.0s            9.9s

Compression speed, MB/s of uncompressed data (1 MB = 1024 * 1024 bytes)
        gzip            bzip2           lzmash          lzmash -e
1       18              2.4             3.5             -
2       17              2.0             1.6             -
3       15              1.8             0.68            0.44
4       13              1.6             0.62            0.42
5       11              1.5             0.50            0.30
6        8.3            1.5             0.45            0.28
7        6.9            1.4             0.40            0.26
8        4.5            1.3             0.31            0.17
9        3.0            1.3             0.27            0.15

Decompression speed, MB/s of uncompressed data (1 MB = 1024 * 1024 bytes)
        gzip            bzip2           lzmash          lzmash -e
1       61              12              18              -
2       61               8.4            19              -
3       61               6.9            19              20
4       61               6.3            20              20
5       63               5.9            20              20
6       63               5.7            20              20
7       63               5.6            20              20
8       63               5.4            20              20
9       65               5.3            20              20

The Linux kernel 2.6.11.0 source tarball

Uncompressed size: 208250880 bytes (199 MB)

Compressed file size in bytes
        gzip            bzip2           lzmash          lzmash -e
1       57860603        43873922        43933138        -
2       55274813        41108704        38871392        -
3       53416918        39791569        34863499        34823465
4       49695438        39040694        33545762        33513509
5       47775348        38395197        32481024        32445716
6       47004031        37975094        31686173        31661947
7       46797152        37676593        30881464        30841602
8       46578138        37365408        30295730        30261027
9       46578138        37075679        29809336        29780803

Compressed size / Uncompressed size * 100%
        gzip            bzip2
1       27,8%           21,1%           21,1%           -
2       26,5%           19,7%           18,7%           -
3       25,7%           19,1%           16,7%           16,7%
4       23,9%           18,7%           16,1%           16,1%
5       22,9%           18,4%           15,6%           15,6%
6       22,6%           18,2%           15,2%           15,2%
7       22,5%           18,1%           14,8%           14,8%
8       22,4%           17,9%           14,5%           14,5%
9       22,4%           17,8%           14,3%           14,3%

Compression time
        gzip            bzip2           lzmash          lzmash -e
1        8.3s           1m  9s           0m 45s         -
2        8.7s           1m 22s           1m 45s         -
3        9.8s           1m 34s           5m 10s          8m 43s
4       11.1s           1m 45s           5m 43s          9m 41s
5       13.8s           1m 57s           7m 39s         14m 38s
6       17.8s           2m  2s           8m 23s         15m 32s
7       20.7s           2m 11s           9m 11s         16m 23s
8       29.7s           2m 21s          11m 34s         24m 47s
9       40.9s           2m 26s          12m 31s         25m 53s

Decompression time
        gzip            bzip2           lzmash          lzmash -e
1       2.8s            12.8s           7.7s            -
2       2.7s            19.4s           6.9s            -
3       2.6s            23.8s           6.4s            6.6s
4       2.5s            26.4s           6.3s            6.3s
5       2.5s            28.3s           6.3s            6.3s
6       2.4s            29.6s           6.2s            6.3s
7       2.4s            30.6s           6.2s            6.2s
8       2.4s            31.3s           6.1s            6.1s
9       2.4s            32.1s           6.1s            6.1s

Compression speed, MB/s of uncompressed data (1 MB = 1024 * 1024 bytes)
        gzip            bzip2           lzmash          lzmash -e
1       24              2.9             4.4             -
2       23              2.4             1.9             -
3       20              2.1             0.64            0.38
4       18              1.9             0.58            0.34
5       14              1.7             0.43            0.23
6       11              1.6             0.39            0.21
7        9.6            1.5             0.36            0.20
8        6.7            1.4             0.29            0.13
9        4.9            1.4             0.26            0.13

Decompression speed, MB/s of uncompressed data (1 MB = 1024 * 1024 bytes)
        gzip            bzip2           lzmash          lzmash -e
1       71              16              26
2       74              10              29
3       76               8.3            31              30
4       79               7.5            32              32
5       79               7.0            32              32
6       83               6.7            32              32
7       83               6.5            32              32
8       83               6.3            33              33
9       83               6.2            33              33

In this test bzip2 is a tough adversary to lzmash in fast modes. lzmash -e makes a few kB smaller files with the expense of a lot longer compression time.

XMMS 1.2.10 binary package

XMMS 1.2.10 binary package (xmms-1.2.10-i486-2.tgz) from Slackware 10.1. The file was first gunzipped, resulting uncompressed size of 5498880 bytes (5.2 MB).

Compressed file size in bytes
        gzip            bzip2           lzmash          lzmash -e
1       2160102         1803573         1431699         -
2       2112332         1611408         1140030         -
3       2072044         1539083         1034903         1038615
4       2031519         1487237         1004176         1007692
5       1992713         1464332          987189          988758
6       1979068         1433617          983305          983198
7       1973404         1431276          982125          983240
8       1972424         1414142          980836          983582
9       1970643         1385112          980836          983582

Compressed size / Uncompressed size * 100%
        gzip            bzip2           lzmash          lzmash -e
1       39,3%           32,8%           26,0%           -
2       38,4%           29,3%           20,7%           -
3       37,7%           28,0%           18,8%           18,9%
4       36,9%           27,0%           18,3%           18,3%
5       36,2%           26,6%           18,0%           18,0%
6       36,0%           26,1%           17,9%           17,9%
7       35,9%           26,0%           17,9%           17,9%
8       35,9%           25,7%           17,8%           17,9%
9       35,8%           25,2%           17,8%           17,9%

Compression time
        gzip            bzip2           lzmash          lzmash -e
1       0.3s            2.4s             1.4s           -
2       0.3s            2.9s             2.7s           -
3       0.4s            3.2s             6.2s            8.9s
4       0.4s            3.3s             6.6s            9.3s
5       0.5s            4.6s             8.2s           13.3s
6       0.7s            5.6s             8.5s           13.7s
7       0.8s            4.7s             8.6s           13.6s
8       1.1s            4.9s            10.5s           21.5s
9       1.8s            5.1s            10.5s           21.5s

Decompression time
        gzip            bzip2           lzmash          lzmash -e
1       0.1s            0.4s            0.3s            -
2       0.1s            0.6s            0.2s            -
3       0.1s            0.7s            0.2s            0.2s
4       0.1s            0.8s            0.2s            0.2s
5       0.1s            0.9s            0.2s            0.2s
6       0.1s            0.9s            0.2s            0.2s
7       0.1s            0.9s            0.2s            0.2s
8       0.1s            1.0s            0.2s            0.2s
9       0.1s            1.0s            0.2s            0.2s

For some reason, bzip2 -6 took more time than even bzip -9. The result didn’t change when the test was repeated. The extreme mode of lzmash creates a few bytes bigger files; seems that using lzmash -e makes compression both slower and less efficient with smaller files. Speed tables are omitted because the smaller test file makes measuring the elapsed time with 'time' command too inaccurate.

XMMS 1.2.10 source tarball

Uncompressed size: 15964160 bytes (15.2 MB)

Compressed file size in bytes
        gzip            bzip2           lzmash          lzmash -e
1       4705710         3702465         3390291         -
2       4560441         3172615         2117511         -
3       4460478         2914692         1921894         1929077
4       4213705         2748562         1803104         1808532
5       4095300         2670185         1721301         1723689
6       4060060         2591439         1642013         1643645
7       4046707         2500735         1540827         1541735
8       4035433         2464688         1533283         1531514
9       4034855         2418265         1533283         1531514

Compressed size / Uncompressed size * 100%
        gzip            bzip2           lzmash          lzmash -e
1       29,5%           23,2%           21,2%           -
2       28,6%           19,9%           13,3%           -
3       27,9%           18,3%           12,0%           12,1%
4       26,4%           17,2%           11,3%           11,3%
5       25,7%           16,7%           10,8%           10,8%
6       25,4%           16,2%           10,3%           10,3%
7       25,3%           15,7%            9,7%            9,7%
8       25,3%           15,4%            9,6%            9,6%
9       25,3%           15,1%            9,6%            9,6%

Compression time
        gzip            bzip2           lzmash          lzmash -e
1       0.7s             6.1s            3.5s           -
2       0.7s             7.3s            6.0s           -
3       0.8s             8.5s           19.0s            30.8s
4       0.9s             9.9s           19.9s            31.2s
5       1.1s            11.2s           28.9s           1m  1s
6       1.4s            11.0s           30.1s           1m  2s
7       1.7s            12.5s           30.9s           1m  4s
8       2.5s            15.9s           41.7s           1m 56s
9       2.9s            17.5s           41.7s           1m 56s

Decompression time
        gzip            bzip2           lzmash          lzmash -e
1       0.2s            1.0s            0.6s            -
2       0.2s            1.5s            0.4s            -
3       0.2s            1.9s            0.4s            0.4s
4       0.2s            2.1s            0.4s            0.4s
5       0.2s            2.3s            0.4s            0.4s
6       0.2s            2.5s            0.4s            0.4s
7       0.2s            2.6s            0.4s            0.4s
8       0.2s            2.7s            0.4s            0.4s
9       0.2s            2.8s            0.4s            0.4s

For some reason, in compression bzip2 -6 was a little faster than bzip -5 but bzip -6 still created smaller file. Speed tables are omitted because the smaller test file makes measuring the elapsed time with time command too inaccurate.

Memory requirements

The memory requirements depend only on the used compression mode (-1 …​ -9). bzip2 has also a mode that uses less memory but is slower. This small memory mode hasn’t been tested.

RAM usage on compression
        gzip            bzip2           lzmash          lzmash -e
1       <1 MB           2 MB              2 MB           -
2       <1 MB           2 MB             12 MB           -
3       <1 MB           3 MB             12 MB           12 MB
4       <1 MB           4 MB             16 MB           16 MB
5       <1 MB           5 MB             26 MB           26 MB
6       <1 MB           5 MB             45 MB           45 MB
7       <1 MB           6 MB             83 MB           83 MB
8       <1 MB           7 MB            159 MB          159 MB
9       <1 MB           7 MB            311 MB          311 MB

RAM usage on decompression
        gzip            bzip2           lzmash          lzmash -e
1       <1 MB           1 MB             1 MB            -
2       <1 MB           2 MB             2 MB            -
3       <1 MB           2 MB             1 MB            1 MB
4       <1 MB           2 MB             2 MB            2 MB
5       <1 MB           3 MB             3 MB            3 MB
6       <1 MB           3 MB             5 MB            5 MB
7       <1 MB           3 MB             9 MB            9 MB
8       <1 MB           4 MB            17 MB           17 MB
9       <1 MB           4 MB            33 MB           33 MB

Conclusions

Compression

When there’s need for a very fast compression, gzip is the clear winner. It has also very small memory footprint, making it ideal for systems with limited memory.

bzip2 creates about 15% smaller files than gzip. bzip2 compresses somewhat slower than gzip, but seems that it hasn’t prevented bzip2 from getting popular. Nowadays most source code is available as both gzip and bzip2 compressed tar archives.

lzmash -3 and lzmash -4 seem to be almost as fast (or slow); same can be said for lzmash -5, lzmash -6 and lzmash -7. However the memory requirements increase with every option meaning that lzmash -3, lzmash -5 and lzmash -6 are usually useful only if you (or the recipient) do not have enough memory for lzmash -4 or lzmash -7.

lzmash -8 and lzmash -9 require lots of memory and are practical only on newer computers; the files compressed with them are probably a pain to decompress on systems with less than 32 MB or 64 MB of memory.

The extreme mode (lzmash -e) roughly doubles the compression time, but especially with small files can lead to even worse compression ratio than normal the mode. The extereme mode might be worth trying if you want make as small files as possible, but in that case forgetting lzmash wrapper script and playing with command line options of lzma directly can lead to better results.

Decompression

In terms of speed, gzip is the winner again. lzma comes right behind it two to three times slower than gzip. bzip2 is a lot slower taking usually two to six times more time than lzma, that is, four to twelve times more than gzip. One interesting thing is that gzip and lzma decompress the faster the smaller the compressed size is, while bzip2 gets slower when the compression ratio gets better.

The memory usage of lzma stays competitive with bzip2 when files have been compressed with lzmash -6 or with a smaller option. The files compressed with the default lzmash -7 can still be decompressed, even on machines with only 16 MB of RAM, but sometimes you don’t have even that much memory available. If you compress with lzmash -8 or lzmash -9, you should think if the users need to be able to decompress your files also on “ancient” computers.

So what is the best?

Of course, it depends on the intended application. gzip is very fast and has small memory footprint. According to this benchmark, neither bzip2 nor lzma can compete with gzip in terms of speed or memory usage. bzip2 has notably better compression ratio than gzip, which has to be the reason for the popularity of bzip2; it is slower than gzip especially in decompression and uses more memory. However the memory requirements of bzip2 should be nowadays no problem even on older hardware.

Both gzip and bzip2 are bundled with practically all GNU/*/Linux distributions and *BSDs. Because everybody has the tools to handle gzip and bzip2 compressed files, they are by far the most commonly used formats to distribute e.g. source code of free software. However, the situation might change because better free (as in freedom) alternatives have become available.

LZMA clearly has potential to become the third commonly used general purporse compression format on *NIX systems. It mainly competes with bzip2 by offering significantly better compression ratio while still keeping decompressing speed relatively close to that of gzip. Its excellence has been already seen in Tukaani Linux package management system, and in software installers such as Nullsoft Scriptable Install System (NSIS), Inno setup and installers of MS-Windows versions of Mozilla products, including Firefox and Thunderbird.