A new OSS compression: XZ

XZ compression has actually been out for a little while, but it just recently began to build traction in Linux distributions.

It has been adopted by default in both Slackware and Arch Linux.

It’s basically an evolution of LZMA, which is the popular compression algorithm in 7-zip

As of version 1.22 of Gnu Tar, Short option -J is reassigned as a shortcut for –xz, meaning that instead of a usual tar czvf, you’d do replace the z (for gzip) to J for xz.

Here are a few benchmarks I lifted from the Arch Linux mailing list.

Did some testing with openoffice-base 3.2.0-1-x86_64.tar:

XZ allows choosing the level of compression, between 1-9 (1 being the least amount of compression, 9 being the most, and 6 being the default)

compression speed:

gzip: 0m28.945s

bzip2: 1m21.876s

xz -1: 0m49.244s

xz -2: 1m18.444s

xz -3: 3m34.208s

xz -6: 4m41.148s

decompression speed:

gzip: 0m 5.772s

bzip2: 0m29.433s

xz -1: 0m13.983s

xz -2: 0m12.949s

xz -3: 0m12.706s

xz -6: 0m11.462s

Interesting, right? Obviously, the more you compress a file, the longer it takes, but the interesting part is the decompression speed. Decompression gets faster with higher compression ratio! With’ xz -6′ you only need to read and process 124MB, with ‘xz -1’ you have to read 150MB. The decompression algorithm is the same for both ratios, only change is the archive size and the dictionary used. The downside is that the higher the ratio, the bigger the dictionary becomes and the more memory you’ll need for decompression.

Here are some more benchmarks comparing file size (using the default ‘xz -6’) (lifted from the Arch Linux forums)

The Kernel compressed extremely well, to 27.%5 of its original size). It may not be worth it for many applications though, as it takes over 3x as long to compress, vs gzips 35%.

86M kernel26-2.6.29.3-1-i686.pkg.tar

30M kernel26-2.6.29.3-1-i686.pkg.tar.gz

22M kernel26-2.6.29.3-1-i686.pkg.tar.xz

287M wesnoth-1.6.1-1-i686.pkg.tar

220M wesnoth-1.6.1-1-i686.pkg.tar.gz

202M wesnoth-1.6.1-1-i686.pkg.tar.xz

conclusion:

xz (in default configuration) takes 3-4 times longer (vs gzip) to compress for an extra 10-15% compression ratio.  It also decompresses at half the speed.

So for the average user, this won’t be of huge interest. If you want the best combination of size and speed, gzip is still king.

The real benefit here is for website mirrors and people who value size more than speed. Imagine hosting a file for millions of people to download (like being a mirror for Firefox, the Kernel or OpenOffice.org etc..), shaving 10% off your entire bandwidth can be huge. For this reason, Arch Linux and Slackware have switched their repositories to xz. If you use bzip2, it’s certainly worth switching! For the average user, however, it probably is not worth it.