blog.awill.me

blog.awill.me

24 Apr 2010

A new OSS compression: XZ

XZ compression has actually been out for a little while, but it just recently began to build traction in Linux distributions.

It has been adopted by default in both Slackware and Arch Linux.

It’s basically an evolution of LZMA, which is the popular compression algorithm in 7-zip

As of version 1.22 of Gnu Tar, Short option -J is reassigned as a shortcut for xz, meaning that instead of a usual tar czvf, you’d do replace the z (for gzip) to J for xz.

Here are a few benchmarks I lifted from the Arch Linux mailing list.

Did some testing with openoffice-base 3.2.0-1-x86_64.tar:

XZ allows choosing the level of compression, between 1-9 (1 being the least amount of compression, 9 being the most, and 6 being the default)

compression speed:

gzip: 0m28.945s
  
bzip2: 1m21.876s
  
xz -1: 0m49.244s
  
xz -2: 1m18.444s
  
xz -3: 3m34.208s
  
xz -6: 4m41.148s

decompression speed:
  
gzip: 0m 5.772s
  
bzip2: 0m29.433s
  
xz -1: 0m13.983s
  
xz -2: 0m12.949s
  
xz -3: 0m12.706s
  
xz -6: 0m11.462s

Interesting, right? Obviously, the more you compress a file, the longer it takes, but the interesting part is the decompression speed. Decompression gets faster with higher compression ratio! With' xz -6 you only need to read and process 124MB, with xz -1 you have to read 150MB. The decompression algorithm is the same for both ratios, only change is the archive size and the dictionary used. The downside is that the higher the ratio, the bigger the dictionary becomes and the more memory you’ll need for decompression.

Here are some more benchmarks comparing file size (using the default xz -6) (lifted from the Arch Linux forums)

The Kernel compressed extremely well, to 27.%5 of its original size). It may not be worth it for many applications though, as it takes over 3x as long to compress, vs gzips 35%.

86M kernel26-2.6.29.3-1-i686.pkg.tar
  
30M kernel26-2.6.29.3-1-i686.pkg.tar.gz
  
22M kernel26-2.6.29.3-1-i686.pkg.tar.xz

287M wesnoth-1.6.1-1-i686.pkg.tar
  
220M wesnoth-1.6.1-1-i686.pkg.tar.gz
  
202M wesnoth-1.6.1-1-i686.pkg.tar.xz

conclusion:

xz (in default configuration) takes 3-4 times longer (vs gzip) to compress for an extra 10-15% compression ratio.  It also decompresses at half the speed.

So for the average user, this won’t be of huge interest. If you want the best combination of size and speed, gzip is still king.

The real benefit here is for website mirrors and people who value size more than speed. Imagine hosting a file for millions of people to download (like being a mirror for Firefox, the Kernel or OpenOffice.org etc..), shaving 10% off your entire bandwidth can be huge. For this reason, Arch Linux and Slackware have switched their repositories to xz. If you use bzip2, it’s certainly worth switching! For the average user, however, it probably is not worth it.

Categories