bzz - Man Page

DjVu general purpose compression utility.

Synopsis

Encoding

bzz -e[blocksize] inputfile outputfile

Decoding

bzz -d inputfile outputfile

Description

The first form of the command line (option -e) compresses the data from file inputfile and writes the compressed data into outputfile. The second form of the command line (option -d) decompressed file inputfile and writes the output to outputfile.

Options

-d

Decoding mode.

-e[blocksize]

Encoding mode. The optional argument blocksize specifies the size of the input file blocks processed by the Burrows-Wheeler transform expressed in kilobytes.  The default block sizes is 2048 KB. The maximal block size is 4096 KB. Specifying a larger block size usually produces higher compression ratios and increases the memory requirements of both the encoder and decoder. It is useless to specify a block size that is larger than the input file.

Algorithms

The Burrows-Wheeler transform is performed using a combination of the Karp-Miller-Rosenberg and the Bentley-Sedgewick algorithms. This is comparable to (Sadakane, DCC 98) with a slightly more flexible ranking scheme. Symbols are then ordered according to a running estimate of their occurrence frequencies.  The symbol ranks are then coded using a simple fixed tree and the ZP binary adaptive coder (Bottou, DCC 98).

The Burrows-Wheeler transform is also used in the well known compressor bzip2. The originality of bzz is the use of the ZP adaptive coder. The adaptation noise can cost up to 5 percent in file size, but this penalty is usually offset by the benefits of adaptation.

Performance

The following table shows comparative results (in bits per character)  on the Canterbury Corpus ( http://corpus.canterbury.ac.nz ). The very good bzz performance on the spreadsheet file excl puts the weighted average ahead of much more sophisticated compressors such as fsmx.

Compression performance
textfaxcsrcexclsprctechpoemhtmllispmanplayWeightedAverage
compress 3.270.973.562.414.213.063.383.683.904.433.512.553.31
gzip -9 2.850.822.241.632.672.713.232.592.653.313.122.082.53
bzip2 -9 2.270.782.181.012.702.022.422.482.793.332.531.542.23
ppmd 2.310.992.111.082.682.192.482.382.433.002.531.652.20
fsmx 2.100.791.891.482.521.842.212.242.292.912.351.632.06
bzz 2.250.762.130.782.672.002.402.522.603.192.521.442.16

Note that DjVu contributors have several  entries in this table.  Program compress was written some time ago by Joe Orost. Program ppmd is an improvement of the PPM-C method invented by Paul Howard.

Credits

Program bzz was written by Léon Bottou <leonb@users.sourceforge.net> and was then improved by Andrei Erofeev <andrew_erofeev@yahoo.com>, Bill Riemers <docbill@sourceforge.net> and many others.

See Also

djvu(1), compress(1), gzip(1), bzip2(1)

Referenced By

djvu(1), djvused(1).

10/11/2001 DjVuLibre-3.5