File compression program (compress R3.0) posted to mod.sources
Joe Orost
joe at petsd.UUCP
Sat Jan 5 00:37:01 AEST 1985
<>
EXTENDED ABSTRACT
Compresses the specified files or standard input. Each file
is replaced by a file with the extension .Z, but only if the
file got smaller. If no files are specified, the
compression is applied to the standard input and is written
to standard output regardless of the results. Compressed
files can be restored to their original form by specifying
the -d option, or by running uncompress (linked to
compress), on the .Z files or the standard input.
When file names are given, the ownership (if run by root),
modes, accessed and modified times are maintained between
the file and its .Z version. In this respect, compress can
be used for archival purposes, yet can still be used with
make(1) after uncompression.
Compress uses the modified Lempel-Ziv algorithm described in
"A Technique for High Performance Data Compression", Terry
A. Welch, IEEE Computer Vol 17, No 6 (June 1984), pp 8-19.
Common substrings in the file are first replaced by 9-bit
codes 257 and up. When code 512 is reached, the algorithm
switches to 10-bit codes and continues to use more bits
until the bits limit as specified by the -b flag is reached
(default 16). Bits must be between 9 and 16. The default
can be changed in the source to allow compress to be run on
a smaller machine.
After the bits limit is reached, compress periodically
checks the compression ratio. If it is increasing, compress
continues to use the codes that were previously found in the
file. However, if the compression ratio decreases, compress
discards the table of substrings and rebuilds it from
scratch. This allows the algorithm to adapt to the next
"block" of the file.
The amount of compression obtained depends on the size of
the input file, the amount of bits per code, and the
distribution of character substrings. Typically, text
files, such as C programs, are reduced by 50-60%.
Compression is generally much better than that achieved by
Huffman coding (as used in pack), or adaptive Huffman coding
(compact), and takes less time to compute.
Some practical uses for compress are:
Saving disk space
Lowering uucp phone expenses
Saving space in archive tape storage
regards,
joe
--
Full-Name: Joseph M. Orost
UUCP: ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
US Mail: MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
Phone: (201) 870-5844
Location: 40 19'49" N / 74 04'37" W
More information about the Comp.unix.wizards
mailing list