Compaction Algorithm (pack vs compress)

Richard Harter g-rh at cca.UUCP
Sat Mar 1 14:06:37 AEST 1986


Just a note on packing.  Recently people in one of the newsgroups were
speculating about one character predictor packing.  We implemented it
and found that it works as well as pack (for text files).  The idea is
that you make a first pass through the file and determine, for each
character, its most probable successor.  You make a second pass through
the file and record one bit for each character, F if the character is not
followed by its most probable successor, and T if it is.  You also record
the incorrect guesses.  You store the file as a bit array followed by the
wrongly predicted characters.  The point of these shenanigans is that the
unpacking is fast.  In our context the determination of the predicted
characters only has to be done once so packing is also fast.  Our results
are that this style of packing is as efficient as PACK for source code
and English text.  The bit array is an automatic 12.5% penalty.  Prediction
rate is around 50% for source code (depending on language and style --
higher for operating systems that store files with trailing blanks in
fixed records).  Not bad for quick and dirty.  However it is of no use
for object code.

	Richard Harter, SMDS Inc.



More information about the Comp.lang.c mailing list