REPOST lharc102A Part 01/04 BSD Unix to Amiga archives
Kent Paul Dolan
xanthian at zorch.SF-Bay.ORG
Fri Feb 1 20:19:05 AEST 1991
bernie at metapro.DIALix.oz.au (Bernd Felsche) writes:
> De-flamed deliberately.
Spoil sport.
xanthian at zorch.SF-Bay.ORG (Kent Paul Dolan) writes:
>> Second, "compress,uuencode,recompress" is not the best use of
>> technology; I did a little test with the same files in just one big
>> shar, to simplify the reporting of the results:
> WHOAH THERE! Shouldn't you be using tar to generate the archive
> instead of shar? Its wrapper information is more compact and
> efficient.
It is more efficient yet because putting everything in one big file
lets compression proceed across file boundaries rather than start
fresh at each file, but filewise storage is nearly as efficent.
> Then you compress the tar archive... and uuencode it. Please try this
> and publish the results for comparison.
You had to ask; well, I was sitting home grumpy because I was too sick to
make the party tonight, so why not:
-------------------------------------------------------------------------
original data:
3091 Makefile
3841 amiga_patch
2885 generic_patch
11521 lh.doc.japanese
2800 lh.inst.japanese
6783 lh.n.japanese
13133 lhadd.c
29556 lharc.c
7568 lharc.doc.posted
11220 lharc.doc.revised
9279 lharc.h
9588 lharc.l
2010 lhdir.c
886 lhdir.h
6154 lhext.c
6504 lhio.c
1483 lhio.h
6672 lhlist.c
22476 lzhuf.c
1229 read.me_1
486 read.me_2
1770 read.me_3
original data size total of file sizes (from wc -c)
160935 lha
three files uuencoded because they contain control characters:
15910 lh.doc.japanese.uu
3895 lh.inst.japanese.uu
9376 lh.n.japanese.uu
original data size but with those three uuencodings instead:
169012 lha3uu
Plan a, just sharing the original files, is unworkable, shars with control
characters won't unpack reliably:
176274 lha.sh
Plan b: current net practice; shar, compress:
184153 lha3uu.sh shar three files uuencoded, rest plain text;
82885 lha3uu.sh.Z its size as transmitted after compression
Plan c: other current net practice; tar, compress, uuencode, compress:
180224 lha.tar original data tarred - not transmittable, so
73149 lha.tar.Z compress it and
100810 lha.tar.Z.uu uuencode it for safety;
91533 lha.tar.Z.uu.Z its size as transmitted after compression
Plan d: improve plan b by replacing compress with lharc, uuencode, compress:
63604 lha3uu.sh.lzh lharc of shar file is binary
87666 lha3uu.sh.lzh.uu must be uuencoded to hide control characters;
79863 lha3uu.sh.lzh.uu.Z its size as transmitted after compression
Plan e: improve plan c by replacing first compress by lharc:
56476 lha.tar.lzh lharc of tar file is binary
77844 lha.tar.lzh.uu must be uuencoded to hide control characters;
70839 lha.tar.lzh.uu.Z its size as transmitted after compression
Plan f: improve plan d by replacing tar | compress by lharc:
56944 lha.lzh lharc of original files is binary
78484 lha.lzh.uu must be uuencoded to hide control characters;
71211 lha.lzh.uu.Z its size as transmitted after compression
Note: step c is not the same as simple news transmission, where tar |
compress | transmit | uncompress | untar is the paradigm, but that
process is not required to create a news article as an intermediate
product, and steps b to f must and do.)
Note: zoo could also have been used whereever lharc was, but lharc compresses
better, and so dominates the zoo data.
Results:
Costs in bytes
Data Telecomm
storage volume Plan
184153 82885 b: partial uuencode, shar, compress
100810 91533 c: tar, compress, uuencode, compress
87666 79863 d: partial uuencode, shar, lharc, uuencode, compress
77844 70839 e: tar, lharc, uuencode, compress
78484 71211 f: lharc, uuencode, compress
The absolute storage champion is plan e, but plan f is nearly as good, and
requires one fewer tools; neither of the current plans, nor plan d, has a lot
to recommend it. The choice between e and f should be made mostly on economic
grounds.
-------------------------------------------------------------------------
> Depending on software versions, you can do all this in a pipe (which
> you undoubtedly know) "tar cf - files | compress | uuencode
> >bugs.tar.Z.uu"
> For transmission, it can be compressed again, (it would be smarter to
> uudecode) though this _should_ be done by a network layer, even though
> it often isn't. Wouldn't it be nice if modem transfer protocols were
> smart enough to compress on the fly?
>> So in fact, for the files being sent, there is some modest _gain_ in
>> telecommunications efficiency by using the best compression
>> technology on text, and then uuencoding it and letting the standard
>> net node to >node compression have its way with the files.
> Agreed. In fact, the more text, the better the gain.
>> I have yet to see a single argument for the present methods that
>> comes down, at the last, to anything but sheer laziness on the part
>> of those who don't want to change their habits. Compressed, uuencoded
>> transmission methods win on every reasonable criterion.
> Although one should be wary of zoo archives, which don't work well if
> there are many small text files in it (i.e. typical source code).
> Compression can be as little as 10-15%, which uuencoding explodes past
> the original size.
Yeah, lharc is _much_ better at compressing small files than is zoo, which
is why putting a shar or tar wrapper around them and then zooing them looks
better than zooing them separately.
>> By the way, it is _not_ a solution to replace compress with a filter
>> form of lharc as the typical file compressor for telecommunications;
>> lharc is _much_ too slow to use at every step along the way, so it
>> needs to be done just once at the originating site to accomplish
>> these savings.
> TANSTAFL.
Kent, the man from xanth.
<xanthian at Zorch.SF-Bay.ORG> <xanthian at well.sf.ca.us>
More information about the Alt.sources.d
mailing list