Resolved tar-to-a-pipe problem ("tar: blocksize=1")
Chris Torek
torek at elf.ee.lbl.gov
Thu Apr 4 08:00:28 AEST 1991
In article <5932 at tahoe.unr.edu> malc at equinox.unr.edu (Malcolm Carlock) writes:
>It looks as if tar commands of the form
>
> tar cf - files | rsh somewhere dd [ibs=???] [obs=???] of=/dev/sometape
>
>just can't be made to work the way you'd expect ...
Sure they can:
tar cf - . | rsh foo dd of=/dev/tape/1n at 6250 obs=20b
Be forewarned that most incarnations of `dd' handle this egregiously
slowly.
What is going on? This answer requires some background:
A. Tapes have `block sizes'. Not all tapes, mind you---most SCSI
tapes have a fixed block size that can, for the most part, be
ignored. 9-track tapes, however, typically record data in
`records' separated by `gaps', and only whole records can be
re-read later.
B. In order to accomodate this, Unix tape drivers generally translate
each read() or write() system call into a single record transfer.
The size of a written record is the number of bytes passed to
write(). (There may be some additional constrants, such as
`the size must be even' or `the size must be no more than 32768
bytes'. Note that phase encoded [1600 bpi] blocks should be no
longer than 10240 bytes, and GCR [6250 bpi] blocks should be no
longer than 32768 bytes, to reduce the chance of an unrecoverable
error.) Each read() call must ask for at least one whole record
(many drivers get this wrong and silently drop trailing portions
of a record that was longer than the byte count given to read());
each read() returns the actual number of bytes in the record.
C. Network connections are generally `byte streams': the two host
`peers' (above, the machine running tar, and the machine with the
tape drive) will exchange data but will drop any `record boundary'
notion at the protocol interface level. If record boundaries are
to be preserved, this must be done in a layer above the network
protocol itself. (Not all network protocols are stream-oriented,
not even flow-controlled, error-recovering protocols. Internet RDP
and XNS SPP are two examples of reliable record-oriented protocols.
Many of these, however, impose fairly small record sizes.)
D. rsh simply opens a stream protocol, and does no work to preserve
`packet boundaries'.
E. dd works in mysterious ways.
dd if=x of=y
is the same as
dd if=x of=y ibs=512 obs=512
which means `open files x and y, then loop doing read(fd_x) with
a byte count of 512, take whatever you got, copy it into an output
buffer for file y, and each time that buffer reaches 512 bytes,
do a single write(fd_y) with 512 bytes'.
On the other hand,
dd if=x of=y bs=512
means something completely different: `open files x and y, then
loop doing read(fd_x) with a byte count of 512, take whatever
you got, and do a single write(fd_y) with that count'.
All of this means that
tar cf - . | rsh otherhost dd of=/dev/tape/0
will write 512 byte blocks (not what you wanted), while
tar cf - . | rsh otherhost dd of=/dev/tape/0 bs=20b
will be even worse: it will take whatever it gets from stdin---which,
being a TCP connection, will be arbitrarily lumpy depending on the
underlying network parameters and the particular TCP implementation
---and write essentially random-sized records. On purely `local'
(Ethernet) connections, with typical implementations, you will wind
up with 1024 byte blocks (a tar `block factor' of 2).
If a blocking factor of 2 is acceptable, and if `cat' forces 1024 byte
blocks (both true in some cases), you can use
tar cf - . | rsh otherhost 'cat >/dev/tape/0'
but this depends on undocumented features in `cat'. In any case, on
9-track tapes, since each `gap' occupies approximately% 0.7 inches of
otherwise useful tape space, a block size of 1024 has 10 times as many
gaps as a block size of 10240, wasting 9x1600x0.7 = 10 kbytes of
tape at 1600 bpi, or 32 times as many as a size of 32768, wasting
31x6250x0.7 = 136 kbytes of tape at 6250 bpi.
-----
% Actual gap sizes vary. In particular, certain `streaming' drives
(all too often called `streaming' because they do not---in some cases
the controller is too `smart' to be able to keep up with the required
data rate, even when fed back-to-back DMA requests) have been known
to stretch the gaps to 0.9".
-----
In general, because of tape gaps, you should use the largest record size
that permits error recovery. Note, hoever, that some olid% hardware (such
as that found on certain AT&T 3B systems) puts a ridiculous upper limit
(5K) on tape blocks.
-----
% Go ahead, look it up... it is a perfectly good crossword puzzle word :-)
--
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA Domain: torek at ee.lbl.gov
More information about the Comp.unix.questions
mailing list