Some thoughts on enhancing cpio(1)
Sam Kendall
sam at delftcc.UUCP
Thu Apr 3 06:29:50 AEST 1986
I've had some thoughts recently about features that cpio(1) needs. Some
of these apply to tar(1) also.
(1) Optional error recovery. If the header of just one file in a cpio
archive is munged, cpio will issue the pitiful message "Out of
phase--get help" and terminate. This message is confusing to
ordinary users, and it then takes a guru to recover the files in the
archive past the garbled point. This is a bit ridiculous. There
should be some optional error recovery, like the ability to retrieve
the file following the garbled header (even if its name is unknown),
and then to recognize the next file header in the garbled archive
and proceed from there. This might break down if another cpio
archive were one of the files in the garbled archive, but no big
deal.
(2) Automatic recognition of -c vs. non-"-c" formats. The -c option
could be ignored with -i (copy in); cpio should recognize which
format the archive is in. This is easy to implement. It
complicates error recovery, though, in the case that the beginning
of the file is munged.
(3) Fix the bug that -m (restore file modification times) is ineffective
on directories that are being copied. This is vital for the next
feature:
(4) Optional save and restore of directory contents, with file
deletion. The purpose of this feature is to correctly handle full
and incremental backups with cpio; specifically, to correctly
restore a directory in which files have been removed after the full
backup was made, but before the incremental backup was made.
Currently, when -o (copy out) gets the name of a directory, it
outputs a header for that directory, but no contents. My proposal
is for an option "-D" which would work with both -o and -i. With
-o, a list of files in a directory is saved along with the
directory. With -i, when a directory is being restored and is
"replacing" an already existing directory on disk, all files that
are in the existing directory but NOT in the archived directory are
REMOVED.
Another way to look at it: with a cpio -i, the action of a file
replacing an already existing file means, of course, that the
archived contents replace the contents on disk. But there is no
corresponding action for directories. -D adds such an action.
N.B.: as with files, the archived directory will replace the
existing directory only if it is newer or the -u option is given;
this is why (3) above is necessary.
-D would also work with -p (pass), of course.
Example: a directory "d" contains files "a" and "b". A full backup
(using cpio) is made including "d" and its contents. The file "b"
is deleted. Now an incremental backup of files that have changed
since the full backup is made using cpio -D. "d" is on the
incremental backup, because it has changed since the full backup was
made. (It changed when "b" was deleted.) Now suppose "d" is lost on
disk, and we try to restore it to disk from backup. We first
restore the full backup; "d" contains "a" and "b" again. We next
restore the incremental backup. On the incremental backup, "d"
contains "a" but not "b". So "b" is deleted from disk. The restore
has worked correctly. With the current cpio, "b" would still exist,
incorrectly, after the incremental backup was restored.
This is extremely useful for backup purposes. It sounds
complicated, but it fits in beautifully.
(5) Preservation of printable ASCII + short lines. It is too late for
this, since the format is already frozen, but it would have been
good. The idea here is that an archive of mailable files should be
itself mailable, except perhaps for its size. A file that is
mailable has only printable ASCII characters, and has no lines
longer than some length, maybe 80 characters (I'm not sure).
A cpio -c archive has headers which are about 80 characters plus the
length of the pathname; this can get too long. Also, the header
includes a NUL character or two. I wish someone had thought about
this a little bit more before designing the format. It is so close
to preserving mailability!
Of course, "shar", and also Martin Minow's (decvax!minow; I think
it's his) "arch" programs do preserve mailability in almost all
cases.
(6) Should be public domain. This would avoid the annoying scenario
where people get cpio archives but cannot unpack them.
I haven't recommended that checksums be introduced into cpio, because I
think this can be handled by some other filter. (There are some tools
to package software for transmission, available through the AT&T
Toolchest, that probably do what I want here.) One could argue that
mailability can also be handled by other filters; but I would rather
keep things simple for unpacking mailed archives.
Comments?
----
Sam Kendall { ihnp4 | seismo!cmcl2 }!delftcc!sam
Delft Consulting Corp. ARPA: delftcc!sam at NYU.ARPA
More information about the Comp.unix.wizards
mailing list