Magic Numbers (and incredible stupidity in "cpio")
Guy Harris
guy at sun.uucp
Sat Dec 7 16:03:45 AEST 1985
> Executables using ``standard'' binary formats, i.e. a.out (PDP-11, Z8000)
> and b.out (MC68000) use the standard magic numbers 0405, 0407, 0410, 0411.
> Non-standard formats, like Xenix x.out (0x0206) and COFF (flames to
> /dev/null; most systems are [ab].out) use distinctive magic numbers.
Well, VAX UNIX (32V, 4.xBSD, System III, Version 8?) also uses those magic
numbers (with 413 added for demand paged executables on 4.xBSD), and
probably lots of other 4.xBSD systems (Sun's does). Does "most" mean "most
UNIX implementations" or "most boxes running UNIX"? If the latter, I think
Xenix is running on a lot of systems, possibly most. Then again, *my* copy
of "Xenix(TM) Standard Object File Format (January 1983)" implies that that
"0x0206" is the "magic number" and is *not* distinctive; the "x_cpu" field
indicates what CPU it's intended for. (This is sort of like the new Sun
UNIX 3.0 object file format, where the "a_machtype" field indicates whether
it's intended for a 68010 or 68020).
COFF seems to invert this, since the "file header" indicates what machine
it's intended for (and tons of other glop) and the "UNIX header" (which is
basically the old a.out header) has the 0405, 0407, 0410, 0411, and 0413
(yes, that's what they use for paged executables, surprise surprise) which
indicates the format of the image but is machine-independent (modulo byte
ordering). Then again, the "file header" magic number seems to indicate
something about the format of the executable, but see a previous posting of
mine for some dyspepsia caused by the proliferation of multiple file header
magic numbers.
> There are other magic numbers. Old-style archives (ar) have 0177545 as a
> magic number; again, the loader knows about this, since a library is an
> archive. System V archives begin with the magic ``number'' "!<arch>\n".
System V, Release 2 archives, anyway; System V Release 1 had a portable
archive format which was different from the 4.xBSD one which was the first
one to use the "!<arch>\n" magic "number". I'm told they came to their
senses because Version 8, being 4.1BSD-based, used that format.
> Cpio archives also have magic numbers in them, but at the archive-member
> level.
No, it has a magic number at the beginning - 070707 (either as a "short" or
a string, depending on whether it's an old cruddy "cpio" archive or a nice
new "gee, we've finally caught up with 'tar' when it comes to portability"
"cpio -c" archive. (S3 had "-c", but it had a bug so it wasn't really
portable. S5 fixed this bug. S5 also broke the byte-swapping garbage:
S3 had an option to swap the bytes within 2-byte quantities.
Presumably, this was because running the tape through "dd" to
byte-swap *everything*, and then byte-swapping the data and
pathnames inside "cpio", thus swapping the binary portion of the
header once and everything else twice, is obviously more efficient
than just swapping the binary portion of the header once. ("cpio"
already has hacks to deal with 4-byte quantities - namely,
file size and modified time - automagically, by shoving "1L" into
a "long" and seeing whether the 0th byte of that "long" is 0 or
not, so PDP-11s and VAXes don't have problems.) It is also
obvious that forcing the user to specify a byte-swapping option
is better than just looking at the magic number and seeing whether
it's 070707 or a byte-swapped 070707 and deciding whether to
swap or not based on that.
Whoever worked on "cpio" for S5 obviously figured that the
purpose of this byte-swapping crap was to make it possible to
move binary data between machines with different byte orders
(as everybody knows, most files with binary data are continuous
streams of 2-byte or 4-byte quantities), not to provide a gross
and kludgy way of byte-swapping the binary portion of a "cpio"
header, so they added an option to swap the 2-byte portions
of 4-byte quantities ("stupid FP-11", to quote - if I remember
correctly - the VAX System III linker, that particular piece of
DEC hardware being responsible for some PDP-11 software, including
but *NOT* limited to UNIX, having a different format for 32-bit
integers than the VAX's hardware supports) and an option to
swap both bytes and 2-byte quantities. They also "fixed" it
not to swap the bytes of the pathnames. This "fix" means that
running the "cpio" archive through "dd" to swap the bytes, and
then doing a byte swap again in "cpio", results in path names
with their bytes swapped! ("/nuxi", anyone?) In effect, you
are now screwed if you have a "cpio" tape, not made with "-c",
which was produced on a machine with a different byte order.
You can't read it in conveniently. (This has been experimentally
verified. I had to whip up a version of "cpio" which does what
"cpio" should have done in the first place - namely, just byte
swap the damn "short"s in the header - to read a tape made on
a System V VAX using the System V "cpio" on a Sun.))
There are a number of quite intelligent and talented people working on UNIX
development at AT&T Information Systems. It looks like the people in charge
of keeping track of COFF magic numbers, and in charge of "cpio", are in need
of some supervision by the aforementioned people. (Fortunately, it looks
like the IEEE P1003 committee is looking at a "tar"-based format, with fixes
to support storing information about directories and special files, for
tapes. I'm told that the European UNIX vendor consortium, X/OPEN, chose a
"cpio" format because of the "cpio" *program*'s byte-swapping
"capabilities". Aside from the basic stupidity (and incorrectness, in the
case of the S5 "cpio") of these "capabilities", they are irrelevant to the
choice of tape *format* because:
1) "tar" doesn't need byte-swapping options because the
control information is in printable ASCII string format
(any tape controller which is good as anything other than
a target for skeet-shooting will write character strings
in memory out to the tape in character-string order)
2) "cpio" has the "-c" option which does the same thing,
so it doesn't need those options except for reading old
tapes (any reasonable "cpio"-format-based standard would
be based on "cpio -c" format, not "cpio" format),
and
3) a *good* program which handles "cpio" format can figure
out the byte order it needs for reading pre-"cpio -c"
tapes by looking at the magic number anyway!
(Flame off, until next time a collection of stupidities this gross comes to
light.)
Guy Harris
More information about the Comp.unix.wizards
mailing list