Byte order (retitled)
Spencer W. Thomas
thomas at utah-gr.UUCP
Thu Apr 10 00:23:10 AEST 1986
The best (and most tongue in cheek) discussion I have ever seen about
this issue is a USC/ISI memo written by Danny Cohen, titled "On Holy
Wars and a Plea for Peace". The whole thing is about 33000 bytes, so I
won't post it, but I will give some excerpts. [Well, even my excerpts
come to about 1/3 of the original.]
IEN 137 Danny Cohen
U S C/I S I
1 April 1980
ON HOLY WARS AND A PLEA FOR PEACE
INTRODUCTION
This is an attempt to stop a war. I hope it is not too late and that
somehow, magically perhaps, peace will prevail again.
The latecomers into the arena believe that the issue is: "What is the
proper byte order in messages?".
The root of the conflict lies much deeper than that. It is the question
of which bit should travel first, the bit from the little end of the
word, or the bit from the big end of the word? The followers of the
former approach are called the Little-Endians, and the followers of the
latter are called the Big-Endians. The details of the holy war between
the Little-Endians and the Big-Endians are documented in [6] and
described, in brief, in the Appendix. I recommend that you read it at
this point.
...
In a consistent order, the bit-order, the byte-order, the word-order,
the page-order, and all the other higher level orders are all the same.
Hence, when considering a serial bit-stream, along a communication line
for example, the "chunk" size which the originator of that stream has in
mind is not important.
There are two possible consistent orders. One is starting with the
narrow end of each word (aka "LSB") as the Little-Endians do, or
starting with the wide end (aka "MSB") as their rivals, the Big-Endians,
do.
In this note we usually use the following sample numbers: a "word" is a
32-bit quantity and is designated by a "W", and a "byte" is an 8-bit
quantity which is designated by a "C" (for "Character", not to be
confused with "B" for "Bit)".
MEMORY ORDER
The first word in memory is designated as W0, by both regimes.
Unfortunately, the harmony goes no further.
The Little-Endians assign B0 to the LSB of the words and B31 is the MSB.
The Big-Endians do just the opposite, B0 is the MSB and B31 is the LSB.
By the way, if mathematicians had their way, every sequence would be
numbered from ZERO up, not from ONE, as is traditionally done. If so,
the first item would be called the "zeroth"....
Since most computers are not built by mathematicians, it is no wonder
that some computers designate bits from B1 to B32, in either the
Little-Endians' or the Big-Endians' order. These people probably would
like to number their words from W1 up, just to be consistent.
...
On the other hand, the Little-Endians have their view, which is
different but also self-consistent.
They believe that one should start with the narrow end of every word,
and that low addresses are of lower order than high addresses.
Therefore they put their words on paper as if they were written in
Hebrew, like this:
...|---word2---|---word1---|---word0---|
When they add the bit order and the byte order they get:
...|---word2---|---word1---|---word0---|
....|C3,C2,C1,C0|C3,C2,C1,C0|C3,C2,C1,C0|
.....|B31......B0|B31......B0|B31......B0|
In this regime, when word W(n) is shifted right, its LSB moves into the
MSB of word W(n-1).
English text strings are stored in the same order, with the first
character in C0 of W0, the next in C1 of W0, and so on.
This order is very consistent with itself, with the Hebrew language, and
(more importantly) with mathematics, because significance increases with
increasing item numbers (address).
It has the disadvantage that English character streams appear to be
written backwards; this is only an aesthetic problem but, admittedly, it
looks funny, especially to speakers of English.
In order to avoid receiving strange comments about this orders the
Little-Endians pretend that they are Chinese, and write the bytes, not
right-to-left but top-to-bottom, like:
C0: "J"
C1: "O"
C2: "H"
C3: "N"
..etc..
... (Discussion of PDP-11 Floating point unit leads into ...)
However, due to some oversights in the security screening process, the
Blefuscuians took over, again. They assigned, as they always do, the
wide end to the LOWer addresses in memory, and the narrow to the HIGHer
addresses.
Let "xy" and "abcd" be 32- and 64-bit floating-point numbers,
respectively. Let's look how these numbers are stored in memory:
ddddddddL ccccccccc bbbbbbbbb SMaaaaaaa yyyyyyyyL SMxxxxxxx
....|--word5--|--word4--|--word3--|--word2--|--word1--|--word0--|
.....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|
......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|
Well, Blefuscu scores many points for this. The above reference in [3]
does not even try to camouflage it by any Chinese notation.
Encouraged by this success, as minor as it is, the Blefuscuians tried to
pull another fast one. This time it was on the VAX, the sacred machine
which all the Little-Endians worship.
Let's look at the VAX order. Again, we look at the way the above data
(with xy being a 32-bit integer) is stored in memory:
"N" "H" "O" "J" SMzzzzzzL SMxxxxxxx yyyyyyyyL
...ng2-------|-------long1-------|-------long0-------|
....|--word4--|--word3--|--word2--|--word1--|--word0--|
.....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|
......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|
What a beautifully consistent Little-Endians' order this is !!!
So, what about the infiltrators? Did they completely fail in carrying
out their mission? Since the integer arithmetic was closely guarded
they attacked the floating point and the double-floating which were
already known to be easy prey.
Let's look, again, at the way the above data is stored, except that now
the 32-bit quantity xy is a floating point number: now this data is
organized in memory in the following Blefuscuian way:
"N" "H" "O" "J" SMzzzzzzL yyyyyyyyL SMxxxxxxx
...ng2-------|-------long1-------|-------long0-------|
....|--word4--|--word3--|--word2--|--word1--|--word0--|
.....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|
......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|
Blefuscu scores again. The VAX is found guilty, however with the
explanation that it tries to be compatible with the PDP11.
Having found themselves there, the VAXians found a way around this
unaesthetic appearance: the VAX literature (e.g., p. 10 of [4])
describes this order by using the Chinese top-to-bottom notation, rather
than an embarrassing left-to-right or right-to-left one. This page is a
marvel. One has to admire the skillful way in which some quantities are
shown in columns 8-bit wide, some in 16 and other in 32, all in order to
avoid the egg-on-the-face problem.....
By the way, some engineering-type people complain about the "Chinese"
(vertical) notation because usually the top (aka "up") of the diagrams
corresponds to "low"-memory (low addresses). However, anyone who was
brought up by computer scientists, rather than by botanists, knows that
trees grow downward, having their roots at the top of the page and their
leaves down below. Computer scientists seldom remember which way "up"
really is (see 2.3 of [5], pp. 305-309).
...
SUMMARY (of the Memory-Order section)
To the best of my knowledge only the Big-Endians of Blefuscu have built
systems with a consistent order which works across chunk-boundaries,
registers, instructions and memories. I failed to find a
Little-Endians' system which is totally consistent.
... (Discussion in similar vein of various transmission protocols)
SUMMARY (of the Transmission-Order section)
There are two camps each with its own language. These languages are as
compatible with each other as any Semitic and Latin languages are.
All Big-Endians can talk to each other with relative ease.
So can all the Little-Endians, even though there are some differences
among the dialects used by different tribes.
There is no middle ground. Only one end can go first.
CONCLUSION
Each camp tries to convert the other. Like all the religious wars of
the past, logic is not the decisive tool. Power is. This holy war is
not the first one, and probably will not be the last one either.
The "Be reasonable, do it my way" approach does not work. Neither does
the Esperanto approach of "let's all switch to yet a new language".
Our communication world may split according to the language used. A
certain book (which is NOT mentioned in the references list) has an
interesting story about a similar phenomenon, the Tower of Babel.
Little-Endians are Little-Endians and Big-Endians are Big-Endians and
never the twain shall meet.
We would like to see some Gulliver standing up between the two islands,
forcing a unified communication regime on all of us. I do hope that my
way will be chosen, but I believe that, after all, which way is chosen
does not make too much difference. It is more important to agree upon
an order than which order is agreed upon.
How about tossing a coin ???
...
For ease of reference please note that Lilliput and Little-Endians
both start with an "L", and that both Blefuscu and Big-Endians start
with a "B". This is handy while reading this note.
R E F E R E N C E S
[1] Bolt Beranek & Newman.
Report No. 1822: Interface Message Processor.
Technical Report, BB&N, May, 1978.
[2] CCITT.
Orange Book. Volume VIII.2: Public Data Networks.
International Telecommunication Union, Geneva, 1977.
[3] DEC.
PDP11 04/05/10/35/40/45 processor handbook.
Digital Equipment Corp., 1975.
[4] DEC.
VAX11 - Architecture Handbook.
Digital Equipment Corp., 1979.
[5] Knuth, D. E.
The Art of Computer Programming. Volume I: Fundamental
Algorithms.
Addison-Wesley, 1968.
[6] Swift, Jonathan.
Gulliver's Travel.
Unknown publisher, 1726.
OTHER SLIGHTLY RELATED TOPICS (IF AT ALL)
Who's on first? Zero or One ??
People start counting from the number ONE. The very word FIRST is
abbreviated into the symbol "1st" which indicates ONE, but this is a
very modern notation. The older notions do not necessarily support this
relationship.
In English and French - the word "first" is not derived from the word
"one" but from an old word for "prince" (which means "foremost").
Similarly, the English word "second" is not derived from the number
"two" but from an old word which means "to follow". Obviously there is
an close relation between "third" and "three", "fourth" and "four" and
so on.
Similarly, in Hebrew, for example, the word "first" is derived from the
word "head", meaning "the foremost", but not specifically No. 1. The
Hebrew word for "second" is specifically derived from the word "two".
The same for three, four and all the other numbers.
...
SWIFT's POINT
It may be interesting to notice that the point which Jonathan Swift
tried to convey in Gulliver's Travels in exactly the opposite of the
point of this note.
Swift's point is that the difference between breaking the egg at the
little-end and breaking it at the big-end is trivial. Therefore, he
suggests, that everyone does it in his own preferred way.
We agree that the difference between sending eggs with the little- or
the big-end first is trivial, but we insist that everyone must do it in
the same way, to avoid anarchy. Since the difference is trivial we may
choose either way, but a decision must be made.
--
=Spencer ({ihnp4,decvax}!utah-cs!thomas, thomas at utah-cs.ARPA)
More information about the Comp.lang.c
mailing list