Representation of integral types
Mark Brader
msb at sq.com
Mon Apr 17 09:24:22 AEST 1989
This comment was posted to comp.lang.c with "Distribution: na" and
"Subject: Re: calloc (actually NULL =?= 0)". I've added comp.std.c and
directed followups to there.
> Actually, I can't see any particular reason for (int)0 to be a zero
> bit pattern either (unless it's mandated by pANS).
Following are excerpts from section 3.1.2.5, "Types".
# There are four "signed integer types", designated as "signed char", "short
# int", "int", and "long int". (The signed integer and other types may also
# be designated in several additional ways, as described in section 3.5.2.)
# ...
# For each of the signed integer types, there is a corresponding (but
# different "unsigned integer type" (designated with the keyword "unsigned")
# that uses the same amount of storage (including sign information) and has
# the same alignment requirements. The range of nonnegative values of a
# signed integer type is a subrange of the corresponding unsigned integer
# type, and the representation of the same value in each type is the same.*
Footnote:
* The same representation and alignment requirements are meant to imply
* interchangeability as arguments to functions, return values from functions,
* and members of unions.
# ...
# The type "char", the signed and unsigned integer types, and the enumerated
# types are collectively called "integral types". The representations of
# integral types shall define values by use of a pure binary numeration
# system.*
Footnote:
* A positional representation for integers that uses the binary digits 0
* and 1, in which the values represented by successive bits are additive,
* begin with 1, and are multiplied by successive integral powers of 2,
* except perhaps the bit with the highest position. (Adapted from the
* "American National Dictionary for Information Processing Systems.")
Now, let me review the usual representations of integers in binary.
Pretend that integers are only 3 bits long, so there are only 8 possible
bit patterns; that way we can enumerate them all in one line of this
article. The "usual" interpretations are:
Bits 000 001 010 011 100 101 110 111
unsigned 0 1 2 3 4 5 6 7
2's complement 0 1 2 3 -4 -3 -2 -1
1's complement 0 1 2 3 -3 -2 -1 -0
sign-magnitude 0 1 2 3 -0 -1 -2 -3
The intention of the second-quoted footnote is to allow each of the
interpretations tabulated above. Note that "all zero-bits" is an
integer 0 in each one of them.
The issue of "-0" is one on which I have seen different opinions. My point
of view is that since -0 is mathematically equivalent to 0, there is only
one *value* there, and since the pANS speaks of "the" representation of a
value, it can have only one representation. Consequently, I feel that a
conforming 1's complement implementation, for instance, is required to
silently convert any instance of the all-1's bit pattern to all-0's before
doing any bitwise operations on it.
A second issue is whether the following interpretation is allowed:
Bits 000 001 010 011 100 101 110 111
"unsigned" 0 1 2 3 0 1 2 3
The question here is whether such an interpretation is "using" all
the bits of the storage, as required by the quoted paragraph about
unsigned types. On a machine where "int", and therefore "unsigned int",
are 16-bit types, "unsigned int" could not use this representation because
the highest unsigned int value would be 32767 and the pANS requires it to
be at least 65535. But on a machine where ints were 18 bits, it might
(depending on this point of interpretation) be permissible for unsigned
ints to use only 17 of their 18 bits and have the same highest value,
131071, as ints.
I think I've heard it said that allowing this was an error and also that
it was intentional and some implementation was using it. I don't know
what's right.
A third issue occurred to me as I was writing this article. I see nothing
in all of this text to prohibit the FOLLOWING interpretations:
Bits 000 001 010 011 100 101 110 111
"unsigned" 4 5 6 7 0 1 2 3
"2's complement" -4 -3 -2 -1 0 1 2 3
"1's complement" -3 -2 -1 -0 0 1 2 3
"sign-magnitude" -0 -1 -2 -3 0 1 2 3
In THIS interpretation, all 0-bits is NOT necessarily a zero value!
If this is indeed a loophole it's clear from other places in the pANS
that it was unintentional. For example, the "null character" is defined in
section 2.2.1 to have all bits zero, and the Examples in 3.1.3.4 say that
'\0' is commonly used to represent it, and body of 3.1.3.4 requires '\0'
to be a synonym for 0. But we can't deduce that 0 has to have all bits 0
from this, because the Examples are not part of the pANS proper. (Neither
are the footnotes, for that matter, and I did complain about the material
in the second-quoted footnote belonging in my opinion in the text proper.)
--
Mark Brader, SoftQuad Inc., Toronto, utzoo!sq!msb, msb at sq.com
"I'm a little worried about the bug-eater," she said. "We're embedded
in bugs, have you noticed?" -- Niven, "The Integral Trees"
This article is in the public domain.
More information about the Comp.lang.c
mailing list