ANSIfication: value preserving rules
Chris Torek
chris at mimsy.UUCP
Sun Apr 10 19:20:34 AEST 1988
Several people have expressed confusion over the difference between
`sign preserving' rules and `value preserving' rules. These rules
control the result when the compiler has to expand an unsigned char,
unsigned short, or unsigned int value to a larger type. (From here on
the unsigned prefix will be abbreviated |u_|.)
The first kind of expansion happens whenever an object of type |u_char|
or |u_short| appears in an expression. The object must be widened to
|int| or |u_int|. The second occurs when |u_int| values (possibly
produced by the former expansion) are mixed with |long| or |u_long|
values in any arithmetic expression.
The `sign preserving' rules can be stated in four words: the result is
unsigned. The table below shows the result of each conversion
(u_int:long means u_int in long context):
SIGN PRESERVING RULES
input type output type
---------- -----------
u_char u_int
u_short u_int
u_int u_int
u_int:long u_long
The `value preserving' rule table looks like this:
VALUE PRESERVING RULES
input type output type
---------- -----------
u_char int or u_int
u_short int or u_int
u_int u_int
u_int:long long or u_long
Whether |int| or |u_int| (|long| or |u_long|) is chosen depends on
whether |int| (|long|) can hold all the values of the input type.
More specifically, on a machine with 16-bit |int|s and 32-bit
|long|s (e.g., IBM PC, PDP-11, some 68000 systems), the table
looks like this:
VALUE PRESERVING RULES FOR PDP-11/IBM-PC
input type output type
---------- -----------
u_char int
u_short u_int
u_int u_int
u_int:long long
whereas on a 32-bit |int| and |long| machine (e.g., VAX, IBM PS/2
in 386 mode, most 68000 systems), it appears instead as
VALUE PRESERVING RULES FOR VAX/SUN/IBM PS2
input type output type
---------- -----------
u_char int
u_short int
u_int u_int
u_int:long u_long
The Rationale provides the following, er, rationale:
The unsigned preserving rules greatly increase the number of
situations where |unsigned int| confronts |signed int| [in an
expression] to yeild a questionably signed result [where a negative
number suddenly becomes a large positive number, a possibly
unintended result], whereas the value preserving rules minimize
such confrontations. Thus, the value preserving rules were
considered to be safer for the novice, or unwary, programmer.
After much discussion, the Committee decided in favor of value
preserving rules, despite the fact that the UNIX C compilers had
evolved in the direction of unsigned preserving.
QUIET CHANGE
A program that depended upon unsigned preserving arithmetic
conversions will behave differently, probably without
complaint. This is considered the most serious semantic
change made by the Committee to a widespread current practice.
I claim that the value-preserving rules are no easier for novices,
particularly because the expansion of |u_short| is so terribly
context-dependent. One might note that the following prints
"conformant" twice on every existing conformant implementation:
unsigned char uc = -1;
unsigned int ui = -1;
if (-uc < 0)
printf("conformant\n");
if (-ui > 0)
printf("conformant\n");
We are supposed to believe that this is somehow less confusing than the
alternative (-uc > 0, -ui > 0). The Rationale notes that the behaviour
of expressions such as
if (-(unsigned short)-1 < 0)
is machine-dependent, without going so far as to give examples like
those above. It also notes that all the ambiguity (along with the
default rules) can be eliminated with judicious use of casts. Why
not, then, ask novices always to write those casts, and/or to remember
the rule `unsigned widens to unsigned'.
In find it significant that the unsigned preserving rules can be stated
in four words, while the value preserving rules require a paragraph
full of conditional wording. How can something that is that hard to
say be `safer'? As for the argument that the value-preserving rules
minimise the presence of mixed signed and unsigned operations, I submit
that a majority of these will occur between |u_int| and |long| objects,
and I note that in this case, on most modern systems (counting the
80386 as modern, but not the 286), the value preserving rules help
not at all.
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain: chris at mimsy.umd.edu Path: uunet!mimsy!chris
More information about the Comp.lang.c
mailing list