Floating point puzzle
Larry Riddle
riddle at emory.uucp
Sun Aug 7 13:15:22 AEST 1988
The following is a very simple C program compiled and run on a Sun-4
with no special command line options.
main()
{
float x,y;
x = 1.0/10.0;
y = 1677721.0/16777216.0;
printf("x: %x",x);
printf("%20.17f\n",x);
printf("y: %x",y);
printf("%20.17f\n",y);
}
Here is the output:
x: 3fb99999 0.10000000149011612
y: 3fb99999 0.09999996423721313
Notice that x and y, which have been declared as floats, and thus have
a 32 bit representation (according to the manual this obeys IEEE
floating point arithmetic standards), both are printed the same in hex,
but give different values when printed as floats. I believe that the
hex is a straight translation of the internal bit representation. The
division in computing x and y is done in double precision (64 bits) and
then converted to floats.
Can anyone enlighten me on why this output comes out this way?
**********
According to what I have read about the IEEE standard, floats should
have 1 sign bit, a biased exponent of 8 bits, and a 23 bit normalized
mantissa. However, my experiments seem to imply that floats have an 11
bit biased exponent (offset by 1023) and only a 20 bit normalized
mantissa, exactly the same as doubles, except double has a 52 bit
mantissa. For example, the bit pattern 3fb99999 given above for 1/10
corresponds to
exponent mantissa
0 01111111011 10011001100110011001
The 11 bits of this exponent gives 1019-1023 = -4 which coupled with
the mantissa gives the binary number
.0001100110011001100110011001...
which is the (non-terminating) binary representation for 1/10. Notice
also that this 32 bit representation has been chopped, rather than
rounded.
I don't understand this discrepancy either. Any suggestions?
Thanks.
***********
--
Larry Riddle | gatech!emory!riddle USENET
Emory University | riddle at emory CSNET,BITNET
Dept of Math and CS | riddle.emory at csnet-relay ARPANET
Atlanta, Ga 30322 | (404) 727-7922 AT at T
More information about the Comp.lang.c
mailing list