IEEE floating point format
Andrew Koenig
ark at alice.UUCP
Sat Jul 29 12:28:38 AEST 1989
In article <2170002 at hpldsla.HP.COM>, manoj at hpldsla.HP.COM (Manoj Joshi) writes:
> What is the format for the IEEE floating point storage
> convention? In other words (for a 32-bit float) where is
> the exact position of the 4 fields (1 Byte each):
> Similarly how is this stored in a 64-bit double precision
> real number?
The IEEE spec gives the format only modulo permutation of the bits.
That is, different machines are allowed to put the bits in different
parts of the word.
The format is:
field 32-bit format 64-bit format
sign 1 1
exponent 8 12
fraction 23 55
If the exponent is all 0-bits or all 1-bits, the number is a
special case that I'll discuss later. Otherwise, flip the high-
order bit of the exponent, treat it as a 2's-complement number.
Put a binary point between the first fraction bit and the rest
of them. Put a 1 ahead of all the fraction bits. The value
of the number is the fraction times 2^exponent.
For example, the 32-bit representation of 1 is:
sign 0
exponent 01111111 (means -1)
fraction (1)0.0000000000000000000000 (means 2)
So the number is 2*(2^-1) or 1. This number might be
represented this way:
0 01111111 00000000000000000000000 0x3F800000
and indeed it is on some machines.
Now the special cases:
exponent all 1's fraction == 0 infinity
exponent all 1's fraction != 0 NaN
exponent all 0's denormalized
Infinity can be positive or negative. NaN means `not a number'
and is used to signal things like 0 divided by 0 or infinity - infinity.
Denormalized numbers are little tiny numbers too small to represent
otherwise -- for denormalized numbers the leading 1 isn't added
and the exponent is offset by 1 to compensate. If all bits are 0,
the number is 0.
--
--Andrew Koenig
ark at europa.att.com
More information about the Comp.lang.c
mailing list