IEEE floating point format

Sat Jul 29 12:28:38 AEST 1989

In article <2170002 at hpldsla.HP.COM>, manoj at hpldsla.HP.COM (Manoj Joshi) writes:

> What is the format for the IEEE floating point storage
> convention? In other words (for a 32-bit float) where is 
> the exact position of the 4 fields (1 Byte each):

> Similarly how is this stored in a 64-bit double precision
> real number? 

The IEEE spec gives the format only modulo permutation of the bits.
That is, different machines are allowed to put the bits in different
parts of the word.

The format is:

	field		32-bit format		64-bit format

	sign			1			1
	exponent		8			12
	fraction		23			55

If the exponent is all 0-bits or all 1-bits, the number is a
special case that I'll discuss later.  Otherwise, flip the high-
order bit of the exponent, treat it as a 2's-complement number.
Put a binary point between the first fraction bit and the rest
of them.  Put a 1 ahead of all the fraction bits.  The value
of the number is the fraction times 2^exponent.

For example, the 32-bit representation of 1 is:

	sign		0
	exponent	01111111			(means -1)
	fraction	(1)0.0000000000000000000000	(means 2)

So the number is 2*(2^-1) or 1.  This number might be
represented this way:

	0 01111111 00000000000000000000000		0x3F800000

and indeed it is on some machines.

Now the special cases:

	exponent all 1's	fraction == 0		infinity
	exponent all 1's	fraction != 0		NaN
	exponent all 0's				denormalized

Infinity can be positive or negative.  NaN means `not a number'
and is used to signal things like 0 divided by 0 or infinity - infinity.
Denormalized numbers are little tiny numbers too small to represent
otherwise -- for denormalized numbers the leading 1 isn't added
and the exponent is offset by 1 to compensate.  If all bits are 0,
the number is 0.

-- 
				--Andrew Koenig
				  ark at europa.att.com