IEEE 754

IEEE 754 is a standardised way of storing floating point numbers with three components

  • A sign bit
  • A biased exponent
  • A normalised mantissa
TypeSignExponentMantissaBias
Single Precision (32 bit)1 (bit 31)8 (bit 30 - 23)23 (bit 22- 0)127
Double Precision (64 bit)1 (bit 63)11 (bit 62 - 52)52 (51 - 0)1023

The examples below all refer to 32 bit numbers, but the principles apply to 64 bit.

  • The exponent is an 8 bit unsigned number in biased form
    • To get the true exponent, subtract 127 from the binary value
  • The mantissa is a binary fraction, with the first bit representing , second bit , etc.
    • The mantissa has an implicit , so 1 must always be added to the mantissa

Formula

Decimal to Float

The number is converted to a binary fractional format, then adjusted to fit into the form we need. Take 12.375 for example:

  • Integer part
  • Fraction part

Combining the two parts yields . However, the standard requires that the mantissa have an implicit 1, so it must be shifted to the right until the number is normalised (ie has only 1 as an integer part). This yields . As this has been shifted, it is actually . The three is therefore the exponent, but this has to be normalised (+127) to yield 130 . The number is positive (sign bit zero) so this yields:

SignBiased ExponentNormalised Mantissa
01000 0010100011

Float to Decimal

Starting with the value 0x41C80000 = 01000001110010000000000000000000:

SignBiased ExponentNormalised Mantissa
01000 00111001
  • The exponent is 131, biasing (-127) gives 4
  • The mantissa is 0.5625, adding 1 (normalising) gives 1.5625
  • gives 25

Special Values

  • Zero
    • When both exponent and mantissa are zero, the number is zero
    • Can have both positive and negative zero
  • Infinity
    • Exponent is all 1s, mantissa is zero
    • Can be either positive or negative
  • Denormalised
    • If the exponent is all zeros but the mantissa is non-zero, then the value is a denormalised number
    • The mantissa does not have an assumed leading one
  • NaN (Not a Number)
    • Exponent is all 1s, mantissa is non-zero
    • Represents error values
ExponentMantissaValue
00
2550
0not 0denormalised
255not 0NaN