The imprecision of double precision floating point: a demonstration

Alonso Del Arte
4 min readMar 2, 2024
Photo by Lucas Kepner on Unsplash

Many beginning programmers, and even a lot of professional programmers, don’t understand that the floating point number format is inherently imprecise. Any number can be represented in floating point if we are willing to allow infinite variance.

In other words, the result of a floating point calculation can be wrong by a factor of infinity. The demonstration I’ll give here today will be far less dramatic than that, but important nevertheless.

Many computer chips available today can perform arithmetic on 32-bit floating point numbers and 64-bith floating point numbers. The former are so-called “single precision,” and the latter are so-called “double precision.”

A lot of computer programming languages offer these as numeric primitive types that conform to the IEEE-754 standard for floating point numbers.

In Java, for example, 32-bit floating point numbers are of type float and 64-bit floating point numbers are of type double, on account of being “double precision.”

There’s also the 16-bit floating point format, which is then called “half precision.” I am unaware of any general purpose programming languages that support 16-bit floating point out of the box.

--

--

Alonso Del Arte

is a Java and Scala developer from Detroit, Michigan. AWS Cloud Practitioner Foundational certified