Imprecision and Overflow

Have you heard about the Boeing 787 that needed to be rebooted after a set amount of days in order to prevent potentially catastrophic consequences ? This was due to a bug where a counter in the generator overflowed when it could no longer store the amount of seconds it had been running since the last reboot. How about the Y2K when we thought the world was going to end when the computer clocks turned from 1999 to 2000 ? Similar issue there and it all has to do with Bits and the way we use them. Problems like this are a more common occurence in the world of computing than you might think. In our world we humans have the luxury of being able to use an infinite amount of real numbers to tackle our problems, but in world of computers where we use bits to represent real numbers we often run into issues because we only have access to a finite number of bits.

So what happens when you are trying to compute something and you run out of bits ? The answer is that it depends on the datatype. Since our computer’s memory is finite to begin with, we can only afford to allocate a certain number of bits for our various data types such as the well known int and float. As our programs run and over time the bits we have assigned to our variables get “used up” we may run into certain issues that programmers need to be aware of.

Thus we enter the world of ‘floating point imprecision’ and ‘integer overflow’. To demonstrate the issues with these occurences we will write a couple of programs in the C programming language.

Floating Point Imprecision

When we need to work with a decimal number the computer converts it to a floating point representation in binary. However due to the aforementioned finite amount of bits assigned our data types, the converted number is a close approximation rather than an exact one. This leads to small errors which can accumulate within our calculations leading to imprecise results.

In this program we simply add the amount of y to x ten times and then print the result to the console.

The program prints 2.0 which seems correct and we move on. But is it correct ? As we mentioned already when we convert our decimal numbers to floating point representations in binary the results are an approximation and thus contain small errors. We should be able to see this accumulation of errors in our program if we simply increased the amount of times that the loop iterates over to a larger number like 50000. Under normal circumstances when ignoring the concept of an error the expected result would be 50000 times 0.1 plus 1 which amounts to 5001.

Lets see what the computer thinks.

We can clearly see that the error indeed exists and that it accumulates over time as expected. Sometimes the result will be more than the expected number and others it will be less but the point of discussion is that it is there and it affects our computing. The longer we let our program run the bigger the error will be.

Using our understanding of the issue so far it should be clear that it is not possible to entirely alleviate FPI because it is a fundamental limitation of our way of representing real numbers using a finite number of bits. The solution to our predicament is to “lessen” its impact using better e.g more precise data types. When we say more precise we actually mean that we use more bits to represent the data held within the data type which in theory should make our problem less noticeable. In C, a ‘more precise’ data type than float would be a double or double-precision floating point. Let us change our program so that it uses double instead of float and then check the results.

This time with improved precision the results are :

This seems to fix the problem although it is important to understand that this issue cannot be fixed , it can only be improved upon with “better” data types which should be carefully assigned according to the needs of the application that is being designed. In fact if we change our program to iterate from 50.000 to 5.000.000 we will see the error become noticeable again. For scientific or financial applications where precision is key there are better data types that can be used such as the long double. Additionally there are libraries out there that improve upon the issue such as Multiple Precision Floating-point Reliable (MPFR) and GNU Multiple Precision (GMP).

Integer Overflow

Similarly when we talk about integer overflow we are facing the problem of our int data type running out of bits to represent real numbers. When that happens it kind of “rolls over” from its maximum which just “overflowed” to its minimum.

The maximum and minimum values of integers are determined by the number of bits used to represent the integer.

In most modern computers, integers are represented using a fixed number of bits, typically 8, 16, 32, or 64 bits. The number of bits used to represent an integer determines the range of values that can be represented by that integer.

For example, an 8-bit integer can represent values from -128 to 127, while a 16-bit integer can represent values from -32768 to 32767. A 32-bit integer can represent values from -2147483648 to 2147483647, while a 64-bit integer can represent values from -9223372036854775808 to 9223372036854775807.

Lets write a program to demonstrate integer overflow in action. In the following program we will have a never ending for-loop which doubles the value of i on each iteration. We use the <unistd.h> header in order to have the sleep function delay each iteration of the loop by one second so we can witness and comprehend the results in real time.

As our i gets in the billions by reaching 1073741824 it tries to double itself one more time which means it would reach the value of 2.147.483.648, just 1 higher than the maximum positive value of an int on my system. Surpassing this value means the int overflows its allocated bit size and falls back into its minimum value.

Likewise with the FPI we can use a “better” data type in order to accomodate our needs of an integer substantially higher than two billion. In this case we could use the long int data type. If you need even higher range you could opt for using the long long int.

Notice how in the printf function we changed the format code from %i to %li to account for our data type being of the long variety this time around. It should take substantially more time to reach the maximum.

Works like a charm.

Another way to increase the maximum limit of your integers without using a bigger data type such as long is to use unsigned int over just int. An unsigned int cannot be a negative number but retains its range which means that it will have a greater positive maximum than a normal int. On my machine that maximum is 4.294.967.295.


In conclusion, floating point imprecision and integer overflow are evidently important issues that can affect the accuracy and correctness of a program. You can read some real world examples of problems caused by FPI and IO over at Wikipedia. These examples demonstrate the importance of being aware of floating point imprecision and integer overflow, and taking the appropriate measures to avoid them in order to prevent serious problems in real-world applications.

Resources

  1. If you would like to learn more about how to calculate the minimum and maximum values for data types you can read this great post by Nickolas Teixeira Lanza on medium.
  2. If you would like to learn more about bits and how we’ve used them to design computer programming as we know it check the beginning of CS50 – Introduction to Computer Science by Harvard. Its free !

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *