Imprecision and Overflow

Created by

Have you heard about the Boeing 787 that needed to be rebooted after a set amount of days in order to prevent potentially catastrophic consequences ? This was due to a bug where a counter in the generator overflowed when it could no longer store the amount of seconds it had been running since the last reboot. How about the Y2K when we thought the world was going to end when the computer clocks turned from 1999 to 2000 ? Similar issue there and it all has to do with Bits and the way we use them. Problems like this are a more common occurence in the world of computing than you might think. In our world we humans have the luxury of being able to use an infinite amount of real numbers to tackle our problems, but in world of computers where we use bits to represent real numbers we often run into issues because we only have access to a finite number of bits.

So what happens when you are trying to compute something and you run out of bits ? The answer is that it depends on the datatype. Since our computer’s memory is finite to begin with, we can only afford to allocate a certain number of bits for our various data types such as the well known int and float. As our programs run and over time the bits we have assigned to our variables get “used up” we may run into certain issues that programmers need to be aware of.

Thus we enter the world of ‘floating point imprecision’ and ‘integer overflow’. To demonstrate the issues with these occurences we will write a couple of programs in the C programming language.

Floating Point Imprecision

When we need to work with a decimal number the computer converts it to a floating point representation in binary. However due to the aforementioned finite amount of bits assigned our data types, the converted number is a close approximation rather than an exact one. This leads to small errors which can accumulate within our calculations leading to imprecise results.

#include <stdio.h>

int main() {
    float x = 1.0;
    float y = 0.1;

    for ( int i = 0; i < 10; i++)
        x+=y;

    printf("%f\n",x);
}

In this program we simply add the amount of y to x ten times and then print the result to the console.

2.000000

The program prints 2.0 which seems correct and we move on. But is it correct ? As we mentioned already when we convert our decimal numbers to floating point representations in binary the results are an approximation and thus contain small errors. We should be able to see this accumulation of errors in our program if we simply increased the amount of times that the loop iterates over to a larger number like 50000. Under normal circumstances when ignoring the concept of an error the expected result would be 50000 times 0.1 plus 1 which amounts to 5001.

#include <stdio.h>

int main() {
    float x = 1.0;
    float y = 0.1;

    for (int i = 0; i < 50000; i++)
        x+=y;

    printf("%f\n",x);
    printf("Expected: 5001\n");
    printf("Error: %f\n",x-5001);
}

Lets see what the computer thinks.

5003.530273
Expected: 5001
Error: 2.530273

We can clearly see that the error indeed exists and that it accumulates over time as expected. Sometimes the result will be more than the expected number and others it will be less but the point of discussion is that it is there and it affects our computing. The longer we let our program run the bigger the error will be.

Using our understanding of the issue so far it should be clear that it is not possible to entirely alleviate FPI because it is a fundamental limitation of our way of representing real numbers using a finite number of bits. The solution to our predicament is to “lessen” its impact using better e.g more precise data types. When we say more precise we actually mean that we use more bits to represent the data held within the data type which in theory should make our problem less noticeable. In C, a ‘more precise’ data type than float would be a double or double-precision floating point. Let us change our program so that it uses double instead of float and then check the results.

#include <stdio.h>

int main() {
    double x = 1.0;
    double y = 0.1;

    for ( int i = 0; i < 50000; i++)
        x+=y;

    printf("%f\n",x);
    printf("Expected: 5001\n");
    printf("Error: %f\n",x-5001);
}

This time with improved precision the results are :

5001.000000
Expected: 5001
Error: 0.000000

This seems to fix the problem although it is important to understand that this issue cannot be fixed , it can only be improved upon with “better” data types which should be carefully assigned according to the needs of the application that is being designed. In fact if we change our program to iterate from 50.000 to 5.000.000 we will see the error become noticeable again. For scientific or financial applications where precision is key there are better data types that can be used such as the long double. Additionally there are libraries out there that improve upon the issue such as Multiple Precision Floating-point Reliable (MPFR) and GNU Multiple Precision (GMP).

Integer Overflow

Similarly when we talk about integer overflow we are facing the problem of our int data type running out of bits to represent real numbers. When that happens it kind of “rolls over” from its maximum which just “overflowed” to its minimum.

The maximum and minimum values of integers are determined by the number of bits used to represent the integer.

In most modern computers, integers are represented using a fixed number of bits, typically 8, 16, 32, or 64 bits. The number of bits used to represent an integer determines the range of values that can be represented by that integer.

For example, an 8-bit integer can represent values from -128 to 127, while a 16-bit integer can represent values from -32768 to 32767. A 32-bit integer can represent values from -2147483648 to 2147483647, while a 64-bit integer can represent values from -9223372036854775808 to 9223372036854775807.

Lets write a program to demonstrate integer overflow in action. In the following program we will have a never ending for-loop which doubles the value of i on each iteration. We use the <unistd.h> header in order to have the sleep function delay each iteration of the loop by one second so we can witness and comprehend the results in real time.

#include <stdio.h>
#include <unistd.h>

int main(void)
{
    for (int i = 1; ; i *= 2)
    {
        printf("%i\n", i);
        sleep(1);
    }
}

1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
33554432
67108864
134217728
268435456
536870912
1073741824
-2147483648
0
0
0

As our i gets in the billions by reaching 1073741824 it tries to double itself one more time which means it would reach the value of 2.147.483.648, just 1 higher than the maximum positive value of an int on my system. Surpassing this value means the int overflows its allocated bit size and falls back into its minimum value.

Likewise with the FPI we can use a “better” data type in order to accomodate our needs of an integer substantially higher than two billion. In this case we could use the long int data type. If you need even higher range you could opt for using the long long int.

#include <stdio.h>
#include <unistd.h>

int main(void)
{
    for (long int i = 1; ; i *= 2)
    {
        printf("%li\n", i);
        sleep(1);
    }
}

Notice how in the printf function we changed the format code from %i to %li to account for our data type being of the long variety this time around. It should take substantially more time to reach the maximum.

1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
33554432
67108864
134217728
268435456
536870912
1073741824
2147483648
4294967296
8589934592
17179869184
34359738368
68719476736
137438953472
274877906944
549755813888
1099511627776
2199023255552
4398046511104
8796093022208
17592186044416
35184372088832
70368744177664
140737488355328
281474976710656
562949953421312
1125899906842624
2251799813685248
4503599627370496
9007199254740992
18014398509481984
36028797018963968
72057594037927936
144115188075855872
288230376151711744
576460752303423488
1152921504606846976
2305843009213693952
4611686018427387904
-9223372036854775808
0
0
0

Works like a charm.

Another way to increase the maximum limit of your integers without using a bigger data type such as long is to use unsigned int over just int. An unsigned int cannot be a negative number but retains its range which means that it will have a greater positive maximum than a normal int. On my machine that maximum is 4.294.967.295.

In conclusion, floating point imprecision and integer overflow are evidently important issues that can affect the accuracy and correctness of a program. You can read some real world examples of problems caused by FPI and IO over at Wikipedia. These examples demonstrate the importance of being aware of floating point imprecision and integer overflow, and taking the appropriate measures to avoid them in order to prevent serious problems in real-world applications.

Resources

If you would like to learn more about how to calculate the minimum and maximum values for data types you can read this great post by Nickolas Teixeira Lanza on medium.
If you would like to learn more about bits and how we’ve used them to design computer programming as we know it check the beginning of CS50 – Introduction to Computer Science by Harvard. Its free !

C Programming

Imprecision and Overflow

Floating Point Imprecision

Integer Overflow

Resources

More posts

Kael Anosh

Map of Fyera

Sword in the stone

Black Lotus