Andrey Karpov

Mar 22 2016

Tags:

#Cpp #Knowledge #64bit

Detecting Overflows of 32-Bit Variables in Long Loops in 64-Bit Programs

Mar 22 2016

Author: Andrey Karpov

One of the problems that 64-bit software developers have to face is overflows of 32-bit variables in very long loops. PVS-Studio code analyzer is very good at catching issues of this type (see the Viva64 diagnostic set). A lot of questions concerning variable overflows are asked at stackoverflow.com. But since my answers may be treated as pure advertisement, rather than useful reference information, I decided to write an article where I could talk about PVS-Studio's capabilities.

A loop is a typical C/C++ construct. When porting software to the 64-bit architecture, loops suddenly become problem spots, as few developers think in advance what would happen if the program had to execute billions of iterations.

In our articles we call such issues 64-bit errors. Actually, these are simple errors. What makes them special is that they manifest themselves only in 64-bit applications. You simply don't have such long loops in 32-bit programs, and it's impossible to create an array of size larger than INT_MAX.

So, we've got a problem: 32-bit types overflow in a 64-bit program. 32-bit types include int, unsigned, and long (if you're working on Win64). We need to find a way to detect all such dangerous spots. The PVS-Studio analyzer can do it, and it is what we are going to talk about.

Let's discuss different scenarios of variable overflows occurring in long loops.

Scenario one. See the corresponding topic at Stack Overflow here: "How can elusive 64-bit portability issues be detected?". We have the following code:

int n;
size_t pos, npos;
/* ... initialization ... */
while((pos = find(ch, start)) != npos)
{
    /* ... advance start position ... */
    n++; // this will overflow if the loop iterates too many times
}

This program processes very long strings. In a 32-bit program, a string can't be of length larger than INT_MAX, so no errors of this kind can occur there. Yes, the program can't process large amounts of data, but it's just limitation of the 32-bit architecture, not a bug.

In a 64-bit program, however, the length of a string can exceed INT_MAX; therefore, the n variable may overflow. The result is undefined behavior. It's a wrong belief that an overflow would simply turn the number 2147483647 into -2147483648. It is literally undefined behavior: you can't predict the consequences. If you don't believe that an overflowed signed variable can cause unexpected changes in program execution, please see my article "Undefined behavior is closer than you think".

OK, we need to check if the n variable can overflow. No problem – we run PVS-Studio on this code and get the following message:

V127 An overflow of the 32-bit 'n' variable is possible inside a long cycle which utilizes a memsize-type loop counter. mfcapplication2dlg.cpp 190

Changing the type of the n variable to size_t will make the error – and the message – disappear.

In the same topic, one more code example is discussed that needs to be checked:

int i = 0;
for (iter = c.begin(); iter != c.end(); iter++, i++)
{
    /* ... */
}

Again, we run PVS-Studio and get warning V127:

V127 An overflow of the 32-bit 'i' variable is possible inside a long cycle which utilizes a memsize-type loop counter. mfcapplication2dlg.cpp 201

That topic at Stack Overflow also brings up the question what one should do when the code base is huge and one needs to find all errors of this kind.

As we have already seen, the PVS-Studio static code analyzer can catch these bugs. Moreover, it's the only way to cope with a large project. The analyzer also provides a convenient user interface to work with multiple diagnostic messages. You can use interactive filters on messages, mark them as false positives, and so on. However, description of PVS-Studio's capabilities is beyond the scope of this article. If you want to learn more about the tool, please see the following resources:

Article PVS-Studio for Visual C++.
Article Best Practices of using PVS-Studio.
Documentation.

By the way, we also have had an experience of porting a large project of 9 million LOC to the 64-bit platform. And PVS-Studio has done that task rather well.

Let's see another topic at Stack Overflow: "Can Klocwork (or other tools) be aware of types, typedefs and #define directives?".

As far as I understand, the programmer has set out to find a tool that could spot all the loops with 32-bit counters. In other words, all the loops where type int is used.

This task is somewhat different from the previous one. But such loops do need to be found and fixed, as you can't use a variable of type int to process huge arrays, and all.

However, the person chose a wrong approach. But it wasn't his fault; he simply didn't know about PVS-Studio. You'll see what I mean in a moment.

So, what he wants to search for is the following construct:

for (int i = 0; i < 10; i++)
    // ...

It's horrible. You would have to look through an enormous number of loops to figure out if they can cause a bug or not. It's a huge amount of work, and I doubt anyone could do it staying focused all the way. So, missing a lot of dangerous fragments seems inevitable.

On the other hand, fixing every single loop by replacing int with, say, intptr_t is not a good idea either. This approach involves too much work and too many changes in the code.

The PVS-Studio analyzer can help here. It won't find the loop from the example above – because it doesn't have to. That loop simply has no room for a bug, as it executes only 10 iterations and will never end up with an overflow. We don't need to waste our time checking that code.

But what the analyzer can find are loops like this one:

void Foo(std::vector<float> &v)
{
  for (int i = 0; i < v.size(); i++)
    v[i] = 1.0;
}

The tool will generate two warnings at once. The first tells us that a 32-bit type is being compared with a memsize-type:

V104 Implicit conversion of 'i' to memsize type in an arithmetic expression: i < v.size() mfcapplication2dlg.cpp 210

Indeed, the i variable's type is not suited for long loops.

The second warning tells us that it is strange to use a 32-bit variable for indexing. If the array is large, the code is incorrect.

V108 Incorrect index type: v[not a memsize-type]. Use memsize type instead. mfcapplication2dlg.cpp 211

The fixed code should look like this:

void Foo(std::vector<float> &v)
{
  for (std::vector<float>::size_type i = 0; i < v.size(); i++)
    v[i] = 1.0;
}

It has become long and ugly, so you may feel tempted to use the auto keyword. But you can't because doing so would make the code incorrect again:

for (auto i = 0; i < v.size(); i++)
  v[i] = 1.0;

Since the constant 0 is of type int, the i variable would be of type int as well. That is, we would end up where we started. By the way, since we've started talking about new features of the C++ standard, I recommend reading the article "C++11 and 64-bit Issues".

I think we could make a trade-off and write a version of that code that is not perfect but still correct:

for (size_t i = 0; i < v.size(); i++)
  v[i] = 1.0;

Note. Of course, an even better solution would be to use iterators or the fill() algorithm, but we are talking about searching for overflows of 32-bit variables in old programs. That's why I don't discuss those fixing techniques in this article - they're just from a different story.

Note that the analyzer is pretty smart and tries not to bother you without a good reason. For example, it won't generate the warning on seeing a code fragment where a small array is processed:

void Foo(int n)
{
  float A[100];
  for (int i = 0; i < n; i++)
    A[i] = 1.0;
}

Conclusion

PVS-Studio analyzer is the leader as far as 64-bit bug search goes. After all, it was originally conceived and created exactly as a tool to help programmers in porting their software to 64-bit systems, and was known as Viva64 at that time. It was only some time later that it turned into a general-purpose analyzer, but the 64-bit diagnostics have always been there, ready to help you out.