Error on verge of extinction, or why I put if (x = 42) in Red List of C & C++ bugs

May 30 2024

Author: Andrey Karpov

If we ask a programmer what bugs are the most common in C and C++ code, they'll name a null pointer dereference, undefined behavior, array overrun, and other typical error patterns. They may name an accidental assignment in condition as well. However, let's see if this error is common today.

C and C++ use the = symbol for the assignment operator and == for the comparison operator. That's why we may make typos when we write = instead of == and get compliable but incorrectly operating code.

These errors look like this:

int abcd = foo();
if (abcd = -1)

There are two unpleasant things going on at once:

The condition is always true or false, depending on what is assigned to the variable.
The correct value has been lost. If the abcd variable is used elsewhere, its value will be invalid (corrupted).

These errors are simple yet common and notorious among programmers. Although devs don't film movies about bugs, they make memes about them.

The bugs look like they're from these memes, though. For example, I've found the following code in the AMT SDK project:

if (status = true)
{
  PrintSuccess();
}

Because of this bug, developers invented the Yoda notation: a programming style where the constant is placed on the left side of the comparison operator.

if (-1 == abcd)

This style was meant to prevent a typo. If a programmer writes = instead of ==, the code won't compile.

However, this notation hasn't caught on in real cases. At least, I've seen it very rarely in open-source projects.

Btw, I don't like this coding style too. First, it makes the code a bit more complicated: you look at the constant and only then realize what the code is about and what is checked.

Second, the coding style won't help if two variables are compared with each other:

int x = getx();
int y = gety();
if (x = y)

In my opinion, it's better to declare all variables that are further used read-only as const. This reduces the likelihood that typos or failed refactoring will cause accidental modification.

const int abcd = foo();
if (abcd = -1)            // The compilation error
{
  const int x = getx();
  const int y = gety();
  if (x = y)              // The compilation error

Sorry, I got a little distracted. In the beginning, I talked about the rarity of such bugs and the Red List. Let's get back to the main topic of the article.

Once I was preparing for a conference talk, where I've discussed how expectations to find errors in code align with reality. To prove it, I researched our collection of bugs found over several years of writing articles about open-source project checking.

Sometimes, expectations match reality. For example, programmers suppose that there are many bugs in C and C++ code related to null pointer dereference and array overrun. Indeed, such bugs are common.

Sometimes, expectations don't match reality. A programmer is very likely to say something about division by zero, but there are few such errors in reality. At least if we compare it to many other bugs.

Of course, I couldn't help but look at assignment errors in conditions. It seems like there should be a mountain of them.

But, surprise, surprise, that such errors are quite rare in open-source projects! We only found 14 cases. Fun fact, but the last error we wrote out was in 2016!

It's amazing how high expectations are, and there are no errors!

Let's dig deeper into the statistics again. With the V559 diagnostic rule, the PVS-Studio analyzer detects errors related to an assignment in a condition. The rule was introduced in PVS-Studio 4.12 on February 7, 2011.

For some time, the analyzer found these errors, but gradually they became fewer and fewer. The last one we found was in 2016. Although we still write many articles about checking open-source projects, it has been eight years since we last saw errors like that.

I think this is an amazing and nice example of how some kinds of errors go extinct due to development of programmers' tools. I don't think that such errors have died off because everyone is aware of them, and now developers write code more carefully. As our experience shows, typos are still as common as they're 10 years ago.

So, it looks like the tools affect it. At first, static analyzers detected such errors. Then the compilers caught it up and started to detect it too. Generally, there are no compilers that will ignore an assignment in a condition. If we write such code now, and we'll see warnings all around: an IDE highlights the code, compilers and static analyzers issue warnings. We just can't help but see the error. Gorgeous!

Of course, I'm not saying that there are no programs with these bugs at all. We encounter such programs, but they're very rare. In addition, in my experience, we haven't seen them in a long time when checking different projects.

By the way, doesn't it mean that static analyzers will become unnecessary as compilers issue more and more warnings? No, of course not. The analyzer development doesn't stand still, either. They are working on to detect more and more complex and intricate errors. Analyzers comprises such technologies as interprocedural and intermodular analysis, data-flow analysis, symbolic execution, taint analysis, and so on. In addition, they can afford to run longer and use more memory than compilers. This means we can go deeper and analyze more complex data.

We may say that static analyzers set the direction of compilers development in terms of detecting bugs. We've even noticed how diagnostic rules from PVS-Studio move into compilers. Our team is proud that PVS-Studio is a source of inspiration for new things coming out. Let good diagnostic rules influence the tools. This way, there will be fewer errors. Meanwhile, we'll write more rules. After all, someone has to find bugs in compilers :)

auto FirstSemiPos = Line.find(';');
if (FirstSemiPos == std::string::npos)
  continue;

auto SecondSemiPos = Line.find(';', FirstSemiPos + 1);
if (FirstSemiPos == std::string::npos)  // <= SecondSemiPos should be checked
  continue;

Here's an example of the typo in the LLVM code — the PVS-Studio analyzer issued the following warning: V547 Expression 'FirstSemiPos == std::string::npos' is always false. UnicodeNameMappingGenerator.cpp 46.

If you wish, you can look at other bugs in compilers.

Finally, let's look back at the assignment in the condition:

int abcd;
while (abcd = foo())

What if the code author really wants to assign and check at the same time?

The first option: if we write in C++ and we need the variable only inside the body of the if/while instruction, we can declare it directly in the condition.

while (int abcd = foo())

Now it's obvious, that there's a variable declaration with initialization and subsequent check.

The second option: almost all developers of compilers and code analyzers have an unspoken agreement that additional parentheses are a sign that everything is OK.

while ((abcd = foo()))

If a programmer has written the code like this, it almost sounds like, "I know what I am doing, this code is correct". In this case, compilers and analyzers don't issue the warning.