Part 2: Upsetting Opinions about Static Analyzers

Oct 16 2020

Author: Andrey Karpov

Joke to the topic
Our view of this case
PVS-Studio is right once again
Note for the most meticulous readers
Additional links

By writing the article "Upsetting opinions about static analyzers" we were supposed to get it off our chest and peacefully let it all go. However, the article unexpectedly triggered robust feedback. Unfortunately, the discussion went in the wrong direction, and now we will make a second attempt to explain our view of this situation.

Joke to the topic

It all started with the article "Upsetting opinions about static analyzers". It came into a question on some resources and that discussion reminded me of an old joke.

Some tough Alabama loggers once got a fancy Japanese gas chainsaw.

They gathered around and decided to give it a try.

Started it up and shoved it a thin tree.

"Whoosh," said the Japanese saw.

"Oh, my ..." said the loggers.

They gave it a thicker tree. "Whoooosh!", said the saw.

"Ugh, damn it!" said the loggers.

A thick cedar came next. "WH-WH-WH-WH-OOOOOOSH!!!" said the saw.

"Wow, holy shit!!"— said the loggers.

They gave it a crowbar. "BANG!" said the saw.

"Oh crap, gotcha!!!"— reproachfully yelled tough Alabama loggers! And went to cut down forest with axes...

This story is just one and the same. People looked at this code:

if (A[0] == 0)
{
  X = Y;
  if (A[0] == 0)
    ....
}

And started coming up with cases when it might be justified, which means that the PVS-Studio analyzer warning was a false-positive. Some speculations about the change in memory between two checks came into play which occurs due to:

running parallel threads;
signal/interrupt handlers;
the variable X is a reference to the element A[0];
hardware, such as performing DMA operations;
and so on.

After heated debate on the analyzer's inability to comprehend all cases, they left to cut down forest with axes. In other words, they found an excuse why they could still avoid using a static code analyzer in their work.

Our view of this case

This approach is counterproductive. An imperfect tool may well be useful, and its use will be economically feasible.

Yes, any static analyzer issues false-positive warnings. There's nothing we can do about it. However, this misfortune is greatly exaggerated. In practice, static analyzers can be configured and used in various ways to suppress and deal with false positives (see 1, 2, 3, 4). In addition, it is fitting here to recall the article "False positives are our enemies, but may still be your friends".

On the other hand, even this is not the main thing. Special cases of exotic code do not make sense to consider at all! Can complex code confuse the analyzer? Yes, it can. At the same time, for one such case, there will be hundreds of useful analyzer findings. You can find and fix a lot of errors at the earliest stage. As for one or two false positives, they will be safely suppressed and will not bother you any longer.

PVS-Studio is right once again

This is where the article could end. Nevertheless, some may consider the previous section to be not rational considerations, but attempts to hide the weaknesses and shortcomings of the PVS-Studio tool. So, we'll have to continue.

Let's take look at the actual compiled code with variable declarations:

void SetSynchronizeVar(int *);

int foo()
{
    int flag = 0;
    SetSynchronizeVar(&flag);

    int X, Y = 1;

    if (flag == 0)
    {
        X = Y;
        if (flag == 0)
            return 1;
    }
    return 2;
}

The PVS-Studio analyzer reasonably issues a warning: V547 Expression 'flag == 0' is always true.

It is perfectly right. If someone starts ranting that a variable can change in another thread, in a signal handler, and so on, they just don't understand the C and C++ language. You just mustn't write code in such a way.

The compiler has the right to throw out the second check for optimization purposes and will be absolutely right. From the language point of view, the variable can't change. Its background change is nothing more than undefined behavior.

For the check to remain in place, the variable must be declared as volatile:

void SetSynchronizeVar(volatile int *);

int foo()
{
    volatile int flag = 0;
    SetSynchronizeVar(&flag);
    ....
}

The PVS-Studio analyzer knows about this and no longer issues a warning for such code.

Here we go back to what was discussed in the first article. There is no problem. Whereas what we have here is criticism or misunderstanding as to why the analyzer has the right to issue a warning.

Note for the most meticulous readers

Some readers may return to the synthetic example from the first article:

char get();
int foo(char *p, bool arg)
{
    if (p[1] == 1)
    {
        if (arg)
            p[0] = get();
        if (p[1] == 1)          // Warning
            return 1;
    }
    // ....
    return 3;
}

And add volatile:

char get();
int foo(volatile char *p, bool arg)
{
    if (p[1] == 1)
    {
        if (arg)
            p[0] = get();
        if (p[1] == 1)          // Warning :-(
            return 1;
    }
    // ....
    return 3;
}

After that, it is fair to note that the analyzer still issues the warning V547 Expression 'p[1] == 1' is always true.

Hooray, finally the analyzer is obviously wrong :). This is a false positive!

As you see, we do not hide any shortcomings. When analyzing the data flow for array elements, this unfortunate volatile was lost. This flaw has already been found and fixed. The edit will be available in the next analyzer version. There will be no false positive.

Why wasn't this bug detected earlier? Because in fact, this is again contrived code that is not found in real projects. Truth be told, we haven't seen such code yet, although we have checked a lot of open projects.

Why is the code unrealistic? First, in practice, there will be some sort of synchronization or delay function between the two checks. Second, no one in their right mind creates arrays consisting of volatile elements unless absolutely necessary. Working with such an array is a huge drop in performance.

Let's recap. You can easily create examples where the analyzer makes mistakes. But from a practical point of view, the identified flaws practically do not affect the quality of code analysis and the number of real errors detected. After all, the code of real applications is just code that is understandable to both the analyzer and the person, and is not a quiz or a puzzle. If the code is a puzzle, then there are other things to worry about :).

Thanks for your attention.