Static and Dynamic Code Analysis

As a PVS-Studio's developer, I am often asked to implement various new diagnostics in our tool. Many of these requests are based on users' experience of working with dynamic code analyzers, for example Valgrind. Unfortunately, it is usually impossible or hardly possible for us to implement such diagnostics. In this article, I'm going to explain briefly why static code analyzers cannot do what dynamic analyzers can and vice versa. Each of these analysis methodologies has its own pros and cons; and one cannot replace the other, but they do complement each other very well.

Static code analysis is the process of detecting errors and defects in software's source code. Static analysis can be viewed as an automated code review process.

Dynamic code analysis is the method of analyzing an application while it is running.

I often come across similar suggestions by users that can be formulated as follows:

Your tool is wonderful. But it can't detect certain errors. For example, Valgrind has recently found this or that bug while PVS-Studio kept silent about it. I wish you added diagnostics for this and that type of bugs into PVS-Studio. Then I could do with just one tool and that would be very convenient.

Sure, it would be much more convenient and easy to work with one tool only. But, unfortunately, static analyzers cannot perform many types of analysis dynamic analyzers are capable of. It doesn't mean that static analysis is worse; it's just that these are two different technologies that complement each other. Now I will try to explain the limitations of both.

Suppose you have a function like this:

void OutstandingIssue(const char *strCount)
{
  unsigned nCount;
  sscanf_s(strCount, "%u", &nCount);
  
  int array[10];
  memset(array, 0, nCount * sizeof(int));
}

A static code analyzer cannot figure out whether or not an array overrun may occur here. If the string to be converted into a number is read from a file, this error is just impossible.

Well, reading a string from a file is a borderline case, so let's make the task a bit simpler. Suppose a string is formed somewhere in another function. Then the analysis is theoretically possible. In practice, however, it is enormously difficult: you need to find out the code execution sequence, which values the variables can take, what data will be written into buffers in memory, and if an array overrun is possible.

I've mentioned the case when data are stored in a string to show you how hard a static analyzer's job is. And there are many other difficulties it has to face when carrying out analysis. The later a value is used after it was calculated, the more work it takes to analyze this situation. When the fragment where a string is formed is separated from the fragment where it is used by several function calls, I can't even imagine how complex the analyzer should be and how much memory it will need to handle this task. The number of all the possible states and variable values is growing immensely fast. To find an error like that, you will need in fact to execute the program virtually - with all the possible branches covered. It is a very complicated algorithmic task which will also need enormous computational resources which are simply impossible to provide.

The important thing is, you just don't have to fight all these difficulties! What static analysis finds incredibly hard to do is just a piece of cake for dynamic analysis. A dynamic analyzer detects when a marker is deleted after an array, which indicates an array overrun.

Moreover, a dynamic analyzer can detect this error even when a string is read from a file!

Does it mean the dynamic analysis technology is better? Perhaps we just need to improve it a bit so that dynamic analyzers can do the job of static analyzers too?

And the answer is again - no, we can't do that; it's just impossible. Some tasks can be easily solved by static analysis but can't be solved by dynamic analysis.

Static analysis works with the source code of a program and can notice anomalies which don't exist for a dynamic analyzer. Let's discuss one example.

The code fragment below is flawless from the viewpoint of a dynamic analyzer. This code compares a part of a buffer - nothing to worry at. It's a very frequent situation when the memcmp() function compares not the entire buffer, but only a part of it; and it happens very often that you need to use only a part of a buffer. A dynamic analyzer has nothing to be angry with in this code.

But a static analyzer examines the code and figures out that the number of bytes being compared is very likely to have been incorrectly calculated. This sample was taken from a real open source project:

const unsigned char stopSgn[2] = {0x04, 0x66};
....
if (memcmp(stopSgn, answer, sizeof(stopSgn) != 0))
  return ERR_UNRECOGNIZED_ANSWER;

The error is about a parenthesis put in a wrong place. A static analyzer can detect this anomaly quick and easy and inform the user about it. From the viewpoint of dynamic analysis, everything's alright here: one byte is being compared - so what? Comparing just one byte is a common situation, especially in macros.

Conclusion. We have discussed two types of errors each of which can be detected by only one of the two code analyzers, this limitation determined by their fundamental working principles. So, however pitiful it is, we can't do with one type of tools only. The best result can be achieved only when using both static and dynamic analysis together.

References:

Terminology. Static code analysis.
Terminology. Dynamic code analysis.
Andrey Karpov. Myths about static analysis. The third myth - dynamic analysis is better than static analysis.
Andrey Karpov. How to complement TDD with static analysis.