What's the Use of Dynamic Analysis When You Have Static Analysis?

In order to verify the quality of software, you have to use a lot of different tools, including static and dynamic analyzers. In this article, we'll try to figure out why only one type of analysis, whether static or dynamic, may not be enough for comprehensive software analysis and why it's preferable to use both.

Our team writes a lot about the usefulness of static analysis and the benefits it brings to your projects. We like to run our tool on various open-source projects to find possible bugs, which is our way to popularize the static code analysis method. In its turn, static analysis helps to make programs more high-quality and reliable and reduce the number of potential vulnerabilities. Perhaps everyone who is directly involved in work on source code has that feeling of satisfaction at having bugs fixed. But even if the process of successfully spotting (and fixing) bugs doesn't trigger your endorphins, you surely enjoy the thought of having development expenses reduced thanks to the static analyzer, which has helped your programmers use their time more effectively and efficiently. To find out more about how you can benefit from the use of static analysis in terms of money, see this article. It gives an approximate estimate for PVS-Studio, but those results can be extrapolated to other static analysis tools available on market.

All said above seems to suggest that the purpose of static analysis is to find bugs in the source code as early as possible, thus reducing the expenses on bug fixing. But why do we need dynamic analysis then, and why sticking only to one of the two techniques may be insufficient? Let's give more formal and clear definitions of static and dynamic analyses and try to answer these questions.

Static code analysis is the process of detecting errors and code smells in software's source code. To analyze a program, you don't need to execute it; the analysis will be performed on the available code base. The closest analogy to static analysis is the so called code review except that static analysis is an automated version of code review (i.e. performed by a bot program).

The main pros of static analysis:

Bug detection at the early development stages. This helps to make bug fixing much cheaper because the earlier a defect is detected, the easier - and, therefore, the cheaper - it is to fix.
It allows you to precisely locate the potential bug in the source code.
Full code coverage. No matter how often one block of code or another gets control while executing, static analysis checks the entire code base.
Easy to use. You don't need to prepare any input data sets to do a check.
Static analyzers detect typos and copy-paste related mistakes fairly quickly and easily.

The objective cons of static analysis:

Inevitable false positives. A static analyzer can get angry about code fragments that actually don't have any bugs in them. Only the programmer can solve this problem and mark a warning as a false positive, which means it will take some of their working time.
Static analysis is generally bad at detecting memory leaks and concurrency related errors. To detect such errors, you'd in fact have to execute some part of the program in virtual mode, which is an extremely difficult task. Besides, such algorithms would require too much memory and CPU time. Static analyzers typically don't go any farther than analyzing some simple cases. Dynamic analyzers are more fit to diagnose memory leaks and concurrency related errors.

It should be noted that static analyzers don't focus exclusively on bug catching. For instance, they can provide recommendations on code formatting. Some tools allow you to check your code for compliance with the coding standard your company sticks to. This includes indentation of various constructs, the use of space/tabulation characters, and so on. In addition, static analysis can be helpful for measuring metrics. A software metric is a quantitative measure of the degree to which a program or its specifications possess some property. See this article to learn about other uses of static analysis.

Dynamic code analysis is the analysis performed on a program at execution time. This means you must have your source code converted into an executable file first. In other words, code containing compilation or build errors can't be checked by this type of analysis. The check is done with a set of input data fed to the program under analysis. That's why the effectiveness of dynamic analysis directly depends on the quality and quantity of the test input data. It is this data that determines the extent of code coverage at the end of the test.

With dynamic testing, you can get the following metrics and warnings:

Resources used: execution time of the entire program or its individual parts, the number of external queries (for instance, to a database), the amount of RAM and other resources used by the program.
The extent of code coverage by tests and other metrics.
Software bugs: division by zero, null dereference, memory leaks, race conditions.
Some security vulnerabilities.

The main pros of dynamic analysis:

You don't have to have access to the program's source code to analyze it. It should be noted, however, that dynamic analysis tools are differentiated by the way they interact with the program under analysis (this is discussed in more detail here). For example, one quite common dynamic analysis technique involves code instrumentation before the check, i.e. the addition of special code fragments to the application's source code for the analyzer to be able to diagnose errors. In that case, you do need to have the source code of the program at hand.
It can detect complex memory handling errors such as indexing beyond array bounds and memory leaks.
It can analyze multithreaded code at execution time, thus detecting potential problems that have to do with access to shared resources or possible deadlocks.
Most implementations of dynamic analyzers don't generate false positives since errors get caught as they occur. Therefore, a warning issued by a dynamic analyzer is not a prediction made by the tool based on the analysis of the program model but a mere statement of the fact that an error has occurred.

The cons of dynamic analysis:

Full code coverage is not guaranteed. That is, you are very unlikely to get 100% coverage by dynamic testing.
Dynamic analyzers are bad at detecting logic errors. For example, an always true condition is not a bug from a dynamic analyzer's perspective since such an incorrect check simply disappears earlier at the compilation step.
It's more difficult to precisely locate the error in the code.
Dynamic analysis is more difficult to use in comparison with static analysis as you need to feed enough data to the program to get better results and attain as full code coverage as possible.

Dynamic analysis is particularly useful in those areas where program reliability, response time, or resources consumed are the primary concern. A real-time system managing a critical production sector or a database server are some examples of such systems. Any error in these areas can be critical.

Getting back to the question why sticking only to one of the two types of analysis may not be sufficient, let's take a look at a couple of quite trivial examples of bugs that one analysis method has no problems diagnosing while the other is not fit to detect, and vice versa.

The following example is taken from the Clang project:

MapTy PerPtrTopDown;
MapTy PerPtrBottomUp;
void clearBottomUpPointers() {
  PerPtrTopDown.clear();
}
void clearTopDownPointers() {
  PerPtrTopDown.clear();
}

A static analyzer would point out that the bodies of the two functions are identical. Of course, two functions having identical bodies aren't necessarily a definite sign of a bug, but it is very likely that they have resulted from using the copy-paste technique combined with carelessness on the programmer's side - and that leads to unexpected behavior. In this case, the clearBottomUpPointers method should call the PerPtrBottomUp.clear method. Dynamic analysis wouldn't notice anything wrong in this example because it's an absolutely legitimate piece of code from its point of view.

Another example. Suppose there is a following fragment of code:

size_t index = 0;
....
if (scanf("%zu", &index) == 1)
{
  ....
  DoSomething(arr[index]);
}

Let's take a closer look at the code above. A program user may enter a negative value or a value that exceeds the maximum allowed index of the arr array. This way, the arr array index may be out of bounds. It might seem that a static analyzer will not be able to detect such errors. After all, you can only find out what number the user will enter when the program is actually running. However, modern static analyzers implement quite complex logic within themselves, including the annotation mechanism. Annotations provide various information about arguments, return values, and internal features of methods. Such information can't be found out in the automatic mode. A programmer uses annotations of well-known and widely used functions to teach the analyzer what to expect from a particular function call. So static analyzers can think in terms of "unsafe input data" (tainted data) and monitor whether the resulting value can lead to an error.

From the code example above, the analyzer can understand that the index variable got its value from the scanf function that was annotated. The tool knows that the value of the index variable may be larger than the size of the arr array. That's why the analyzer issues a warning. The message will suggest first checking the variable. After that you can safely access the arr array value by index. For example, in the code below, the author first checks the index variable and then accesses the array value by the index. The analyzer understands this and does not issue a warning.

size_t index = 0;
....
if (scanf("%zu", &index) == 1)
{
  ....
  if (index < arraySize)
    DoSomething(arr[index]);
}

PVS-Studio developers have already implemented a similar diagnostic rule with the number V1010. It warns a user that received external data was used without a preliminary check.

With the correct set of input data, dynamic analyzers can also detect the problem above. A certain set of errors can still be found by both a dynamic and a static analyzer. But there are also such errors that can be detected only by one approach.

Check out the following example of code:

void OutstandingIssue(int number)
{
    int array[10];
    unsigned nCount = MathLibrary::MathFunctions::Abs(number);

    memset(array, 0, nCount * sizeof(int));
}

Abs is a static method from the library we use. But we don't have access to the MathLibrary's source code. Imagine an error has crept into this method. If number takes a certain value we get a number that exceeds the size of the arr array. If it happens so, we get an array overrun in the memset function. How can a static analyzer understand that the Abs method can return a number that can exceed the size of the array? No one has annotated un unknown Abs method from an unfamiliar MathLibrary. One can't annotate all methods. If the analyzer gives warnings for all fragments with dubious input data - we find ourselves on a highway to a bunch of false positives. Read more about the philosophy of the PVS-Studio static analyzer in this article.

On the other hand, a dynamic analyzer would have no trouble noticing and pointing out the memory handling error in this code (given that the program is fed the right data).

This article doesn't aim at comparing static and dynamic analyses. There's no single technique that could diagnose the whole variety of software defects. Neither type of analysis can completely replace the other. To improve the quality of your programs, you'll have to use different types of tools so that they complement each other. I hope the examples shown above are persuading enough.

I don't wish to look too biased toward static analysis, but it is this technique that's being most spoken of and, more importantly, included by companies into their CI processes lately. Static analysis acts as one of the steps of the so called quality gates to building a reliable and high-quality software product. We believe static analysis is going to become a standard software development practice in a couple of years, just like unit testing once did.

To wrap up, I'd like to point out once again that dynamic analysis and static analysis are just two different methods, which complement each other. In the end, all these techniques serve the single purpose of increasing software quality and reducing development expenses.

References:

Terminology. Static code analysis.
Terminology. Dynamic code analysis.
Andrey Karpov. Static and Dynamic Code Analysis.
Andrey Karpov. Myths about static analysis. The third myth - dynamic analysis is better than static analysis.
Andrey Karpov. PVS-Studio ROI.