In order to verify the quality of software, you have to use a lot of different tools, including static and dynamic analyzers. In this article, we'll try to figure out why only one type of analysis, whether static or dynamic, may not be enough for comprehensive software analysis and why it's preferable to use both.
Our team writes a lot about the usefulness of static analysis and the benefits it brings to your projects. We like to run our tool on various open-source projects to find possible bugs, which is our way to popularize the static code analysis method. In its turn, static analysis helps to make programs more high-quality and reliable and reduce the number of potential vulnerabilities. Perhaps everyone who is directly involved in work on source code has that feeling of satisfaction at having bugs fixed. But even if the process of successfully spotting (and fixing) bugs doesn't trigger your endorphins, you surely enjoy the thought of having development expenses reduced thanks to the static analyzer, which has helped your programmers use their time more effectively and efficiently. To find out more about how you can benefit from the use of static analysis in terms of money, see this article. It gives an approximate estimate for PVS-Studio, but those results can be extrapolated to other static analysis tools available on market.
All said above seems to suggest that the purpose of static analysis is to find bugs in the source code as early as possible, thus reducing the expenses on bug fixing. But why do we need dynamic analysis then, and why sticking only to one of the two techniques may be insufficient? Let's give more formal and clear definitions of static and dynamic analyses and try to answer these questions.
Static code analysis is the process of detecting errors and code smells in software's source code. To analyze a program, you don't need to execute it; the analysis will be performed on the available code base. The closest analogy to static analysis is the so called code review except that static analysis is an automated version of code review (i.e. performed by a bot program).
The main pros of static analysis:
The objective cons of static analysis:
It should be noted that static analyzers don't focus exclusively on bug catching. For instance, they can provide recommendations on code formatting. Some tools allow you to check your code for compliance with the coding standard your company sticks to. This includes indentation of various constructs, the use of space/tabulation characters, and so on. In addition, static analysis can be helpful for measuring metrics. A software metric is a quantitative measure of the degree to which a program or its specifications possess some property. See this article to learn about other uses of static analysis.
Dynamic code analysis is the analysis performed on a program at execution time. This means you must have your source code converted into an executable file first. In other words, code containing compilation or build errors can't be checked by this type of analysis. The check is done with a set of input data fed to the program under analysis. That's why the effectiveness of dynamic analysis directly depends on the quality and quantity of the test input data. It is this data that determines the extent of code coverage at the end of the test.
With dynamic testing, you can get the following metrics and warnings:
The main pros of dynamic analysis:
The cons of dynamic analysis:
Dynamic analysis is particularly useful in those areas where program reliability, response time, or resources consumed are the primary concern. A real-time system managing a critical production sector or a database server are some examples of such systems. Any error in these areas can be critical.
Getting back to the question why sticking only to one of the two types of analysis may not be sufficient, let's take a look at a couple of quite trivial examples of bugs that one analysis method has no problems diagnosing while the other is not fit to detect, and vice versa.
The following example is taken from the Clang project:
MapTy PerPtrTopDown;
MapTy PerPtrBottomUp;
void clearBottomUpPointers() {
PerPtrTopDown.clear();
}
void clearTopDownPointers() {
PerPtrTopDown.clear();
}
A static analyzer would point out that the bodies of the two functions are identical. Of course, two functions having identical bodies aren't necessarily a definite sign of a bug, but it is very likely that they have resulted from using the copy-paste technique combined with carelessness on the programmer's side - and that leads to unexpected behavior. In this case, the clearBottomUpPointers method should call the PerPtrBottomUp.clear method. Dynamic analysis wouldn't notice anything wrong in this example because it's an absolutely legitimate piece of code from its point of view.
Another example. Suppose there is a following fragment of code:
size_t index = 0;
....
if (scanf("%zu", &index) == 1)
{
....
DoSomething(arr[index]);
}
Let's take a closer look at the code above. A program user may enter a negative value or a value that exceeds the maximum allowed index of the arr array. This way, the arr array index may be out of bounds. It might seem that a static analyzer will not be able to detect such errors. After all, you can only find out what number the user will enter when the program is actually running. However, modern static analyzers implement quite complex logic within themselves, including the annotation mechanism. Annotations provide various information about arguments, return values, and internal features of methods. Such information can't be found out in the automatic mode. A programmer uses annotations of well-known and widely used functions to teach the analyzer what to expect from a particular function call. So static analyzers can think in terms of "unsafe input data" (tainted data) and monitor whether the resulting value can lead to an error.
From the code example above, the analyzer can understand that the index variable got its value from the scanf function that was annotated. The tool knows that the value of the index variable may be larger than the size of the arr array. That's why the analyzer issues a warning. The message will suggest first checking the variable. After that you can safely access the arr array value by index. For example, in the code below, the author first checks the index variable and then accesses the array value by the index. The analyzer understands this and does not issue a warning.
size_t index = 0;
....
if (scanf("%zu", &index) == 1)
{
....
if (index < arraySize)
DoSomething(arr[index]);
}
PVS-Studio developers have already implemented a similar diagnostic rule with the number V1010. It warns a user that received external data was used without a preliminary check.
With the correct set of input data, dynamic analyzers can also detect the problem above. A certain set of errors can still be found by both a dynamic and a static analyzer. But there are also such errors that can be detected only by one approach.
Check out the following example of code:
void OutstandingIssue(int number)
{
int array[10];
unsigned nCount = MathLibrary::MathFunctions::Abs(number);
memset(array, 0, nCount * sizeof(int));
}
Abs is a static method from the library we use. But we don't have access to the MathLibrary's source code. Imagine an error has crept into this method. If number takes a certain value we get a number that exceeds the size of the arr array. If it happens so, we get an array overrun in the memset function. How can a static analyzer understand that the Abs method can return a number that can exceed the size of the array? No one has annotated un unknown Abs method from an unfamiliar MathLibrary. One can't annotate all methods. If the analyzer gives warnings for all fragments with dubious input data - we find ourselves on a highway to a bunch of false positives. Read more about the philosophy of the PVS-Studio static analyzer in this article.
On the other hand, a dynamic analyzer would have no trouble noticing and pointing out the memory handling error in this code (given that the program is fed the right data).
This article doesn't aim at comparing static and dynamic analyses. There's no single technique that could diagnose the whole variety of software defects. Neither type of analysis can completely replace the other. To improve the quality of your programs, you'll have to use different types of tools so that they complement each other. I hope the examples shown above are persuading enough.
I don't wish to look too biased toward static analysis, but it is this technique that's being most spoken of and, more importantly, included by companies into their CI processes lately. Static analysis acts as one of the steps of the so called quality gates to building a reliable and high-quality software product. We believe static analysis is going to become a standard software development practice in a couple of years, just like unit testing once did.
To wrap up, I'd like to point out once again that dynamic analysis and static analysis are just two different methods, which complement each other. In the end, all these techniques serve the single purpose of increasing software quality and reducing development expenses.
References: