Jan 31 2013

Dynamic code analysis

Jan 31 2013

Dynamic code analysis is the method of analyzing an application right during its execution. The dynamic analysis process can be divided into several steps: preparing input data, running a test program launch and gathering the necessary parameters, and analyzing the output data. When performing the test launch, the program can be executed both on a real and a virtual processor. The source code should be necessarily compiled into an executable file, i.e. you can't use this method to analyze a code containing compilation or build errors.

Dynamic analysis can be performed on programs written in various programming languages: C, C++, Java, C#, PHP, Python, Erlang, and many others.

There exist special dynamic code analysis utilities intended for program launch and output data gathering and analysis. Many contemporary development environments already have dynamic analysis tools as one of its modules. Such is, for example, the Microsoft Visual Studio 2012's debugger and profiler.

Dynamic analysis tools differ in the way they interact with the program being checked:

source code instrumentation - a special code is added into the source code before compilation to detect errors;
object code instrumentation - a special code is added directly into the executable file;
compilation stage instrumentation - a checking code is added through special compiler switches (this mode is supported, for instance, by the GNU C/C++ 4.x compiler);
the tool doesn't change the source code; instead, it uses special execution stage libraries - special debugging versions of system libraries are used to detect errors.

Dynamic analysis is executed by passing a set of data to the input of the program being checked. That's why the efficiency of analysis depends directly on the quality and quantity of the input test data. It's them that the fullness of code coverage obtained through the test depends on.

Dynamic testing can provide the following metrics to you:

resources consumed - the time of program execution on the whole or its modules individually, the number of external queries (for example, to the database), the number of memory being used, and other resources;
cyclomatic complexity, the degree of code coverage with tests, and other program metrics;
program errors - division by zero, null pointer dereferencing, memory leaks, "race conditions";
vulnerabilities in the program.

Dynamic testing is most important in the areas where program reliability, response time, and consumed resources are the crucial criteria. It may be, for instance, a real-time system managing a responsible production area or a database server. Any bug that occurs in these systems may be critical.

Dynamic testing can be performed on the principles of white box and black box. Their only difference is that you have information about the program code in case of the "white box", while you don't have it in case of the "black box". There also exists the so called "gray box" method when you know the program structure but these data are not used in the testing itself.

When performing dynamic testing, you also have to minimize the influence of instrumentation on execution of the program being tested (temporal characteristics, resources consumed, or program errors).

Dynamic testing allows you to make sure that the product works well or reveals errors showing that the program doesn't work. The second goal of the testing is a more productive one from the viewpoint of quality enhancement, as it doesn't allow you to ignore the program drawbacks. But if no defects have been revealed during the testing, it doesn't necessarily mean there are no any at all. Even 100% code coverage with tests doesn't mean there are no errors in the code, since dynamic testing cannot reveal logic errors. Another important aspect is whether testing utilities have errors themselves.

A separate task third-party utilities may be used to solve is creation of input test data. Some utilities use the following method: they mark the input data and track their movement as the program is being executed. At the next iteration of test launch the utility will generate a new set of input parameters, and son on - until it gets the needed result.

Thus, dynamic analysis has both weak and strong points.

The pros of dynamic code analysis are the following:

In most cases, generation of false positives is impossible, as error detection occurs right at the moment of its occurrence; thus, the error detected is not a prediction based on the analysis of the program model, but a statement of the fact of its occurrence;
Usually you don't need the source code; it allows you to test proprietary code.

These are the cons of dynamic code analysis:

Dynamic analysis detects defects only on the route defined by the concrete input data; defects in other code fragments won't be found;
It cannot check the correctness of code operation, i.e. if the code does what it must;
Significant computational resources are required to perform the testing;
Only one execution path can be checked at each particular moment, which makes you run the test many times to provide as complete testing as possible;
When the test is run on a real processor, execution of incorrect code may have unpredictable consequences.

Having its own weak and strong points, the dynamic analysis technology can be used most effectively together with the static analysis technology.