This method refers to the detection of pattern-based errors. For example, when the variable is assigned to itself:
acx->window_size = acx->window_size;
Although these errors are obvious, and the diagnostic rules to detect them are usually simple, a static analyzer is still needed to search for them. We have given an example of non-synthetic code above. The PVS-Studio analyzer has detected the bug in the Linux Kernel. Let's take a look at this:
int wl12xx_acx_config_hangover(struct wl1271 *wl)
{
....
acx->recover_time = cpu_to_le32(conf->recover_time);
acx->hangover_period = conf->hangover_period;
acx->dynamic_mode = conf->dynamic_mode;
acx->early_termination_mode = conf->early_termination_mode;
acx->max_period = conf->max_period;
acx->min_period = conf->min_period;
acx->increase_delta = conf->increase_delta;
acx->decrease_delta = conf->decrease_delta;
acx->quiet_time = conf->quiet_time;
acx->increase_time = conf->increase_time;
acx->window_size = acx->window_size; // <=
....
}
The developer didn't notice the error while they monotonously wrote similar pieces of code. When developers review code, such errors can slip away because people quickly lose focus. The tireless static code analysis comes to the rescue!
Such simple errors aren't as rare as they may seem, and can lie in the code of well-known projects. You can see some other examples in our bug collection.
As mentioned above, the pattern-based analysis is simple enough. However, developers rarely use it on its own because it requires additional info, for example, about types. It helps reduce the number of false positives.
Regular expressions can detect some errors, but the analysis quality will be poor because there will be many false negatives and false positives. As a result, some errors won't be detected, and the analyzer will issue false positives about errors where there are none.
Look at the example of searching for repeated conditions in the if-else-if constructions:
if (A < B) { .... }
else if (B > A) { .... }
The second condition is always false because it duplicates the first. This is a classic typo pattern — here you can check examples from real applications.
If we use regular expressions to search for repeated conditions, it becomes a very challenging task because we have to consider many options of how this error pattern may behave:
if (A < B) ....
else if (A < B) ....
if (A & B == 0) ....
else if (0 == B & A) ....
if (A < B) ....
else if (x == y) .... // mid-checks
else if (A < B) ....
As a result, a small part of such typos is detected.
It's even more complicated than that. The static analyzer should handle extra data. For example, it should check that operators aren't overloaded, or that there is a variable change in the condition:
if ((A = get()) < B) ....
else if ((A = get()) < B) .... // no need to issue a warning
So, regular expressions aren't always the best approach, even in seemingly simple cases. So, while regular expressions are used for some static analysis cases, their scope is extremely limited.
Let's look at the PVS-Studio code analyzer, or rather, a core for analyzing C and C++ code. It uses regular expressions in only 5 out of about 700 diagnostic rules (at the time of writing the terminology).
The pattern-based analysis in PVS-Studio and other modern analyzers is designed upon searching detecting patterns and regularities while traversing the syntax tree. The data about control flows, data flows, and so on are needed to enhance the analysis quality.
Additional links
0