Difficulties of comparing code analyzers, or don't forget about usability

Mar 31 2011

Author: Andrey Karpov , Evgenii Ryzhkov

Introduction
What parameters are just unreasonable to compare
Tool's usability is very important for adequate comparison
Summary

Users' desire to compare different code analyzers is natural and understandable. However, it's not so easy to fulfill this desire as it may seem at first sight. The point is that you don't know what particular factors must be compared.

Introduction

If we eliminate such quite ridiculous ideas like "we should compare the number of diagnosable errors" or "we should compare the number of tool-generated messages", then even the reasonable parameter "signal-to-noise ratio" doesn't seem to be an ideal criterion of estimating code analyzers.

You doubt that it's unreasonable to compare the mentioned parameters? Here you are some examples.

What parameters are just unreasonable to compare

Let's take a simple (at first sight) characteristic like the number of diagnostics. It seems that the more diagnostics, the better. But the general number of rules doesn't matter for the end user who exploits a particular set of operating systems and compilers. Diagnostic rules which are relevant to systems, libraries and compilers he doesn't use won't give him anything useful. They even disturb him overloading the settings system and documentation, and complicate use and integration of the tool.

Here you an analogy: say, a man comes in a store to buy a heater. He is interested in the domestic appliances department and it's good if this department has a wide range of goods. But the customer doesn't need other departments. It's OK if he can buy a inflatable boat, cell phone or chair in this store. But the inflatable boats department doesn't enlarge the range of heaters anyway.

Take, for instance, the Klockwork tool that supports a lot of various systems, including exotic ones. One of them has a compiler that easily "swallows" this code:

inline int x;

The Klocwork analyzer has a special diagnostic message to detect this anomaly in code: "The 'inline' keyword is applied to something other than a function or method". Well, it seems good to have such a diagnostic. But developers using the Microsoft Visual C++ compiler or any other adequate compiler won't benefit from this diagnostic anyhow. Visual C++ simply doesn't compile this code: "error C2433: 'x' : 'inline' not permitted on data declarations".

Another example. Some compilers provide poor support of the bool type. So Klockwork may warn you when a class member is assigned the bool type: "PORTING.STRUCT.BOOL: This checker detects situations in which a struct/class has a bool member".

"They wrote bool in class! How awful..." It's clear that only few developers will benefit from having this diagnostic message.

There are plenty of such examples. So it turns out that the number of diagnostic rules in no way is related to the number of errors an analyzer can detect in a particular project. An analyzer implementing 100 diagnostics and intended for Windows-applications can find much more errors in a project built with Microsoft Visual Studio than a cross-platform analyzer implementing 1000 diagnostics.

The conclusion is the number of diagnostic rules cannot be relevant when comparing analyzers by usability.

You may say: "OK, let's compare the number of diagnostics relevant for a particular system then. For instance, let's single out all the rules to search for errors in Windows-applications". But this approach doesn't work either. There are two reasons for that:

First, it may be that some diagnostic is implemented in one diagnostic rule in some analyzer and in several rules in some other analyzer. If you compare them by the number of diagnostics, the latter analyzer seems better although they both have the same functional to detect a certain type of errors.

Second, implementation of certain diagnostics may be of different quality. For instance, nearly all the analyzers have the search of "magic numbers". But, say, some analyzer can detect only magic numbers dangerous from the viewpoint of code migration to 64-bit systems (4, 8, 32, etc) and some other simply detects all the magic numbers (1, 2, 3, etc). So it won't do if we only write a plus mark for each analyzer in the comparison table.

They also like to take the characteristic of tool's speed or number of code lines processessed per second. But it's unreasonable from the viewpoint of practice either. There is no relation between the speed of a code analyzer and speed of analysis performed by man! First, code analysis is often launched automatically during night builds. You just must "be in time" for the morning. And second, they often forget about the usability parameter when comparing analyzers. Well, let's study this issue in detail.

Tool's usability is very important for adequate comparison

The point is that usability of a tool influences the practice of real use of code analyzers very much...

We have checked the eMule project recently with two code analyzers estimating the convenience of this operation in each case. One of the tools was a static analyzer integrated into some Visual Studio editions. The second analyzer was our PVS-Studio. We at once encountered several issues when handling the code analyzer integrated into Visual Studio. And those issues did not relate to the analysis quality itself or speed.

The first issue is that you cannot save a list of analyzer-generated messages for further examination. For instance, while checking eMule with the integrated analyzer, I got two thousand messages. No one can thoroughly investigate them all at once, so you have to examine them for several days. But the impossibility to save analysis results causes me to re-analyze the project each time, which tires me very much. PVS-Studio allows you to save analysis results for you to continue examining them later.

The second issue is about the way how processing of duplicate analyzer-messages is implemented. I mean diagnosis of problems in header files (.h-files). Say the analyzer has detected an issue in an .h-file included into ten .cpp-files. While analyzing each of these ten .cpp-files, the Visual Studio-integrated analyzer produces the same message about the issue in the .h-file ten times! Here you are a real sample. The following message was generated more than ten times while checking eMule:

c:\users\evg\documents\emuleplus\dialogmintraybtn.hpp(450): 
warning C6054: String 'szwThemeColor' might not be zero-terminated:
Lines: 434, 437, 438, 443, 445, 448, 450

Because of this, analysis results get messy and you have to review almost the same messages. I should say, PVS-Studio has been filtering duplicate messages instead of showing them to user since the very beginning.

The third issue is generation of messages on issues in plug-in files (from folders like C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\include). The analyzer built into Visual Studio is not ashamed to attaint system header files although there is little sense in it. Again, here you are an example. We got several times one and the same message about system files while checking eMule:

1>c:\program files (x86)\microsoft
sdks\windows\v7.0a\include\ws2tcpip.h(729): 
warning C6386: Buffer overrun: accessing 'argument 1', 
the writable size is '1*4' bytes, 
but '4294967272' bytes might be written: 
Lines: 703, 704, 705, 707, 713, 714, 715, 720, 
721, 722, 724, 727, 728, 729

Nobody will ever edit system files. What for to "curse" them? PVS-Studio has never done that.

Into the same category we can place the impossibility to tell the analyzer not to perform mask-check of certain files, for instance, all the files "*_generated.cpp" or "c:\libs\". You may specify exception files in PVS-Studio.

The fourth issue relates to the very process of handling the list of analyzer-generated messages. Of course, you may disable any diagnostic messages by code in any code analyzer. But it can be done at different convenience levels. To be more exact, the question is: should analysis be relaunched to hide unnecessary messages by code or not. In the Visual-Studio-integrated analyzer, you must rewrite codes of messages to be disabled in the project's settings and relaunch the analysis. Sure, you hardly can specify all the "unnecessary" diagnostics, so you will have to relaunch the analysis several times. In PVS-Studio, you can easily hide and reveal messages by code without relaunching the analysis, which is much more convenient.

The fifth issue is filtering of messages not only by code but by text as well. For instance, it might be useful to hide all the messages containing "printf". The analyzer integrated into Visual Studio doesn't have this feature while PVS-Studio has it.

Finally, the sixth issue is convenience of specifying false alarms to the tool. The #pragma warning disable mechanism employed in Visual Studio lets you hide a message only relaunching the analysis. The mechanism in PVS-Studio lets you mark messages as "False Alarm" and hide them without relaunching the analysis.

All the six above mentioned issues don't relate to code analysis itself yet they are very important since usability of a tool is that very integral index showing whether it will come to estimating analysis quality at all.

Let's see what we've got. The static analyzer integrated into Visual Studio checks the eMule project several times quicker than PVS-Studio. But it took us 3 days to complete work with the Visual Studio's analyzer (actually it was less but we had to switch to other tasks to have a rest). PVS-Studio took us only 4 hours to complete the work.

Note. What the quantity of errors found is concerned - the both analyzers have shown almost the same results and found the same errors.

Summary

Comparison of two static analyzers is a very difficult and complex task. And there is no answer to the question what tool is the best IN GENERAL. You can only speak of what tool is better for a particular project and user.