PVS-Studio's New Message Suppression Mechanism

Nov 13 2014

Author: Paul Eremeev

The tasks of suppressing false positives and previous analysis results
The new mechanism to associate source code with diagnostics based on message bases files
Statistics on using the new suppression mechanism

The PVS-Studio analyzer already has a false positive suppression mechanism, and it completely suits us when its functionality is concerned, i.e. we have no complaints about its reliability. However, some of our customers would like to work with the messages generated by the analyzer only for new, i.e. freshly written, code. And we can understand why they want it, since we know that the analyzer generates thousands or even dozens of thousands of messages for the existing source code in a large-scale project and surely no one would feel like fixing all of them.

The feature of marking messages as false positives in a sense correlates with the wish to work with 'fresh' messages only as there is theoretically no obstacle that could prevent one from marking every generated message as a false one to work with messages for fresh code only after that.

But the current false positive marking mechanism has a usability issue (we'll speak about it later) that may make it difficult to use it for solving this task in real-life projects. In particular, this mechanism is not suitable for mass marking of messages, which is inevitable when handling thousands of messages.

Since the above mentioned issue is fundamental for the current mechanism, it can't be eliminated while keeping this mechanism intact. That's why we need to consider implementing an alternative approach to solving this task.

The tasks of suppressing false positives and previous analysis results

The mechanism of associating source code and the analyzer's diagnostics implies the possibility to associate a line of source code with a certain diagnostic, and it is important that this connection should be sustained throughout a long period of time during which both the user code and the analyzer's output may change.

The mechanism of associating source code with diagnostics can be used to solve the following two tasks:

The task of suppressing false positives. The user can mark those analyzer-generated messages that he considers to be false positive by special comments so that he could filter such messages in the future (for instance, hide them). Such a markup should be preserved for all subsequent launches of the analyzer, even if the source code is somehow modified. This feature has been available in PVS-Studio for quite a long time now.
The task of suppressing analysis results of previous launches is about enabling the user to see only fresh analysis results (that is, those messages that were generated during the previous launches must not be shown). PVS-Studio hasn't had a solution for this task until recently, and it is this task we are going to discuss in the next section.

PVS-Studio provides a mechanism for associating source code with diagnostics which is based on special markers (comments of a special pattern) placed in the source code. This mechanism is implemented at the level of the analyzer core (PVS-Studio.exe) and IDE plugins. An IDE plugin performs initial placement of markers in the code and also enables filtering of analysis results by these markers. The analyzer core can "pick up" already present markers and use them to adjust its output thus preserving the markings from previous launches.

Let's review what pros and cons the existing mechanism provides.

Pros:

Simple to implement at the level of the analyzer and plugins.
Easy-to-use by tool users, the manual marking capability.
The "code - diagnostic" association is guaranteed to be preserved during next analysis launches regardless of any code modifications introduced by the user.
"Free" support of team development: since markers are stored in source files, they can be synced through the same system as used to sync the files themselves (for instance, a version control system).

Cons:

Cluttering code with special-pattern comments not related to the code execution logic.
The issue when using version control systems - the necessity to submit special comments into the repository.
A potential danger of damaging source code through global mass marking.

The issues above described make it unreasonable in practice to use the existing code-diagnostic associating mechanism for solving the task of suppressing results of previous launches, i.e. mass message marking on the currently existing codebase.

In other words, no one would be willing to add 20,000 comments suppressing current messages without looking at them and to submit all these changes into the version control system.

The new mechanism to associate source code with diagnostics based on message bases files

As was shown earlier, the basic problem with the current mechanism is that it relies on modifying user source code. This implies both obvious pros and cons of this method. What becomes evident is that we need to abandon the practice of modifying user source code if we want to implement an alternative approach, and keep the information about the code-diagnostic association in some external storage instead of source files themselves.

Long-term storage of such association markings sets a fundamental task of taking into account both changes of the analyzer's diagnostics themselves and changes of user source code within a large timeframe. Disappearance of diagnostics from the analyzer's output is not a problem for us as the corresponding message should have been already marked as false/irrelevant. However, changing something in the source code may cause a "second coming" of messages which were already marked before.

This is not an issue when utilizing an old source code marking mechanism. No matter how much a certain code fragment is changed, the marker is still preserved inside it until the user himself (either intentionally or not) removes it, which doesn't seem to be likely. Moreover, an experienced user may add such a marker into a new (or modified) code fragment if he knows for sure that the analyzer will be angry about this code.

What exactly do we need to identify an analyzer-generated diagnostic message? The message itself contains the file name, project name, source file line number for which the message was generated, and hash sums of this line as well as one preceding and one following it. To associate a diagnostic with a code fragment when making changes in the source code, we will definitely have not to take the line number into account as it may change unpredictably after even a slightest modification of the file.

To preserve the above described association we chose the method of creating local "base files". These files (with the .suppress extension) are created in the same folder as the project files (vcproj/vcxproj) and contain lists of diagnostics marked as "irrelevant". These diagnostics are stored without taking into account the line numbers, while the paths to the files these diagnostics were associated with are stored in a relative (to the project files) format. It allows the user to transfer such files between developers' computers even if they have the projects deployed in different locations (from the viewpoint of the file system). These files can be submitted to version control systems as in most cases project files themselves store the paths to the source files in a similar relative format. An exception to this are auto-generated project files like in CMake where the source files tree can be placed independently from the project files tree.

We used the following fields to identify messages in a suppress file:

Diagnostic message text;
Message's error code;
Relative path to the file the message was triggered by;
Hash sums of the triggering code line as well as the preceding and following lines.

As you can see, it is through storing the code lines' hash sums that we are intending to associate messages with user code. If the code shifts, the corresponding message will shift as well, however its context (i.e. the surrounding code lines) will remain the same. And if the user changes the code from which the message was triggered by, it would be just logical to treat this code as "fresh" and allow the message for this code to be shown anew. But if the user has really "fixed" the issue the message was pointing to, it will simply disappear. Otherwise - if the suspicious fragment wasn't "fixed" - the analyzer will again display the message.

It's clear that relying on hash sums of code lines in user files, we will inevitably face certain limitations. For instance, if there are several identical lines in a user file, the analyzer will treat all the messages for such lines as suppressed even if only one was actually marked. In the next section, we will in more detail discuss the problems and limitations we had faced when using the described method.

PVS-Studio's IDE plugins automatically create suppress files during initial message marking and at next analysis launches compare all of the newly generated diagnostics with those contained in the suppress bases - and if a newly generated message is found to be in the base already, it won't be shown again.

Statistics on using the new suppression mechanism

After implementing the first working prototype, we naturally wanted to test it on real-life projects. Rather than waiting for months or years for a large enough amount of changes to accumulate in such projects, we instead just took a number of earlier revisions of some of the large open-source projects.

What did we expect to see? We took some of previous project revisions old enough (which depended on developers' activity, varying between a week and a year), checked it with our analyzer and submitted all the generated messages to the suppress bases. Then we updated a project to its latest head revision and ran the analyzer over it once again. Ideally, we expected to get messages triggered by the new code only, i.e. code fragments written during the timeframe between two selected revisions.

It was when checking the very first project already that we were faced by a number of problems and limitations of our method. Let's discuss them in detail.

First, just as we had actually expected, suppressed messages reappeared if the corresponding code had been changed either in the triggering line itself or the one preceding or following it. While the fact of modifying a message-triggering line leading to the message's "resurrection" looked quite normal, getting the same result when modifying the nearby lines didn't. This is, in particular, one of the main limitations of our method: we rely on a source file's text fragment consisting of 3 lines. Reducing it to one line only doesn't seem reasonable as there's a risk of ending up with too many messages all mixed up. In the project statistics that will be provided further, we named such messages as "doubled", i.e. messages already saved in the suppress bases but popping up for the second time.

Second, we've come across another nuance (or to be more exact, another limitation) to our new mechanism: reappearance of messages in header files when these files were included into other source files in other projects. This limitation is caused by the fact that suppression bases are generated at the level of an IDE project. The same also occurs when new projects are added into a solution, and if these projects are using the same header/source files as well.

Also, it happens that it is a bad idea to rely on the message text to identify the message in the base. Sometimes the message text may contain line numbers of the source code (these change when the code shifts) and names of variables used in the code. We fixed this issue by saving an incomplete message in the base - we have it devoid of any numerals. But when "resurrection" of a message when changing the variable name is concerned, we found it fair and correct, for it's not only the name but also the definition that could have changed. That we treat as new code.

Finally, some messages have "migrated" - it's either that the code was copied into other files or the files themselves were included into other projects, which in fact coincides with the very first issue mentioned above.

Below is the statistics on several projects we have tested our new system on. The large amount of diagnostic messages is due to the fact that every single message was counted - including 64-bit diagnostics that unfortunately tend to produce too many false positives by themselves, which we can't help.

LLVM, a large compiler infrastructure designed for compile-time, link-time, run-time, and idle-time optimization of programs. The project has been in active development for a few years now, so it was enough to take a timeframe within just 1.5 months to get a large number of changes in the code. The well-known Clang compiler is a part of this project. With the project comprised of about 1,600-1,800 files, 52,000 messages were marked as irrelevant. In the revision 1.5 months older, 18,000 new messages were generated, among them 500 doubled and 500 migrated to other files;
Miranda, a Windows instant messaging application widely known all over the world. There have been 11 versions of it since its first release. We took the latest one of them, Miranda IM. Unfortunately, because of conflicts inside the Miranda developer team, this version is changing very slowly: we had to take a timeframe between two revisions as long as 2 years. For 700 project files, 51,000 messages were marked as irrelevant. In the next selected revision, two years older, we got only 41 messages;
ffdShow, a codec used for decoding of video in the MPEG-4 ASP and H.264/MPEG-4 AVC video formats. ffdShow is quite a completed project by the time of writing this article, its last release dating back to April 2013. For this, we chose a timeframe of one year. For 570 files, 21,000 messages were marked as irrelevant. A one-year older version gave us 120 new ones;
Torque3D, a 3D computer game engine. The project development has almost stopped nowadays, but things were quite different when it all started. The latest release by the time of writing this article dates back to May 1st, 2007. At the times of active development, we got 43,259 messages for the first version and 222 new ones a week later;
OpenCV, a library of programming functions mainly aimed at real-time computer vision. It's quite an actively developing project. We took one timeframe of 2 months and another of one year. 50,948 diagnostic messages were marked as irrelevant, 1,174 new ones appearing in the 2-month older version and 19,471 in the one-year older version;

So what conclusions are to be drawn from these results?

It was quite expected that we wouldn't get many new messages for slowly developing projects even in such a long period as one year. Notice that we didn't count the number of doubled and migrated messages for such projects.

But it was "living" projects we were most interested in, of course. In particular, taking the example of LLVM, we can see that the number of new messages makes 34% of the number for the version released just 1.5 months earlier! However, out of these 18,000 new messages only 1000 (500 doubled plus 500 migrated) have to do with the limitations of our new method, i.e. just 5% of the total number of the new messages.

In our opinion, these figures demonstrate very well the viability of our new mechanism. Of course we should remember that this new message suppression mechanism is no cure-all, but nothing prevents one from using the regular, long-existing multiple suppression/filtering mechanisms. For instance, if some message starts "showing off" too often in some header file, it would be OK to "kill" it once and for all by adding a //-Vxxx comment to the corresponding line.

Although the new mechanism is pretty well tuned now and we are ready to offer it to our users in the next release, we decided we should keep testing it by arranging regular (nightly) analysis of the LLVM/Clang project. The new mechanism will allow us to get only messages for freshly written code, so theoretically we could find errors even before the developers themselves do. It's a very good way to demonstrate how much useful it is indeed to use static analysis regularly - and it wouldn't be possible without our new suppression system, for one could never possibly check a pile of 50,000 diagnostic messages every day. Stay with us in twitter for reports about fresh bugs found in Clang.