To get a trial key
fill out the form below
Team License (standard version)
Enterprise License (extended version)
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Request our prices
New License
License Renewal
--Select currency--
USD
EUR
GBP
RUB
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
I am interested to try it on the platforms:
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Message submitted.

Your message has been sent. We will email you at


If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

>
>
What do static analysis and search engi…

What do static analysis and search engines have in common? A good "top"!

Apr 18 2012

Developers of search engines like Google/Yandex and developers of static code analysis tools to some extent solve the same task. Both have to provide users with a certain selection of resources that meet users' wishes. Well, of course search engines' developers would like to confine themselves just to the button "I'm Feeling Lucky!", while developers of static code analysis tools want to generate a list of real errors only. But reality imposes constrains, as usual. Do you want to know how we fight the cruel reality while developing PVS-Studio?

OpenMP support in PVS-Studio had been dropped after version 5.20. If you have any questions, feel free to contact our support.

Introduction

So, what is the task of search systems in the conditions of existing restrictions? Without pretending to fully cover this issue, I'll tell you that a search system should give several answers to a user's query (stated explicitly). That is, it should show several websites that might be of interest to a user. At the same time, it could show some advertisement as well.

From the viewpoint of static code analyzers, the task is almost the same. It is answering to the user's implicit query ("You, a smart program, show me please where I have errors in my code") that the tool should point at the code fragments in the program that most likely will be of interest to the user.

Those who dealt with static code analyzers (regardless for which language) understand that any tool produces false positives. This is a situation when there is "formally" an error judging by the code from the viewpoint of the tool, but a human sees that there is no error. Then the human perception comes into play. So, imagine the following situation.

Someone downloads a trial version of the code analyzer and launches it. It even doesn't crash (a miracle!) and manages to work for some time. It shows a list of some tens/hundreds/thousands of messages to the user. If there are just a few dozens of messages, the user will review them all. If he/she finds anything interesting, it's the reason for him/her to think of using the tool constantly and buying it. If he/she doesn't find anything interesting, he/she will soon forget of it. But if there are hundreds or thousands of messages in the list, the user will review just a few of them and draw a conclusion proceeding from what he/she has seen. That's why it is very important that relevant messages can at once "catch" the user's eye. This is the similarity between approaches to the "right top" of search engines' developers and developers of static code analyzers.

So how to provide "right top" for static analysis?

To allow PVS-Studio users to see the most interesting messages first of all, we have several tricks.

First, all the messages are categorized into levels similar to Compiler Warning Levels. Only first-level and second-level messages are shown at the first launch by default, while the third level is disabled.

Second, our diagnostics are divided into classes "General Analysis", "64-bit diagnostics" and "OpenMP diagnostics". At the same time, OpenMP and 64-bit diagnostics are also disabled, and users don't see them. It doesn't mean that they are bad, or meaningless, or buggy at all. No, it's just that you are much more likely to find the most interesting errors among errors of the "General Analysis" category. And if a user does find anything interesting there, he/she will turn on the other diagnostics and handle them if he/she needs them, of course.

Third, we are constantly fighting against false positives.

So how do you do all this?

We have an internal tool that allows us to make statistic (do not confuse with "static"!) analysis of our code analyzer's output results. It allows us to estimate the following three parameters:

  • Share of an error in the project — how prevalent errors are (by their codes) at the project level (Project Level Share).
  • Average density of an error — the ratio of the number of errors of one type to the number of files where errors of this type occur (Average Density (project level)).
  • Distribution of errors of one type throughout the project files compared to their average density (Errors count on file).

Let's see how we use this internal tool by the example of the Miranda IM project.

Note that this post is not about errors found in Miranda IM. If you want to see them, please refer to this post.

So, we open the analysis report (plog-file) in our internal tool, turn off the third error level and leave only the GA-analyzer (General Analysis). The error distribution is shown in Figure 1.

0143_Static_analysis_and_search_engines/image1.png

Figure 1 - Distribution of errors in the Miranda IM project.

The color sectors correspond to a more than 2.5% share of reports of a certain diagnostic out of the general amount of detected issues. The black sectors correspond to shares less than 2.5%. You can see that errors with codes V547, V595 and V560 are the most frequent. Let's keep them in mind.

In Figure 2, you can see the average number of errors of each type per file (i.e. their average density for the project).

0143_Static_analysis_and_search_engines/image3.png

Figure 2 - Average density of errors in the Miranda IM project.

As you can see from this graph, the errors with codes V547, V595 and V560 are reported from 1.5 to 2.5 times per file. This is actually a normal value and there's no reason, as we think, to "fight" these errors regarding false positives. But the final conclusion is drawn on the basis of the third graph for these errors shown in Figure 3, Figure 4 and Figure 5.

0143_Static_analysis_and_search_engines/image5.png

Figure 3 - Distribution of V547 errors in the Miranda IM project compared to their average density.

0143_Static_analysis_and_search_engines/image7.png

Figure 4 - Distribution of V595 errors in the Miranda IM project compared to their average density.

0143_Static_analysis_and_search_engines/image9.png

Figure 5 - Distribution of V560 errors in the Miranda IM project compared to their average density.

In Figures 3-5, names of individual files are written horizontally, and the number of times a certain error was reported for a particular file - vertically. The red columns are files where the error was reported more than the average (blue dots) number of times for this error type.

So what do you do with these graphs?

Then we study these "red" files and make a decision: if there is a false positive and it occurs quite frequently in other projects too, then we eliminate it. And if there is a real error which is, in addition, was swiftly cloned with the copy-paste technology, there's nothing to "improve".

In this post, I'm consciously omitting code samples the analyzer swore at in order not to overload the text.

In other words, after drawing a whole lot of such graphs and analyzing them, we can easily see where our analyzer misses and fix those places. It confirms an old truth that the visual representation of "boring" data allows you to have a better view of the issue being investigated.

What is that OP button in the pictures?

Attentive readers have noticed one more button (OP) in the pictures besides the three standard buttons of analyzers (GA, 64, MP). OP is the abbreviation of "optimization". In PVS-Studio 4.60, we have introduced the new group of diagnostic messages referring to micro-optimizations. Diagnostics of possible micro-optimizations is quite an ambiguous feature of our analyzer. Somebody will be glad to find a place where a large object is passed into a function through copying instead of by reference (V801). Somebody will significantly save memory by decreasing structure sizes for large object arrays (V802). And somebody thinks it all is rubbish and premature optimization. Everything depends on the project type.

Anyway, analyzing the results of our tool's output, we have come to the necessity of:

  • arranging optimization diagnostics into a separate group so that they could be easily hidden or shown;
  • turning them off by default, as they can "jam" the error list with those diagnostics that not everyone likes.

That's how this new button OP has appeared in the PVS-Studio Output Window (Figure 6):

0143_Static_analysis_and_search_engines/image11.png

Figure 6 - OP button (optimization) has appeared in PVS-Studio 4.60.

By the way, we have also significantly reduced the number of false positives for 64-bit issues analysis in the same version.

I invite you to download the new PVS-Studio version and to check how adequate the recommendations on optimizing your code are.

Conclusion

Developers of static code analyzers, as well as search engine developers, are interested in making the output as adequate as possible. Both employ many methods to achieve that, including statistical analysis methods. In this post I have shown you how we achieve that when developing PVS-Studio.

A question to the audience

I have a small question to those who dealt with (or at least played around with) PVS-Studio or any other code analyzer. Do you think a code analyzer's end user needs the graphs demonstrated in this article as an end-user tool? In other words, do you think you could learn anything useful from such diagrams if your code analyzer contained them? Or is it a tool "for internal use" only? Please share your opinion by writing to us.

Popular related articles
How PVS-Studio Proved to Be More Attentive Than Three and a Half Programmers

Date: Oct 22 2018

Author: Andrey Karpov

Just like other static analyzers, PVS-Studio often produces false positives. What you are about to read is a short story where I'll tell you how PVS-Studio proved, just one more time, to be more atte…
Static analysis as part of the development process in Unreal Engine

Date: Jun 27 2017

Author: Andrey Karpov

Unreal Engine continues to develop as new code is added and previously written code is changed. What is the inevitable consequence of ongoing development in a project? The emergence of new bugs in th…
Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities

Date: Nov 21 2018

Author: Andrey Karpov

A brief description of technologies used in the PVS-Studio tool, which let us effectively detect a large number of error patterns and potential vulnerabilities. The article describes the implementati…
Appreciate Static Code Analysis!

Date: Oct 16 2017

Author: Andrey Karpov

I am really astonished by the capabilities of static code analysis even though I am one of the developers of PVS-Studio analyzer myself. The tool surprised me the other day as it turned out to be sma…
PVS-Studio for Java

Date: Jan 17 2019

Author: Andrey Karpov

In the seventh version of the PVS-Studio static analyzer, we added support of the Java language. It's time for a brief story of how we've started making support of the Java language, how far we've co…
The Last Line Effect

Date: May 31 2014

Author: Andrey Karpov

I have studied many errors caused by the use of the Copy-Paste method, and can assure you that programmers most often tend to make mistakes in the last fragment of a homogeneous code block. I have ne…
PVS-Studio ROI

Date: Jan 30 2019

Author: Andrey Karpov

Occasionally, we're asked a question, what monetary value the company will receive from using PVS-Studio. We decided to draw up a response in the form of an article and provide tables, which will sho…
Characteristics of PVS-Studio Analyzer by the Example of EFL Core Libraries, 10-15% of False Positives

Date: Jul 31 2017

Author: Andrey Karpov

After I wrote quite a big article about the analysis of the Tizen OS code, I received a large number of questions concerning the percentage of false positives and the density of errors (how many erro…
The way static analyzers fight against false positives, and why they do it

Date: Mar 20 2017

Author: Andrey Karpov

In my previous article I wrote that I don't like the approach of evaluating the efficiency of static analyzers with the help of synthetic tests. In that article, I give the example of a code fragment…
Free PVS-Studio for those who develops open source projects

Date: Dec 22 2018

Author: Andrey Karpov

On the New 2019 year's eve, a PVS-Studio team decided to make a nice gift for all contributors of open-source projects hosted on GitHub, GitLab or Bitbucket. They are given free usage of PVS-Studio s…

Comments (0)

Next comments

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
This website uses cookies and other technology to provide you a more personalized experience. By continuing the view of our web-pages you accept the terms of using these files. If you don't want your personal data to be processed, please, leave this site.
Learn More →
Accept