To get a trial key
fill out the form below
Team License (standard version)
Enterprise License (extended version)
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Request our prices
New License
License Renewal
--Select currency--
USD
EUR
GBP
RUB
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
I am interested to try it on the platforms:
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Message submitted.

Your message has been sent. We will email you at


If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

>
>
>
PVS-Studio Impressed by the Code Qualit…

PVS-Studio Impressed by the Code Quality of ABBYY NeoML

Jul 02 2020

ABBYY has recently opened the source code of their NeoML framework. Someone suggested that we check this library with PVS-Studio. We liked the idea and got down to work without further delay. This article won't take long to read because the project has proved to be pretty high-quality :).

0746_Abbyy_NeoML/image1.png

The source code of NeoML can be downloaded from GitHub. This is a cross-platform framework designed for implementing machine learning models. It is used by ABBYY engineers for computer vision and natural language processing tasks, including image preprocessing, document layout analysis, and so on. It currently supports C++, Java, and Objective-C, with Python support coming soon. The framework itself is mainly written in C++.

Starting analysis

Starting the analysis on this framework was easy. Once I had the Visual Studio project generated in CMake, I ran PVS-Studio from Visual Studio on the projects in that solution, except for third-party libraries. Besides NeoML itself, the solution also included ABBYY libraries such as NeoOnnx and NeoMathEngine, which I also included in the list of projects to be analyzed.

Analysis results

Needless to say, I was hoping to find some bad bugs, but... the code turned out to be pretty clean, and I had to settle for just a few warnings. It's very likely that the project was already checked with some static analysis tool during development. Many of the warnings were produced by the same diagnostics on similar code fragments.

For example, calling a virtual method in a constructor is very common in this project, though it's generally an unsafe practice. Such cases are detected by the V1053 diagnostic: Calling the 'foo' virtual function in the constructor/destructor may lead to unexpected result at runtime. I got a total of 10 warnings of this type. To learn more about why this practice is unsafe and what issues it leads to, see the article "Never Call Virtual Functions during Construction or Destruction" by Scott Meyers. But the NeoML developers seem to understand what they are doing, so those warnings can be ignored.

There were also 11 warnings issued by the medium-level diagnostic V803, which deals with micro-optimizations. This diagnostic recommends replacing postfix increments with prefix ones when the iterator's previous value is not used. With a postfix increment, an unnecessary temporary object is created. It's not a bug, of course – just a minor detail. If this diagnostic is irrelevant, you can simply turn it off. Actually, the "micro-optimizations" set is turned off by default.

You must have already guessed that me talking about trifles like iterator increment means the code is fine and I'm just looking for something to pick on.

Certain diagnostics are very often irrelevant or inapplicable to a given project, so we recommend that you spend some time configuring the analyzer before analysis rather than put up with the pain of working with non-optimal settings. If you want to get to the most interesting warnings right off, follow the steps described in our article "How to quickly check out interesting warnings given by the PVS-Studio analyzer for C and C++ code?"

A few interesting warnings related to "micro-optimizations" were produced by diagnostic V802, which recommends rearranging a structure's fields by type size in decreasing order, thus reducing the structure's overall size.

V802 On 64-bit platform, structure size can be reduced from 24 to 16 bytes by rearranging the fields according to their sizes in decreasing order. HierarchicalClustering.h 31

struct CParam {
  TDistanceFunc DistanceType; 
  double MaxClustersDistance;
  int MinClustersCount; 
};

By simply swapping the MaxClustersDistance field of type double and the enumerator DistanceType field, we can reduce the structure's size from 24 to 16 bytes.


struct CParam {
  TDistanceFunc DistanceType; 
  int MinClustersCount; 
  double MaxClustersDistance; 
};

TDistanceFunc is enum, so its size is the same as that of int or smaller, which means we should move it to the bottom of the structure.

Again, that's not a bug, but if you want to have micro-optimizations just for the sake of it or if they are objectively crucial to your project, warnings like the ones shown above will help you quickly find spots in your code that could use at least some basic refactoring.

Overall, the code of NeoML is neat and clear, but the V807 diagnostic pointed out a couple of lines that could be optimized and made somewhat clearer. Here's one example:

V807 Decreased performance. Consider creating a reference to avoid using the same expression repeatedly. GradientBoostFullTreeBuilder.cpp 469

0746_Abbyy_NeoML/image2.png

The chain curLevelStatistics[i]->ThreadStatistics[j] can be replaced with a call to an individual variable. There are no calls to any complex methods in this chain, so this optimization wouldn't give any noticeable boost, but it would still make this fragment clearer and shorter, I believe. Besides, it would indicate to any future maintainers that the original developer meant to address these exact indexes and there's no error here. This is the version with the suggested fix applied:

auto threadStatistics = curLevelStatistics[i]->ThreadStatistics[j];

if(threadStatistics.FeatureIndex != NotFound ) {
  if(   threadStatistics.Criterion > criterion
     || ( .... ))
  {
    criterion = threadStatistics.Criterion;
    curLevelStatistics[i]->FeatureIndex    = threadStatistics.FeatureIndex;
    curLevelStatistics[i]->Threshold       = threadStatistics.Threshold;
    curLevelStatistics[i]->LeftStatistics  = threadStatistics.LeftStatistics;
    curLevelStatistics[i]->RightStatistics = threadStatistics.RightStatistics;
  }
}

Conclusion

As you can see, the code base of the NeoML framework turns out to be very clean.

One thing you should keep in mind is that a single run of a static analyzer on an intensely developing project doesn't say much in favor of adopting static analysis because many of the bugs, especially grave ones, have already been found and fixed using other – more time- and resource-intensive – means. The article "Errors that static code analysis does not find because it is not used" elaborates on this topic.

But even with that fact considered, PVS-Studio issued particularly few warnings on NeoML and I give credit to the developers for the quality of their code, no matter if they used static analysis or not.

Popular related articles
Appreciate Static Code Analysis!

Date: Oct 16 2017

Author: Andrey Karpov

I am really astonished by the capabilities of static code analysis even though I am one of the developers of PVS-Studio analyzer myself. The tool surprised me the other day as it turned out to be sma…
Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities

Date: Nov 21 2018

Author: Andrey Karpov

A brief description of technologies used in the PVS-Studio tool, which let us effectively detect a large number of error patterns and potential vulnerabilities. The article describes the implementati…
How PVS-Studio Proved to Be More Attentive Than Three and a Half Programmers

Date: Oct 22 2018

Author: Andrey Karpov

Just like other static analyzers, PVS-Studio often produces false positives. What you are about to read is a short story where I'll tell you how PVS-Studio proved, just one more time, to be more atte…
The way static analyzers fight against false positives, and why they do it

Date: Mar 20 2017

Author: Andrey Karpov

In my previous article I wrote that I don't like the approach of evaluating the efficiency of static analyzers with the help of synthetic tests. In that article, I give the example of a code fragment…
PVS-Studio for Java

Date: Jan 17 2019

Author: Andrey Karpov

In the seventh version of the PVS-Studio static analyzer, we added support of the Java language. It's time for a brief story of how we've started making support of the Java language, how far we've co…
Characteristics of PVS-Studio Analyzer by the Example of EFL Core Libraries, 10-15% of False Positives

Date: Jul 31 2017

Author: Andrey Karpov

After I wrote quite a big article about the analysis of the Tizen OS code, I received a large number of questions concerning the percentage of false positives and the density of errors (how many erro…
The Evil within the Comparison Functions

Date: May 19 2017

Author: Andrey Karpov

Perhaps, readers remember my article titled "Last line effect". It describes a pattern I've once noticed: in most cases programmers make an error in the last line of similar text blocks. Now I want t…
The Last Line Effect

Date: May 31 2014

Author: Andrey Karpov

I have studied many errors caused by the use of the Copy-Paste method, and can assure you that programmers most often tend to make mistakes in the last fragment of a homogeneous code block. I have ne…
PVS-Studio ROI

Date: Jan 30 2019

Author: Andrey Karpov

Occasionally, we're asked a question, what monetary value the company will receive from using PVS-Studio. We decided to draw up a response in the form of an article and provide tables, which will sho…
Free PVS-Studio for those who develops open source projects

Date: Dec 22 2018

Author: Andrey Karpov

On the New 2019 year's eve, a PVS-Studio team decided to make a nice gift for all contributors of open-source projects hosted on GitHub, GitLab or Bitbucket. They are given free usage of PVS-Studio s…

Comments (0)

Next comments

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
This website uses cookies and other technology to provide you a more personalized experience. By continuing the view of our web-pages you accept the terms of using these files. If you don't want your personal data to be processed, please, leave this site.
Learn More →
Accept