To get a trial key
fill out the form below
Team License (standard version)
Enterprise License (extended version)
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Request our prices
New License
License Renewal
--Select currency--
USD
EUR
GBP
RUB
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
I am interested to try it on the platforms:
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Message submitted.

Your message has been sent. We will email you at


If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

>
>
>
How PVS-Studio Checked ELKI in January

How PVS-Studio Checked ELKI in January

Jan 22 2021

If you feel like the New Year just came, and you missed the first half of January, then all this time you've been busy looking for tricky bugs in the code you maintain. It also means that our article is what you need. PVS-Studio has checked the ELKI open source project to show you errors that may occur in the code, how cunningly they can hide there, and how you can deal with them.

0791_ELKI/image1.png

What kind of library is ELKI?

The abbreviation ELKI stands for Environment for Developing KDD-Applications Supported by Index-Structures. This project is written in Java and is designed for data mining. Most users of this library are students, researchers, data scientists, and software engineers. No wonder since this library was developed for research only.

The algorithms included in the library are mainly related to cluster analysis and outlier detection that may indicate experimental errors. Every year, there are more and more references to research conducted using ELKI. On the developers' official website, you can find a whole list of scientific works that were conducted via this library. From year to year, this list is constantly growing and being supplemented, and the topics of scientific works are getting more diverse.

The only obstacle to using this library may be the AGPL license, which requires the full source code of the application to be open to any user. In other words, you won't be able to use this library in commercial projects with closed source code.

Let's start checking

The ELKI library contains 2 630 java files with 186 444 lines of code, except for comments. Therefore, the project seems to be quite small compared to some open source giants.

However, the small size of the library does not guarantee the absence of errors. As you know, errors can occur in any projects, regardless of their scale. That's why we are here now! Let's see my top 10 most interesting warnings found when viewing the report.

Unchecked boundaries

V6001 There are identical sub-expressions 'bounds[j + 1]' to the left and to the right of the '!=' operator. CLIQUEUnit.java(252)

private boolean checkDimensions(CLIQUEUnit other, int e) {
    for(int i = 0, j = 0; i < e; i++, j += 2) {
        if (dims[i] != other.dims[i]
            || bounds[j] != other.bounds[j]
            || bounds[j + 1] != bounds[j + 1]) {
          return false;
        }
    }
    return true;
}

In the if block of the checkDimensions method, the value of bounds[j + 1] is checked for inequality with its own value. Naturally, this condition will always be false, and one of the boundaries will never be checked. That is, the checkDimensions method may return true when checking, even if the boundaries of the arrays do not match.

The correct check in the if block must look this way:

bounds[j + 1] != other.bounds[j + 1]

Lazy constructor

V6022 Parameter 'updates' is not used inside constructor body. DataStoreEvent.java(60)

V6022 Parameter 'removals' is not used inside constructor body. DataStoreEvent.java(60)

public DataStoreEvent(DBIDs inserts, DBIDs removals, DBIDs updates) {
    super();
    this.inserts = inserts;
    this.removals = inserts;
    this.updates = inserts;
}

Two warnings were received for this code snippet. Here, three parameters are passed to the DataStoreEvent constructor, two of which are not used for initialization. Looking closer at the code in the constructor, it becomes clear that the programmer used copy-paste and forgot to correct the names of variables that should have been involved in initialization.

If you go even further and take a closer look at the DataStoreEvent class, you can see three methods in it that are actively using the above constructor.

public static DataStoreEvent insertionEvent(DBIDs inserts) {
  return new DataStoreEvent(inserts, DBIDUtil.EMPTYDBIDS, DBIDUtil.EMPTYDBIDS);
}

public static DataStoreEvent removalEvent(DBIDs removals) {
  return new DataStoreEvent(DBIDUtil.EMPTYDBIDS, removals, DBIDUtil.EMPTYDBIDS);
}

public static DataStoreEvent updateEvent(DBIDs updates) {
  return new DataStoreEvent(DBIDUtil.EMPTYDBIDS, DBIDUtil.EMPTYDBIDS, updates);
}

Due to incorrect initialization in the constructor, the behavior of these methods will also differ from what is expected.

The correct code version should be as follows:

this.inserts = inserts;
this.removals = removals;
this.updates = updates;

Opaque variable

V6012 The '?:' operator, regardless of its conditional expression, always returns one and the same value '0.5'. ClusterHullVisualization.java(173), ClusterHullVisualization.java(173)

public void fullRedraw() {
    ....
    boolean flat = (clusters.size() == topc.size());
    // Heuristic value for transparency:
    double baseopacity = flat ? 0.5 : 0.5;
    ....
}

In this code snippet, the value of the baseopacity variable will always be 0.5, regardless of what the value of the flat variable is. It's clear that there should have been two different values, but due to inattention, the author of the code forgot to correct one of them.

The mistake is simple and very disappointing. It immediately strikes your eye if you exclude lines of code that are not related to this variable from the method. But it's not an easy task to find a mistake in a method consisting of almost 90 lines. It looks like this is exactly what became the obstacle for the programmer.

In pursuit of something beyond the limits

V6025 Index '1' is out of bounds. GeneratorStatic.java(104)

@Override
public double[] computeMean() {
    // Not supported except for singletons.
    return points.size() == 1 ? points.get(1) : null;
}

In the computeMean method, the code author checks that the size of the points collection is equal to one. If so, the developer tries to return ... a collection element with index 1 from the method. Since indexing starts from zero, an IndexOutOfBoundsException is inevitable.

The correct version of the code must look like this:

return points.size() == 1 ? points.get(0) : null;

Is it possible to divide by zero?

V6020 Divide by zero. The range of the 'referenceSetSize' denominator values includes zero. PreDeConNeighborPredicate.java(138)

protected PreDeConModel computeLocalModel(DoubleDBIDList neighbors, ....) {
    final int referenceSetSize = neighbors.size();
    ....
    // Shouldn't happen:
    if(referenceSetSize < 0) {
        LOG.warning("Empty reference set – 
            should at least include the query point!");
        return new PreDeConModel(Integer.MAX_VALUE, DBIDUtil.EMPTYDBIDS);
    }
    ....
    for(int d = 0; d < dim; d++) {
        s[d] /= referenceSetSize;
        mvVar.put(s[d]);
    }
    ....
}

Every programmer knows that you can't divide by zero. Every programmer tries to avoid the occurrence of such an error in their code. And it works in most cases. But not always.

In this code snippet, everything depends on the size of the neighbors parameter passed to the computeLocalModel method. The code author checks the neighbors size and cuts off the values that are less than zero by checking in the if statement, but isn't taking any actions if the size is zero. Since the check of referenceSetSize < 0 doesn't make any sense, and the message for logging also says about an empty set, all of that seems like a typo. The check of referenceSetSize == 0 was most likely to be here.

If an empty neighbors container is passed in this method anyway, the division by zero will occur in the for loop. We can only hope that this will never really happen.

Endless initialization

V6062 Possible infinite recursion inside the 'setInitialMeans' method. Predefined.java(65), Predefined.java(66)

public void setInitialMeans(List<double[]> initialMeans) {
    this.setInitialMeans(initialMeans);
}

This one-line method is a true recursion. It's difficult to guess what exactly the code author wanted to write here. The only line of this method probably should have looked as follows:

this.setInitialMeans(initialMeans.toArray(new double[0][0]));

Since there is another method with the same name in this class, the programmer most likely wanted to pass data for initialization to it, but in the end, something went wrong. This is what the body of the second method looks like, by the way:

public void setInitialMeans(double[][] initialMeans) {
    double[][] vecs = initialMeans.clone(); // TODO: deep copy?
    this.initialMeans = vecs;
}

Lost weight

V6094 The expression was implicitly cast from 'int' type to 'double' type. Consider utilizing an explicit type cast to avoid the loss of a fractional part. An example: double A = (double)(X) / Y;. ProbabilityWeightedMoments.java(130)

public static <A> double[] alphaBetaPWM(...., final int nmom) {
    final int n = adapter.size(data);
    final double[] xmom = new double[nmom << 1];
    double aweight = 1. / n, bweight = aweight;
    for(int i = 0; i < n; i++) {
        ....
        for(int j = 1, k = 2; j < nmom; j++, k += 2) {
            xmom[k + 1] += val * (aweight *= (n - i - j + 1) / (n - j + 1));
            xmom[k + 1] += val * (bweight *= (i - j + 1) / (n - j + 1));
        }
    }
    return xmom;
}

In the for loop of this code snippet, the (n - i - j + 1) / (n - j + 1) expression is converted implicitly from the int type to the double type. In this case, the loss of accuracy may be quite different: from a few digits after the decimal point to a complete nullification if the number modulo is less than one. Probably, it's not exactly the expected behavior given that the xmom array is of the double type. The (n – i – j + 1) / (n – j + 1) expression helps to prove that the programmer meant something completely different here. Let's say n – j + 1 equals x. Then we get the expression: (x – i) / x. This expression will always give 0 for integer division unless we start going into negative ranges. But since the values of n in this code snippet are always greater than zero, we can conclude that the programmer did not intend to use integer division here.

Not to lose precision, an explicit conversion to the double type is required:

xmom[k + 1] += val * (aweight *= (double) (n - i - j + 1) / (n - j + 1));
xmom[k + 1] += val * (bweight *= (double) (i - j + 1) / (n - j + 1));

Out of bounds

V6079 Value of the 'splitpoint' variable is checked after use. Potential logical error is present. KernelDensityFittingTest.java(97), KernelDensityFittingTest.java(97)

public final void testFitDoubleArray() throws IOException {
    ....
    int splitpoint = 0;
    while(fulldata[splitpoint] < splitval && splitpoint < fulldata.length) {
        splitpoint++;
    }
    ....
}

In this code fragment, in the while loop, first, the fulldata array element with the splitpoint index is compared with the value of the splitval variable, and only then we check whether the splitpoint value is smaller than the size of the array itself. These two checks in the while loop need to be changed, otherwise, you can very easily find yourself outside the array's bounds.

Unreachable code

V6019 Unreachable code detected. It is possible that an error is present. Tokenizer.java(172)

V6007 Expression 'c != '\n'' is always true. Tokenizer.java(169)

public String getStrippedSubstring() {
    int sstart = start, ssend = end;
    while(sstart < ssend) {
        char c = input.charAt(sstart);
        if(c != ' ' || c != '\n' || c != '\r' || c != '\t') {
            break;
        }
        ++sstart;
    }
    ....
}

Two diagnostics decided to act together this time. They both issued warnings. The V6019 diagnostic pointed to an unreachable code fragment: ++sstart, and V6007 pointed to a condition in the if statement that will always be true.

Why will there always be true in the if block? The answer is very simple. In this statement several conditions are checked at once: c !=' ' or c != '\n', or c != '\r', or c != '\t'. Whatever the input data, some of those will be true. Even if one of the checks is false, the next check will return true, and because of the || (or) operator, the condition in if will eventually be true. The condition in the if block will always be true. Therefore, the break statement, which prematurely ends the while loop, will trigger. As a result, the increment of the sstart variable will never be executed. This is exactly what the V6019 diagnostics noticed and started sounding the alarm.

Most likely, the programmer wanted to write something like this:

if(c != ' ' && c != '\n' && c != '\r' && c != '\t')

Override but check

V6009 Function 'equals' receives an odd argument. An object 'other.similarityFunction' is used as an argument to its own method. AbstractSimilarityAdapter.java(91)

@Override
public boolean equals(Object obj) {
    if(obj == null) {
        return false;
    }

    if(!this.getClass().equals(obj.getClass())) {
        return false;
    }

    AbstractSimilarityAdapter<?> other = (AbstractSimilarityAdapter<?>) obj;
    return other.similarityFunction.equals(other.similarityFunction);
}

The code author decided to override the equals method in the AbstractSimilarityAdapter class. If it is assumed that objects in the program will be stored in containers, or checked for equality, the equals override is required. However, the whole author's idea was spoiled by the method's last line, in which equals is called for the same object. As a result, even the most common comparison will be incorrect.

The code was probably meant to be like this:

return this.similarityFunction.equals(other.similarityFunction);

I'd like to note that this error pattern is often found in programs in different languages, and not only in Java. We've got an article 'The Evil within the Comparison Functions' on this topic. You must read it if you're interested in finding out why programmers often make mistakes in fairly simple functions designed to compare two objects.

Let's recap

As you can see, mistakes may appear in any project. Some of them are simple and disappointing, others are cunning and tricky. And it doesn't matter whether your project is small or large. It doesn't matter if you're a professional developer or just starting to write code. Mistakes can appear anywhere in your program and hide safely from your eyes.

For every developer and every project, static code analysis is an indispensable tool. As essential as a code review or unit tests. That's why, we recommend using static code analysis in your work right now, so that in the New Year your code will become better and clearer than in the previous one.

GetFreeTrialImage

In conclusion

It's not the first time we are checking packages related to calculations, statistics, and scientific researches. And every time we worry about mistakes in the code of such projects. Can we trust the data calculated using these libraries?

Of course, it's incredibly difficult to write code without mistakes, and in practice, it's even impossible. However, libraries used as building blocks in scientific, medical, and research programs should be checked as carefully as possible. The code should be as clean as possible. After all, every mistake can lead to very unpleasant consequences in the future. In any case, static analysis will prove to be extremely useful for such projects.

We have already covered this topic in some of our other articles, so if this question is interesting for you, don't hesitate to check out some of them:

Popular related articles
The way static analyzers fight against false positives, and why they do it

Date: Mar 20 2017

Author: Andrey Karpov

In my previous article I wrote that I don't like the approach of evaluating the efficiency of static analyzers with the help of synthetic tests. In that article, I give the example of a code fragment…
Static analysis as part of the development process in Unreal Engine

Date: Jun 27 2017

Author: Andrey Karpov

Unreal Engine continues to develop as new code is added and previously written code is changed. What is the inevitable consequence of ongoing development in a project? The emergence of new bugs in th…
Appreciate Static Code Analysis!

Date: Oct 16 2017

Author: Andrey Karpov

I am really astonished by the capabilities of static code analysis even though I am one of the developers of PVS-Studio analyzer myself. The tool surprised me the other day as it turned out to be sma…
Free PVS-Studio for those who develops open source projects

Date: Dec 22 2018

Author: Andrey Karpov

On the New 2019 year's eve, a PVS-Studio team decided to make a nice gift for all contributors of open-source projects hosted on GitHub, GitLab or Bitbucket. They are given free usage of PVS-Studio s…
How PVS-Studio Proved to Be More Attentive Than Three and a Half Programmers

Date: Oct 22 2018

Author: Andrey Karpov

Just like other static analyzers, PVS-Studio often produces false positives. What you are about to read is a short story where I'll tell you how PVS-Studio proved, just one more time, to be more atte…
Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities

Date: Nov 21 2018

Author: Andrey Karpov

A brief description of technologies used in the PVS-Studio tool, which let us effectively detect a large number of error patterns and potential vulnerabilities. The article describes the implementati…
The Ultimate Question of Programming, Refactoring, and Everything

Date: Apr 14 2016

Author: Andrey Karpov

Yes, you've guessed correctly - the answer is "42". In this article you will find 42 recommendations about coding in C++ that can help a programmer avoid a lot of errors, save time and effort. The au…
The Evil within the Comparison Functions

Date: May 19 2017

Author: Andrey Karpov

Perhaps, readers remember my article titled "Last line effect". It describes a pattern I've once noticed: in most cases programmers make an error in the last line of similar text blocks. Now I want t…
PVS-Studio ROI

Date: Jan 30 2019

Author: Andrey Karpov

Occasionally, we're asked a question, what monetary value the company will receive from using PVS-Studio. We decided to draw up a response in the form of an article and provide tables, which will sho…
Characteristics of PVS-Studio Analyzer by the Example of EFL Core Libraries, 10-15% of False Positives

Date: Jul 31 2017

Author: Andrey Karpov

After I wrote quite a big article about the analysis of the Tizen OS code, I received a large number of questions concerning the percentage of false positives and the density of errors (how many erro…

Comments (0)

Next comments

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
This website uses cookies and other technology to provide you a more personalized experience. By continuing the view of our web-pages you accept the terms of using these files. If you don't want your personal data to be processed, please, leave this site.
Learn More →
Accept