To get a trial key
fill out the form below
Team License (standard version)
Enterprise License (extended version)
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Request our prices
New License
License Renewal
--Select currency--
USD
EUR
GBP
RUB
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
I am interested to try it on the platforms:
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Message submitted.

Your message has been sent. We will email you at


If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

>
>
>
An unusual bug in Lucene.Net

An unusual bug in Lucene.Net

Mar 14 2016
Author:

Listening to stories about static analysis, some programmers say that they don't really need it, as their code is entirely covered by unit tests, and that's enough to catch all the bugs. Recently I have found a bug that is theoretically possible to find using unit tests, but if you are not aware that it's there, it's almost unreal to write such a test to check it.

0381_LuceneNet/image1.png

Introduction

Lucene.Net is a port of the Lucene search engine library, written in C#, and targeted at .NET runtime users. The source code is open and available on the project website https://lucenenet.apache.org/.

The analyzer managed to detect only 5 suspicious fragments due to the slow pace of development, small size and the fact that the project is widely used in other projects for full-text search [1].

To be honest, I didn't expect to find more bugs. One of these errors seemed especially interesting to me, so I decided to tell our readers about it in our blog.

About the bug found

We have a diagnostic, V3035, about an error when instead of += a programmer may mistakenly write =+, where + is a unary plus. When I was writing it by analogy with the V588 diagnostic, designed for C++, I was thinking - can a programmer really make the same error, coding in C#? It could be understandable in C++ - people use various text editors instead of IDE, and a typo can be easily left unnoticed. But typing text in Visual Studio, which automatically aligns the code once a semicolon is put, is it possible to overlook the misprint? It turns out that it is. Such a bug was found in Lucene.Net. It is of great interest to us, mostly because it's rather hard to detect it using means other than static analysis. Let's take a look at the code:

protected virtual void Substitute( StringBuilder buffer )
{
    substCount = 0;
    for ( int c = 0; c < buffer.Length; c++ ) 
    {
        ....

        // Take care that at least one character
        // is left left side from the current one
        if ( c < buffer.Length - 1 ) 
        {
            // Masking several common character combinations
            // with an token
            if ( ( c < buffer.Length - 2 ) && buffer[c] == 's' &&
                buffer[c + 1] == 'c' && buffer[c + 2] == 'h' )
            {
                buffer[c] = '$';
                buffer.Remove(c + 1, 2);
                substCount =+ 2;
            }
            ....
            else if ( buffer[c] == 's' && buffer[c + 1] == 't' ) 
            {
                buffer[c] = '!';
                buffer.Remove(c + 1, 1);
                substCount++;
            }
            ....
        }
    }
}

There is also a class GermanStemmer, which cuts off suffixes of german words to mark out a common root. It works in the following way: first, the Substitute method replaces different combinations of letters with other symbols, so that they are not confused with a suffix. There are such substitutions as - 'sch' to '$', 'st' to '!' (you can see it in the code example). At the same time the number of characters by which such changes will shorten the word, is stored in the substCount variable. Further on, the Strip method cuts off extra suffixes and finally, the Resubstitute method does the reverse substitution: '$' to 'sch', '!' to 'st'. For instance, if we have a word "kapitalistischen" (capitalistic), the stemmer will do the following: kapitalistischen => kapitali!i$en (Substitute) => kapitali!i$ (Strip) => kapitalistisch (Resubstitute).

Because of this typo, during the substitution of 'sch' with '$', the substCount variable will be assigned with 2, instead of adding 2 to substCount. This error is really hard to find using methods other than static analysis. That's the answer to those who think "Do I need static analysis, if I have unit-tests?" Thus, to catch such a bug with the help of unit tests one should test Lucene.Net on German texts, using GermanStemmer; the tests should index a word containing the 'sch' combination, and one more letter combination, for which the substitution will be performed. At the same time it should be present in the word before 'sch', so that the substCount will be not zero by the time the expression substCount =+ 2 is executed. Quite an unusual combination for a test, especially if you don't see the bug.

Conclusion

Unit tests and static analysis need not exclude, but rather complement, each other as methods of software development [2]. I suggest downloading PVS-Studio static analyzer, and finding those bugs that weren't detected by means of unit-testing.

Additional links

Popular related articles
Appreciate Static Code Analysis!

Date: Oct 16 2017

Author: Andrey Karpov

I am really astonished by the capabilities of static code analysis even though I am one of the developers of PVS-Studio analyzer myself. The tool surprised me the other day as it turned out to be sma…
Characteristics of PVS-Studio Analyzer by the Example of EFL Core Libraries, 10-15% of False Positives

Date: Jul 31 2017

Author: Andrey Karpov

After I wrote quite a big article about the analysis of the Tizen OS code, I received a large number of questions concerning the percentage of false positives and the density of errors (how many erro…
Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities

Date: Nov 21 2018

Author: Andrey Karpov

A brief description of technologies used in the PVS-Studio tool, which let us effectively detect a large number of error patterns and potential vulnerabilities. The article describes the implementati…
How PVS-Studio Proved to Be More Attentive Than Three and a Half Programmers

Date: Oct 22 2018

Author: Andrey Karpov

Just like other static analyzers, PVS-Studio often produces false positives. What you are about to read is a short story where I'll tell you how PVS-Studio proved, just one more time, to be more atte…
The way static analyzers fight against false positives, and why they do it

Date: Mar 20 2017

Author: Andrey Karpov

In my previous article I wrote that I don't like the approach of evaluating the efficiency of static analyzers with the help of synthetic tests. In that article, I give the example of a code fragment…
Static analysis as part of the development process in Unreal Engine

Date: Jun 27 2017

Author: Andrey Karpov

Unreal Engine continues to develop as new code is added and previously written code is changed. What is the inevitable consequence of ongoing development in a project? The emergence of new bugs in th…
The Last Line Effect

Date: May 31 2014

Author: Andrey Karpov

I have studied many errors caused by the use of the Copy-Paste method, and can assure you that programmers most often tend to make mistakes in the last fragment of a homogeneous code block. I have ne…
The Ultimate Question of Programming, Refactoring, and Everything

Date: Apr 14 2016

Author: Andrey Karpov

Yes, you've guessed correctly - the answer is "42". In this article you will find 42 recommendations about coding in C++ that can help a programmer avoid a lot of errors, save time and effort. The au…
PVS-Studio for Java

Date: Jan 17 2019

Author: Andrey Karpov

In the seventh version of the PVS-Studio static analyzer, we added support of the Java language. It's time for a brief story of how we've started making support of the Java language, how far we've co…
Free PVS-Studio for those who develops open source projects

Date: Dec 22 2018

Author: Andrey Karpov

On the New 2019 year's eve, a PVS-Studio team decided to make a nice gift for all contributors of open-source projects hosted on GitHub, GitLab or Bitbucket. They are given free usage of PVS-Studio s…

Comments (0)

Next comments

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
This website uses cookies and other technology to provide you a more personalized experience. By continuing the view of our web-pages you accept the terms of using these files. If you don't want your personal data to be processed, please, leave this site.
Learn More →
Accept