To get a trial key
fill out the form below
Team License (standard version)
Enterprise License (extended version)
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Request our prices
New License
License Renewal
--Select currency--
USD
EUR
GBP
RUB
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
I am interested to try it on the platforms:
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Message submitted.

Your message has been sent. We will email you at


If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

>
>
>
Why PVS-Studio Uses Data Flow Analysis:…

Why PVS-Studio Uses Data Flow Analysis: Based on Gripping Error in Open Asset Import Library

Feb 18 2021
Author:

An essential part of any modern static code analyzer is data flow analysis. However, from an outside perspective, the use of data flow analysis and its benefit is unclear. Some people still consider static analysis a tool searching for something in code according to a certain pattern. Thus, we occasionally write blog posts to show how this or that technology, used in the PVS-Studio analyzer, helps to identify another interesting error. Today, we have such an article about the bug found in the Base64, one of the encoding standard implementations of binary data.

0803_DataFlow_And_Assimp/image1.png

It all started with checking the latest version of the Qt 6 library. There was a separate usual article on this, where I'd described 77 errors found. It turned out that at first, I decided to flip through the report, not excluding the third-party libraries' warnings. In other words, I didn't exclude the warnings related to \src\3rdparty in the settings. It so happened that I immediately ran up against a gripping error example in the Open Asset Import Library. So, I decided to write this extra little note about it.

This defect highlights the benefit of data flow analysis in tools such as PVS-Studio. Without that, it's impossible to find numerous errors. By the way, if you're interested in learning more about data flow analysis and other aspects of the tool's setup, you can read the Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities article.

Now, let's turn our attention right to the error, found in the Open Asset Import Library (assimp). File: \src\3rdparty\assimp\src\code\FBX\FBXUtil.cpp.

std::string EncodeBase64(const char* data, size_t length)
{
    // calculate extra bytes needed to get a multiple of 3
    size_t extraBytes = 3 - length % 3;

    // number of base64 bytes
    size_t encodedBytes = 4 * (length + extraBytes) / 3;

    std::string encoded_string(encodedBytes, '=');

    // read blocks of 3 bytes
    for (size_t ib3 = 0; ib3 < length / 3; ib3++)
    {
        const size_t iByte = ib3 * 3;
        const size_t iEncodedByte = ib3 * 4;
        const char* currData = &data[iByte];

        EncodeByteBlock(currData, encoded_string, iEncodedByte);
    }

    // if size of data is not a multiple of 3,
    // also encode the final bytes (and add zeros where needed)
    if (extraBytes > 0)
    {
        char finalBytes[4] = { 0,0,0,0 };
        memcpy(&finalBytes[0], &data[length - length % 3], length % 3);

        const size_t iEncodedByte = encodedBytes - 4;
        EncodeByteBlock(&finalBytes[0], encoded_string, iEncodedByte);

        // add '=' at the end
        for (size_t i = 0; i < 4 * extraBytes / 3; i++)
            encoded_string[encodedBytes - i - 1] = '=';
    }
    return encoded_string;
}

If you want, for a start you may try to detect the error yourself. So that you don't accidentally read the answer right away, let me show you some other exciting articles and briefly tell you what Base64 is:). Here's a list of additional articles on related topics:

Ok, let's go on. Here is the coding algorithm implementation of a byte string in Base64 encoding. This is the coding standard of binary data with only 64 characters. The encoding alphabet contains text and numeric Latin characters A-Z, a-z, and 0-9 (62 characters) and 2 additional characters that vary among implementations. Base64 encoding converts every 3 source bytes into 4 encoded characters.

If only one or two bytes are left to encode, as a result, we have only the first two or three characters of the line. The output will be padded with one or two additional pad characters (=). The padding character "=" prevents further bits from being added to the reconstructed data. This point is implemented incorrectly in the function considered.

Found the error? Well done. If not, that's ok too. You need to delve into the code to notice that something goes wrong. The analyzer reports about this "something wrong" with the warning: V547 [CWE-571] Expression 'extraBytes > 0' is always true. FBXUtil.cpp 224

To understand what worried the analyzer, let's take a look at the initialization of the extraBytes variable:

// calculate extra bytes needed to get a multiple of 3
size_t extraBytes = 3 - length % 3;

The programmer planned to calculate how many additional bytes of input data need to be processed if their total number is not equal to 3. To do this, we just need to divide the number of processed bytes by modulo 3. A correct option of the variable initialization looks like this:

size_t extraBytes = length % 3;

Then, if, for example, 5 bytes are processed, we get 5 % 3 = 2. So, we need to additionally process 2 bytes. If the input received 6 bytes, then nothing needs to be processed separately, since 6 % 3 = 0.

Although, it may have meant the number of bytes missing for a multiple of three. Then, the correct code should look that way:

size_t extraBytes = (3 - length % 3) % 3;

Right now, I'm not interested in trying to figure out the right variant. Anyway, the programmer wrote some average meaningless version of the code:

size_t extraBytes = 3 - length % 3;

Right at the moment of analyzing this code, the analyzer uses data flow analysis. Whatever value is in the length variable, after modulo division, a value in the range [0..2] will be obtained. The PVS-Studio analyzer can work with ranges, exact values, and sets. That is, we are talking about Value Range Analysis. In this case, it is the range of values that will be used.

Let's continue the evaluations:

size_t extraBytes = 3 - [0..2];

It turns out that the extraBytes variable will never be equal to zero. The analyzer will evaluate the following possible range of its values: [1..3].

Until the moment of checking, the variable is not changed anywhere. The analyzer reports us that the check result will always be true. Therefore, the tool is absolutely right:

if (extraBytes > 0)

This is a simple but wonderful example. It shows how the data flow analysis allowed us to evaluate the range of variable values. It also helped us to be certain that the variable does not change, and finally, that the condition is always true.

Of course, the incorrectness of function operation is not limited to the execution of a code fragment that should not be executed. Everything goes awry there. Imagine, you want to encode 6 characters. In this case, the output string must contain 8 characters. Let's quickly estimate how the considered function will behave.

// calculate extra bytes needed to get a multiple of 3
size_t extraBytes = 3 - length % 3; // 3-6%3 = 3

// number of base64 bytes
size_t encodedBytes = 4 * (length + extraBytes) / 3; // 4*(6+3)/3 = 12

std::string encoded_string(encodedBytes, '=');

The output string happened to contain 12 characters, not 8. Further, everything will work in a wrong way, too. There's no point in going into details.

That's how nice and easy static analysis found the error in the code. Just imagine how painful it would be to debug and understand why the characters encoding in Base64 encoding went wrong. By the way, here comes the question of the third-party libraries' quality. I touched upon it in the following article: Why it is important to apply static analysis for open libraries that you add to your project.

Try to use PVS-Studio regularly in your development process to find many bugs as early as possible. You'll like it :). If you are developing an open-source project, you can use the analyzer for free. Thanks for your attention. Wish you bugless code.

Popular related articles
The way static analyzers fight against false positives, and why they do it

Date: Mar 20 2017

Author: Andrey Karpov

In my previous article I wrote that I don't like the approach of evaluating the efficiency of static analyzers with the help of synthetic tests. In that article, I give the example of a code fragment…
Appreciate Static Code Analysis!

Date: Oct 16 2017

Author: Andrey Karpov

I am really astonished by the capabilities of static code analysis even though I am one of the developers of PVS-Studio analyzer myself. The tool surprised me the other day as it turned out to be sma…
Free PVS-Studio for those who develops open source projects

Date: Dec 22 2018

Author: Andrey Karpov

On the New 2019 year's eve, a PVS-Studio team decided to make a nice gift for all contributors of open-source projects hosted on GitHub, GitLab or Bitbucket. They are given free usage of PVS-Studio s…
PVS-Studio ROI

Date: Jan 30 2019

Author: Andrey Karpov

Occasionally, we're asked a question, what monetary value the company will receive from using PVS-Studio. We decided to draw up a response in the form of an article and provide tables, which will sho…
PVS-Studio for Java

Date: Jan 17 2019

Author: Andrey Karpov

In the seventh version of the PVS-Studio static analyzer, we added support of the Java language. It's time for a brief story of how we've started making support of the Java language, how far we've co…
The Evil within the Comparison Functions

Date: May 19 2017

Author: Andrey Karpov

Perhaps, readers remember my article titled "Last line effect". It describes a pattern I've once noticed: in most cases programmers make an error in the last line of similar text blocks. Now I want t…
The Ultimate Question of Programming, Refactoring, and Everything

Date: Apr 14 2016

Author: Andrey Karpov

Yes, you've guessed correctly - the answer is "42". In this article you will find 42 recommendations about coding in C++ that can help a programmer avoid a lot of errors, save time and effort. The au…
How PVS-Studio Proved to Be More Attentive Than Three and a Half Programmers

Date: Oct 22 2018

Author: Andrey Karpov

Just like other static analyzers, PVS-Studio often produces false positives. What you are about to read is a short story where I'll tell you how PVS-Studio proved, just one more time, to be more atte…
Characteristics of PVS-Studio Analyzer by the Example of EFL Core Libraries, 10-15% of False Positives

Date: Jul 31 2017

Author: Andrey Karpov

After I wrote quite a big article about the analysis of the Tizen OS code, I received a large number of questions concerning the percentage of false positives and the density of errors (how many erro…
Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities

Date: Nov 21 2018

Author: Andrey Karpov

A brief description of technologies used in the PVS-Studio tool, which let us effectively detect a large number of error patterns and potential vulnerabilities. The article describes the implementati…

Comments (0)

Next comments

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
This website uses cookies and other technology to provide you a more personalized experience. By continuing the view of our web-pages you accept the terms of using these files. If you don't want your personal data to be processed, please, leave this site.
Learn More →
Accept