To get a trial key
fill out the form below
Team License (standard version)
Enterprise License (extended version)
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Request our prices
New License
License Renewal
--Select currency--
USD
EUR
GBP
RUB
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
I am interested to try it on the platforms:
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Message submitted.

Your message has been sent. We will email you at


If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

>
>
>
PVS-Studio Learns What strlen is All Ab…

PVS-Studio Learns What strlen is All About

Apr 27 2021
Author:

Somehow, it so happens that we write about our diagnostics, but barely touch upon the subject of how we enhance the analyzer's internal mechanics. So, for a change, today we'll talk about a new useful upgrade for our data flow analysis.

0824_DataFlow_And_Strlen/image1.png

How It Started: a Tweet from JetBrains CLion IDE

A few days ago I saw a post from JetBrains about new features offered by CLion's built-in static analyzer.

0824_DataFlow_And_Strlen/image2.png

Since we are soon planning to release the PVS-Studio plugin for CLion, I could not just ignore their announcement! I had to point out that PVS-Studio is also powerful. And that the PVS-Studio plugin for CLion can find even more mistakes.

0824_DataFlow_And_Strlen/image3.png

So I had a nice little chat with JetBrains:

I pondered this all for a little bit. Very nice! They enhanced their data flow analysis and told the world about it. We are no worse! We're always enhancing the analyzer's engine - including that very data flow analysis mechanics. So here I am, writing this note.

What's up with Our Data Flow

One of our clients described an error that PVS-Studio unfortunately failed to find. A couple of days ago we upgraded the analyzer so that it can find this error. Sometimes, in case of an overflow, the analyzer got confused with unsigned variable values. The code that caused the problem looked something like this:

bool foo()
{
  unsigned N = 2;
  for (unsigned i = 0; i < N; ++i)
  {
    bool stop = (i - 1 == N);
    if (stop)
      return true;
  }
  return false;
}

The analyzer could not understand that the stop variable was always assigned the false value.

Why false? Let's do a quick calculation:

  • the variable's value range is i = [0; 1];
  • the expression's possible result is i-1 = [0; 0] U [UINT_MAX; UINT_MAX];
  • the N variable equals two and falls beyond the { 0, UINT_MAX } set;
  • the expression is always false.

Note. There is no undefined behavior here, because numbers are overflown (wrapped) when you work with an unsigned type.

Now we have taught PVS-Studio to process these expressions correctly and to issue an appropriate warning. Interestingly, this change led to other improvements.

For example, the initial change caused false positives related to string length processing. While fighting them, we introduced more enhancements and taught the analyzer about functions like strlen - how and why they are used. Now I'll go ahead and show you the analyzer's new abilities.

There is an open-source project test base that we use for our core's regression testing. The project test base contains the FCEUX emulator. The upgraded analyzer found an interesting error in the Assemble function.

int Assemble(unsigned char *output, int addr, char *str) {
  output[0] = output[1] = output[2] = 0;
  char astr[128],ins[4];
  if ((!strlen(str)) || (strlen(str) > 0x127)) return 1;
  strcpy(astr,str);
  ....
}

Can you see it? To be honest, we did not notice it immediately and our first thought was, "Oh no, we broke something!" Then we saw what was up and took a minute to appreciate the advantages of static analysis.

PVS-Studio warned: V512 A call of the 'strcpy' function will lead to overflow of the buffer 'astr'. asm.cpp 21

Still don't see the error? Let's go through the code step by step. To start with, we'll remove everything irrelevant:

int Assemble(char *str) {
  char astr[128];
  if ((!strlen(str)) || (strlen(str) > 0x127)) return 1;
  strcpy(astr,str);
  ....
}

The code above declares a 128-byte array. The plan is to verify a string and then pass it to the strcpy function that copies the string to the array. The string should not be copied if it is empty or contains over 127 characters (not counting the terminal zero).

So far, all is well and good, right? Wait, wait, wait. What do we see here? What kind of a constant is 0x127?!

It's not 127 at all. Far from it!

This constant is set in hexadecimal notation. If you convert it to decimal, you get 295.

So, the code above is equivalent to the following:

int Assemble(char *str) {
  char astr[128];
  if ((!strlen(str)) || (strlen(str) > 295)) return 1;
  strcpy(astr,str);
  ....
}

As you can see, the str string check does not prevent possible buffer overflows. The analyzer correctly warns you about the problem.

Previously, the analyzer could not find the error. The analyzer could not understand that both strlen function calls work with the same string. And the string does not change between them. Although things like this one are obvious to developers, this is not the case for the analyzer. It needs to be taught expressly.

Now PVS-Studio warns that the str string length is in the [1..295] range, and thus may exceed the array bounds if copied to the astr buffer.

0824_DataFlow_And_Strlen/image4.png

New Challenges

The error above also exists in the FCEUX project's current code base. But we will not find it, because now the string's length is written to a variable. This breaks the connection between the string and its length. For now, the analyzer is oblivious to this error in the code's new version:

int Assemble(unsigned char *output, int addr, char *str) {
  output[0] = output[1] = output[2] = 0;
  char astr[128],ins[4];
  int len = strlen(str);
  if ((!len) || (len > 0x127)) return 1;
  strcpy(astr,str);
  ....
}

This code is easy for a human to understand. The static analyzer, however, has a difficult time tracking values here. It needs to know that the len variable represents the str string's length. Additionally, it needs to carefully track when this connection breaks. This happens when the len variable or the string contents are modified.

So far, PVS-Studio does not know how to track these values. On the bright side, now here's one more direction to grow and develop! Over time, the analyzer will learn to find the error in this new code as well.

By the way, the reader may wonder why we analyze projects' old code and do not upgrade the test projects regularly. It's simple, really. If we update the test projects, we won't be able to perform regression testing. It will be unclear what caused the analyzer to behave differently - the analyzer's or the test projects' code changes. This is why we do not update open-source projects we use for testing.

Of course, we need to test the analyzer on modern code written in C++14, C++17 etc. To do this, we add new projects to the database. For example, one of our recent additions was a header-only C++ library collection (awesome-hpp).

Conclusion

It's always interesting and useful to enhance data flow analysis mechanisms. Do you think so too? Do you want to know more about how static code analysis tools work? Then we recommend you read the following articles:

On a final note, I invite you to download the PVS-Studio analyzer and check your projects.

Popular related articles
The Last Line Effect

Date: May 31 2014

Author: Andrey Karpov

I have studied many errors caused by the use of the Copy-Paste method, and can assure you that programmers most often tend to make mistakes in the last fragment of a homogeneous code block. I have ne…
Appreciate Static Code Analysis!

Date: Oct 16 2017

Author: Andrey Karpov

I am really astonished by the capabilities of static code analysis even though I am one of the developers of PVS-Studio analyzer myself. The tool surprised me the other day as it turned out to be sma…
Static analysis as part of the development process in Unreal Engine

Date: Jun 27 2017

Author: Andrey Karpov

Unreal Engine continues to develop as new code is added and previously written code is changed. What is the inevitable consequence of ongoing development in a project? The emergence of new bugs in th…
The Evil within the Comparison Functions

Date: May 19 2017

Author: Andrey Karpov

Perhaps, readers remember my article titled "Last line effect". It describes a pattern I've once noticed: in most cases programmers make an error in the last line of similar text blocks. Now I want t…
Free PVS-Studio for those who develops open source projects

Date: Dec 22 2018

Author: Andrey Karpov

On the New 2019 year's eve, a PVS-Studio team decided to make a nice gift for all contributors of open-source projects hosted on GitHub, GitLab or Bitbucket. They are given free usage of PVS-Studio s…
Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities

Date: Nov 21 2018

Author: Andrey Karpov

A brief description of technologies used in the PVS-Studio tool, which let us effectively detect a large number of error patterns and potential vulnerabilities. The article describes the implementati…
The way static analyzers fight against false positives, and why they do it

Date: Mar 20 2017

Author: Andrey Karpov

In my previous article I wrote that I don't like the approach of evaluating the efficiency of static analyzers with the help of synthetic tests. In that article, I give the example of a code fragment…
PVS-Studio for Java

Date: Jan 17 2019

Author: Andrey Karpov

In the seventh version of the PVS-Studio static analyzer, we added support of the Java language. It's time for a brief story of how we've started making support of the Java language, how far we've co…
PVS-Studio ROI

Date: Jan 30 2019

Author: Andrey Karpov

Occasionally, we're asked a question, what monetary value the company will receive from using PVS-Studio. We decided to draw up a response in the form of an article and provide tables, which will sho…
Characteristics of PVS-Studio Analyzer by the Example of EFL Core Libraries, 10-15% of False Positives

Date: Jul 31 2017

Author: Andrey Karpov

After I wrote quite a big article about the analysis of the Tizen OS code, I received a large number of questions concerning the percentage of false positives and the density of errors (how many erro…

Comments (0)

Next comments

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
This website uses cookies and other technology to provide you a more personalized experience. By continuing the view of our web-pages you accept the terms of using these files. If you don't want your personal data to be processed, please, leave this site.
Learn More →
Accept