To get a trial key
fill out the form below
Team License (a basic version)
Enterprise License (an extended version)
* By clicking this button you agree to our Privacy Policy statement

Request our prices
New License
License Renewal
--Select currency--
USD
EUR
GBP
RUB
* By clicking this button you agree to our Privacy Policy statement

Free PVS-Studio license for Microsoft MVP specialists
* By clicking this button you agree to our Privacy Policy statement

To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

I am interested to try it on the platforms:
* By clicking this button you agree to our Privacy Policy statement

Message submitted.

Your message has been sent. We will email you at


If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

>
>
>
Protocol Buffers, a brutal protocol fro…

Protocol Buffers, a brutal protocol from Google, vs. PVS-Studio, a static code analyzer

Nov 05 2021
Author:

Protocol Buffers is a very popular, cool, and high-quality product that is mostly developed by Google. This is a good challenge for the PVS-Studio static code analyzer. Finding at least something is already an achievement. Let's give it a shot.

0882_Protocol_Buffers_vs_PVS_Studio/image1.png

I am writing about Protocol Buffers (protobuf) as part of a long-term series of articles about checking open-source projects. The library implements a protocol for structured data serialization. This is an effective binary alternative to the XML text format.

The project seemed like an intriguing challenge for the PVS-Studio analyzer, because Google takes a very serious approach to the quality of the C++ code it produces. Take, for example, the "Safer Usage of C++" document that has been actively discussed recently. Additionally, many developers use protobuf in their projects - which means the protobuf product is well-tested. Finding at least a couple of mistakes in this project is a challenge that we have taken upon us. So what are we waiting for? Time to find out what PVS-Studio can do!

0882_Protocol_Buffers_vs_PVS_Studio/image2.png

We have never checked this project on purpose before. Once, three years ago, we examined it when writing a series of articles about checking Chromium. We found an interesting error in a data-checking function and described it in a standalone article - "February 31".

To be honest, when I was writing my article this time, I had a specific plan. I wanted to demonstrate the analyzer's new feature - the intermodular analysis mechanism for C++ projects - and what it can do. Unfortunately, this time, intermodular analysis didn't produce any new interesting results. With or without it - it was all the same, no new interesting analyzer triggers in code. Although this was not surprising. It's hard to find anything in this project, at all :).

So let's see what mistakes evaded the eye of developers and assisting tools.

Copy-paste

void SetPrimitiveVariables(....) {
  ....
  if (HasHasbit(descriptor)) {
    (*variables)["get_has_field_bit_message"] = ....;
    (*variables)["set_has_field_bit_message"] = ....;
    (*variables)["clear_has_field_bit_message"] = ....;
    ....
  } else {
    (*variables)["set_has_field_bit_message"] = "";      // <=
    (*variables)["set_has_field_bit_message"] = "";      // <=
    (*variables)["clear_has_field_bit_message"] = "";
  ....
}

PVS-Studio warns: V519 [CWE-563] The variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 163, 164. java_primitive_field_lite.cc 164

This is a classic error that occurred when a developer was copying code lines. The developer fixed some code lines, but missed the other ones. As a result, the code sets the same key - "set_has_field_bit_message" - twice.

If you look at the code above, it becomes clear that, in the else code block, the developer intended to write the following:

(*variables)["get_has_field_bit_message"] = "";
(*variables)["set_has_field_bit_message"] = "";
(*variables)["clear_has_field_bit_message"] = "";

File descriptor leak

ExpandWildcardsResult ExpandWildcards(
    const string& path, std::function<void(const string&)> consume) {
  ....
  HANDLE handle = ::FindFirstFileW(wpath.c_str(), &metadata);
  ....
  do {
    // Ignore ".", "..", and directories.
    if ((metadata.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) == 0 &&
        kDot != metadata.cFileName && kDotDot != metadata.cFileName) {
      matched = ExpandWildcardsResult::kSuccess;
      string filename;
      if (!strings::wcs_to_utf8(metadata.cFileName, &filename)) {
        return ExpandWildcardsResult::kErrorOutputPathConversion;       // <=
      }
    ....
  } while (::FindNextFileW(handle, &metadata));
  FindClose(handle);
  return matched;
}

PVS-Studio warns: V773 [CWE-401] The function was exited without releasing the 'handle' handle. A resource leak is possible. io_win32.cc 400

Before the function exits, the FindClose(handle) method call must close the handle file descriptor. However, this does not happen if UTF-8-encoded text fails to convert to UTF-8. In this case, the function exits with an error.

Potential overflow

uint32_t GetFieldOffset(const FieldDescriptor* field) const {
  if (InRealOneof(field)) {
    size_t offset =
        static_cast<size_t>(field->containing_type()->field_count() +
                            field->containing_oneof()->index());
    return OffsetValue(offsets_[offset], field->type());
  } else {
    return GetFieldOffsetNonOneof(field);
  }
}

PVS-Studio warns: V1028 [CWE-190] Possible overflow. Consider casting operands, not the result. generated_message_reflection.h 140

Two int type values are added and placed into the size_t variable:

size_t offset = static_cast<size_t>(int_var_1 + int_var_2);

It is assumed that in case of a 64-bit build, the sum of two 32-bit variables can exceed the INT_MAX value. This is why the result is written to the size_t type variable that will be a 64-bit variable in a 64-bit application. Moreover, since adding two int values can result in overflow, the developer uses an explicit cast.

However, this explicit cast is used incorrectly. And it does not protect from anything. The implicit cast from int to size_t would have worked without it. So the code does not differ in any way from the following:

size_t offset = int_var_1 + int_var_2;

I assume that, by accident, the developer placed a parenthesis in the wrong spot. Here's the correct code:

size_t offset = static_cast<size_t>(int_var_1) + int_var_2;

Null pointer dereferencing

bool KotlinGenerator::Generate(....)
{
  ....
  std::unique_ptr<FileGenerator> file_generator;
  if (file_options.generate_immutable_code) {
    file_generator.reset(
        new FileGenerator(file, file_options, /* immutable_api = */ true));
  }

  if (!file_generator->Validate(error)) {
    return false;
  }
  ....
}

PVS-Studio warns: V614 [CWE-457] Potentially null smart pointer 'file_generator' used. java_kotlin_generator.cc 100

If the generate_immutable_code variable equals false, then the smart file_generator pointer remains equal to nullptr. Consequently, the null pointer will be dereferenced.

Apparently the generate_immutable_code variable is always true - otherwise the mistake would have been detected already. It can be called insignificant. As soon as someone edits the code and its logic, the null pointer will be dereferenced, someone will notice and fix the problem. On the other hand, this code contains, so-to-say, a mine. And it's better to find it early than to sit and wait till someone blows themselves up on it in the future. The point of static analysis is to find errors before they get dangerous.

Is the parenthesis in the right spot?

AlphaNum::AlphaNum(strings::Hex hex) {
  char *const end = &digits[kFastToBufferSize];
  char *writer = end;
  uint64 value = hex.value;
  uint64 width = hex.spec;
  // We accomplish minimum width by OR'ing in 0x10000 to the user's value,
  // where 0x10000 is the smallest hex number that is as wide as the user
  // asked for.
  uint64 mask = ((static_cast<uint64>(1) << (width - 1) * 4)) | value;
  ....
}

Let's take a look at this subexpression:

((static_cast<uint64>(1) << (width - 1) * 4))

The analyzer does not like this code for two reasons:

  • V634 [CWE-783] The priority of the '*' operation is higher than that of the '<<' operation. It's possible that parentheses should be used in the expression. strutil.cc 1408
  • V592 The expression was enclosed by parentheses twice: ((expression)). One pair of parentheses is unnecessary or misprint is present. strutil.cc 1408

You probably agree that these warnings compliment each other. The shift and multiplication operators are used together. It's easy to forget which one has a higher priority. And the recurring parentheses hint that the author knew about the ambiguity and wanted to avoid it. But that didn't work.

There are two ways to understand this code. Version one: the code is correct. In this case, additional parentheses only make it easier to read the code and do not affect anything:

uint64 mask = (static_cast<uint64>(1) << ((width - 1) * 4)) | value;

Version two: the expression contains a mistake. If this is so, the additional parentheses must change the order of the operations performed:

uint64 mask = ((static_cast<uint64>(1) << (width - 1)) * 4) | value;

Conclusion

It's a good feeling to be able to find flaws in a well-known and quality product - such as protobuf. On the other hand, it's probably not the best idea to use protobuf to demonstrate static code analysis capabilities :). It's difficult to show off the tool's features if the tool can find only a couple of errors :).

0882_Protocol_Buffers_vs_PVS_Studio/image3.png

Let me remind you that the static analyzer is the most beneficial when used regularly to check new code - and not for one-time checks of already tested projects.

However, you need to start somewhere. So I recommend downloading PVS-Studio, checking your project, and taking a look at the Best Warnings. Most likely, you will see many things that require your attention :).

If your code is of the highest quality - like that of protobuf - I recommend you start using the analyzer as intended. Try integrating PVS-Studio into the development process and see what it can find every day. Wondering how you can do this if yours is a large project? Click here.

Popular related articles
The Evil within the Comparison Functions

Date: May 19 2017

Author: Andrey Karpov

Perhaps, readers remember my article titled "Last line effect". It describes a pattern I've once noticed: in most cases programmers make an error in the last line of similar text blocks. Now I want t…
The Last Line Effect

Date: May 31 2014

Author: Andrey Karpov

I have studied many errors caused by the use of the Copy-Paste method, and can assure you that programmers most often tend to make mistakes in the last fragment of a homogeneous code block. I have ne…
PVS-Studio ROI

Date: Jan 30 2019

Author: Andrey Karpov

Occasionally, we're asked a question, what monetary value the company will receive from using PVS-Studio. We decided to draw up a response in the form of an article and provide tables, which will sho…
How PVS-Studio Proved to Be More Attentive Than Three and a Half Programmers

Date: Oct 22 2018

Author: Andrey Karpov

Just like other static analyzers, PVS-Studio often produces false positives. What you are about to read is a short story where I'll tell you how PVS-Studio proved, just one more time, to be more atte…
Free PVS-Studio for those who develops open source projects

Date: Dec 22 2018

Author: Andrey Karpov

On the New 2019 year's eve, a PVS-Studio team decided to make a nice gift for all contributors of open-source projects hosted on GitHub, GitLab or Bitbucket. They are given free usage of PVS-Studio s…
Static analysis as part of the development process in Unreal Engine

Date: Jun 27 2017

Author: Andrey Karpov

Unreal Engine continues to develop as new code is added and previously written code is changed. What is the inevitable consequence of ongoing development in a project? The emergence of new bugs in th…
Characteristics of PVS-Studio Analyzer by the Example of EFL Core Libraries, 10-15% of False Positives

Date: Jul 31 2017

Author: Andrey Karpov

After I wrote quite a big article about the analysis of the Tizen OS code, I received a large number of questions concerning the percentage of false positives and the density of errors (how many erro…
Appreciate Static Code Analysis!

Date: Oct 16 2017

Author: Andrey Karpov

I am really astonished by the capabilities of static code analysis even though I am one of the developers of PVS-Studio analyzer myself. The tool surprised me the other day as it turned out to be sma…
The way static analyzers fight against false positives, and why they do it

Date: Mar 20 2017

Author: Andrey Karpov

In my previous article I wrote that I don't like the approach of evaluating the efficiency of static analyzers with the help of synthetic tests. In that article, I give the example of a code fragment…
PVS-Studio for Java

Date: Jan 17 2019

Author: Andrey Karpov

In the seventh version of the PVS-Studio static analyzer, we added support of the Java language. It's time for a brief story of how we've started making support of the Java language, how far we've co…

Comments (0)

Next comments
This website uses cookies and other technology to provide you a more personalized experience. By continuing the view of our web-pages you accept the terms of using these files. If you don't want your personal data to be processed, please, leave this site.
Learn More →
Accept