>
>
>
PVS-Studio Has Finally Got to Boost

Andrey Karpov
Articles: 674

PVS-Studio Has Finally Got to Boost

We thought of checking the Boost library long ago but were not sure if we would collect enough results to write an article. However, the wish remained. We tried to do that twice but gave up each time because we didn't know how to replace a compiler call with a PVS-Studio.exe call. Now we've got us new arms, and the third attempt has been successful. So, are there any bugs to be found in Boost?

Analysis based on pre-generated preprocessed files is no longer supported. Please consider using the Compiler Monitoring mode instead. This mode allows a generation and analysis of preprocessed files automatically during regular build process.

Boost

Boost is a set of open-source libraries for the C++ programming language to extend its functionality. The project was started after the C++ standard had been published, when many programmers felt unsatisfied about some libraries not having been included into the standard. The project is kind of a "test site" for language extensions, some libraries being candidates to be included into future C++ standards. References:

Boost is a "heavy code" which extensively exploits complex templates and is in a sense a test for compilers. It is a usual thing that a certain compiler is able to compile only some of all the projects contained in the contemporary Boost version.

However, it was not with the task of code parsing that we faced troubles. After all, the analyzer can at the worst afford to secretly skip some too complicated construct. But we had hard time integrating into the building process as such.

Let me remind you the currently existing implementations of the analysis process in PVS-Studio.

If you have a common Visual Studio project

It's all very simple. You can perform project analysis directly from within the development environment. Another way is to run PVS-Studio from the command line and get a file with the bug report. This mode is convenient to use with continuous integration systems (for example Cruise Control, Draco.NET, or Team Foundation Build). See the documentation for details on this analysis mode. If you want to learn about interaction with continuous integration systems, follow this link.

If you don't have a project file (or it is in fact a disguised makefile)

In this case you need to enable a build mode when the code analyzer is run instead of (or together with) the compiler. As the output you also get a bug report. To run analysis in this mode, you need the magic spells described in the documentation as well. The magician should be very attentive, study the manual carefully and never forget symbols.

It was this approach we needed to use to check Boost. However, we appeared to be magicians of a level too low for this task (or maybe just too lazy magicians). We could never find our way around the build system to pass all the necessary parameters to the analyzer console version.

The third (new) implementation of project analysis came to help us

My colleague has already mentioned this new mode in the recent post "News from PVS-Studio Developers' Secret Basement Lab". You see, you don't have to perform an absolute, complete integration into the build system; you just need to get the preprocessed *.i files. This method is much easier, and we used it.

With the help of a prototype of our new tool (PVS-Studio Standalone) we analyzed all the *.i files and finally got the bug report we had been waiting for so long. Our new program also allows you to handle the warning list and edit the code.

I hope we will include this tool into the distribution package a few versions later. Perhaps it will happen when releasing PVS-Studio 5.10.

A few words about the mode we don't yet have but are dreaming of

We are gradually approaching the solution of the task of tracing compiler actions. This mode will also be a feature of the new PVS-Studio Standalone tool. It will allow tracing all compiler calls and collecting its call switches. Thus, you will only need to perform the following steps. Tell the tool: "Start tracing!"; build a project with any build system; tell the tool: "Stop! ". After that, the analyzer will know exactly how this project should be checked from now on. Of course, if the project structure or parameters are changed, you will have to repeat the whole process. OK, enough of dreaming. Let's go back to our Boost check.

The Feeling of Despair

At some moment I accepted the thought that we would not be able to write an article about checking Boost. There was some ground for that assumption.

Too many compilers and tools

Boost can be built by many compilers. Some of them build it completely, while some others only partially. I didn't investigate this issue, but as far as I understand, Boost is pretty well compiled by Visual C++, Intel C++, Sun Studio, Compaq C++, GCC, and Clang. Each compiler has its own unique diagnostic features, and the total effect of using them all must provide a very high quality of the code: one compiler finds bug A, another finds bug B, and so on.

Moreover, the Boost library is kind of a test site for various tools and static code analyzers. Since Boost extensively exploits various contemporary features of the C++ language, one is curious to know if one's tool can manage such a complex code. As a consequence, Boost has been by now checked and re-checked all through by numbers of code analyzers.

Seeking for any bugs and misprints in the library after so many compilers and other tools have worked on it is an almost hopeless task.

Too many users

The Boost library is used in many projects. We ourselves were using it for some time in the PVS-Studio project (it was called Viva64 at that time). We employed mechanisms of handling regular expressions, configuration files, and a couple of other features. Then we figured out that regular expressions were a dead-end and gradually eliminated them from the code. On the other hand, continuing to carry Boost with us just for the sake of its configuration files was doubtful. In addition, some unpleasant drawbacks were revealed: for example, it was impossible to use the character '#' in a filename, as it served to indicate the beginning of a comment. Thus, Boost was not of much help in our particular case, but it is surely a very useful library as such.

Since many programmers extensively use it, they also quickly catch bugs in it. Some remaining bugs can only be found in rarely used code fragments or exotic subsystems few users employ.

Templates

Boost uses a lot of template classes. If they are not instanced, it's almost impossible to check them. For example, Visual C++ does not at all parse template classes if they are not utilized. You can write any nonsense in a non-instanced template class and get a successfully compiled file - you only need to make sure that all the opening braces (), <>, {}, [] and quotes "", '' have the corresponding closing counterparts.

When analyzing a template class, you have to make a compromise between the number of false positives you want to see and chances to skip genuine errors. Here is an example to explain that difficulty.

template <typename T>
bool IsNAN(T x) { return x != x; }

This function checks if the variable's value is not-a-number. This comparison makes sense only in case of types float/double/long double but is meaningless for integer types and therefore indicates an error if present.

What should you do when you don't know the variable's type? An unresolvable question. For the analyzer to be able to perform a complete analysis, all the templates should be used in all the possible variants. More than that, templates need to be parsed, which is a difficult task indeed. PVS-Studio does have some problems with it: some items it can parse and even try to instance, while some others it cannot.

Anyway, analyzing templates is a very ungrateful task, and Boost has numbers of templates.

Evaluating chances of success

Pondering over and evaluating the above mentioned troubles, I was feeling quite pessimistic. I guessed that we might find nothing interesting at all or find a single bug at most which would be not enough for writing an article anyway.

Finding 3 or 4 bugs in Boost would be a huge success.

So, let's see what PVS-Studio 5.06 has managed to find in Boost 1.55 (this version is still in the development process).

Boost Analysis Results

Pretty few bugs and suspicious fragments were found in the library, which is natural. But I still think it's just a really grand result.

Fragment No. 1. A misprint

point3D operator/(const point3D &p1, const point3D &p2)
{
  return point3D( p1.x/p2.x , p1.y/p2.y , p1.z/p1.z );
}

PVS-Studio's diagnostic message: V501 There are identical sub-expressions to the left and to the right of the '/' operator: p1.z / p1.z lorenz_point.cpp 61

The variable 'p1.z' is divided by itself in the function's third argument. It must be probably divided by 'p2.z'.

Fragment No. 2. Class member initialization error

BOOST_UBLAS_INLINE
compressed_matrix_view(const compressed_matrix_view& o) :
  size1_(size1_), size2_(size2_),
  nnz_(nnz_),
  index1_data_(index1_data_),
  index2_data_(index2_data_),
  value_data_(value_data_)
{}

The first diagnostic message by PVS-Studio (no reason citing all the rest): V546 Member of a class is initialized by itself: 'size1_(size1_)'. sparse_view.hpp 193

The class members are initialized to themselves. I guess the data from the 'o' object should be used instead. And I think the constructor should look like this:

BOOST_UBLAS_INLINE
compressed_matrix_view(const compressed_matrix_view& o) :
  size1_(o.size1_), size2_(o.size2_),
  nnz_(o.nnz_),
  index1_data_(o.index1_data_),
  index2_data_(o.index2_data_),
  value_data_(o.value_data_)
{}

Fragment No. 3. Memory released in an inappropriate way

static std::basic_string<wchar_t> get(char const* source = "")
{
  ....
  std::auto_ptr<wchar_t> result (new wchar_t[len+1]);
  ....
}

PVS-Studio's diagnostic message: V554 Incorrect use of auto_ptr. The memory allocated with 'new []' will be cleaned using 'delete'. tree_to_xml.ipp 71

The 'std::auto_ptr' container is not a good type to store a pointer to an object array. To destroy the object, the 'delete' operator will be used instead of 'delete []'. This bug does not look fatal, but it is genuine indeed.

A similar bug is found here: generate_static.hpp 53.

Fragment No. 4. SOCKET, a classic of the genre

I believe there are very few projects which don't have at least one bug related to using the SOCKET type. Let me remind you what it is all about. Programmers often check the status of an operation in the following way:

SOCKET s = Foo();
if (s < 0) { Error(); }

This check is illegal. You must compare the variable to the constant SOCKET_ERROR. But programmers are lazy to do that and write "socket < 0" or "socket >= 0" instead.

In Linux, the SOCKET type is signed, so such a slopwork is pardoned there. In Windows, the SOCKET type is unsigned, so the condition it is used in is always false and the error is not processed in any way.

Boost has one defect of that kind too.

typedef SOCKET socket_type;

class socket_holder
{
  ....
  socket_type socket_;
  ....
  socket_type get() const { return socket_; }
  ....
};

template <typename Socket>
boost::system::error_code accept(....)
{
  ....
  // On success, assign new connection to peer socket object.
  if (new_socketnew_socket.get() >= 0)
  {
    if (peer_endpoint)
      peer_endpoint->resize(addr_len);
    if (!peer.assign(impl.protocol_, new_socket.get(), ec))
      new_socket.release();
  }
  return ec;
}

PVS-Studio's diagnostic message: V547 Expression 'new_socket.get() >= 0' is always true. Unsigned type value is always >= 0. win_iocp_socket_service.hpp 436

In Windows, this code fragment will work quite in a different way than the programmer expected. The condition "new_socketnew_socket.get() >= 0" is always true.

Fragment No. 5. A misprint

void set_duration_style(duration_style style)
{
  duration_style_ == style;
}

PVS-Studio's diagnostic message: V607 Ownerless expression 'duration_style_ == style'. base_formatter.hpp 51

I don't think I need to comment much upon this. As the function name suggests to me, the following line should be written: "duration_style_ = style". It's just a misprint.

Take a look at this bug and imagine how many nerves and how much time PVS-Studio can help you to save when being used regularly. All of us constantly make such slip-ups. Then we spend much time seeking for and fixing them. They don't stick in our memory because they are too small. But being summed up, they turn into hours and days wasted by the programmer. PVS-Studio is very good at finding misprints. Try the incremental analysis mode, and when the tool shows you a few slip-ups like that after compilation, you'll fall in love with it.

You don't make such mistakes, do you? Look closely at the foregoing sample once again (and various other samples). It doesn't look like being written by a student. You see, the human brain is not perfect; we all make mistakes. That's a normal thing and there's nothing bad about securing yourself against them by using different auxiliary utilities and techniques: static code analyzers, the TDD methodology, and code review.

Fragment No. 6. Potentially dangerous reading from a stream

template< typename CharT >
basic_settings< CharT > parse_settings(std::basic_istream< CharT >&
  strm)
{
  ....
  string_type line;
  while (!strm.eof())
  {
     std::getline(strm, line);

     const char_type* p = line.c_str();
     parser.parse_line(p, p + line.size());

     line.clear();
     ++line_number;
  }
  ....
}

PVS-Studio's diagnostic message: V663 Infinite loop is possible. The 'cin.eof()' condition is insufficient to break from the loop. Consider adding the 'cin.fail()' function call to the conditional expression. settings_parser.cpp 285

This code fragment does what it should - reading data from a file. The analyzer does not like this situation because it may cause an infinite loop. I don't know how to simulate the dangerous situation exactly, but I will try to guess. Assume the file is being stored on a network disk. We start reading it, and suddenly the connection breaks. The function 'eof()' will be returning 'false', for the end of the file has not been reached. To catch such cases it is recommended to use the function 'eof()' together with 'fail()'. In the fragment above, the 'fail()' function is not called anywhere, which means that the described trouble may occur. It is from such nuances and subtleties that software reliability and bug tolerance is made up.

And here is one more potentially dangerous fragment: V663 Infinite loop is possible. The 'cin.eof()' condition is insufficient to break from the loop. Consider adding the 'cin.fail()' function call to the conditional expression. adjacency_list_io.hpp 195

Fragment No. 7. Suspicious subtraction

template<> 
struct identity_element<boost::gregorian::date_duration>
{
  static boost::gregorian::date_duration value()
  { 
    return
      boost::gregorian::date(boost::gregorian::min_date_time) -
      boost::gregorian::date(boost::gregorian::min_date_time); 
  }
};

PVS-Studio's diagnostic message: V501 There are identical sub-expressions 'boost::gregorian::date(boost::gregorian::min_date_time)' to the left and to the right of the '-' operator. gregorian.hpp 57

Am I right supposing this function will always return 0?

Conclusion

I believe PVS-Studio has proved to be a fine tool. It's a great success to find anything in Boost, and the analyzer has managed to do that!