>
>
>
How we were looking for a bug in PVS-St…

Grigory Semenchev
Articles: 3

Phillip Khandeliants
Articles: 30

Sergey Larin
Articles: 13

How we were looking for a bug in PVS-Studio or 278 GB of log files

Here is an interesting story of how our team were looking for a bug in the PVS-Studio analyzer. Well, we make mistakes too. However, we are ready to roll up our sleeves and dive deep into the rabbit hole.

A small backstory

A colleague of ours has already talked about our technical support. But it's always interesting to read some tech support stories. We do have them!

If you want heavy programming stuff, you can skip straight to the next section. If you want to get to know the inner workings of our support team, keep reading :).

At the moment we have five development departments:

  • the C and C++ analyzer development department;
  • the C# analyzer development department;
  • Tools & DevOps department;
  • web development department;
  • CRM development department.

The first two departments, as their names suggest, are engaged in the development and support of corresponding static code analyzers. This includes:

The third department develops and supports all additional software for our analyzers:

  • integration with popular IDEs — Visual Studio 2010-2022, IntelliJ IDEA, Rider, CLion;
  • integration with the SonarQube continuous quality control platform;
  • integration with the Unreal Engine and Unity game engines;
  • the utility for converting the analyzer report into various formats — SARIF, TeamCity, HTML, FullHtml, etc.;
  • the utility that notifies the development teams about suspicious fragments in code.

In addition to development, all departments are also engaged in technical support. Every month, we select one or two people from each department to communicate with users via email. Please note: these people do not sit in any call centers and are not engaged in the primary processing of requests. For this, we have another department with great experience. They manage to keep guys from the development department away from most typical user questions, except those that are technically complex. Actually, this is the moment when we — developers — start cooperating with support to solve such complex tasks. In most cases, such tasks will require fixes in code. We believe that this approach not only improves the quality and speed of tech support, but also demonstrates to developers the importance and relevance of the functionality they have developed.

Now let's take a closer look at the support of the C++ department. Support requests for C and C++ analyzer can be divided into the following types:

  • The diagnostic rule issues a false positive. The developer is very lucky if the user sends the code example to reproduce the issue. In most cases, examples sent by email are greatly simplified, and correcting a diagnostic sometimes may become an ordeal.
  • The analyzer does not issue a warning on the user's code example. Here two outcomes are possible:
    • the analyzer does not issue a warning on purpose. Here you can learn more about the reasons why it does so in some cases;
    • the user is correct. We get the necessary clarification on the example from the user, and then we decide: either we fine-tune the existing diagnostic, or we write a new one.
  • The analyzer did not understand some construction of C and C++ languages. The grammar of these languages allow you to write super complicated code, and sometimes the analyzer cannot handle it. In such situations, users send us the V001 errors. To fix such problems, we usually request code examples that can be reproduced or intermediate files for analysis (*.i and *.cfg files).
  • The crash of the C and C++ analyzer's kernel. No one is safe from making mistakes, crashes sometimes happen. And it happens to our analyzer too (V003). Our users help a lot by sending us a stack trace, a memory dump or intermediate files for analysis.
  • One of the many product use cases does not work. The problems of this kind are extremely varied, and it is not possible to describe them all in a sentence or two.

The story mentioned in the title of the article began just with the user's request to the support. The client complained about the freezing of the incremental analysis, so next we will talk about the latter item of the list above.

Incremental analysis that failed

The story began with the user contacting our support with the following issue:

  • they run the analysis in incremental mode or run a check of the file list;
  • they parallel the analysis in N threads;
  • the analyzer works perfectly well up to a certain time in N threads and then "collapses" to a single thread. At the same time, a bunch of V008 errors saying about the inability to preprocess the file begin to pour into the report.

The first action to take in this situation is to look at the log file. After looking at the analyzer's log sent by the user, we found a lot of lines of the following kind:

Command "/usr/bin/c++ -DBOOST_ASIO_DYN_LINK ...." returned code 3.

This line means that the preprocessor stopped working due to a timeout. We run the preprocessor on the compiled project files in order to expand macros and make substitutions of files specified in the #include directives. And only after that we run the analysis on the received files with some additional information (target platform, paths to excluded directories from the analysis, etc.).

Many C++ developers are familiar with the pain of compiling projects with included Boost libraries — the build time greatly increases. Preprocessing is affected by this as well. As you can see from the command above, the user uses Boost in the project. Previously, we also received emails with a similar issue: with high CPU usage, files do not have time to preprocess.

We have had an idea for some time to remove the 30-second preprocessing timeout. And then we got the similar case. It has been decided — we removed the timeout. We send the user a beta and were waiting for a response.

We were about to forget about the fixed bug when the user reported back about the beta:

  • previously the analysis made it to the end, but there was a pile of V008s in the report;
  • now the analysis freezes at the parsing stage of the same files (about 86% of the progress).

Due to the special features of the C and C++ languages, incremental analysis and file list checking mode requires analyzing the dependencies of the files being compiled. For example, if a header file has been modified in incremental analysis mode, any inclusion of that file into other compiled files should be checked. In theory, this allows you to see more different use cases for code from the header file and improve the quality of your analysis.

Therefore, if a header file has been modified, you need to find all the dependent compilation files and analyze them. The same applies to the check of the file list.

Well, the problem turned out to be more complicated. We continued to dig deeper.

Since the preprocessor crash is gone and now apparently it was the C and C++ analyzer's kernel that was freezing, we decided to look at the generated configuration files. And it seems that this is exactly what we needed. There was nothing unusual in the client's settings, except for one small detail:

exclude-path=*/generated/sip*
exclude-path=*/pacs/soapserver/generated/*
exclude-path=*/soap_engine/*
exclude-path=*/tech1utils/tests/googlemock/*
exclude-path=*/sdk-common/*
exclude-path=*/tech1grabbers/SDKs/*
# ....
# 200+ similar entries
# ....
exclude-path=/mnt/nvme/jenkins/workspace/..../lpr-ide.cpp

The exclude-path setting allows to suppress warnings on code from third-party libraries and tests. In a standard situation, users either specify multiple paths to specific directories, or use a search pattern. And the number of entries rarely exceeds 30-40. In our case, there were 200+ different paths with excluded files, including search patterns. We suspected that our algorithm for excluding files from analysis, written 10+ years ago, simply could not quickly handle such a number of entries in the configuration file.

The algorithm for excluding files from analysis worked as follows:

  • We specify platform-specific paths to directories with system libraries. In C and C++ analyzer, some standard paths already exist, for example:
    • "?:\\program files (x86)\\microsoft visual studio *\\vc\\*"
    • "/usr/include/"
    • "/usr/local/Cellar"
    • and other paths, up to about 30.
  • We combine them with the paths specified by the user.
  • We compare the input path with each of the collected list:
    • if the path to be excluded contains the "?" or "*" characters, we need to use a platform-specific function to search by the pattern. On Windows, this is PathMatchSpec, on *nix-like OS — fnmatch;
    • otherwise, we check whether the input path starts with the path from the collected list. When comparing lines, a platform-specific comparison function is used. As we remember, on Windows, the comparison of paths is case-insensitive, on *nix–like OS — mostly case-sensitive.

As you can easily see, the algorithm appears highly unoptimised. Each path in the collected list is first scanned for wildcard characters — this is a full line pass in the worst case. Then a method of comparison is chosen, and in the worst case we are already going through the path twice. And these two passes are performed on all lines in the list.

The first optimization that immediately came to mind was to divide the excluded paths into search patterns (glob patterns) and normal paths in advance. This is how a new class was born in the analyzer, PathMatcher, which contains 2 containers. One container for patterns and one for standard paths:

class PathMatcher
{
// ....
private:
  using GlobsCollection = std::set<std::string, std::less<>>;
  using PathsCollection = ???;

  GlobsCollection m_globs; // templates
  PathsCollection m_paths; // normal paths
};

The search patterns are simple enough — you can store them in the standard std::set container to eliminate duplicates — but there is a nuance with paths. Yes, here you can also use an associative container like std::set, but in most cases the paths passed by the user will contain some common prefix and differ at the very end:

/home/user/folderToExclude/fileToExclude.cpp
/home/user/folderToExclude/
|______common prefix______|

All this suggests the "trie" data structure. It allows you to optimize both memory consumption and the search for the longest prefix possible. After searching for ready-made implementations, we chose Tessil/hat-trie. In order to distinguish files from directories, we used tsl::htrie_map, where the key was our path and the value was the file type.

Now the algorithm works like this:

  • When we read the configuration file, we specify in which container within the PathMatcher class to put the excluded path:
    • if a wildcard character was found, we put the glob patterns in a container for patterns;
    • otherwise, we put it in the prefix tree.
  • We compare the input path to the paths inside the PathMatcher class like this:
    • we search for a common prefix with paths in the trie. If it is found, the file is excluded from the analysis;
    • otherwise, we go through all the glob patterns to compare them with the path of the input file by calling platform-specific functions.

After optimizing the algorithm on a test case with 200+ excluded paths in the configuration file, the analyzer started parsing and analysing files several times faster. It was definitely a success. It remained the case for small — to build a beta, give it to the user and rejoice at our small victory.

The butler did it!

But it was too early to celebrate the victory (close the ticket). The user again wrote about the same freezing.

Well, quick fixes didn't help, we needed to dig even deeper into this issue. We decided to ask the user to run our utility on strace and send us all the logs generated. If anyone does not know, the strace utility allows you to track all system calls of the program and much more. By the way, we use it as one of the options for running the analyzer on the project (the compilation tracing).

Here is the command that the user used to generate logs:

strace -y -v -s 4096 -ff -o strace-logs/log.txt -- pvs-studio-analyzer ....

They left the program running for about 20 minutes before terminating the process. Since during the freeze, the strace utility continued to write information to logs, the size of logs turned out to be impressive — 22795 files with a total weight of 278 GB (!) without compression.

First, we looked at the strace results. And immediately we saw a huge number of nanosleep calls. This meant that the child processes generated by the pvs-studio-analyzer utility, for some reason, were pending. We went through the logs from top to bottom and found the problem (the image is clickable):

When you click on the image, the gif will show that the file descriptor number gradually increases after opening the files. After this number approached the value of 1024, the EMFILE error was generated when an attempt was made to allocate a new descriptor, and then the analysis stopped. This behavior indicates a leak of file descriptors.

In Linux OS, when the file is opened, it is assigned a special number — a descriptor. The descriptor is then used to work with the file: read, write, view attributes, etc. The number of such descriptors is limited and is determined by the system settings.

By the way, it is very easy to reproduce the problem. To reproduce the problem, it is enough to write the following CMakeLists.txt:

cmake_minimum_required(VERSION 3.5)
project(many-files LANGUAGES C CXX)

set(SRC "")

foreach(i RANGE 10000)
  set(file "${CMAKE_CURRENT_BINARY_DIR}/src-${i}.c")
  file(TOUCH "${file}")
  set(SRC "${SRC};${file}")
endforeach()

add_library(many-files STATIC
            ${SRC})

Next, we form the cache in the directory with CMakeLists.txt and run the pvs-studio-analyzer utility version 7.18 and earlier:

cmake -S . -B build -DCMAKE_EXPORT_COMPILE_COMMANDS=On
pvs-studio-analyzer analyze -f ./build/compile_commands.json -j -i -o pvs.log

Unfortunately, at the time of writing this article, the original logs have sunk into oblivion. So in the picture above is a log that we reproduced ourselves.

As previously mentioned, to work correctly, the incremental mode and file list checks require a dependency analysis of the files being compiled. For this we generate a special depend_info.json file which shows the dependencies of compiled files on header files.

Some source files can be compiled multiple times in different projects with different flags within the same "solution". When creating a preprocessed file with the ".PVS-Studio.i" postfix, we have to cut off the extension from the name of the source file. This is due to the fact that some preprocessors refuse to preprocess a file if the final one contains postfixes like ".cpp", ".cxx", etc. in the name.

This can lead to a collision if, for example, two files are preprocessed — "source.cpp " and "source.cxx". To eliminate the race condition, we lock the resulting path — the special ".pvslock" file is created and opened. If a collision occurs, the number 1, 2, 3, etc. is added to the name of the next preprocessed file.

After the first stage of parsing, we already have all the necessary preprocessed files. There is no point in running the preprocessor a second time, and this can save considerable time in subsequent analysis. So after the first step, we move the PreprocessedFile object, which contains all the required paths as well as lock objects, to the cache. At the second stage, we check the cache to see if it contains the preprocessed file we need.

The problem was that when moving the object at the first stage, we forgot to remove the lock. This caused the number of opened temporary ".pvslock" files to increase and the program would freeze if a certain number of file descriptors were exceeded.

The fix was simple enough — moving the object to the cache now removes the lock and the ".pvslock" file is closed and deleteted.

We fixed the resource handling in the program and the problem disappeared. We suspect that this error has not occurred to anyone before, as the Linux version of the analyzer is more commonly used on build servers in normal mode. Incremental analysis is often used in combination with IDEs, and at the moment we fully support only JetBrains CLion on Linux. It appears that, until then, there was no user with the need to analyze a project in incremental mode with a large number of files.

Giving the beta to the client for the third time, we finally solved the problem with the freeze of the analyzer.

Conclusion

Unfortunately, not all problems coming to support are easy to handle. Often the most trivial bugs lie deep inside and are difficult to debug.

We hope that our story was interesting for you. And if you have any problems with our product, do not hesitate to contact our awesome support. We promise we will help you.

Related articles