Webinar: C++ semantics - 06.11
Here is an interesting story of how our team were looking for a bug in the PVS-Studio analyzer. Well, we make mistakes too. However, we are ready to roll up our sleeves and dive deep into the rabbit hole.
A colleague of ours has already talked about our technical support. But it's always interesting to read some tech support stories. We do have them!
If you want heavy programming stuff, you can skip straight to the next section. If you want to get to know the inner workings of our support team, keep reading :).
At the moment we have five development departments:
The first two departments, as their names suggest, are engaged in the development and support of corresponding static code analyzers. This includes:
The third department develops and supports all additional software for our analyzers:
In addition to development, all departments are also engaged in technical support. Every month, we select one or two people from each department to communicate with users via email. Please note: these people do not sit in any call centers and are not engaged in the primary processing of requests. For this, we have another department with great experience. They manage to keep guys from the development department away from most typical user questions, except those that are technically complex. Actually, this is the moment when we — developers — start cooperating with support to solve such complex tasks. In most cases, such tasks will require fixes in code. We believe that this approach not only improves the quality and speed of tech support, but also demonstrates to developers the importance and relevance of the functionality they have developed.
Now let's take a closer look at the support of the C++ department. Support requests for C and C++ analyzer can be divided into the following types:
The story mentioned in the title of the article began just with the user's request to the support. The client complained about the freezing of the incremental analysis, so next we will talk about the latter item of the list above.
The story began with the user contacting our support with the following issue:
The first action to take in this situation is to look at the log file. After looking at the analyzer's log sent by the user, we found a lot of lines of the following kind:
Command "/usr/bin/c++ -DBOOST_ASIO_DYN_LINK ...." returned code 3.
This line means that the preprocessor stopped working due to a timeout. We run the preprocessor on the compiled project files in order to expand macros and make substitutions of files specified in the #include directives. And only after that we run the analysis on the received files with some additional information (target platform, paths to excluded directories from the analysis, etc.).
Many C++ developers are familiar with the pain of compiling projects with included Boost libraries — the build time greatly increases. Preprocessing is affected by this as well. As you can see from the command above, the user uses Boost in the project. Previously, we also received emails with a similar issue: with high CPU usage, files do not have time to preprocess.
We have had an idea for some time to remove the 30-second preprocessing timeout. And then we got the similar case. It has been decided — we removed the timeout. We send the user a beta and were waiting for a response.
We were about to forget about the fixed bug when the user reported back about the beta:
Due to the special features of the C and C++ languages, incremental analysis and file list checking mode requires analyzing the dependencies of the files being compiled. For example, if a header file has been modified in incremental analysis mode, any inclusion of that file into other compiled files should be checked. In theory, this allows you to see more different use cases for code from the header file and improve the quality of your analysis.
Therefore, if a header file has been modified, you need to find all the dependent compilation files and analyze them. The same applies to the check of the file list.
Well, the problem turned out to be more complicated. We continued to dig deeper.
Since the preprocessor crash is gone and now apparently it was the C and C++ analyzer's kernel that was freezing, we decided to look at the generated configuration files. And it seems that this is exactly what we needed. There was nothing unusual in the client's settings, except for one small detail:
exclude-path=*/generated/sip*
exclude-path=*/pacs/soapserver/generated/*
exclude-path=*/soap_engine/*
exclude-path=*/tech1utils/tests/googlemock/*
exclude-path=*/sdk-common/*
exclude-path=*/tech1grabbers/SDKs/*
# ....
# 200+ similar entries
# ....
exclude-path=/mnt/nvme/jenkins/workspace/..../lpr-ide.cpp
The exclude-path setting allows to suppress warnings on code from third-party libraries and tests. In a standard situation, users either specify multiple paths to specific directories, or use a search pattern. And the number of entries rarely exceeds 30-40. In our case, there were 200+ different paths with excluded files, including search patterns. We suspected that our algorithm for excluding files from analysis, written 10+ years ago, simply could not quickly handle such a number of entries in the configuration file.
The algorithm for excluding files from analysis worked as follows:
As you can easily see, the algorithm appears highly unoptimised. Each path in the collected list is first scanned for wildcard characters — this is a full line pass in the worst case. Then a method of comparison is chosen, and in the worst case we are already going through the path twice. And these two passes are performed on all lines in the list.
The first optimization that immediately came to mind was to divide the excluded paths into search patterns (glob patterns) and normal paths in advance. This is how a new class was born in the analyzer, PathMatcher, which contains 2 containers. One container for patterns and one for standard paths:
class PathMatcher
{
// ....
private:
using GlobsCollection = std::set<std::string, std::less<>>;
using PathsCollection = ???;
GlobsCollection m_globs; // templates
PathsCollection m_paths; // normal paths
};
The search patterns are simple enough — you can store them in the standard std::set container to eliminate duplicates — but there is a nuance with paths. Yes, here you can also use an associative container like std::set, but in most cases the paths passed by the user will contain some common prefix and differ at the very end:
/home/user/folderToExclude/fileToExclude.cpp
/home/user/folderToExclude/
|______common prefix______|
All this suggests the "trie" data structure. It allows you to optimize both memory consumption and the search for the longest prefix possible. After searching for ready-made implementations, we chose Tessil/hat-trie. In order to distinguish files from directories, we used tsl::htrie_map, where the key was our path and the value was the file type.
Now the algorithm works like this:
After optimizing the algorithm on a test case with 200+ excluded paths in the configuration file, the analyzer started parsing and analysing files several times faster. It was definitely a success. It remained the case for small — to build a beta, give it to the user and rejoice at our small victory.
But it was too early to celebrate the victory (close the ticket). The user again wrote about the same freezing.
Well, quick fixes didn't help, we needed to dig even deeper into this issue. We decided to ask the user to run our utility on strace and send us all the logs generated. If anyone does not know, the strace utility allows you to track all system calls of the program and much more. By the way, we use it as one of the options for running the analyzer on the project (the compilation tracing).
Here is the command that the user used to generate logs:
strace -y -v -s 4096 -ff -o strace-logs/log.txt -- pvs-studio-analyzer ....
They left the program running for about 20 minutes before terminating the process. Since during the freeze, the strace utility continued to write information to logs, the size of logs turned out to be impressive — 22795 files with a total weight of 278 GB (!) without compression.
First, we looked at the strace results. And immediately we saw a huge number of nanosleep calls. This meant that the child processes generated by the pvs-studio-analyzer utility, for some reason, were pending. We went through the logs from top to bottom and found the problem (the image is clickable):
When you click on the image, the gif will show that the file descriptor number gradually increases after opening the files. After this number approached the value of 1024, the EMFILE error was generated when an attempt was made to allocate a new descriptor, and then the analysis stopped. This behavior indicates a leak of file descriptors.
In Linux OS, when the file is opened, it is assigned a special number — a descriptor. The descriptor is then used to work with the file: read, write, view attributes, etc. The number of such descriptors is limited and is determined by the system settings.
By the way, it is very easy to reproduce the problem. To reproduce the problem, it is enough to write the following CMakeLists.txt:
cmake_minimum_required(VERSION 3.5)
project(many-files LANGUAGES C CXX)
set(SRC "")
foreach(i RANGE 10000)
set(file "${CMAKE_CURRENT_BINARY_DIR}/src-${i}.c")
file(TOUCH "${file}")
set(SRC "${SRC};${file}")
endforeach()
add_library(many-files STATIC
${SRC})
Next, we form the cache in the directory with CMakeLists.txt and run the pvs-studio-analyzer utility version 7.18 and earlier:
cmake -S . -B build -DCMAKE_EXPORT_COMPILE_COMMANDS=On
pvs-studio-analyzer analyze -f ./build/compile_commands.json -j -i -o pvs.log
Unfortunately, at the time of writing this article, the original logs have sunk into oblivion. So in the picture above is a log that we reproduced ourselves.
As previously mentioned, to work correctly, the incremental mode and file list checks require a dependency analysis of the files being compiled. For this we generate a special depend_info.json file which shows the dependencies of compiled files on header files.
Some source files can be compiled multiple times in different projects with different flags within the same "solution". When creating a preprocessed file with the ".PVS-Studio.i" postfix, we have to cut off the extension from the name of the source file. This is due to the fact that some preprocessors refuse to preprocess a file if the final one contains postfixes like ".cpp", ".cxx", etc. in the name.
This can lead to a collision if, for example, two files are preprocessed — "source.cpp " and "source.cxx". To eliminate the race condition, we lock the resulting path — the special ".pvslock" file is created and opened. If a collision occurs, the number 1, 2, 3, etc. is added to the name of the next preprocessed file.
After the first stage of parsing, we already have all the necessary preprocessed files. There is no point in running the preprocessor a second time, and this can save considerable time in subsequent analysis. So after the first step, we move the PreprocessedFile object, which contains all the required paths as well as lock objects, to the cache. At the second stage, we check the cache to see if it contains the preprocessed file we need.
The problem was that when moving the object at the first stage, we forgot to remove the lock. This caused the number of opened temporary ".pvslock" files to increase and the program would freeze if a certain number of file descriptors were exceeded.
The fix was simple enough — moving the object to the cache now removes the lock and the ".pvslock" file is closed and deleteted.
We fixed the resource handling in the program and the problem disappeared. We suspect that this error has not occurred to anyone before, as the Linux version of the analyzer is more commonly used on build servers in normal mode. Incremental analysis is often used in combination with IDEs, and at the moment we fully support only JetBrains CLion on Linux. It appears that, until then, there was no user with the need to analyze a project in incremental mode with a large number of files.
Giving the beta to the client for the third time, we finally solved the problem with the freeze of the analyzer.
Unfortunately, not all problems coming to support are easy to handle. Often the most trivial bugs lie deep inside and are difficult to debug.
We hope that our story was interesting for you. And if you have any problems with our product, do not hesitate to contact our awesome support. We promise we will help you.
0