What memory release strategy does the PVS-Studio C and C++ core use?
In various discussions, we have already commented on how the PVS-Studio C and C++ module works with memory. Now it's time to make a small article from this comment.
At the time of publication, the PVS-Studio analyzer contains three console modules that analyze the program code in the following languages:
- C++, as well as the C language and a number of dialects: C++/CLI, C++/CX;
We call these modules analyzer cores.
So, the core of the C# analyzer is written in C#. The core of the Java analyzer — in Java. In these languages, the garbage collector releases the memory, so no questions here. Of course, there are nuances with optimization. For example, in articles [1, 2, 3] my teammates described how they reduced the number of temporary objects created, configured the garbage collector, interned strings, etc. But now we're interested in the core of the C and C++ analyzer, written in C++.
General information about the core
To explain why we chose a particular strategy to work with memory, let's talk a little about the general principles of the analyzer's work. The project analysis is performed in small steps. This is important.
A new process is started for analyzing each translation unit (.c, .cpp files). This allows to parallelize the project analysis. The absence of parallelization means that we don't need to synchronize anything. This reduces the complexity of development.
But doesn't internal parallelization help quickly check files? Yes, but there's no sense in it. First, each separate file is quickly checked. Second, the file analysis time is reduced out of proportion to the number of threads created. This may be unexpected, so let me explain.
Before the file is analyzed, it is preprocessed. An external preprocessor (compiler) is used for that. We do not control the preprocessor time. Let's assume that the preprocessor runs for 3 seconds. The analysis is also performed in 3 seconds. Let's add another conditional second which is spent on collecting information about the file, starting processes, reading files and other non-parallelizable or poorly parallelizable operations. Total 7 seconds.
Imagine that internal parallelization is implemented, and the analysis is performed in 0.5 seconds instead of 3. Then the total time for checking one file is reduced from the conditional 7 seconds to 4.5. It's nice, but nothing has changed dramatically. When we analyze multiple files, such parallelization does not make sense — the file analysis will be parallelized, which is more efficient. If it's required to check one file, the analysis won't speed up significantly. However, for this slight acceleration, we'll have to pay the high price — write a complex mechanism for parallelizing algorithms and synchronizing when we access shared objects.
Note. How does PVS-Studio run intermodular analysis if each process works only with one compilation unit? The analysis is run in two steps. First, the analyzer collects the necessary information into a special file. Then the previously collected information is used to re-analyze files .
Memory release strategy
Parallelizing the analyzer at the file processing level has another important consequence, which relates to memory usage.
We don't release memory in the PVS-Studio C and C++ core until the analysis is complete. This was a conscious decision.
Our unicorn always eats memory :)
Okay-okay, it's not entirely true. Objects with automatic storage duration are deleted in a natural way. The memory on the heap that these objects allocated for their needs is also released.
There are many other objects with a short lifetime. Classic smart pointers are used to delete them in time.
However, there are three types of data that are only created, but not destroyed until the analysis is complete:
- Abstract syntax tree;
- Various data collected during tree traversal;
- "Virtual values" used for data flow analysis and symbolic execution .
Until the end of the analysis, we don't know for sure which of the data may be required for diagnostics. Therefore, until the last diagnostic is performed on the last tree node, all the data continues to be stored.
Before the end of the analysis, it no longer makes sense to individually destroy each of the created tree nodes, as well as information about what functions can return, and so on. Technically, we can go through all the saved pointers and delete them with delete. Still, it makes no sense, and it will only slow down the analysis. The operating system will still release all the memory used by the process, and it'll do almost instantly.
Practically, it's safe if we don't delete the objects. All these "forgotten" objects don't contain any finalizers. Their destructors don't output messages, don't write logs, don't delete files, etc. These are very simple classes that contain only numbers, strings, and pointers/references to other similar objects.
So, since each process works only with one compilation unit, we can no longer care about whether the processes need data or not. It's easier to keep everything until the end. This increases memory consumption, but for modern computer technology these amounts are not critical. But it simplifies development a little and reduces the execution time. According to our approximate measurements, if we release the memory ourselves at the end, the performance will slow down by about 5%.
Handling internal errors
What if the memory runs out? Since every file is processed separately, one process fail doesn't affect the whole analysis.
Of course, the failure may happen for many reasons. For example, the analyzed file may contain uncompiled code or garbage. Then one of the processes may start consuming a lot of memory or work unacceptably long (V006). If this happens, the process will be terminated, and the project analysis will continue.
The process doesn't contain any special information that cannot be lost. Yes, it's bad that the analyzer won't issue some warnings, but nothing's critical here.
So, what happens if the analyzer runs out of memory, and the next new operator call throws the std::bad_alloc exception? The exception will be caught at the top level, and the core shuts down after issuing the corresponding warning.
This approach to handling internal errors may seem harsh. But in real life these failures rarely occur. It's better to stop than to try processing situation when everything goes wrong. Failures usually happen when the analyzer meets something unusual. To stop at such input data is a quite rational option.
Of course, it's hard to explain this without any examples. So let me show you a humorous talk by my teammate. It describes a couple of cases when memory consumption was followed by stopping processes by timeout.
These cases include string literals of 26 megabytes and a function with a length of more than 800 KLOC.
Yuri Minaev. CoreHard 2019. Don't take on C++ programmers support.