We'd like to present the series of articles dealing with the recommendations on writing code of high quality using the examples of errors found in the Chromium project. This is the third part, which will be focused on memory leaks.
I think the code of Chromium project and the libraries used in it is of very high quality. Sure, in the introductory article, I wrote about 250 errors, but actually, this is a very small number. In view of probability laws, no doubts that in a large project many errors will be found.
However, if we talk about memory leaks, there are not few of them. I suppose that what lets the Chromium developers down is that they give preference to dynamic code analyzers. Of course, these tools have a number of benefits. For instance, they don't issue false positives, once a dynamic analyzer has detected an error, we know for sure that an errors is really presented.
On the other hand, dynamic analysis also has weaknesses. If a code is not executed, an error will not be detected. But any developer realizes that it is very difficult to cover 100% of code with tests, or rather, this is impossible in practice. As a result, the number of errors remains in code and they are waiting for a favorable set of circumstances to reveal themselves.
Here static code analysis might come to aid. Yes, this is a hint for Google developers, that we will be glad, if they become our clients. Moreover, we are ready to get the additional work done on PVS-Studio adaptation and configuring for the specifications of the Chromium project. Our team is also ready to take on the correction of errors found. We already had similar experience (example).
But let's get back to memory leaks. As you will see, they hide in code that is rarely controlled. Basically, these are different error handlers. Static analyzers, unlike dynamic, are not always able to monitor the "future of a pointer" on the allocated memory and don't detect a lot of memory leaks. On the other hand, static analyzers check all code, regardless of the likelihood of its execution and notice errors. Thus, dynamic and static analyzers are complementary.
Let's see what I've noticed during studying the report issued by PVS-Studio. As I wrote in the introductory article, I skimmed through the report quite fluently, so there may be other, unnoticed errors. I would also like to note that memory leaks are extremely unpleasant for such a project as Chromium, so it will be exciting to talk about them. Such errors can be classified as CWE-401.
Let's look at the error in the Chromium code. First I'll show you the BnNew helper function, which allocates and returns a nullified memory buffer:
uint32_t* BnNew() {
uint32_t* result = new uint32_t[kBigIntSize];
memset(result, 0, kBigIntSize * sizeof(uint32_t));
return result;
}
Now let's look at the code that can lead to a memory leak:
std::string AndroidRSAPublicKey(crypto::RSAPrivateKey* key) {
....
uint32_t* n = BnNew();
....
RSAPublicKey pkey;
pkey.len = kRSANumWords;
pkey.exponent = 65537; // Fixed public exponent
pkey.n0inv = 0 - ModInverse(n0, 0x100000000LL);
if (pkey.n0inv == 0)
return kDummyRSAPublicKey;
....
}
If the condition (pkey.n0inv == 0) is executed, then the function exit occurs without freeing the buffer, a pointer at which is stored in the n variable.
The analyzer points at this defect by issuing the warning: V773 CWE-401 The function was exited without releasing the 'n' pointer. A memory leak is possible. android_rsa.cc 248
By the way, at this point, memory leaks related to Chromium itself, ended. Anyway, many of them are presented in the used libraries. Users don't care whether there will be memory leaks in Chromium libraries or Chromium itself. That's why the errors in libraries are no less important.
The following bugs relate to the WebKit engine. We'll begin again with the helper function:
static CSSValueList* CreateSpaceSeparated() {
return new CSSValueList(kSpaceSeparator);
}
Now here is the code containing the error:
const CSSValue* CSSTransformValue::ToCSSValue(....) const {
CSSValueList* transform_css_value =
CSSValueList::CreateSpaceSeparated();
for (size_t i = 0; i < transform_components_.size(); i++) {
const CSSValue* component =
transform_components_[i]->ToCSSValue(secure_context_mode);
if (!component)
return nullptr; // <=
transform_css_value->Append(*component);
}
return transform_css_value;
}
If a pointer component turns out to be null, the function will finish its work, and in doing so, a memory leak will occur.
PVS-Studio analyzer issues a warning: V773 CWE-401 The function was exited without releasing the 'transform_css_value' pointer. A memory leak is possible. csstransformvalue.cpp 73
Let's see another error related to WebKit.
Request* Request::CreateRequestWithRequestOrString(....)
{
....
BodyStreamBuffer* temporary_body = ....;
....
temporary_body =
new BodyStreamBuffer(script_state, std::move(init.GetBody()));
....
if (exception_state.HadException())
return nullptr;
....
}
If the function HadException() returns true, then the function will prematurely exit. Whereas no one will call the delete operator for a pointer, stored in the variable temporary_body.
PVS-Studio warning: V773 CWE-401 The function was exited without releasing the 'temporary_body' pointer. A memory leak is possible. request.cpp 381
Other errors that I noticed in WebKit, are no different from those described, so I don't see any reason to consider them in the article and I shall confine myself to list the analyzer warnings:
A lot? Yeap, a lot. I had enough energy only for the warnings that I highlighted. I quickly got bored and just skimmed through the warnings. Most likely, with closer analysis of errors there would be much more errors found in WebKit.
What does it mean? This means that the WebKit project has problems with memory leaks, so accept my "congratulations".
Now let's move on to the ICU project and consider a bug found in it.
UVector*
RuleBasedTimeZone::copyRules(UVector* source) {
if (source == NULL) {
return NULL;
}
UErrorCode ec = U_ZERO_ERROR;
int32_t size = source->size();
UVector *rules = new UVector(size, ec);
if (U_FAILURE(ec)) {
return NULL;
}
....
}
If an error of the UVector type occurs when object initializing, this will have the impact on the status, which is placed in the ec variable. For example, the constructor will return a status U_MEMORY_ALLOCATION_ERROR if it cannot allocate the memory buffer to store the desired number of elements. However, regardless whether it is possible to allocate memory for storing elements or not, an object of the UVector type will be created itself and a pointer to that object will be placed in the rules variable.
If the constructor returns a status U_MEMORY_ALLOCATION_ERROR, then there will be an exit from the function. The object of the UVector type will not be removed and a memory leak will occur.
PVS-Studio warning: V773 CWE-401 The function was exited without releasing the 'rules' pointer. A memory leak is possible. rbtz.cpp 668
Other errors from the ICU library will be also listed:
What else did I notice?
Libwebm Library:
SwiftShader library:
Probably, these are not all errors, but they are enough for me to demonstrate the abilities of PVS-Studio and write this article.
What unites all of the above cases? The errors became possible due to manual memory management!
Friends, we are already using C++17. Stop calling the new operator, placing the result in the ordinary pointer, and then forgetting to free it! So embarrassing!
No more ordinary pointers and subsequent manual management of allocated resources! Let's always use smart pointers.
The modern C++ standard offers such smart pointers as unique_ptr, shared_ptr and weak_ptr. In most cases, just unique_ptr will be enough.
For example, let's return to this incorrect code:
const CSSValue* CSSTransformValue::ToCSSValue(....) const {
CSSValueList* transform_css_value =
CSSValueList::CreateSpaceSeparated();
for (size_t i = 0; i < transform_components_.size(); i++) {
const CSSValue* component =
transform_components_[i]->ToCSSValue(secure_context_mode);
if (!component)
return nullptr;
transform_css_value->Append(*component);
}
return transform_css_value;
}
Let's rewrite it using unique_ptr. To do this, first, we need to change the type of the pointer. Secondly, in the end we need to call the release function to return the pointer to a controlled object and do not control it any more.
Correct code:
const CSSValue* CSSTransformValue::ToCSSValue(....) const {
unique_ptr<CSSValueList> transform_css_value(
CSSValueList::CreateSpaceSeparated());
for (size_t i = 0; i < transform_components_.size(); i++) {
const CSSValue* component =
transform_components_[i]->ToCSSValue(secure_context_mode);
if (!component)
return nullptr;
transform_css_value->Append(*component);
}
return transform_css_value.release();
}
In this article, I'm not going to teach how to use smart pointers. This topic is widely discussed in articles and book sections. I just wanted to show that the code has not become more difficult and cumbersome due to changes. But now it will be much harder to make a mistake.
Don't think that you can handle the new/delete or malloc/free and not slip up. Chromium developers make such mistakes. Other developers do. You are doing and will do such errors. There is no need to indulge in illusions that your team is oh-so-special :). I'd like to take this opportunity to ask managers to read now this info.
Use smart pointers.
According to my own observations, programmers sometimes incorrectly use the function realloc. Here is a classic pattern of errors, associated with using this function:
p = realloc(p, n);
if (!p)
return ERROR;
Let's pay attention to the following function property: If there is not enough memory, the old memory block is not freed and null pointer is returned.
As NULL will be written in the p variable, which stored a pointer to a buffer, then you lose the opportunity to release this buffer. A memory leak occurs.
The right thing to do is to rewrite the code as follows:
void *old_p = p;
p = realloc(p, n);
if (!p)
{
free(old_p);
return ERROR;
}
It has not been without such errors in libraries used in the Chromium project. For example, let's consider the following code fragment in the FLAC codec.
FLAC__bool FLAC__format_entropy_codi.....ce_contents_ensure_size(
FLAC__EntropyCodingMethod_PartitionedRiceContents *object,
unsigned max_partition_order)
{
....
if(object->capacity_by_order < max_partition_order) {
if(0 == (object->parameters =
realloc(object->parameters, ....)))
return false;
if(0 == (object->raw_bits = realloc(object->raw_bits, ....)))
return false;
....
}
The function increases the size of two buffers:
If a memory allocation error occurs, the function prematurely finishes and returns the false value. With this, the previous value of the pointer is lost, and a memory leak will occur.
Here PVS-Studio analyzer reports two relevant warnings:
Similar shortcomings in the WebRTC project:
Fortunately, there are few errors of this type in Chromium. At least, a lot less than I usually come across in other projects.
It is not always possible to give up using the realloc function, as it allows writing efficient code when you need to frequently change the size of a buffer.
So we will not get ahead of ourselves to recommend avoiding it completely. Sometimes it would be unreasonable. I just ask to be careful with this function and not to forget the error pattern I've described above.
However, very often in C++ it is quite possible to do without this function and use containers like std::vector or std::string. The effectiveness of the containers has significantly increased in recent years. For example, I was pleasantly surprised when I saw that in the core of PVS-Studio there is no more difference in performance between a self-made string class and std::string. Nevertheless, many years ago a self-made string class gave about 10% of productivity gains to the analyzer. There is no such effect any more, so it became possible to remove your own class. Now the class std::string is not the same as it was 10 years ago. Efficiency has improved significantly thanks to modern compilers and optimization abilities and such language innovations as, for example, a move constructor.
Anyway, no rush to roll up your sleeves and manage memory manually by using the functions malloc, realloc, free. Almost certainly, std::vector will prove to be no less effective for your needs. In addition, it's much simpler to use std::vector. It will become more difficult to make an error. It makes sense to return to low-level functions only when the profiler shows that it is really one of the bottlenecks in the program work.
Thank you for your attention. I invite you all to download and try PVS-Studio.