Webinar: C++ semantics - 06.11
We'd like to present the series of articles dealing with the recommendations on writing code of high quality using the examples of errors found in the Chromium project. This is the 6th part, which focuses on the malloc function. Or rather, why you should always check the pointer returned by this function. Most likely, you don't have a clue what's the catch with malloc, so we recommend looking through this article.
Note. In the article under the function malloc it will often be implied that the question is not only about this function, but also of calloc, realloc, _aligned_malloc, _recalloc, strdup, and so on. I don't want to clutter up the text of the article, constantly repeating the names of all these functions. What they have in common is that they might return a null pointer.
This article is a bit outdated, although the topic covered there is very important. So, we invite you to read its updated version: "Four reasons to check what the malloc function returned".
If the malloc function is unable to allocate the memory buffer, it returns NULL. Any normal program should check the pointers which the malloc function returns and properly handle the situation when the memory allocation failed.
Unfortunately, many programmers are careless about checking of pointers, and sometimes they deliberately do not check whether to memory was allocated or not. Their idea is following:
'If the malloc function failed to allocate memory, it is unlikely that my program will continue to function properly. Most likely, memory will not be enough for other operations, so I cannot bother about the memory allocation errors. The first addressing to the memory by a null pointer will lead to Structured Exception generation in Windows, or the process will receive a signal SIGSEGV, when it comes to Unix-like systems. As a result, the program will crash which is fine by me. No memory, no suffering. Alternatively, I can catch a structured exception/signal and handle the dereferencing of a null pointer more centralized. This is more convenient than to write thousands of checks'.
I'm not making this up, I have talked with people who consider this approach to be appropriate and consciously never check the result that the malloc function returns.
By the way, there is another excuse for developers, why they don't check what the malloc function returned. malloc function only reserves memory, but does not guarantee that there will be enough of physical memory, when we begin to use the allocated memory buffer. Therefore, if there are still no guarantees, it is not necessary to perform a check. For example, Carsten Haitzler, one of the developers of EFL Core libraries, explained to the fact that I counted more than 500 fragments in the library code, where there are no checks in the following way. Here is his comment to the article:
OK so this is a general acceptance that at least on Linux, which was always our primary focus, and for a long time was our only target, returns from malloc/calloc/realloc can't be trusted especially for small amounts. Linux overcommits memory by default. That means you get new memory but the kernel has not actually assigned real physical memory pages to it yet. Only virtual space. Not until you touch it. If the kernel cannot service this request your program crashes anyway trying to access memory in what looks like a valid pointer. So all in all the value of checking returns of allocs that are small at least on Linux is low. Sometimes we do it... sometimes not. But the returns cannot be trusted in general UNLESS its for very large amounts of memory and your alloc is never going to be serviced - e.g. your alloc cannot fit in virtual address space at all (happens sometimes on 32bit). Yes overcommit can be tuned but it comes at a cost that most people never want to pay or no one even knows they can tune. Secondly, fi an alloc fails for a small chunk of memory - e.g. a linked list node... realistically if NULL is returned... crashing is about as good as anything you can do. Your memory is so low that you can crash, call abort() like glib does with g_malloc because if you can't allocate 20-40 bytes ... your system is going to fall over anyway as you have no working memory left anyway. I'm not talking about tiny embedded systems here, but large machines with virtual memory and a few megabytes of memory etc. which has been our target. I can see why PVS-Studio doesn't like this. Strictly it is actually correct, but in reality code spent on handling this stuff is kind of a waste of code given the reality of the situation. I'll get more into that later.
The given reasoning of developers is wrong and I'll explain why in details below. But first we must answer the question: "What does this have to do with Chromium?".
Chromium is related to the matter as in the used libraries there are at least 70 errors owing to the absence of the check after functions call such as malloc, calloc, realloc. Yes, in Chromium itself these functions are not used almost anywhere. In Chromium only containers or operator new are applied. However, once there are errors in the libraries, then, we can say that they are in Chromium. Of course, some parts of the libraries may not be used when running Chromium, but it is difficult and unnecessary to define it. It is necessary to correct all errors anyway.
I will not cite in an article a lot of code fragments with errors, as they are similar. I'll give just one error, detected in the Yasm library as an example:
static SubStr *
SubStr_new_u(unsigned char *s, unsigned int l)
{
SubStr *r = malloc(sizeof(SubStr));
r->str = (char*)s;
r->len = l;
return r;
}
PVS-Studio warning: V522 CWE-690 There might be dereferencing of a potential null pointer 'r'. Check lines: 52, 51. substr.h 52
There is no protection from the null pointer in code. I have collected other similar errors from Chromium and used libraries in a file and uploaded them here: chromium_malloc.txt. There are 72 errors mentioned in the file, but in fact there may be more. As I wrote in the introductory article, I have been looking through the report only superficially.
According to the Common Weakness Enumeration, PVS-Studio classifies the errors found as:
As you can see, even in such a high-quality project as Chromium, you can notice many defects associated with the absence of checks. Now I go to the most interesting part and I'll tell why the checks are needed.
There are 4 reasons at once, each of them is enough to prove that it is so necessary to write a check after you call the malloc function. If someone from your team doesn't write the checks, make him read this article.
Before I start, a small theoretical reference, why structural exceptions or signals happen if a dereferencing of a null pointer occurs. It will be important for further storytelling.
At the beginning of the address space, one or more pages of memory are protected by the operating system. This allows to identify errors of addressing to memory by a null pointer, or the pointer of value close to 0.
In various operating systems, different amounts of memory are reserved for these purposes. Besides, in some operating systems this value is configurable. Therefore, it makes no sense to call a specific number of bytes of memory reserved. But in order to direct the reader, I would say that in Linux systems the standard value is 64Kb.
It is important that, adding any sufficiently large number to a null pointer, you can "strike out" the control memory pages and accidentally get into any unprotected page records. Thus, it is possible to corrupt somewhere some information, but the operating system will not notice and generate any signal/exception.
Make your coffee, let's get started!
In terms of C and C++ languages, null pointer dereferencing causes undefined behavior. Undefined behavior can be anything. Don't assume that you know how the program will behave if nullptr dereferencing occurs. Modern compilers are involved in serious optimizations, which result in situation when sometimes it is impossible to predict how a code error will reveal itself.
Undefined behavior of the program is very nasty. You mustn't let it be it in your code.
Don't think that you will be able to cope with a null pointer dereference, using structured exception handlers (SEH in Windows) or signals (in UNIX-like systems). If null pointer dereference took place, the program work has already broken and anything can happen. Let's look at an abstract example, why we can't rely on SEH-handlers, and etc.
size_t *ptr = (size_t *)malloc(sizeof(size_t) * N * 2);
for (size_t i = 0; i != N; ++i)
{
ptr[i] = i;
ptr[N * 2 - i - 1] = i;
}
This code fills an array from the edges to the center. The values of the elements increase towards the center. This a 1 minute example, so don't try to guess why anyone needs such an array. I also don't know. It was important for me that a record in the adjacent lines takes place in the beginning of the array and somewhere in its end. It is sometimes needed in practical tasks and we will consider the actual code when we get to the 4-th reason.
Let's look closely at these two lines:
ptr[i] = i;
ptr[N * 2 - i - 1] = i;
From a programmer's perspective, at the beginning of the loop a recording will occur in the element ptr[0], and a structured exception/signal will appear. It will be handled, and everything will be fine.
However, in order to optimize the compiler can interchange the assignments. It has all the rights to do that. According to the compiler, if the pointer is dereferenced, it cannot be equal to nullptr. If the pointer is null, then it is undefined behavior, and the compiler is not required to think about the consequences of optimization.
So, the compiler might decide that in order to optimize it is more profitable to perform assignments as follows:
ptr[N * 2 - i - 1] = i;
ptr[i] = i;
As a result, at the beginning, a recording will occur by the address ((size_t *)nullptr)[N * 2 - 0 - 1]. If the value N is great enough, the security page at the beginning of the memory will be "jumped over" and the value of the i variable can be written in any cell that is available for recording. In overall, some data will be corrupted.
And only after that assignment by the address ((size_t *)nullptr)[0] will be performed. Operating system will notice a try to write in the controlled area and will generate a signal/exception.
The program can handle this structured exception/signal. But it is already too late. Somewhere in memory, there is corrupted data. In addition, it is not clear what data is corrupted and what consequences it may have!
Is the compiler to blame, that it rearranged the assignment operations? No. The programmer let the dereferencing of a null pointer happen and thereby led the program in the undefined behavior state. In this particular case, the undefined behavior of a program will be that somewhere in the memory data is corrupted.
Conclusion
Based on the axiom, any null pointer dereference is undefined behavior of a program. There is no such thing as a "harmless" undefined behavior. Any undefined behavior is unacceptable.
Do not allow dereferencing of pointers, which the malloc function and its analogs returned, without their prior check. Do not rely on any other ways to catch the dereferencing of a null pointer. You should only use the good old if operator.
The thing, which is perceived as not an error by one programmers, is a vulnerability for others. This is the exact situation which happens in case of null pointer dereference.
For someone it is normal if a program crashes because of null pointer dereference or if an error is handled by the common way using the catch of a signal/structured exception.
Others believe that dereferencing of a null pointer cause denial-of-service and represents a vulnerability. Instead of nominal handling the memory lack, a program, or one of the program threads completes its work. This can lead to loss of data, data integrity, and so on. In other words, the CAD system will simply close, if it is not be able to allocate memory for any complex operation without offering the user to even save the results of its work.
I wouldn't like to be unfounded, so here are the proofs. There is such a program as Ytnef made for decoding the TNEF threads, for example, created in Outlook. So, the application developers consider the absence of a check after calling calloc as a vulnerability CVE-2017-6298.
All the fixed fragments which could contain null pointer dereference were approximately the same:
vl->data = calloc(vl->size, sizeof(WORD));
temp_word = SwapWord((BYTE*)d, sizeof(WORD));
memcpy(vl->data, &temp_word, vl->size);
Conclusions
If you are developing not a very significant application for which a crash during its work is not a trouble, then yes, it is not needed to write checks.
However, if you are developing a library, the lack of checks is unacceptable! Not only lazy programmers, who write irresponsible applications, like a Tetris game, can use your library. We must take care both about normal programmers, and normal programs.
Therefore, I ideologically disagree with, for example, Carsten Haitzler, that in the library of EFL Core there are no checks (see article). This doesn't let programmers build reliable applications based on such libraries.
In general, if you are creating a library, please note that in some applications, dereferencing of a null pointer is a vulnerability. You must handle memory allocation errors and nominally return the information about the failure.
Those who feel lazy to write checks, think that dereferencing affects exactly null pointers. Yes, it often happens this way. But can a developer vouch for the entire code application? I'm sure, no.
I'm going to show what I mean with practical examples. Let's take, for example, code from the library LLVM-subzero, which is used in Chromium. Honestly, I get lost guessing, what is the relationship between the Chromium project and LLVM, but it is.
void StringMapImpl::init(unsigned InitSize) {
assert((InitSize & (InitSize-1)) == 0 &&
"Init Size must be a power of 2 or zero!");
NumBuckets = InitSize ? InitSize : 16;
NumItems = 0;
NumTombstones = 0;
TheTable = (StringMapEntryBase **)
calloc(NumBuckets+1,
sizeof(StringMapEntryBase **) +
sizeof(unsigned));
// Allocate one extra bucket, set it to look filled
// so the iterators stop at end.
TheTable[NumBuckets] = (StringMapEntryBase*)2;
}
PVS-Studio warning: V522 CWE-690 There might be dereferencing of a potential null pointer 'TheTable'. Check lines: 65, 59. stringmap.cpp 65
Right after allocation of memory buffer, a record happens in the cell TheTable[NumBuckets]. If the value of the variable NumBuckets is great enough, we'll taint some data with unpredictable consequences. After such damage, it generally makes no sense to speculate how the program will run. There may be the most unexpected consequences.
I can see similar dangerous assignments in two more places:
Therefore, this is not a unique case, but quite a typical situation when data is written not exactly by a null pointer, and by a random shift.
I will continue the correspondence discussion with Carsten Haitzler. He argues that they understand what they are doing when they don't check the result of the malloc function call. No, they don't. Let's take a look, for example, at the code fragment from the EFL library:
static void
st_collections_group_parts_part_description_filter_data(void)
{
....
filter->data_count++;
array = realloc(filter->data,
sizeof(Edje_Part_Description_Spec_Filter_Data) *
filter->data_count);
array[filter->data_count - 1].name = name;
array[filter->data_count - 1].value = value;
filter->data = array;
}
PVS-Studio warning: V522 There might be dereferencing of a potential null pointer 'array'. edje_cc_handlers.c 14249
Note. I use the old source files of EFL Core Libraries that I have left over from writing articles about the library. Therefore, the code or line numbers may no longer match to what there is now. However, it is not that essential for telling my story.
Here we have a typical situation: there is not enough space for data storing in a buffer, it should be increased. To increase the size of the buffer the realloc function is used, which may return NULL.
If this happens, a structured exception/signal will not necessarily occur due to null pointer dereferencing. Let's take a look at these lines:
array[filter->data_count - 1].name = name;
array[filter->data_count - 1].value = value;
If the value of the variable filter->data_count is large enough, then the values will be written to a strange address.
In memory some data will be corrupted, but the program will run anyway. The effects are unpredictable again and there will be no good for sure.
I wasn't carefully studying the old report on EFL Core Libraries, but this is definitely not the only error. I noticed at least two similar places where after realloc data is recorded to an index.
Conclusion
I'm asking the question once again: "Where's the guarantee that dereferencing of exactly a null pointer will occur?". No such guarantees. It is impossible, developing or modifying code, to remember about a nuance considered lately. You can easily spoil something in memory, in doing so the program continues executing as nothing happened.
The only way to write reliable and correct code is to always check the result that is returned by the malloc function. Check and live a peaceful life.
There will be someone who will say something like this:
I understand perfectly about realloc and everything else that is written in the article. But I am a professional and, allocating memory, I just fill it with zeros using memset at once. Where it is necessary I use checks. But I'm not going to write the extra checks after each malloc.
Generally, to fill the memory immediately after buffer allocation is quite a weird idea. It is strange because there is a function calloc. However, people act like this very often. You don't need to look very far in order to get examples, here is the code from the WebRTC library, used in Chromium:
int Resampler::Reset(int inFreq, int outFreq, size_t num_channels) {
....
state1_ = malloc(8 * sizeof(int32_t));
memset(state1_, 0, 8 * sizeof(int32_t));
....
}
Memory is allocated, then the buffer is filled with zeros. It's a very common practice, although, in fact, two lines can be reduced to one using calloc. But it doesn't matter.
The main thing is that even such code is not secure! memset function is not obliged to begin to fill the memory from the beginning and thereby to cause a null pointer dereferencing.
memset function has the right to begin to fill the buffer from the end. And if a large buffer was allocated, some useful data could be cleared. Yes, filling the memory, the memset function will eventually reach the page, protected from recording, and the operating system will generate a structural exception/signal. But there is no point to handle them anyway. By that moment, a large fragment of memory will be corrupted and the following work of the program will be unpredictable.
The reader might argue that all this is purely theoretical. Yes, the memset function could theoretically fill the buffer starting from the end of the buffer, but in practice, nobody will realize this function this way.
I would agree that this implementation of memset is really exotic, and I even asked a question on Stack Overflow about this topic. This is the reply:
The Linux kernel's memset for the SuperH architecture has this property: link.
Unfortunately, this is the code on the unfamiliar type of assembler, so I'm not going to speculate about it. But still there is such an interesting implementation in the C programming language. I will cite the beginning of the function:
void *memset(void *dest, int c, size_t n)
{
unsigned char *s = dest;
size_t k;
if (!n) return dest;
s[0] = c;
s[n-1] = c;
....
}
Pay attention to these lines:
s[0] = c;
s[n-1] = c;
Here we come to the reason N1 "Dereferencing of a null pointer is undefined behavior". There is no guarantee that the compiler will not interchange the assignments. If your compiler does it, and the argument n is of great value, in the beginning a byte of memory will be corrupted. Null pointer dereference will occur only after that.
Not convincing again? Well, how's this implementation for you:
void *memset(void *dest, int c, size_t n)
{
size_t k;
if (!n) return dest;
s[0] = s[n-1] = c;
if (n <= 2) return dest;
....
}
Conclusion
You cannot even trust the memset function. Yes, this may be an artificial and far-fetched problem. I just wanted to show how many nuances appear, if one doesn't check the value of the pointer. It is simply impossible to take in account all of this. Therefore, don't try show off, and you should carefully check every pointer that is returned by the malloc function and similar ones. That's the point when you will become a professional.
Always check the pointer that is returned by the malloc function or a similar one at once.
As you can see, the PVS-Studio analyzer is right, warning that there is no check of the pointer after a malloc call. It's impossible to write reliable code without making checks. This is especially important and relevant to library developers.
I hope now you have a new look at the malloc function, check pointers and warnings of the PVS-Studio code analyzer. Don't forget to show this article to your colleagues and start using PVS-Studio. I wish you less bugs!
0