>
>
>
Four reasons to check what the malloc f…

Andrey Karpov
Articles: 674

Four reasons to check what the malloc function returned

Some developers may be dismissive of checks: they deliberately do not check whether the malloc function allocated memory or not. Their reasoning is simple — they think that there will be enough memory. And if there is not enough memory to complete operations, let the program crash. Seems like a bad approach, doesn't it? For a variety of reasons.

A few years ago, I already published a similar article titled "Why it is important to check what the malloc function returned". The article you are reading now is its updated version. Firstly, I have some new ideas to share with you. Secondly, the previous article was a part of a series dedicated to the Chromium project that we checked — it contains details that distract from the main topic.

Note. In the article, under the malloc function will be implied that the question is not only about this particular function, but also of calloc, realloc, _aligned_malloc, _recalloc, strdup, and so on. I don't want to clutter up the article with all these function names. What all these functions have in common is that they can return a null pointer.

malloc

If the malloc function is unable to allocate the memory buffer, it returns NULL. Any normal program should check the pointers returned by the malloc function and appropriately handle the situation when memory could not be allocated.

Unfortunately, many programmers neglect to check pointers, and sometimes they deliberately do not check whether the memory was allocated or not. Their reasoning is following:

If the malloc function failed to allocate memory, it is unlikely that a program will continue to function properly. Most likely, memory will not be enough for other operations, so why bother with the memory allocation errors at all. The first addressing to the memory by a null pointer leads to Structured Exception generation in Windows. When it comes to Unix-like systems, the process receives the SIGSEGV [RU] signal. As a result, the program crashes, which is quite acceptable. No memory, no suffering. Alternatively, you can catch a structured exception/signal and handle the dereferencing of a null pointer in a more centralized way. This is more convenient than to write thousands of checks.

I'm not making this up. I've talked with people who consider this approach appropriate and consciously never check the result that the malloc function returns.

By the way, there is another excuse for developers, why they don't do checks. The malloc function only reserves memory but does not guarantee that there will be enough of physical memory, when we begin to use the allocated memory buffer. Therefore, if there are still no guarantees, why to perform a check? For example, Carsten Haitzler, one of the developers of EFL Core libraries, explained why I counted more than 500 fragments with no checks in the library code. Here is his comment to the article:

OK so this is a general acceptance that at least on Linux which was always our primary focus and for a long time was our only target, returns from malloc/calloc/realloc can't be trusted especially for small amounts. Linux overcommits memory by default. That means you get new memory but the kernel has not actually assigned real physical memory pages to it yet. Only virtual space. Not until you touch it. If the kernel cannot service this request your program crashes anyway trying to access memory in what looks like a valid pointer. So all in all the value of checking returns of allocs that are small at least on Linux is low. Sometimes we do it... sometimes not. But the returns cannot be trusted in general UNLESS its for very large amounts of memory and your alloc is never going to be serviced - e.g. your alloc cannot fit in virtual address space at all (happens sometimes on 32bit). Yes overcommit can be tuned but it comes at a cost that most people never want to pay or no one even knows they can tune. Secondly, if an alloc fails for a small chunk of memory - e.g. a linked list node... realistically if NULL is returned... crashing is about as good as anything you can do. Your memory is so low that you can crash, call abort() like glib does with g_malloc because if you can't allocate 20-40 bytes ... your system is going to fall over anyway as you have no working memory left anyway. I'm not talking about tiny embedded systems here, but large machines with virtual memory and a few megabytes of memory etc. which has been our target. I can see why PVS-Studio doesn't like this. Strictly it is actually correct, but in reality code spent on handling this stuff is kind of a waste of code given the reality of the situation. I'll get more into that later.

The given reasoning of developers is incorrect. Below, I will explain in detail why.

You need to perform checks

There are four reasons at once, each of which is sufficient to prove to write a check after you call the malloc function. If someone from your team doesn't write checks, make him read this article.

Before I start, here's a small theoretical reference why structural exceptions or signals happen if a dereferencing of a null pointer occurs. It is important for further storytelling.

At the beginning of the address space, one or more pages of memory are protected by the operating system. This allows to detect errors of addressing to memory by a null pointer, or to the pointer of value close to 0.

In various operating systems, different amounts of memory are reserved for these purposes. Besides, in some operating systems this value is configurable. Therefore, it makes no sense to call a specific number of bytes of memory reserved. Let me remind you that in Linux systems the standard value is 64Kb.

It is important that, when adding any sufficiently large number to a null pointer, you can "strike out" the control memory pages and accidentally get into pages that are not protected from writing. Thus, one may corrupt some data. The operating system won't notice that and won't generate any signal/exception.

Note. If we talk about embedded systems, there may not be any memory protection from writing by the null address. Some systems have low memory, and all the memory stores data. However, the systems with a small amount of RAM, most likely, won't have dynamic memory management and, accordingly, the malloc function.

Make your coffee, let's get started!

Null pointer dereference is undefined behavior

In terms of C and C++ languages, null pointer dereferencing causes undefined behavior. When undefined behavior is invoked, anything can happen. Don't assume that you know how the program will behave if nullptr dereference occurs. Modern compilers make use of serious optimizations. As a result, it is sometimes impossible to predict how a particular code error will manifest itself.

Undefined behavior of the program is very nasty. You should avoid undefined behavior in your code.

Don't think that you will be able to cope with a null pointer dereference, using structured exception handlers (SEH in Windows) or signals (in UNIX-like systems). If null pointer dereference took place, the program work has already broken and anything can happen. Let's look at an abstract example, why we can't rely on SEH-handlers, etc.

size_t *ptr = (size_t *)malloc(sizeof(size_t) * N * 2);
for (size_t i = 0; i != N; ++i)
{
  ptr[i] = i;
  ptr[N * 2 - i - 1] = i;
}

This code fills an array from the edges to the center. The element values increase towards the center. I came up with this example in 1 minute, so don't guess why anyone would need such an array. I don't even know myself. It was important for me that a record in the adjacent lines takes place in the beginning of the array and somewhere at its end. Sometimes you need something like this in practical tasks, and we'll look at the actual code when we get to the 4-th reason.

Let's take a closer look at these two lines again:

ptr[i] = i;
ptr[N * 2 - i - 1] = i;

From a programmer's perspective, at the beginning of the loop, a record occurs in the ptr[0] element — a structured exception/signal will appear. It will be handled, and everything will be fine.

However, the compiler may swap the assignments for some optimization purposes. It has all the rights to do that. According to the compiler, if the pointer is dereferenced, it cannot be equal to nullptr. If the pointer is null, it is undefined behavior, and the compiler is not required to think about the consequences of optimization.

So, the compiler might decide that for optimization purposes it is more profitable to perform assignments as follows:

ptr[N * 2 - i - 1] = i;
ptr[i] = i;

As a result, at the beginning, a recording will occur by the ((size_t *)nullptr)[N * 2 - 0 - 1] address. If the value N is great enough, the protected page at the beginning of the memory will be "jumped over" and the value of the i variable can be written in any cell that is available for writing. In overall, some data will be corrupted.

And only after that the assignment at the ((size_t *)nullptr)[0] address will be performed. The operating system will notice an attempt to write to the area it controls and will generate a signal/exception.

The program can handle this structural exception/signal. But it's already too late. Somewhere in memory, there is corrupted data. In addition, it is not clear what data is corrupted and what consequences it may have!

Is the compiler to blame for swapping assignment operations? No. The programmer let the dereferencing of a null pointer happen and thereby led the program in the undefined behavior state. In this particular case, the undefined behavior of a program will be that data is corrupted somewhere in the memory.

Conclusion

Adhere to the axiom: any null pointer dereference is undefined behavior of a program. There is no such thing as a "harmless" undefined behavior. Any undefined behavior is unacceptable.

Do not allow dereferencing of pointers, which the malloc function and its analogs returned, without their prior check. Do not rely on any other ways to intercept the dereferencing of a null pointer. Use the good old if operator only.

Null pointer dereference is a vulnerability

What some developers do not consider a bug at all, others perceive as a vulnerability. This is the exact situation that happens in case of null pointer dereference.

In a number of projects, it is acceptable if the program crashes due to dereference of the null pointer, or if the error is handled in some general way using signal interception/structural exception.

In other applications, null pointer dereference represents a kind of potential vulnerability that can be used for an application-layer DoS attack. Instead of normally handling the lack of memory, the program or one of the execution threads terminates its work. This can lead to loss of data, data integrity, and so on.

Here's an example. There is such a program as Ytnef made for decoding the TNEF threads, for example, created in Outlook. The absence of check after calling calloc was considered the CVE-2017-6298 vulnerability.

All the fixed fragments which could contain null pointer dereference were approximately the same:

vl->data = calloc(vl->size, sizeof(WORD));
temp_word = SwapWord((BYTE*)d, sizeof(WORD));
memcpy(vl->data, &temp_word, vl->size);

Conclusions

If you are developing not a very significant application for which a crash during its work is not a trouble, then yes — do not write checks.

However, if you are developing a real software project or a library, the absence of checks is unacceptable!

Therefore, I ideologically disagree with Carsten Haitzler's argument that the absence of checks in the EFL Core library is acceptable (more details — in the article). This approach does not let developers build reliable applications based on such libraries. If you are creating a library, please note that in some applications, dereferencing of a null pointer is a vulnerability. You must handle memory allocation errors and properly return the information about the failure.

Where are guarantees that dereferencing of exactly a null pointer will occur?

Those who feel lazy to write checks, for some reason think that dereferencing affects exactly null pointers. Yes, it often happens this way. But can a programmer vouch for the code of the entire application? I'm sure not.

I'm going to show what I mean with practical examples. For example, let's look at the code fragment of the LLVM-subzero library, which is used in Chromium.

void StringMapImpl::init(unsigned InitSize) {
  assert((InitSize & (InitSize-1)) == 0 &&
         "Init Size must be a power of 2 or zero!");
  NumBuckets = InitSize ? InitSize : 16;
  NumItems = 0;
  NumTombstones = 0;
  
  TheTable = (StringMapEntryBase **)
             calloc(NumBuckets+1,
                    sizeof(StringMapEntryBase **) + 
                    sizeof(unsigned));

  // Allocate one extra bucket, set it to look filled
  // so the iterators stop at end.
  TheTable[NumBuckets] = (StringMapEntryBase*)2;
}

Note. Here and further, I use old code fragments that I have left over from writing various articles. Therefore, the code or line numbers may no longer match to what they are now. However, this is not that important for the storytelling.

PVS-Studio warning: V522 CWE-690 There might be dereferencing of a potential null pointer 'TheTable'. Check lines: 65, 59. stringmap.cpp 65

Right after allocation of memory buffer, a record happens in the TheTable[NumBuckets] cell. If the value of the variable NumBuckets is great enough, we'll taint some data with unpredictable consequences. After such damage, it makes no sense to speculate how the program will run. There may be the most unexpected consequences.

I will continue the indirect discussion with Carsten Haitzler. He says that the library developers understand what they are doing when they don't check the result of calling the malloc function. I'm afraid they underestimate the danger of this approach. Let's take a look, for example, at the following code fragment from the EFL library:

static void
st_collections_group_parts_part_description_filter_data(void)
{
  ....
  filter->data_count++;
  array = realloc(filter->data,
    sizeof(Edje_Part_Description_Spec_Filter_Data) *
    filter->data_count);
  array[filter->data_count - 1].name = name;
  array[filter->data_count - 1].value = value;
  filter->data = array;
}

PVS-Studio warning: V522 [CWE-690] There might be dereferencing of a potential null pointer 'array'. edje_cc_handlers.c 14249

Here we have a typical situation: there is not enough space for data storing in a buffer, it should be increased. To increase the size of the buffer the realloc function is used, which may return NULL.

If this happens, a structured exception/signal will not necessarily occur due to null pointer dereferencing. Let's look at these lines:

array[filter->data_count - 1].name = name;
array[filter->data_count - 1].value = value;

If the value of the filter->data_count variable is large enough, the values will be written to a strange address.

In memory some data will be corrupted, but the program will run anyway. The consequences are unpredictable, and there will be no good for sure.

Conclusion

I'm asking the question once again: "Where's the guarantee that dereferencing of exactly a null pointer will occur?". No such guarantees. It is impossible, while developing or modifying code, to remember about a nuance considered lately. You can easily mess up something in memory, while the program will continue executing as nothing happened.

The only way to write reliable and correct code is to always check the result returned by the malloc function. Perform a check and live a peaceful life.

Where are the guarantees that memset fills the memory in a direct order?

There will be someone who will say something like this:

I know perfectly about realloc and everything else that is written in the article. But I am a professional and, when allocating memory, I just fill it with zeros using memset at once. Where it is necessary I use checks. But I'm not going to write the extra checks after each malloc.

Generally, to fill the memory immediately after buffer allocation is quite a strange idea. It's strange because there is a calloc function. However, people act like this very often. You don't need to look very far to find examples, here is the code from the WebRTC library:

int Resampler::Reset(int inFreq, int outFreq, size_t num_channels) {
  ....
  state1_ = malloc(8 * sizeof(int32_t));
  memset(state1_, 0, 8 * sizeof(int32_t));
  ....
}

Memory is allocated, then the buffer is filled with zeros. It's a very common practice, although, in fact, two lines can be reduced to one using calloc. It doesn't really matter though.

The main thing is that even such code is unsafe! The memset function is not required to start filling the memory from the beginning and thereby to cause a null pointer dereferencing.

The memset function has the right to start filling the buffer from the end. And if a large buffer was allocated, some useful data could be cleared. Yes, while filling the memory, the memset function will eventually reach the page protected from recording, and the operating system will generate a structural exception/signal. However, it no longer makes sense to handle them. By this time, a large fragment of memory will be corrupted — and the following work of the program will be unpredictable.

The reader might argue that all this is purely theoretical. Yes, the memset function could theoretically fill the buffer starting from the end of the buffer, but in practice, nobody will implement this function this way.

I would agree that this implementation of memset is really exotic, and I even asked a question on Stack Overflow about this topic. This is the reply:

The Linux kernel's memset for the SuperH architecture has this property: link.

Unfortunately, this is code in an assembler unfamiliar for me, so I do not undertake to talk about it. But still there is such an interesting implementation in the C language. Here's the beginning of the function:

void *memset(void *dest, int c, size_t n)
{
  unsigned char *s = dest;
  size_t k;
  if (!n) return dest;
  s[0] = c;
  s[n-1] = c;
  ....
}

Pay attention to these lines:

s[0] = c;
s[n-1] = c;

Here we come to the reason N1 "Dereferencing of a null pointer is undefined behavior". There is no guarantee that the compiler won't swap the assignments. If your compiler does it, and the n argument is of great value, a byte of memory will be corrupted in the beginning. Null pointer dereference will occur only after that.

Not convinced again? Well, how about this implementation?

void *memset(void *dest, int c, size_t n)
{
  size_t k;
  if (!n) return dest;
  s[0] = s[n-1] = c;
  if (n <= 2) return dest;
  ....
}

Conclusion

You cannot even trust the memset function. Yes, this may be an artificial and far-fetched issue. I just wanted to show how many nuances appear if one doesn't check the value of the pointer. It is simply impossible to take in account all of this. Therefore, you should carefully check every pointer returned by the malloc function and similar ones. That's the point when you will become a professional and write reliable code.

Notes based on the publication of the previous article

The previous article has given rise to several debates: 1, 2, 3. Let me respond to some comments.

1. If malloc returned NULL, it is better to terminate the program right away than to write a bunch of if-s and try to somehow handle the lack of memory, which makes the program's execution impossible anyway.

I did not call for fighting the consequences of lack of memory to the last, by throwing the error higher and higher. If it is acceptable for your application to terminate its work without a warning, then so be it. For this purpose, even a single check right after malloc or using xmalloc is enough (see the next point).

I was objecting and warning about the lack of checks, when a program continues to work "as if nothing had happened." This is completely different case. It is unsafe, as it leads to undefined behavior, data corruption, and so on.

2. There is no description of a solution that lies in writing wrapper functions for allocating memory with a check following it or using already existing functions, such as xmalloc.

I agree, this point slipped my mind. It was more important for me to convey to the reader the danger of the check absence. How to fix the code is a matter of taste and implementation details.

The xmalloc function is not part of the C standard library (check out "What is the difference between xmalloc and malloc?"). However, this function may be declared in other libraries, for example in GNU utils library (GNU libiberty).

The main point of the function is that the program crashes when it fails to allocate memory. Implementation of this function might look as follows:

void* xmalloc(size_t s)
{
  void* p = malloc(s);
  if (!p) {
    fprintf (stderr, "fatal: out of memory (xmalloc(%zu)).\n", s);
    exit(EXIT_FAILURE);
  }
  return p;
}

Accordingly, by calling the xmalloc function instead of malloc every time, you can be sure that there will be no undefined behavior in the program due to usage of a null pointer.

Unfortunately, xmalloc is not a panacea either. You need to remember that usage of xmalloc is unacceptable when it comes to writing library code. I will talk about it later.

3. Most comments were the following: "in practice, malloc never returns NULL."

This is usually claimed by Linux developers. They are not right. Fortunately, I am not the only one, who understands that this is the wrong approach. I really liked this comment:

From my experience of discussing this topic, I've got a feeling that there are two sects on the Internet. Members of the first sect are people who firmly convinced that in Linux malloc never returns NULL. Supporters of the second one are firmly convinced that if the memory in the program could not be allocated, nothing can be done in principle, you just let the app crash. There is no way to over-persuade them. Especially when these two sects intersect. You can only take it as a given. And it's even not important on which specialized resource a discussion takes place.

I thought for a while and decided to follow the advice, so I'll not try to persuade anyone :). Let's hope that these development teams write only non-critical software. If, for example, some data gets corrupted in the game or the game crashes, it's not a big deal.

The only thing that is important is that developers of libraries, databases, etc. would not think the same.

Appeal to the developers of highly dependable code and libraries

If you are developing a library or other highly dependable code, always check the value of the pointer returned by malloc/realloc function and return outwards an error code if memory couldn't be allocated.

In libraries, you cannot call the exit function, if memory allocation failed. For the same reason, you cannot use xmalloc. For many applications, it is unacceptable to simply abort them. Because of this, for example, a database or a project that a person has been working on for many hours can be corrupted. One can lose data that was evaluated for many hours. Because of this, the program may be subjected to "denial of service" vulnerabilities, when, instead of correct handling of the growing workload, a multithreaded application simply terminates.

You cannot assume in what projects the library will be used. Therefore, it should be assumed that the application may solve very critical tasks. That's why just killing it by calling exit is no good. Most likely, such a program is written taking into account the possibility of memory lack and it can do something in this case. For example, a CAD system cannot allocate an appropriate memory buffer that is enough for regular operation due to the strong fragmentation of memory. In this case, it is not the reason for it to crash in the emergency mode with data loss. The program can provide an opportunity to save the project and restart itself normally.

Never rely on the fact that malloc can always allocate memory. You don't know on which platform and how the library will be used. If low memory situation on one platform is exotic, it can be quite a common situation on another one.

You cannot expect that if malloc returns NULL, the program will crash. Anything can happen. The program may write data not by the zero address at all. As a result, some data may be corrupted, which leads to unpredictable consequences. Even memset is unsafe. If padding with data goes in reverse order, first some data gets corrupted, and then the program will crash. But the crash may occur too late. If corrupted data is used in parallel threads while the memset function is working, consequences can be fatal. You can get a corrupted transaction in a database or sending commands to the removal of "unnecessary" files. Anything has a chance to happen. I suggest a reader to dream up yourself, what might happen due to the usage of garbage in memory.

Thus, the library has only one correct way of working with the malloc functions. You need to IMMEDIATELY check what the function returned, and if it is NULL, return an error status.

Conclusion

Always check the pointer that is returned by the malloc function or its analogs at once.

As you can see, the PVS-Studio analyzer is right, warning that there is no check of the pointer after a malloc call. It's impossible to write reliable code without making checks. This is especially important and relevant to library developers.

I hope now you have a new look at the malloc function, check pointers and warnings of the PVS-Studio code analyzer. Don't forget to show this article to your teammates and start using PVS-Studio. Thank you for your attention. Wish you less bugs!