Code for some Erlang/OTP modules is older than most modern junior developers. These files are the true digital patriarchs. For decades, they have made sure that banking transactions, telephone networks, and messaging systems are all working properly. We decided to take a look under the hood of this long-lived language to see what exactly lies behind the lines that millions of users rely on today. In this article, let's see what we found.

Before we move on to discussing warnings, I'd like to say a few words about the project.
Erlang's journey began back in 1986 in one of Ericsson's laboratories. Joe Armstrong was experimenting with Prolog while trying to build a telephony program. Those experiments eventually gave birth to Erlang.
Erlang is a programming language designed for fault-tolerant systems where failure is unacceptable. Think of a telephone switch where a dropped call counts as an emergency, or a banking transaction that must go through no matter what. These are the tasks it's used for. It lets a system split into millions of isolated, independent processes, so when one fails, the rest keep running without missing a beat.
Erlang goes hand in hand with the Open Telecom Platform (OTP), a framework that provides powerful tools, patterns, and behaviors for building distributed, fault-tolerant applications. This built-in toolkit enables systems written with it run continuously for years without ever needing to stop.
We built the project following these instructions. We checked it at the maint-28 branch using PVS-Studio analyzer and the Visual Studio Code plugin. We faced no issues with building the project, so kudos to the developers.
The article includes only High- and Medium-level warnings.
That concludes our introduction. Now, let's move on to the warnings. We'll divide them into three categories:
Fragment N1
The PVS-Studio warning: V501 There are identical sub-expressions '(!esock_is_integer((env), (argv[3])))' to the left and to the right of the '||' operator. prim_socket_nif.c 6035
static ERL_NIF_TERM
nif_sendfile(ErlNifEnv* env,
int argc,
const ERL_NIF_TERM argv[])
{
....
BOOLEAN_T a2ok;
if ((! (a2ok = GET_INT64(env, argv[2], &offset64))) ||
(! GET_UINT64(env, argv[3], &count64u)))
{
if ((! IS_INTEGER(env, argv[3])) ||
(! IS_INTEGER(env, argv[3])))
return enif_make_badarg(env);
if (! a2ok)
return esock_make_error_integer_range(env, argv[2]);
else
return esock_make_error_integer_range(env, argv[3]);
}
....
}
In the code, the developers wanted to convert the third and fourth arguments to 64-bit types (the signed and unsigned ones, respectively). The function returns early if one of the arguments isn't a number. However, the same argument is being checked here.
The fixed code:
....
if ((! IS_INTEGER(env, argv[2])) ||
(! IS_INTEGER(env, argv[3])))
return enif_make_badarg(env);
....
Fragment N2
The PVS-Studio warning: V501 There are identical sub-expressions 'n->type == ycf_node_type_on_save_yield_state_code' to the left and to the right of the '||' operator. ycf_node.c 218
static void uniqify_local_vars_in_node(ycf_node* n)
{
if (n->type == ycf_node_type_code_scope)
{
uniqify_local_vars_in_scope(&n->u.code_scope);
}
else if(n->type == ycf_node_type_on_restore_yield_state_code)
{
uniqify_local_vars_in_scope(&n->u.special_code_block
.code.if_statement->u.code_scope);
}
else if (n->type == ycf_node_type_on_save_yield_state_code ||
n->type == ycf_node_type_on_save_yield_state_code ||
n->type == ycf_node_type_on_destroy_state_code ||
n->type == ycf_node_type_on_return_code ||
n->type == ycf_node_type_on_destroy_state_or_return_code)
{
uniqify_local_vars_in_scope(&n->u.special_code_block
.code.if_statement->u.code_scope);
}
....
}
Here's another interesting copy-paste case. We have the uniqify_local_vars_in_node function, which recursively handles tree nodes and finds visibility areas with local variables. It then calls the variable name unification function for each of these areas. However, in the last else if branch, the n->type == ycf_node_type_on_save_yield_state_code condition is duplicated after the || operator.
Unfortunately, I don't have a proper fix for this fragment. Judging by the comparison chain, the values of the following list are sorted in the order they're declared:
typedef enum
{
....
ycf_node_type_on_save_yield_state_code,
ycf_node_type_on_restore_yield_state_code,
ycf_node_type_on_destroy_state_code,
ycf_node_type_on_return_code,
ycf_node_type_on_destroy_state_or_return_code
} ycf_node_type;
However, the ycf_node_type_on_restore_yield_state_code constant is checked in the previous else if. So, I guess there's a duplicate check here after all, and we can delete it.
Fragment N3
The PVS-Studio warning: V1040 Possible typo in the spelling of a pre-defined macro name. The '__WIN_32__' macro is similar to '__WIN32__'. inet_drv.c 13506
static int tcp_send_error(tcp_descriptor* desc, int err)
{
/* EPIPE errors usually occur in one of three ways:
* 1. We write to a socket when we've already shutdown() the write side.
* On Windows the error returned for this is ESHUTDOWN
* rather than EPIPE.
* 2. The TCP peer sends us an RST through no fault of our own (perhaps
* by aborting the connection using SO_LINGER) and we then attempt
* to write to the socket. On Linux and Windows we would actually
* receive an ECONNRESET error for this, but on the BSDs, Darwin,
* Illumos and presumably Solaris, it's an EPIPE.
* 3. We cause the TCP peer to send us an RST by writing to a socket
* after we receive a FIN from them. Our first write will be
* successful, but if the they have closed the connection (rather
* than just shutting down the write side of it) this will cause their
* OS to send us an RST. Then, when we attempt to write to the socket
* a second time, we will get an EPIPE error. On Windows we get an
* ECONNABORTED.
*
* What we are going to do here is to treat all EPIPE messages that aren't
* of type 1 as ECONNRESET errors. This will allow users who have the
* show_econnreset socket option enabled to receive {error, econnreset} on
* both send and recv operations to indicate that an RST
* has been received.
*/
#ifdef __WIN_32__
if (err == ECONNABORTED)
err = ECONNRESET;
#endif
if (err == EPIPE && !(desc->tcp_add_flags & TCP_ADDF_SHUTDOWN_WR_DONE))
err = ECONNRESET;
return tcp_send_or_shutdown_error(desc, err);
}
This is a rather unusual case of checking whether a project compiles under Windows. Erlang has other checks on Windows ([1], [2], [3], etc.), but this one is unique. This looks very much like a simple typo (most likely, it should be __WIN32__).
But why is this the case? Out of curiosity, I searched for "__WIN_32__" to see what I could find. Here are the results:

As you can see, there are almost no results. It's funny that one of the links leads to the file we're looking at on Debian Sources. I grepped through the project to find no similar macro definitions. I also thought it might be built into some compiler, like MinGW. Unfortunately, the experiment failed. After trying to understand the macro's purpose, I believe this code is flawed.
As a fix, I suggest using the check that I found earlier in other parts of the project:
static int tcp_send_error(tcp_descriptor* desc, int err)
{
....
#if defined(__WIN32__) || defined(_WIN32) || defined(_WIN32_)
if (err == ECONNABORTED)
err = ECONNRESET;
#endif
....
}
Fragment N4
The PVS-Studio warning: V640 The code's operational logic does not correspond with its formatting. The second statement will always be executed. It is possible that curly brackets are missing. dialyzer.c 246
int main(int argc, char** argv)
{
....
if (argc > 2 && strcmp(argv[1], "+P") == 0)
{
PUSH2("+P", argv[2]);
argc--, argv++;
argc--, argv++;
} else PUSH2("+P", "1000000");
....
}
Here's an interesting example of a common coding error that can be difficult to spot later on. Take a look at the PUSH2 structure:
#define QUOTE(s) s
#define PUSH(s) eargv[eargc++] = QUOTE(s)
#define PUSH2(s, t) PUSH(s); PUSH(t)
Uh-huh, it's a macro that expands into two constructs. Let's expand the code as a preprocessor would:
int main(int argc, char** argv)
{
....
if (argc > 2 && strcmp(argv[1], "+P") == 0)
{
eargv[eargc++] = "+P"; eargv[eargc++] = argv[2];
argc--, argv++;
argc--, argv++;
} else eargv[eargc++] = "+P"; eargv[eargc++] = "1000000";
....
}
As a result, if the check evaluates to false, only the first assignment will be conditionally executed, while the second one will always occur after the branch. This is hardly what the developers wanted to achieve.
Writing and using macros attracts bugs. The perfect solution would be to get rid of them, but let's settle for a less radical fix. Developers usually use the do {...} while (0) construct when they need to combine several actions into one within a macro.
#define PUSH2(s, t) do { PUSH(s); PUSH(t); } while (0)
Fragment N5
The PVS-Studio warning: V654 The condition 'i >= 0' of loop is always true. wxe_return.cpp 236
ERL_NIF_TERM wxeReturn::make_array_objs(wxAuiPaneInfoArray& arr,
WxeApp *app,
const char *cname)
{
ERL_NIF_TERM head, tail;
ERL_NIF_TERM class_name = enif_make_atom(env, cname);
tail = enif_make_list(env, 0);
for (unsigned int i = arr.GetCount() - 1; i >= 0; i--)
{
head = make_ref(app->getRef((void *) &arr.Item(i),memenv), class_name);
tail = enif_make_list_cell(env, head, tail);
}
return tail;
}
Take a look at the for loop, paying special attention to its condition. Since the i induction variable is of the unsigned int type with a value range of [0; UINT_MAX], the loop becomes infinite. After the loop iterates with i == 0, the variable becomes equal to UINT_MAX as a result of the decrement, and the loop continues.
Let's fix the code as follows:
for (int64_t i = arr.GetCount() - 1; i >= 0; i--)
{
....
}
Similar warnings:
Fragment N6
The PVS-Studio warning: V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 2903, 3053. erl_bif_info.c 2903
BIF_RETTYPE system_info_1(BIF_ALIST_1)
{
....
if (is_tuple(BIF_ARG_1)) { // L2778
....
} else if (BIF_ARG_1 == am_scheduler_id) { // L2782
....
}
....
else if (BIF_ARG_1 == am_garbage_collection) { // L2903
....
} else if (BIF_ARG_1 == am_fullsweep_after) { // L2921
....
}
else if (BIF_ARG_1 == am_garbage_collection) { // L3053
....
} else if (BIF_ARG_1 == am_instruction_counts) { // L3056
....
}
....
else if (ERTS_IS_ATOM_STR("halt_flush_timeout", BIF_ARG_1)) { // L3552
....
}
}
The analyzer has detected several branches with identical checks in a function containing a huge number of if-else if statements—roughly 800 lines long. Each of them, however, has different logic: the first check and second check. Given the number of branches and the 150-line gap between the duplicates, it's hardly surprising that this could happen. Static analysis helps prevent such cases. Unfortunately, I can't think of a fix, so I'll leave that to the developers.
Fragment N7
The PVS-Studio warning: V560 A part of conditional expression is always false: (arity == 0). bif.c 3145
BIF_RETTYPE delete_element_2(BIF_ALIST_3)
{
....
ptr = tuple_val(BIF_ARG_2);
arity = arityval(*ptr);
ix = signed_val(BIF_ARG_1);
if ((ix < 1) || (ix > arity) || (arity == 0))
{
BIF_ERROR(BIF_P, BADARG);
}
....
}
Here, we have an example involving math. The analyzer informs us that the last arity == 0 sub-expression is always false. Why?
Let's dig into it. If the control flow reaches the point where this subexpression is evaluated, we will know that two previous subexpressions are false. We can also learn the following:
ix >= 1
ix <= arity
Let's rewrite these expressions mathematically:
1 <= ix <= arity
Next, we bring up a transitive relation rule, which states that arity >= 1. So, arity can never be 0 in this context.
I'll also leave fixing this code to the developers, since my only suggestion is to remove the condition.
Fragment N8
The PVS-Studio warnings:
V557 Array overrun is possible. The value of 'prefix_len' index could reach 262. erl_alloc_util.c 5375
V557 Array overrun is possible. The value of 'prefix_len' index could reach 262. erl_alloc_util.c 5378
V557 Array overrun is possible. The value of 'prefix_len' index could reach 262. erl_alloc_util.c 5381
#define MAX_ATOM_CHARACTERS 255
static void make_name_atoms(Allctr_t *allctr)
{
char alloc[] = "alloc";
char realloc[] = "realloc";
char free[] = "free";
char buf[MAX_ATOM_CHARACTERS];
size_t prefix_len = sys_strlen(allctr->name_prefix);
if (prefix_len > MAX_ATOM_CHARACTERS + sizeof(realloc) - 1)
erts_exit(ERTS_ERROR_EXIT,
"Too long allocator name: %salloc\n"
allctr->name_prefix);
sys_memcpy((void *) buf, (void *) allctr->name_prefix, prefix_len);
sys_memcpy((void *) &buf[prefix_len], (void *) alloc, sizeof(alloc) - 1);
allctr->name.alloc = am_atom_put(buf, prefix_len + sizeof(alloc) - 1);
sys_memcpy((void *) &buf[prefix_len], (void *) realloc, sizeof(realloc) - 1);
allctr->name.realloc = am_atom_put(buf, prefix_len + sizeof(realloc) - 1);
sys_memcpy((void *) &buf[prefix_len], (void *) free, sizeof(free) - 1);
allctr->name.free = am_atom_put(buf, prefix_len + sizeof(free) - 1);
}
We have a function that generates names for allocation and deallocation functions based on the prefix passed to it. A 255-byte buffer is used to generate them. Judging by the function contents, there will be no null terminator in the generated buffer, since am_atom_put passes the total size of the generated string.
The very first thing the developers did was calculate the prefix size. Then, they limited the prefix size: if it exceeds the sum of MAX_ATOM_CHARACTERS and the length of the realloc string (the longest postfix), the noreturn function is called.
Next, they copied the prefix to the buffer, added the postfix, and passed the string to the am_atom_put function. The devs performed all these actions sequentially for all postfixes: alloc, realloc, and free.
Can you spot the catch? The early return check was performed incorrectly. The buffer must have enough space to write the longest postfix, which is seven characters. If the prefix string is between 256 and 262 characters long, the early return doesn't happen, so the sys_memcpy function overflows the buffer. This operation has undefined behavior.
Assuming that the am_atom_put function works with strings no longer than MAX_ATOM_CHARACTERS, let's fix the code as follows:
if (prefix_len > MAX_ATOM_CHARACTERS - sizeof(realloc) + 1)
Now, the maximum prefix length is 248 characters. When writing the maximum postfix, overflow won't occur.
Similar warnings:
Fragment N9
I'd like to introduce this example in an unusual way. First, let's look at this function:
static const char* event_state_flag_to_str(EventStateFlags f,
const char *buff,
int len)
{
switch ((int)f)
{
case ERTS_EV_FLAG_CLEAR:
return "CLEAR";
case ERTS_EV_FLAG_USED:
return "USED";
/* other cases that return string literals */
default:
snprintf((char *)buff, len, "ERROR(%d)", f);
return buff;
}
}
The event_state_flag_to_str function converts the EventStateFlags enumeration to a string. For non-default branches, switch returns a string literal. Otherwise, the function creates a string in the provided buffer and returns a pointer to it. Note how exactly it does this: the buffer forcibly discards constness. According to the C standard, this is allowed only if the source buffer hasn't been declared as const (C11 6.7.3.6, C23 6.7.4.1.7).
Is this really the case? Let's take a look at the invocation point:
static ERTS_INLINE void
print_flags(erts_dsprintf_buf_t *dsbufp, EventStateFlags f)
{
const char buff[64];
if (f & ERTS_EV_FLAG_WANT_ERROR)
{
erts_dsprintf(dsbufp, "WANTERR|");
f &= ~ERTS_EV_FLAG_WANT_ERROR;
}
erts_dsprintf(dsbufp, "%s",
event_state_flag_to_str(f, buff, sizeof(buff)));
}
The function in question is called by passing the constant buffer as the second argument. Unfortunately, we get undefined behavior. We found this error using the analyzer, but it issued a confusing warning:
V614 Uninitialized buffer 'buff' used. Consider checking the second actual argument of the 'event_state_flag_to_str' function. erl_check_io.c 2833
Let's fix the code by removing the constness from the buffer:
static ERTS_INLINE void
print_flags(erts_dsprintf_buf_t *dsbufp, EventStateFlags f)
{
char buff[64];
if (f & ERTS_EV_FLAG_WANT_ERROR)
{
erts_dsprintf(dsbufp, "WANTERR|");
f &= ~ERTS_EV_FLAG_WANT_ERROR;
}
erts_dsprintf(dsbufp, "%s",
event_state_flag_to_str(f, buff, sizeof(buff)));
}
I also suggest changing the event_state_flag_to_str function signature, so the parameter can take a char* pointer. This makes it clear that the function can modify the buffer passed to it. We won't be breaking much code in the process since the function exists only in the compiled file and is marked as static.
Fragment N10
The PVS-Studio warning: V1050 The uninitialized class member 'a' is used when initializing the base class 'BeamAssemblerCommon'. beam_asm.hpp 87
class BeamAssemblerCommon : public ErrorHandler
{
BaseAssembler &assembler;
protected:
BeamAssemblerCommon(BaseAssembler &assembler);
....
};
struct BeamAssembler : public BeamAssemblerCommon
{
BeamAssembler() : BeamAssemblerCommon(a) { /* .... */ }
protected:
a64::Assembler a;
....
};
This fragment contains a small class hierarchy where a reference to a non-static member function of the derived class, BeamAssembler, is passed to the constructor of the base class, BeamAssemblerCommon. This operation won't cause any issues as long as the base class constructor doesn't interact with this member function. Why? The lifetime of an object from a derived class starts when the required initializer in the initialization list has been executed or, if there's no initializer, when the control flow enters the constructor body.
With this information in mind, let's take a look at the base class constructor. It doesn't use this object, right? Right?
BeamAssemblerCommon::BeamAssemblerCommon(BaseAssembler &assembler_)
: assembler(assembler_), ....
{
....
#ifdef DEBUG
assembler.addDiagnosticOptions(DiagnosticOptions::kValidateAssembler);
#endif
assembler.addEncodingOptions(EncodingOptions::kOptimizeForSize
| EncodingOptions::kOptimizedAlign);
....
}
Nope :) A non-static member function is called on an object of a derived class via a reference when its lifetime has not yet begun. Again, we get undefined behavior.
I also find it difficult to suggest a fix in this case, so I'll leave that to the developers.
Fragment N11
The PVS-Studio warning: V586 The 'free' function is called twice for deallocation of the same memory space. erl_call.c 668
int main(int argc, char *argv[])
{
int fd;
char *p = NULL;
ei_cnode ec;
....
int i = 0;
ei_x_buff reply;
....
if (ei_rpc(&ec, fd, "c", "c", p, i, &reply) < 0)
{
free(p);
ei_x_free(&reply);
fprintf(stderr,"erl_call: can't compile file %s\n", fname);
}
free(p);
/* FIXME complete this code
FIXME print out error message as term
if (!erl_match(erl_format("{ok,_}"), reply)) {
fprintf(stderr,"erl_call: compiler errors\n");
}
*/
ei_x_free(&reply);
....
}
Judging by the name, ei_rpc is a remote procedure call. A negative number is returned if the operation fails. If the remote call fails, some resources are cleaned up. The p pointer is passed to the free function and an error is logged to stderr. Next, regardless of the function call result, the pointer is passed to free again. Memory is released twice.
The fix is pretty simple: either stop the program execution or set the freed pointers to zero. To fix the issue, I would interrupt execution here:
if (ei_rpc(&ec, fd, "c", "c", p, i, &reply) < 0)
{
free(p);
ei_x_free(&reply);
fprintf(stderr,"erl_call: can't compile file %s\n", fname);
return 1;
}
free(p);
Fragment N12
The PVS-Studio warning: V595 The 'pkey' pointer was utilized before it was verified against nullptr. Check lines: 581, 582. pkey.c 581
static int get_pkey_public_key(ErlNifEnv *env,
const ERL_NIF_TERM argv[],
int algorithm_arg_num,
int key_arg_num,
EVP_PKEY **pkey,
ERL_NIF_TERM *err_return)
{
....
if (enif_is_map(env, argv[key_arg_num]))
{
password = get_key_password(env, argv[key_arg_num]);
*pkey = ENGINE_load_public_key(e, id, NULL, password);
if (!pkey)
assign_goto(*err_return, err,
EXCP_BADARG_N(env, key_arg_num,
"Couldn't get public key from engine"));
}
/* other branches */
ret = 1;
done:
....
return ret;
err:
....
*pkey = NULL;
ret = 0;
goto done;
}
We have quite an interesting code fragment here. The function attempts to extract the public key to the fifth out parameter. It returns 1 if successful and 0 if not. In the latter case, the public key is zeroed as well.
The analyzer warns that, in one of the function branches, the pkey pointer is dereferenced before being checked for validity. However, other branches don't seem to have such checks. We can conclude that the function has the following contract: "The pkey parameter must never be equal to a nullptr." We can confirm this by searching for calls to this function in the same file ([1], [2], [3])—the variable address is passed everywhere.
In fact, the developers intended to handle the incorrect result of the ENGINE_load_public_key function by jumping to the err mark. However, they made a typo by forgetting to put an asterisk before the ! operator. A modern compiler will optimize away this check on release. As a result, the function will return 1, and a null pointer will end up in the fifth parameter.
Let's fix the check:
....
*pkey = ENGINE_load_public_key(e, id, NULL, password);
if (!*pkey)
assign_goto(*err_return, err,
EXCP_BADARG_N(env,
key_arg_num,
"Couldn't get public key from engine"));
....
Fragment N13
We often receive messages saying that the analyzer warns about passing a null pointer to free. After all, the standard states that nothing terrible will happen. We even wrote an article about a similar case.
Yes, that's fine, but such code is probably suspicious. That helped us find an interesting memory leak in the code. Let's take a look at it.
The PVS-Studio warning: V575 The null pointer is passed into 'free' function. Inspect the first argument. erl_misc_utils.c 264
void erts_cpu_info_destroy(erts_cpu_info_t *cpuinfo)
{
if (cpuinfo)
{
cpuinfo->configured = 0;
cpuinfo->online = 0;
cpuinfo->available = 0;
#ifdef HAVE_PSET_INFO
if (cpuinfo->cpuids)
free(cpuinfo->cpuids);
#endif
cpuinfo->topology_size = 0;
if (cpuinfo->topology)
{
cpuinfo->topology = NULL;
free(cpuinfo->topology);
}
free(cpuinfo);
}
}
In this fragment, the cpuinfo->topology pointer is explicitly set to NULL before calling free. As a result, the memory-freeing operation is pointless, since free(NULL) does nothing. This results in memory leaks because the original memory block that cpuinfo->topology pointed to is never freed.
We can fix the issue by swapping the operations:
if (cpuinfo->topology)
{
free(cpuinfo->topology);
cpuinfo->topology = NULL;
}
Fragment N14
The PVS-Studio warning: V773 The function was exited without releasing the 'entry' pointer. A memory leak is possible. wxe_gl.cpp 88
typedef struct _wxe_glc
{
wxGLCanvas *canvas;
wxGLContext *context;
} wxe_glc;
void setActiveGL(wxeMemEnv *memenv,
ErlNifPid caller,
wxGLCanvas *canvas,
wxGLContext *context)
{
ErlNifUInt64 callId = wxe_make_hash(memenv->tmp_env, &caller);
wxe_glc * entry = glc[callId];
gl_active_index = callId;
gl_active_pid = caller;
if (!entry)
{
entry = (wxe_glc *) malloc(sizeof(wxe_glc));
entry->canvas = NULL;
entry->context = NULL;
}
if (entry->canvas == canvas && entry->context == context)
return;
entry->canvas = canvas;
entry->context = context;
glc[gl_active_index] = entry;
....
}
In the setActiveGL function, the entry pointer is declared and initialized via an array element. If this element is a null pointer, memory is allocated using malloc. Next, the pointers are compared. If it evaluates to true, an early return will occur. However, if dynamic allocation occurred, the memory won't be released.
Yes, such a situation is rather rare and unlikely to happen, because it requires a whole series of events:
glc[callId] contains a null pointer.However, we can ensure that a leak will never occur, not even theoretically. Let's modify the code a little:
ErlNifUInt64 callId = wxe_make_hash(memenv->tmp_env, &caller);
wxe_glc * entry = glc[callId];
gl_active_index = callId;
gl_active_pid = caller;
bool new_alloc = false;
if (!entry)
{
entry = (wxe_glc *) malloc(sizeof(wxe_glc));
entry->canvas = NULL;
entry->context = NULL;
new_alloc = true;
}
if (entry->canvas == canvas && entry->context == context)
{
if (new_alloc)
{
free(entry);
}
return;
}
....
Fragment N15
The PVS-Studio warning: V701 realloc() possible leak: when realloc() fails in allocating memory, original pointer 'outbuf_base' is lost. Consider assigning realloc() to a temporary pointer. run_erl.c 1324
static void outbuf_append(const char* buf, int n)
{
....
/*
* Allocate a larger buffer if we still cannot fit the data.
*/
if (outbuf_base+outbuf_total < outbuf_in+n)
{
int size = outbuf_in - outbuf_out;
outbuf_total = size+n;
outbuf_base = realloc(outbuf_base, outbuf_total);
outbuf_out = outbuf_base;
outbuf_in = outbuf_base + size;
}
....
}
There's a tricky pattern with the realloc function, where the original pointer is immediately overwritten by the return value. Let's get to the bottom of this.
The function takes the buffer for reallocation as its first argument and the new buffer size as its second argument. If the operation is successful, the function will return a pointer to the newly allocated memory. The memory passed as the first argument becomes invalid, even if only a simple memory block expansion has occurred. If the operation fails, the function returns a null pointer. The memory passed as the first argument remains untouched.
So, if we immediately overwrite the original pointer with a call to realloc, we will get a memory leak.
Let's fix the code as follows:
static void outbuf_append(const char* buf, int n)
{
....
/*
* Allocate a larger buffer if we still cannot fit the data.
*/
if (outbuf_base + outbuf_total < outbuf_in + n)
{
int size = outbuf_in - outbuf_out;
outbuf_total = size+n;
char *tmp = realloc(outbuf_base, outbuf_total);
if (!tmp)
{
/* somehow handle this scenario
* the `outbuf_base` buffer here is still valid */
}
outbuf_base = tmp;
outbuf_out = outbuf_base;
outbuf_in = outbuf_base + size;
}
....
}
In the fixed example, we also need to handle the situation where realloc returns the null pointer. This may be an exception throwing, an attempt to allocate a buffer of a different size, and so on.
Fragment N16
The PVS-Studio warning: V769 The 'end' pointer in the 'end - mm->sua.top' expression equals nullptr. The resulting value is senseless and it should not be used. erl_mmap.c 2411
static void
add_free_desc_area(ErtsMemMapper* mm, char *start, char *end)
{
ERTS_MMAP_ASSERT(end == (void *) 0 || end > start);
if (sizeof(ErtsFreeSegDesc) <= ((UWord) end) - ((UWord) start))
{
....
}
....
}
void erts_mmap_init(ErtsMemMapper* mm, ErtsMMapInit *init)
{
....
if (end == (void *) 0)
{
/*
* Very unlikely, but we need a guarantee
* that `mm->sua.top` always will
* compare as larger than all segment pointers
* into the super carrier...
*/
mm->sua.top -= ERTS_PAGEALIGNED_SIZE;
mm->size.supercarrier.used.total += ERTS_PAGEALIGNED_SIZE;
#ifdef ERTS_HAVE_OS_PHYSICAL_MEMORY_RESERVATION
if (!virtual_map || os_reserve_physical(mm->sua.top, ERTS_PAGEALIGNED_SIZE))
#endif
add_free_desc_area(mm, mm->sua.top, end);
mm->desc.reserved += (end - mm->sua.top) / sizeof(ErtsFreeSegDesc);
}
....
}
The analyzer has detected suspicious address arithmetic. Let's see what's wrong here.
We enter the code branch where end is indeed a null pointer. The mm->sua.top data member is also a pointer, but we don't know its value yet. According to the C standard, the behavior of pointers that differ is defined only when they both point to elements of the same array (C11 6.5.6.8; C23 6.5.7.10). The null pointer doesn't point to any array element. So, the behavior of this operation is undefined, which isn't good.
I was wondering what happens in the add_free_desc_area function.
static void
add_free_desc_area(ErtsMemMapper* mm, char *start, char *end)
{
ERTS_MMAP_ASSERT(end == (void *) 0 || end > start);
if (sizeof(ErtsFreeSegDesc) <= ((UWord) end) - ((UWord) start)) {
....
}
Pointer subtraction is also used here, but with a difference: first, the pointers are converted to numbers. This is implementation-defined behavior (C11 6.3.2.3.6; C23 6.3.2.3.6), which is much better.
Unfortunately, I can't suggest a proper fix. The developers might want to take a look into this issue.
Well, that concludes the article. As expected with any large project that has had a long and eventful history, we found various errors. They're also part of the project, though. It's important to keep in mind that finding these kinds of issues in a code base that's over three decades old isn't a sign of weakness. Rather, it's proof of the code's incredible scale and longevity. There are millions of code lines written by hundreds of developers, and the system has been running for years without rebooting. This project deserves the deepest respect, not because it's perfect, but because it has stood the test of time and proven its real power.
Thanks to constantly evolving modern technologies, developers have powerful tools at their disposal to improve code reliability. For example, static analyzers help identify hidden errors and potential vulnerabilities. I invite you to see it for yourself :) You can get a trial version of PVS-Studio analyzer here.
0