A beautiful error in the implementation of the string concatenation function

Jul 22 2021

Author: Andrey Karpov

LFortran project error
Error detection
Let's continue to improve the code
Additional links:

We, the PVS-Studio static code analyzer developers, have a peculiar view on beauty. On the beauty of bugs. We like to find grace in errors, examine them, try to guess how they appeared. Today we have an interesting case when the concepts of length and size got mixed up in the code.

LFortran project error

When we heard about the new CppCast issue about LFortran, we decided to check this very LFortran. This is a small project so we don't know if there will be enough material for a classic article about open-source project analysis. However, a small error immediately caught our attention, so we decided to write a small note. To our taste, this is a lovely error.

The LFortran project has functions that concatenate two strings in a new buffer.

void _lfortran_strcat(char** s1, char** s2, char** dest)
{
    int cntr = 0;
    char trmn = '\0';
    int s1_len = strlen(*s1);
    int s2_len = strlen(*s2);
    int trmn_size = strlen(&trmn);
    char* dest_char = (char*)malloc(s1_len+s2_len+trmn_size);
    for (int i = 0; i < s1_len; i++) {
        dest_char[cntr] = (*s1)[i];
        cntr++;
    }
    for (int i = 0; i < s2_len; i++) {
        dest_char[cntr] = (*s2)[i];
        cntr++;
    }
    dest_char[cntr] = trmn;
    *dest = &(dest_char[0]);
}

Before we analyze this code, you can try to find an error yourself. I will insert a long picture so you don't accidentally read the explanation. You have probably seen the "longcat" meme. We will have a "longunicorn" :)

The function should work in the following way. We calculate a buffer size that can accommodate both merged strings, and the terminal null. The buffer is allocated, we copy the strings into it and add the terminal null. However, the allocated buffer has insufficient size. Its size is 1 byte less than required. As a result, the terminal null will be written outside of the allocated buffer.

The developer who wrote the code got carried away with using too much strlen function. The author even used it to determine the terminal null size. There was a mix-up between an object's size (terminal null) and an empty string's length. This code is strange and incorrect. But for us it is a beautiful and unusual mistake.

Explanation:

char trmn = '\0';
int trmn_size = strlen(&trmn);

Here, the trmn symbol is interpreted as an empty string whose length is zero. Accordingly, the trmn_size variable, whose name stands for the terminal null size, is always equal to 0.

They shouldn't have counted the length of the empty string. It is better to calculate how many bytes the terminal character occupies with the sizeof operator. The correct code:

void _lfortran_strcat(char** s1, char** s2, char** dest)
{
    int cntr = 0;
    char trmn = '\0';
    int s1_len = strlen(*s1);
    int s2_len = strlen(*s2);

    int trmn_size = sizeof(trmn);  // <=

    char* dest_char = (char*)malloc(s1_len+s2_len+trmn_size);
    for (int i = 0; i < s1_len; i++) {
        dest_char[cntr] = (*s1)[i];
        cntr++;
    }
    for (int i = 0; i < s2_len; i++) {
        dest_char[cntr] = (*s2)[i];
        cntr++;
    }
    dest_char[cntr] = trmn;
    *dest = &(dest_char[0]);
}

Error detection

We found the error with the PVS-Studio static code analyzer. Unfortunately, the tool could not detect the error as array index out of bounds. This is rather difficult to do. The data flow analysis could not compare how the size of the dest_char buffer is related to the cntr variable value that is incremented in the loop. The error was detected indirectly.

PVS-Studio issued a warning: V742 [CWE-170, CERT-EXP37-C] Function receives an address of a 'char' type variable instead of pointer to a buffer. Inspect the first argument. lfortran_intrinsics.c 550

It is weird to calculate the length of a string with the strlen function by passing a pointer to a single symbol to this function. Indeed, when we examined the anomaly, we found a serious bug. Static analysis is cool!

Let's continue to improve the code

We have fixed the error. However, the code has other drawbacks that the analyzer has pointed out. It would be useful to do an additional refactoring.

First, the analyzer doesn't like the lack of an additional check of a pointer that the malloc function returns. This is important. Warning: V522 [CWE-690, CERT-MEM52-CPP] There might be dereferencing of a potential null pointer 'dest_char'. Check lines: 553, 551. lfortran_intrinsics.c 553

Second, the analyzer issues several warnings about 64-bit errors. The code isn't prepared for strings that can be longer than INT_MAX characters. This is clearly exotic, but writing code that way is still ugly and potentially dangerous. It is better to use the size_t type instead of int.

The improved version of the function:

void _lfortran_strcat(const char** s1, const char** s2, char** dest)
{
    if (s1 == NULL || *s1 == NULL ||
        s2 == NULL || *s2 == NULL || dest == NULL)
    {
      // Some kind of error handling appropriate in the given project.
      ....
    }
    size_t cntr = 0;
    const char trmn = '\0';
    const size_t s1_len = strlen(*s1);
    const size_t s2_len = strlen(*s2);
    char* dest_char = (char*)malloc((s1_len+s2_len+1)*sizeof(char));
    if (dest_char == NULL)
    {
      // Some kind of error handling appropriate in the given project.
      ....
    }

    for (size_t i = 0; i < s1_len; i++) {
        dest_char[cntr] = (*s1)[i];
        cntr++;
    }
    for (size_t i = 0; i < s2_len; i++) {
        dest_char[cntr] = (*s2)[i];
        cntr++;
    }
    dest_char[cntr] = trmn;
    *dest = dest_char;
}

The new code isn't perfect either, but it has clearly become better. Thank you for the attention. Come and try PVS-Studio to test your own projects.