Starting my collection of bugs found in copy functions

Apr 04 2020

Author: Andrey Karpov

The Baader-Meinhof phenomenon? I don't think so
Example 1
Example 2
Conclusion

I've already noticed a few times before that programmers seem to tend to make mistakes in simple copy functions. Writing a profound article on this topic is going to take quite a while since I'll have to do some thorough research and sample collecting, but for now I'd like to share a couple of examples I stumbled upon recently.

The Baader-Meinhof phenomenon? I don't think so

As a member of the PVS-Studio team, I come across numerous bugs found with our tool in various projects. And as a DevRel, I love telling people about that :). Today I'm going to talk about incorrectly implemented copy functions.

I saw such functions before, but I never wrote them down as I didn't think they were worth mentioning. But since I discovered the tendency, I can't but start collecting them. For a start, I'm going to show you two recently found specimens.

You may argue that two cases don't make a tendency yet; that I paid attention solely because they occurred too close in time and the Baader-Meinhof phenomenon kicked in.

The Baader-Meinhof phenomenon, also called Frequency Illusion, is a cognitive bias where a person stumbles upon a piece of information and soon afterwards encounters the same subject again, which makes them believe this subject appears exceptionally frequently.

I don't think that's the case. I already had a similar experience with poorly written comparison functions, and my observation was later proved by real examples: "The evil within the comparison functions".

Okay, let's get to the point. That introduction was a bit too long for a brief note on two examples :).

Example 1

In the article about the check of the Zephyr RTOS, I mentioned a failed attempt to make a function that should work like strdup:

static char *mntpt_prepare(char *mntpt)
{
  char *cpy_mntpt;

  cpy_mntpt = k_malloc(strlen(mntpt) + 1);
  if (cpy_mntpt) {
    ((u8_t *)mntpt)[strlen(mntpt)] = '\0';
    memcpy(cpy_mntpt, mntpt, strlen(mntpt));
  }
  return cpy_mntpt;
}

PVS-Studio diagnostic message: V575 [CWE-628] The 'memcpy' function doesn't copy the whole string. Use 'strcpy / strcpy_s' function to preserve terminal null. shell.c 427

The analyzer says the memcpy function copies the string but fails to copy the terminating null character, which is a very strange behavior. You may think the copying of the terminating null takes place in the following line:

((u8_t *)mntpt)[strlen(mntpt)] = '\0';

But that's wrong – this is a typo that causes the terminating null to get copied into itself. Notice that the target array is mntpt, not cpy_mntpt. As a result, the mntpt_prepare function returns a non-terminated string.

This is what the code was actually meant to look like:

((u8_t *)cpy_mntpt)[strlen(mntpt)] = '\0';

I don't see any reason, though, for implementing this function in such a complicated and unconventional way. Because of this overcomplicating, what should have been a small and simple function ended up with a critical bug in it. This code can be reduced to the following:

static char *mntpt_prepare(char *mntpt)
{
  char *cpy_mntpt;

  cpy_mntpt = k_malloc(strlen(mntpt) + 1);
  if (cpy_mntpt) {
    strcpy(cpy_mntpt, mntpt);
  }
  return cpy_mntpt;
}

Example 2

void myMemCpy(void *dest, void *src, size_t n) 
{ 
   char *csrc = (char *)src; 
   char *cdest = (char *)dest; 
   for (int i=0; i<n; i++) 
     cdest[i] = csrc[i]; 
}

We didn't catch this one; I came across it on Stack Overflow: C and static Code analysis: Is this safer than memcpy?

Well, if you check this function with PVS-Studio, it will expectedly issue the following warnings:

V104 Implicit conversion of 'i' to memsize type in an arithmetic expression: i < n test.cpp 26
V108 Incorrect index type: cdest[not a memsize-type]. Use memsize type instead. test.cpp 27
V108 Incorrect index type: csrc[not a memsize-type]. Use memsize type instead. test.cpp 27

Indeed, this code has a flaw, and it was pointed out in the replies on Stack Overflow. You can't use a variable of type int as an index. In a 64-bit program, an int variable would most certainly (we don't talk about exotic architectures now) be 32 bits long and the function would be able to copy only as much as INT_MAX bytes, i.e. not more than 2 GB.

With a larger buffer to copy, a signed integer overflow will occur, which is undefined behavior in C and C++. By the way, don't try to guess how exactly the bug would manifest itself. Surprisingly, it's quite a complicated topic, which is elaborated on in the article "Undefined behavior is closer than you think".

The funniest thing is that the code shown above was written in an attempt to eliminate some warning of the Checkmarx analyzer triggered by a call of the memcpy function. The wisest thing the programmer could come up with was to reinvent the wheel. But the resulting copy function – however simple – ended up flawed. The programmer probably made the things even worse than they had already been. Rather than try and find the cause of the warning, he or she chose to conceal the issue by writing their own function (thus confusing the analyzer). Besides, they made a mistake of using an int variable as a counter. And yes, code like that may not be optimizable. Using a custom function instead of the existing efficient and optimized function memcpy is not an efficient decision. Don't do that :)

Conclusion

Well, it's only the start of the journey, and it might well take me a few years before I collect enough examples to write a profound article on this topic. Actually, it's only now that I'm starting to keep an eye for such cases. Thanks for reading, and be sure to try PVS-Studio on your C/C++/C#/Java code – you may find something interesting.

#Cpp