>
>
>
Errors that static code analysis does n…

George Gribkov
Articles: 20

Errors that static code analysis does not find because it is not used

Readers of our articles occasionally note that the PVS-Studio static code analyzer detects a large number of errors that are insignificant and don't affect the application. It is really so. For the most part, important bugs have already been fixed due to manual testing, user feedback, and other expensive methods. At the same time, many of these errors could have been found at the code writing stage and corrected with minimal loss of time, reputation and money. This article will provide several examples of real errors, which could have been immediately fixed, if project authors had used static code analysis.

The idea is very simple. We'll search for examples of pull requests on GitHub that specify that an issue is a bugfix. Then we'll try to find these bugs using the PVS-Studio static code analyzer. If an error could be found by the analyzer, then it is a bug which could have been found at the code writing stage. The earlier the bug is corrected, the cheaper it costs.

Unfortunately, GitHub let us down and we didn't manage to make a big posh article on the subject. GitHub itself has a glitch (or a feature) that doesn't allow you to search for comments of pull requests in projects written only in certain programming languages. Or I don't know how to cook it. Despite that I specify to search for comments in C, C++, C# projects, the results are given for all languages, including PHP, Python, JavaScript, and others. As a result, looking for suitable cases has proved to be extremely tedious, and I'll go for just a few examples. However, they are enough to demonstrate the usefulness of static code analysis tools when used regularly.

What if the bug had been caught at the earliest stage? The answer is simple: programmers wouldn't have to wait for it to show itself, then search and correct the defective code.

Let's look at the errors that PVS-Studio could have immediately detected:

The first example is taken from the SatisfactoryModLoader project. Before fixing the error, the code looked as follows:

// gets an API function from the mod handler
SML_API PVOID getAPIFunction(std::string name) {
  bool found = false;
  for (Registry reg : modHandler.APIRegistry) {
    if (reg.name == name) {
      found = true;
    }
  }
  if (!found) {
    std::string msg = ...;
    MessageBoxA(NULL, 
                msg.c_str(), 
                "SatisfactoryModLoader Fatal Error", 
                MB_ICONERROR);
    abort();
  }
}

This code contained an error, that PVS-Studio would immediately issue a warning to:

V591 Non-void function should return a value. ModFunctions.cpp 44

The above function has no return statement, so it will return a formally undefined value. The programmer didn't use the code analyzer, so he had to look for the bug on his own. The function after editing:

// gets an API function from the mod handler
SML_API PVOID getAPIFunction(std::string name) {
  bool found = false; 
  PVOID func = NULL;
  for (Registry reg : modHandler.APIRegistry) {
    if (reg.name == name) {
      func = reg.func;
      found = true;
    }
  }
  if (!found) {
    std::string msg = ...;
    MessageBoxA(NULL, 
                msg.c_str(), 
                "SatisfactoryModLoader Fatal Error", 
                MB_ICONERROR);
    abort();
  }
  return func;
}

Curiously, in the commit, the author marked the bug as critical: "fixed critical bug where API functions were not returned".

In the second commit from the mc6809 project history, edits were introduced in the following code:

void mc6809dis_direct(
  mc6809dis__t *const dis,
  mc6809__t    *const cpu,
  const char   *const op,
  const bool          b16
)
{
  assert(dis != NULL);
  assert(op != NULL);

  addr.b[MSB] = cpu->dp;
  addr.b[LSB] = (*dis->read)(dis, dis->next++);

  ...

  if (cpu != NULL)
  {
    ...
  }
}

The author corrected only one line. He replaced the expression

addr.b[MSB] = cpu->dp;

for the following one

addr.b[MSB] = cpu != NULL ? cpu->dp : 0;

In the old code version there was not any check for a null pointer. If it happens so that a null pointer is passed to the mc6809dis_direct function as the second argument, its dereference will occur in the body of the function. The result is deplorable and unpredictable.

Null pointer dereference is one of the most common patterns we are told about: "It's not a critical bug. Who cares that it is thriving in code? If dereference occurs, the program will quietly crash and that's it." It's strange and sad to hear this from C++ programmers, but life happens.

Anyway, in this project such dereference has turned into a bug, as the commit's subject tells us: "Bug fix---NULL dereference".

If the project developer had used PVS-Studio, he could have checked and found the warning two and a half months ago. This is when the bug was introduced. Here is the warning:

V595 The 'cpu' pointer was utilized before it was verified against nullptr. Check lines: 1814, 1821. mc6809dis.c 1814

Thus, the bug would have been fixed at the time of its appearance, which would have saved the developer's time and nerves :).

An example of another interesting fix was found in the libmorton project.

Code to be fixed:

template<typename morton>
inline bool findFirstSetBitZeroIdx(const morton x, 
                                   unsigned long* firstbit_location)
{
#if _MSC_VER && !_WIN64
  // 32 BIT on 32 BIT
  if (sizeof(morton) <= 4) {
    return _BitScanReverse(firstbit_location, x) != 0;
  }
  // 64 BIT on 32 BIT
  else {
    *firstbit_location = 0;
    if (_BitScanReverse(firstbit_location, (x >> 32))) { // check first part
      firstbit_location += 32;
      return true;
    }
    return _BitScanReverse(firstbit_location, (x & 0xFFFFFFFF)) != 0;
  }
#elif  _MSC_VER && _WIN64
  ....
#elif __GNUC__
  ....
#endif
}

In his edit, a programmer replaces the expression "firstbit_location += 32" with "*firstbit_location += 32". The programmer expected that 32 will be added to the value of the variable referred to by the firstbit_location pointer, but 32 was added to the pointer itself. The changed value of the pointer wasn't used anywhere any more and the expected variable value remained unchanged.

PVS-Studio would issue a warning to this code:

V1001 The 'firstbit_location' variable is assigned but is not used by the end of the function. morton_common.h 22

Well, what is so bad about the modified but further unused expression? The V1001 diagnostic doesn't look like it's meant for detecting particularly dangerous bugs. Despite this, it found an important error that influenced the program logic.

Moreover, it turned out that that error wasn't so easy to find! Not only has it been in the program since the file was created, but it has also experienced many edits in neighboring lines and existed in the project for as many as 3 (!) years! All this time the logic of the program was broken, and it didn't work in the way developers expected. If they had used PVS-Studio, the bug would have been detected much earlier.

In the end, let's look at another nice example. While I was collecting bug fixes on GitHub, I came across a fix with the following content several times. The fixed error was here:

int kvm_arch_prepare_memory_region(...)
{
  ...
  do {
    struct vm_area_struct *vma = find_vma(current->mm, hva);
    hva_t vm_start, vm_end;
    ...
    if (vma->vm_flags & VM_PFNMAP) {
      ...
      phys_addr_t pa = (vma->vm_pgoff << PAGE_SHIFT) +
        vm_start - vma->vm_start;
      ...
    }
    ...
  } while (hva < reg_end);
  ...
}

PVS-Studio issued a warning for this code snippet:

V629 Consider inspecting the 'vma->vm_pgoff << 12' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. mmu.c 1795

I checked out declarations of variables, used in the expression "phys_addr_t pa = (vma->vm_pgoff << PAGE_SHIFT) + vm_start - vma->vm_start;" and found out that the code given above is equal to the following synthetic example:

void foo(unsigned long a, unsigned long b)
{
  unsigned long long x = (a << 12) + b;
}

If the value of the a 32-bit variable is greater than 0xFFFFF, 12 highest bits will have at least one nonnull value. After shifting this variable left, these significant bits will be lost, resulting in incorrect information written in x.

To eliminate loss of high bits, we need first to cast a to the unsigned long long type and only after this shift the variable:

pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;
pa += vm_start - vma->vm_start;

This way, a correct value will always be written in pa.

That'd be okay but this bug, the same as the first example from the article, also turned out to be critical. It's author wrote about it in the comment. Moreover, this error found its way to an enormous number of projects. To fully appreciate the scale of the tragedy, I suggest looking at the number of results when searching for this bugfix on GitHub. Scary, isn't it?

So I've taken a new approach to demonstrate the benefits of a regular static code analyzer usage. I hope you enjoyed it. Download and try the PVS-Studio static code analyzer to check your own projects. At the time of writing, it has about 700 implemented diagnostic rules to detect a variety of error patterns. Supports C, C++, C# and Java.