Static analysis of source code by the example of WinMerge

Oct 30 2010

Author: Andrey Karpov

The today's post is devoted to the question why tools of static source code analysis are helpful regardless of programmer's knowledge and skill. I will demonstrate the benefit of static analysis by the example of the tool known to every programmer - WinMerge.

The earlier the developer finds an error in application code, the cheaper it is to fix it. From this we conclude that it's cheapest and easiest to eliminate an error while writing the code. The best way is certainly just to write without errors at all: imagine that you are only going to make an error but you slap your hand with the other and go on writing correct code. Still we don't manage to do that, do we? So, the approach "you should write without errors" doesn't work anyway.

Even a highly skilled programmer who takes his time makes errors from common misprints to errors in algorithms. It is the law of large numbers that works in this case. Does it seem to you that one can't make a mistake in every particular "if" operator? But I carried out such an experiment and wrote 200 comparisons - I did make an error once. Andrey Urazov discussed this thing in his interesting lecture "Quality-oriented programming" at the CodeFest 2010 conference. I would like to cite his thought that however skilled developers are, errors will appear in code all the same. You just can't stop making them. But you may successfully fight many of them at much earlier stages of development process than usually.

Usually the first level of error defense is creating unit-tests for the newly written code. Sometimes tests are written earlier than the code they are intended to check. However, unit-tests have some disadvantages that I'm not going to discuss in detail here because all the programmers are aware of them. It is not always easy to create a unit-test for a function that requires a complicated procedure of preliminarily preparing the data. Unit-tests become a burden if the project requirements change rapidly; tests consume a lot of time to write and support; it is not always easy to cover all the program branches with tests, etc. Moreover, you may get a solid project "as a present" that merely has no unit-tests and they were not intended at all. Without denying the great benefit of unit-tests, I still think that although it is a good defense level, we can and must improve it greatly.

Programmers usually neglect an even earlier defense level - static code analysis. Many developers utilize capabilities of static code analysis without leaving the scope of diagnostic warnings generated by compilers. However, there is a wide range of tools that allow you to detect a significant part of logical errors and common misprints at the coding stage already. These tools perform a higher-level code check relying on knowledge of some coding patterns, use heuristic algorithms and provide for a flexible settings system.

Of course, static analysis has its own disadvantages: it just cannot detect many types of errors; analyzers produce false alarms and make you modify code so that they like it and consider safe.

But there are huge advantages as well. Static analysis covers all the program branches regardless of how often they are used. It doesn't depend on execution stages. You may check even incomplete code or you may check a large amount of code you inherited from some developer. Static analysis is quick and well scalable unlike dynamic analysis tools.

So you have read a lot of words about static analysis of source code. Now it's time for practice. I want to take one application in C++ and try to find errors in it.

I wanted to choose something small and widely known. Since I don't use too many tools, I just looked through the "Programs" list in the "Start" menu and decided to take WinMerge. The WinMerge application is open-source and it is small (about 186000 lines). Its quality is rather high. I'm saying this relying on my experience - I have no complaints about it and I like that comments occupy 25% of its source code (it is a good sign). So, it is a good choice.

I downloaded the latest available version 2.13.20 (from 20.10.2010). I used the prototype of a general-purpose analyzer we are developing now. Let me tell you a bit more about it.

Currently, the PVS-Studio static analyzer includes two rule sets. One of them is intended to detect 64-bit defects and the other is intended to check OpenMP programs. Now we are developing a general-purpose set of rules. We haven't got even a beta-version yet but some code already works and I'm very eager to have a real war against errors. We intend to make the new rule set free, so please don't write that we are indulging in self-advertisement. The new tool will be presented to the community in 1-2 months as a part of PVS-Studio 4.00.

So, here are some interesting issues I detected in WinMerge-2.13.20's code during a half an hour (15 minutes for analysis, 15 minutes to review the results). There are also some other suspicious fragments but it demands some efforts to make it out if they are really errors or not. My current task is not to find as many defects in one project as possible; I just want to make a nice demonstration of benefits static analysis provides and show how to quickly detect some errors through even superficial examination.

The first sample. The analyzer pointed to several errors "V530 - The return value of function 'Foo' is required to be utilized". These warnings are usually generated for inappropriately used functions. Study this code fragment:

/**
* @brief Get the file names on both sides for specified item.
* @note Return empty strings if item is special item.
*/
void CDirView::GetItemFileNames(int sel,
  String& strLeft, String& strRight) const
{
  UINT_PTR diffpos = GetItemKey(sel);
  if (diffpos == (UINT_PTR)SPECIAL_ITEM_POS)
  {
    strLeft.empty();
    strRight.empty();
  }
  else
  {
     ...
  }
}

The function must return two empty strings in a particular case. But due to programmer's inattention, it it's the std::string::empty() functions which are called instead of std::string::clear(). By the way, this error is not so rare as it may seem - I encountered it in many other projects. This error is also present in another WinMerge's function:

/**
* @brief Clear variant's value (reset to defaults).
*/
void VariantValue::Clear()
{
  m_vtype = VT_NULL;
  m_bvalue = false;
  m_ivalue = 0;
  m_fvalue = 0;
  m_svalue.empty();
  m_tvalue = 0;
}

Again we don't get the expected clearing of the string.

And here we have the warning "V501 - There are identical sub-expressions to the left and to the right of the '||' operator":

BUFFERTYPE m_nBufferType[2];
...
// Handle unnamed buffers
if ((m_nBufferType[nBuffer] == BUFFER_UNNAMED) ||
    (m_nBufferType[nBuffer] == BUFFER_UNNAMED))
  nSaveErrorCode = SAVE_NO_FILENAME;

If we review the code nearby, we conclude by analogy that we must have the following lines in our fragment:

(m_nBufferType[0] == BUFFER_UNNAMED) ||
(m_nBufferType[1] == BUFFER_UNNAMED)

If it is not so, still there is some error here.

When various crashes occur, WinMerge tries to report about errors but fails in most cases. By the way, it is a good example of how a code analyzer can detect errors in rarely used program fragments. There are several errors in the code PVS-Studio reports about with the following warning: "V510 - The 'Format' function is not expected to receive class-type variable as 'N' actual argument". Study this code sample:

String GetSysError(int nerr);
...
CString msg;
msg.Format(
_T("Failed to open registry key HKCU/%s:\n\t%d : %s"),
f_RegDir, retVal, GetSysError(retVal));

Everything seems good at first. But the "String" type is actually "std::wstring" and therefore we will have some rubbish printed at best, or an access violation error at worst. It is an object of the "std::wstring" type which is put into the stack instead of a string-pointer. Read the post "Big Brother helps you" where I described this error in detail. The correct code must have a call with c_str():

msg.Format(
_T("Failed to open registry key HKCU/%s:\n\t%d : %s"),
f_RegDir, retVal, GetSysError(retVal).c_str());

Let's go further. Here we have a suspicious code fragment. I don't know if there is really an error, but it is strange that two branches of the "if" operator contain absolutely the same code. The analyzer warns about it with the diagnostic message "V532 - The 'then' statement is equivalent to the 'else' statement". Here is this suspicious code:

if (max < INT_MAX)
{
  for (i = min; i < max; i++)
  {
    if (eptr >= md->end_subject ||
        IS_NEWLINE(eptr))
      break;
    eptr++;
    while (eptr < md->end_subject &&
           (*eptr & 0xc0) == 0x80)
      eptr++;
    }
  }
else
{
  for (i = min; i < max; i++)
  {
    if (eptr >= md->end_subject ||
        IS_NEWLINE(eptr))
      break;
    eptr++;
    while (eptr < md->end_subject &&
           (*eptr & 0xc0) == 0x80)
      eptr++;
    }
  }
}

I feel that "this humming is no accident".

OK, let's study one more sample and get finished with the post. The analyzer found a suspicious loop: "V534 - It is likely that a wrong variable is being compared inside the 'for' operator. Consider reviewing 'i'." This is the source code:

// Get length of translated array of bytes from text.
int Text2BinTranslator::iLengthOfTransToBin(
  char* src, int srclen )
{
  ...
    for (k=i; i<srclen; k++)
    {
      if (src[k]=='>')
        break;
    }
  ...
}

This code is inclined to Access Violation. The loop must continue until the '>' character is found or the string with the length of 'srclen' characters ends. But the programmer used by accident the 'i' variable instead of 'k' for comparison. If the '>' character is not found, the consequences are likely to be bad.

Summary

Don't forget about static analysis. It may often help you find some peculiar issues even in good code. I also invite you to visit our site some time later to try our free general-purpose analyzer when it is ready.

#Cpp