>
>
>
Documenting Bugs in Doxygen

Igor Shtukarev
Articles: 3

Documenting Bugs in Doxygen

In this article, we will speak about the static analysis of the doxygen documentation generator tool. This popular and widely used project, which, as its authors claim, not without reason, has become "the de facto standard tool for generating documentation from annotated C++ sources", has never been scanned by PVS-Studio before. Doxygen scans the program source code and generates the documentation relying on it. Now it's time for us to peep into its source files and see if PVS-Studio can find any interesting bugs there.

Introduction

Doxygen is a crossplatform documentation generator tool for writing software reference documentation, supporting multiple programming languages: C++, C, Objective-C, Python, Java, C#, PHP, IDL, Fortran, VHDL, and to some extent D. Doxygen extracts documentation directly from annotated sources and can be also configured to extract the code structure from undocumented source files. The tool supports the HTML, LATEX, man, rtf, and xml formats as its output. Doxygen is used in the projects KDE, Mozilla, Drupal, Pidgin, AbiWorld, FOX toolkit, Torque Game Engine, and Crystal Space.

Preparing for and running the analysis

The latest doxygen source files can be downloaded from github.com/doxygen/doxygen. The repository doesn't originally contain the Visual Studio project files, but since the developers use cmake, you can easily generate them by yourself. I used the program's console version and the "cmake -G "Visual Studio 12"" command to generate a VS 2013 project file. To start the analysis, you just need to click on the Check Solution button in the PVS-Studio tab in Visual Studio.

Discussing diagnostic messages

Before we start talking about the diagnostic messages (warnings) themselves, I'd like to draw your attention to doxygen's coding style. For some reason, the programmer would very often try to fit the code in one line, neglecting spaces between variables and operators, which made the code much less comprehensible. Some fragments had really strange formatting. I had to format some of the code samples to fit them in the article. That's been said, let's go on to see what interesting bugs PVS-Studio has managed to find in doxygen.

PVS-Studio's diagnostic message: V519 The '* outListType1' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 8326, 8327. util.cpp 8327

void convertProtectionLevel(MemberListType inListType,
                            int *outListType1,
                            int *outListType2)
{
  static bool extractPrivate;
  ....
  switch (inListType)
  {
  ....
  case MemberListType_priSlots:
    if (extractPrivate)
    {
      *outListType1=MemberListType_pubSlots;
      *outListType1=MemberListType_proSlots;      <<<<====
    }
    else
    {
      *outListType1=-1;
      *outListType2=-1;
    }
    break;
  ....
  }
}

In the if statement body, one and the same variable is assigned two values on end. This is surely either a typo or an unfixed copy-pasted line. The else block suggests that the "MemberListType_proSlots" value must be written into "*outListType2". Another error of this kind can be found here: doxygen.cpp 5742 (see the variable 'da->type').

The next warning: V519 The 'pageTitle' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 970, 971. vhdldocgen.cpp 971

QCString VhdlDocGen::getClassTitle(const ClassDef *cd)
{
  QCString pageTitle;
  if (cd == 0) 
    return "";
  pageTitle += cd->displayName();
  pageTitle = VhdlDocGen::getClassName(cd);
  ....
}

Note the assignment operation. This is most likely a typo, and "+=" should be used instead of "=". Speaking about the coding style, there were no spaces between the operators and values in the source code, which made it much harder to read. And that, in its turn, left much more chances for an error to appear as you can't easily spot a missing "+" in an uninterrupted stream of characters. Adding the spaces makes the bug more visible. Another similar error is hidden in the following line:

V519 The 'nn' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 2166, 2167. vhdldocgen.cpp 2167

Passing on to the next message.

PVS-Studio's diagnostic message: V523 The 'then' statement is equivalent to the 'else' statement. docparser.cpp 521

static void checkUndocumentedParams()
{
  ....
  if (g_memberDef->inheritsDocsFrom())
  {
    warn_doc_error(g_memberDef->getDefFileName(),
                   g_memberDef->getDefLine(),
                   substitute(errMsg,"%","%%"));
   }
  else
  {
    warn_doc_error(g_memberDef->getDefFileName(),
                   g_memberDef->getDefLine(),
                   substitute(errMsg,"%","%%"));
  }
  ....
}

The copy-paste programming technique can not only help you save time on writing the code but bring some bugs into it as well. In the sample above, a code line was copied from the if block into the else block but wasn't fixed after the insertion. Every time you use copy-paste, please remember to stick to the rule "Copy once, check thrice".

PVS-Studio's diagnostic message: V523 The 'then' statement is equivalent to the 'else' statement. translator_tw.h 769

class TranslatorChinesetraditional : public Translator
{
public:
  ....
  virtual QCString trGeneratedFromFiles(bool single, ....)
  { 
  ....
  QCString result=(QCString)"?";
  ....
  if (single) result+=":"; else result+=":";
  ....
  }
....
}

Here's another issue similar to the previous one. In the if block, regardless of the condition, one and the same character is added to the result string. I strongly doubt that's what the programmer really intended because the condition itself would otherwise have been meaningless. Again, had this block been split into 4 lines, following the common style, it would not only have looked much neater but made the typo more prominent, too. Interestingly, this construct was copied two times more for further use in functions, the programmer never noticing the bug. So, we've got two more warnings of this kind:

  • V523 The 'then' statement is equivalent to the 'else' statement. translator_tw.h 1956
  • V523 The 'then' statement is equivalent to the 'else' statement. translator_tw.h 1965

PVS-Studio's diagnostic message: V530 The return value of function 'toupper' is required to be utilized. classdef.cpp 1963

void ClassDef::writeDocumentationContents(....)
{
  QCString pageType = " ";
  pageType += compoundTypeString();
  toupper(pageType.at(1));
  ....
}

In this sample, the programmer misunderstood the principle of the toupper function. Perhaps he or she was expecting the function to change the character passed into it to a capital letter. But the function doesn't actually change the character argument, it only returns its capital version. This is how the toupper function is declared in the "ctype.h" header:

int toupper (int __c);

As you can see from the declaration, the argument is received by value, therefore the character passed into the function can't be changed. To avoid errors like this, carefully read the description of the functions you use if you are not much sure about their behavior.

PVS-Studio's diagnostic message: V560 A part of conditional expression is always false: (flags() &!0x0008). qfile_win32.cpp 267

#define IO_Truncate    0x0008  

bool QFile::open(....)
{
  ....
  int length = INT_MAX;
  if ((flags() & !IO_Truncate) && length == 0 && isReadable())
  ....
}

This condition will always be false because inversion of a non-zero value always results in zero. The logical "AND" used after that makes no sense when one of its arguments is zero. As a result, the condition doesn't depend on other parameters. It would be more logical to use the bitwise inversion operator '~' here.

PVS-Studio's diagnostic message: V560 A part of conditional expression is always true: !found. util.cpp 4264

bool getDefs(....)
{
  ....
  bool found=FALSE;
  MemberListIterator mmli(*mn);
  MemberDef *mmd;
  for (mmli.toFirst();((mmd=mmli.current()) && !found);++mmli)
  {
    ....
  }
  ....
}

I'll tell you right off that the found variable doesn't change in the body of the for loop. Because of that, the loop termination condition depends solely on the mmli.current method's result. What's dangerous about this error is that the loop will run from beginning to end all the time regardless of whether or not the required value has been found.

PVS-Studio's diagnostic message: V595 The 'bfd' pointer was utilized before it was verified against nullptr. Check lines: 3371, 3384. dot.cpp 3371

void DotInclDepGraph::buildGraph(....)
{
  ....
  FileDef *bfd = ii->fileDef;
  QCString url="";
  ....
  url=bfd->getSourceFileBase();
  ....
  if (bfd)
  ....    
}

V595 is probably the most frequent warning among all the projects we check. It's just that you don't always think before using a pointer if it can be null, and only remember to make a check after using it a couple of times. But there may be a large bulk of code between the check and the first time the pointer is dereferenced, which makes the error pretty hard to detect. Other warnings of this kind:

  • V595 The 'cd' pointer was utilized before it was verified against nullptr. Check lines: 6123, 6131. doxygen.cpp 6123
  • V595 The 'p' pointer was utilized before it was verified against nullptr. Check lines: 1069, 1070. htmldocvisitor.cpp 1069
  • V595 The 'Doxygen::mainPage' pointer was utilized before it was verified against nullptr. Check lines: 3792, 3798. index.cpp 3792
  • V595 The 'firstMd' pointer was utilized before it was verified against nullptr. Check lines: 80, 93. membergroup.cpp 80
  • V595 The 'lastCompound' pointer was utilized before it was verified against nullptr. Check lines: 410, 420. vhdljjparser.cpp 410
  • V595 The 'len' pointer was utilized before it was verified against nullptr. Check lines: 11960, 11969. qstring.cpp 11960
  • V595 The 'len' pointer was utilized before it was verified against nullptr. Check lines: 11979, 11988. qstring.cpp 11979
  • V595 The 'fd' pointer was utilized before it was verified against nullptr. Check lines: 2077, 2085. doxygen.cpp 2077

PVS-Studio's diagnostic message: V595 The 'lne' pointer was utilized before it was verified against nullptr. Check lines: 4078, 4089. index.cpp 4078

static void writeIndexHierarchyEntries(OutputList &ol, ....)
{
  QListIterator<LayoutNavEntry> li(entries);
  LayoutNavEntry *lne;
  for (li.toFirst();(lne=li.current());++li)
  {
    LayoutNavEntry::Kind kind = lne->kind();
    ....
    bool addToIndex=lne==0 || lne->visible();
    ....
  }
}

I don't usually describe similar warnings because it feels boring. But today I want to discuss one more instance of the V595 message. This time, the loop is only entered if the returned value li.current() (assigned to the Ine pointer) is not equal to NULL. It means that the pointer is guaranteed to be non-null when used inside the loop, which makes the check just not necessary. I felt I should mention this example because the V595 warning generally deals with potential null pointer dereferencing operations, while in this particular case, it revealed an excessive check.

PVS-Studio's diagnostic message: V601 The bool type is implicitly cast to the class type. docsets.cpp 473

struct IncludeInfo
{
  ....
  bool local;
};

void DocSets::addIndexItem(Definition *context,MemberDef *md,
                           const char *,const char *)
{
  QCString decl;
  ....
  IncludeInfo *ii = cd->includeInfo();
  ....
  decl=ii->local;
  ....
}

The analyzer has noticed a strange conversion of bool to the class type. The QCString class lacks an overloaded assignment operator for a bool argument but it does have a constructor with the input parameter of the int type denoting the string length. It is this constructor which is called to create a temporary object when executing this assignment. The compiler will find the constructor with the int-argument and call it, the bool type cast to int in advance. The local variable can only have 2 values: true or false, which corresponds to 1 and 0. The constructor will create a one-character string in the first case and an empty string in the second. In the end, the assignment operator with the argument of the CQString type will be called. A similar yet less evident conversion takes place in the following fragments:

  • V601 The bool type is implicitly cast to the class type. Inspect the fifth argument. context.cpp 2315
  • V601 The bool type is implicitly cast to the class type. Inspect the fifth argument. context.cpp 2675
  • V601 The bool type is implicitly cast to the class type. Inspect the fifth argument. context.cpp 4456

PVS-Studio's diagnostic message: V614 Potentially uninitialized pointer 't' used. vhdlparser.cc 4127

QCString VhdlParser::extended_identifier()
{
  Token *t;
  if (!hasError)
    t = jj_consume_token(EXTENDED_CHARACTER);
  return t->image.c_str();
  assert(false);
}

In this code fragment, an uninitialized pointer may be dereferenced. The original code is poorly formatted, which only makes this bug less visible. I have formatted this code for the article, and it has become much more prominent. Two more bugs of this kind can be found in the following lines:

  • V614 Potentially uninitialized pointer 'tmpEntry' used. vhdlparser.cc 4451
  • V614 Potentially uninitialized pointer 't' used. vhdlparser.cc 5304

PVS-Studio's diagnostic message: V668 There is no sense in testing the 'file' pointer against null, as the memory was allocated using the 'new' operator. The exception will be generated in the case of memory allocation error. outputgen.cpp 47

void OutputGenerator::startPlainFile(const char *name)
{
  ....
  file = new QFile(fileName);
  if (!file)
  ....
}

It's no secret for anyone nowadays that the new operator throws an exception instead of returning nullptr when it fails to allocate memory. The code sample above is kind of a relic from the programming past. Checks like those don't make any sense for modern compilers anymore and can be removed. 3 more checks of this kind:

  • V668 There is no sense in testing the 'expr' pointer against null, as the memory was allocated using the 'new' operator. The exception will be generated in the case of memory allocation error. template.cpp 1981
  • V668 There is no sense in testing the 'n' pointer against null, as the memory was allocated using the 'new' operator. The exception will be generated in the case of memory allocation error. qglist.cpp 1005
  • V668 There is no sense in testing the 'nd' pointer against null, as the memory was allocated using the 'new' operator. The exception will be generated in the case of memory allocation error. qstring.cpp 12099

PVS-Studio's diagnostic message: V701 realloc() possible leak: when realloc() fails in allocating memory, original pointer 'd' is lost. Consider assigning realloc() to a temporary pointer. qcstring.h 396

class BufStr 
{
public:
  ....
  void resize(uint newlen)
  {
    ....
    m_buf = (char *)realloc(m_buf,m_size);
    ....
  }
private:
  uint m_size;
  char *m_buf;
  ....
}

The analyzer has detected an incorrect use of the "realloc". When failed to allocate memory, "realloc" will return nullptr, rewriting the previous pointer value. To avoid this, we recommend storing the pointer value in a temporary variable before using "realloc". In addition to this one, the analyzer detected a total of 8 similar potential memory leaks:

  • V701 realloc() possible leak: when realloc() fails in allocating memory, original pointer 'd' is lost. Consider assigning realloc() to a temporary pointer. qcstring.h 396
  • V701 realloc() possible leak: when realloc() fails in allocating memory, original pointer 'str' is lost. Consider assigning realloc() to a temporary pointer. growbuf.h 16
  • V701 realloc() possible leak: when realloc() fails in allocating memory, original pointer 'str' is lost. Consider assigning realloc() to a temporary pointer. growbuf.h 23
  • V701 realloc() possible leak: when realloc() fails in allocating memory, original pointer 'str' is lost. Consider assigning realloc() to a temporary pointer. growbuf.h 33
  • V701 realloc() possible leak: when realloc() fails in allocating memory, original pointer 'm_str' is lost. Consider assigning realloc() to a temporary pointer. vhdlstring.h 61
  • V701 realloc() possible leak: when realloc() fails in allocating memory, original pointer 'shd->data' is lost. Consider assigning realloc() to a temporary pointer. qgarray.cpp 224
  • V701 realloc() possible leak: when realloc() fails in allocating memory, original pointer 'm_data' is lost. Consider assigning realloc() to a temporary pointer. qgstring.cpp 114
  • V701 realloc() possible leak: when realloc() fails in allocating memory, original pointer 'm_data' is lost. Consider assigning realloc() to a temporary pointer. qgstring.cpp 145

Conclusion

To sum it up, I'd say the analyzer has done very well. Despite doxygen being a popular and widely used (by both small and large companies) tool, PVS-Studio still has managed to find lots of suspicious fragments in it. I have only discussed the most basic warnings and skipped such dull defects as excessive checks, unused variables, and the like. As I already said in the beginning, I was surprised by the, as I believe, quite careless code formatting in certain fragments.

I wish you neat, clear code and as few bugs as possible. While the former depends solely on the programmer, the analyzer will help you with the latter. You can download and try PVS-Studio from here: http://www.viva64.com/en/pvs-studio/download/