Vladimir Korneychuk

Jan 23 2014

CppCat Static Analyzer Review

Jan 23 2014

Author: Vladimir Korneychuk

Notepad++
False positives (in my opinion)
Conclusions on Notepad++
ZeroMq
Conclusions on ZeroMq
My own project
False positives
- Array overrun
- Cases when the analyzer strangely kept silent
Conclusions on my project
General Conclusions
Final Conclusions

A new static analysis tool for C++ code CppCat was presented just recently. You probably heard a lot about the previous product (PVS-Studio) by the same authors. I was pretty doubtful about it then: on the one hand, static analysis is definitely a must-have methodology - things go better with than without it; on the other hand, PVS-Studio may scare users off with its hugeness, an enterprise-like character and the price, of course. I could imagine a project team of 50 developers buying it but wasn't sure about single developers or small teams of 5 developers. I remember suggesting to the PVS-Studio authors deploying "PVS as a cloud service" and sell access to it by time. But they chose to go their own way and created an abridged version at a relatively small price (which any company or even a single developer can afford).

Unfortunately, we are no longer developing or supporting the CppCat static code analyzer. Please read here for details.

Now let's find out if the game is worth the candle.

The first thing I didn't like right away when installing the tool is incompatibility with Visual Studio 2005\2008. Well, I do understand that the older versions of this IDE employed quite a different API for extensions and perhaps supporting it demanded extra efforts; but the C++ language is far from young and I regular stumble upon numerous projects still living under VS2008 and nobody is going to port them as long as the modification rate is about ten lines per quarter. So it appears that you still need the full-function PVS-Studio to be able to work with legacy code. Well, let it be.

I do like the minimalistic character about the way the tool integrates into Visual Studio: a menu with a couple of commands and one toolbar - no frills. The "Enable Analysis on Build" option is ticked by default. What for? I find it more rational not to slow down every build but run analysis of the whole project - for example, before committing to the repository. You can get a pre-commit hook in svn\git clients to remind you to check the freshly written code with a static analyzer.

For the tests to be fair, I picked three projects:

Notepad++ (which I find an example of poor code)
ZeroMQ (which I find an example of fine code)
One of my old work projects - I wrote it many years ago, and it must have a lot of silly mistakes

I chose Notepad++ and ZeroMQ because I have some experience developing both - just a couple of patches in each, but thanks to that I didn't need to waste time figuring out how to compile and check them (I knew for sure that they could be built under VS2010).

Notepad++

86 files, complete analysis ran in 2 minutes. CppCat generated 48 warnings about suspicious code (this makes an average of 0.55 warnings per file).

A common mistake - a logical 'and' is mixed up with a bitwise 'and' when trying to carry a byte

// V560. A part of conditional expression is always true/false.
ToAscii(wParam,(lParam >> 16) && 0xff,keys,&dwReturnedValue,0);

Checking an unsigned type for being negative

WPARAM wParam
...
if(wParam<0) // V547. Expression is always true/false.

Potential addressing of an array cell with the index '-1'

j=lstrlen(BGHS[SelfIndex].editstring);
BGHS[SelfIndex].editstring[j-1]=0x00; 
// V557. Array overrun is possible.

I'm not sure if this is a mistake at all - perhaps the check for an empty string is done before, but I guess we shouldn't rely on that assumption.

Double assignment. One of the two is obviously redundant

lpcs = &cs;
// V519. The 'x' variable is assigned values twice successively. 
// Perhaps this is a mistake.
lpcs = (LPCREATESTRUCT)lParam;

Mess in conditions

if(MATCH)
{
    return returnvalue+MAX_GRIDS;
}
// V560. A part of conditional expression is always true/false.
if(!MATCH)
{
    return -1;
}

Incorrect check of correctly allocated memory

V668. There is no sense in testing the pointer against null, as the memory was allocated using the 'new' operator. The exception will be generated in the case of memory allocation error.

char *source = new char[docLength];
if (source == NULL) 
    return;

According to my personal statistics, this is the most common bug which can be found in any C++ text (in this particular project, there are 24 instances of it). This mistake dates back to C memory allocation functions that used to return NULL in case of an error. But in C++, when using the new operator to check if there is still available memory left, one should remember to catch the exception std::bad_alloc instead of checking the result for NULL.

Unnecessary paranoia

TCHAR intStr[5];
...
// V600. Consider inspecting the condition. 
// The 'Foo' pointer is always not equal to NULL.
if (!intStr)

If something in the program goes so wrong that the pointer to a locally declared array of five characters turns null - well, it is so much bad that you'd better let the program crash.

Double check of one and the same condition

do
{
...
} while(openFound.success && (styleAt == SCE_H_DOUBLESTRING 
|| styleAt == SCE_H_DOUBLESTRING) && searchStartPoint > 0);

The programmer intended to check for quotes (double and single ones) but actually made a check for double quotes twice. The copy-and-paste method is to blame: the programmer wanted to replace the second check with SCE_H_SINGLESTRING but forgot about that. This is one of the most crucial bugs found - a real bug in the XML parser, not just a "tip on improvement".

False positives (in my opinion)

Two if-blocks with identical conditions on end

//Setup GUI
// V581. The conditional expressions of the 'if' operators 
// situated alongside each other are identical.
if (!_beforeSpecialView.isPostIt) 
{
... 
}

//Set old style if not fullscreen
if (!_beforeSpecialView.isPostIt)
{
...
}

I can't agree with CppCat about this. How can it know why the programmer decided to write it this way? The code is valid and works well. This is definitely not a bug of the "a ! left by mistake" kind because there would have been "else" then. It's just a way of code formatting.

A few bugs "Priority of the '&&' operation is higher than that of the '||' operation"

printPage = (!(_pdlg.Flags & PD_PAGENUMS) ||
          (pageNum >= _pdlg.nFromPage) && (pageNum <= _pdlg.nToPage));

CppCat is very persistent in its belief that programmers are not good at remembering operation priorities. In this particular case, it is obvious from the code's meaning and formatting that the programmer was absolutely aware of what he was doing. "Explicit is better than implicit", of course, but there are no bugs here.

It doesn't occur to the analyzer that seemingly meaningless code may have sense with other compilation switches

if (unicodeSupported)
    ::DispatchMessageW(&msg);
else 
// V523. The 'then' statement is equivalent to the 'else' statement
    ::DispatchMessage(&msg);

We get this warning because there was the "#define DispatchMessage DispatchMessageW" directive a bit earlier in the code. But the matter is really the following: this replacement is enabled by conditional compilation macros. In this case, the code is meaningless indeed, but if you compile the project with other switches, DispatchMessage will point to DispatchMessageA thus making the code meaningful.

I'm just nagging at trifles, of course: it's not fair to demand that a static analyzer should be able to make guesses about other possible compilation options. But the programmer ought to.

Conclusions on Notepad++

Out of 48 bug-messages, I find about 38 really worth considering and making some fixes in the code. Out of these, 30 instances are obviously bugs, while 8 are useful but not crucial stylistic issues. The remaining 10 messages I find to be false positives from the viewpoint of the code logic. Not bad.

ZeroMq

72 files, analyzed in 1 minute.

In fact, only 1 suspicious fragment found - which really turned to be a false positive.

rc = pipe_->write (&probe_msg_);
// zmq_assert (rc) is not applicable here, since it is not a bug.
pipe_->flush ();
// V519. The 'x' variable is assigned values twice successively. 
// Perhaps this is a mistake.
rc = probe_msg_.close ();

The analyzer is upset with the fact that nobody needs the poor rc variable between its first and second assignments. Yes, that's true. But there are also other things to consider:

It may be useful for the debugging purpose to make a breakpoint and check the value.
There is a comment saying that there used to be (or could be) a check of rc. It also says that there is no need in such a check.

The analyzer, of course, is not obliged to read comments, so it cannot know why everything is alright here.

Conclusions on ZeroMq

Not a single useful message from the analyzer. Well, I told you in the beginning that I think highly of this library.

My own project

420 files (in 8 subprojects), analyzed in 9 minutes, 99 warnings generated (which makes an average of 0.23 warnings per file).

A silly mistake using the std::list type

void CHttpDownloaderBase::GetResponseHeader(
  const std::string& strHeaderName,
  std::list<std::string>& listValues) const
{
  // V530. The return value of function 'Foo' 
  // is required to be utilized.
  listValues.empty();
  ...

The mess here is caused by the fact that the empty method of a container type does clear containers in some frameworks. But it's just a check of emptiness in case of std::list. It should be replaced with clear().

Checking a pointer after the first use

const char *xml_attr(xml_t* xml, const char *attr)
{
    ...
    const char *name = xml->name;
// V595. The pointer was utilized before 
// it was verified against nullptr.
    if (! xml || ! xml->attr)
        return NULL;

Undefined behavior

// V567. Undefined behavior. 
// The variable is modified while being used twice 
// between sequence points.
while (*(n = ++s + strspn(s, XML_WS)))
{...}

Operation priorities once again

// V562. It's odd to compare a bool type value with a value of N.
if (!text[++r] == '\\')
{
    break;
}
// V562. It's odd to compare a bool type value with a value of N.
if (!text[++r] == 'u')
{
    break;
}

It's clear: the "!" operator has a higher priority than "=="

Pointer arithmetic

// V532. Consider inspecting the statement of '*pointer++' pattern.
// Probably meant: '(*pointer)++'.
TCHAR *ch = path + lstrlen(path) - 1;  
while (*ch && *ch != '\\')
    *ch--;

CppCat's assumption that I probably wanted to change the value by the pointer is incorrect; I actually wanted to change the pointer itself. However, this message is still helpful - it points to an unnecessary operation of taking a value by pointer,"*". It is not used here, so I could simply write "ch--"

Redundant condition

// V590. Consider inspecting this expression. 
// The expression is excessive or contains a misprint.
while (*p1 != 0 && *p1 == _T(' '))
    p1++;

It's clear: if you compare p1 to a space, the first check is not necessary.

Redundant operation

    m_dwInPartPos = 0;
    m_pcOutData = NULL;
    ...
// V519. The 'x' variable is assigned values twice successively.
    m_dwInPartPos = 0;

Wrong enum.

// V556. The values of different enum types are compared.
if (type == eRT_unk) 
    return false;

One of C++'s greatest troubles (well, at least it used to be until enum classes appeared in the new standard) is the fact that enum is actually a set of numbers, not a separate entity. With the value eRt_unk in one enum and the value eResourceUnknown in the other, I was just lucky they both were 0. The mistake was lurking in the code for many years and it all worked just by sheer luck.

And finally, an inevitable classic of the genre: = instead of == in a comparison

if (scheme == ZLIB_COMPRESSION)
    out.push(boost::iostreams::zlib_compressor());
else if (scheme = GZIP_COMPRESSION)
// V559. Suspicious assignment inside the condition expression 
// of 'if/while/for' operator.
    out.push(boost::iostreams::gzip_compressor());

Well, what can I say - a stupid mistake, no excuse for it.

False positives

Array overrun

int len = lstrlen(szLogDir);
TCHAR ch = szLogDir[len-1]; // V557. Array overrun is possible.

Right, the len variable is not checked for ">0", but the szLogDir string itself is checked for being non-empty a bit earlier in the code. The second check won't make it more secure.

Cases when the analyzer strangely kept silent

I found the following fragment in my code:

if (m_packets[i] != NULL)
    delete m_packets[i];

CppCat didn't react to it in any way, although, as I recall, PSV-Studio used to say in such cases that removing NULL is safe.

Conclusions on my project

Out of 99 warnings, about 65 were relevant; the ratio of bugs and simple tips on improvement is about 50/50.

General Conclusions

Pros

Intuitiveness
Price
Can detect genuine bugs

Cons

Absent integration with VS2005\VS2008
Some percent of false positives

What could be a con, but in fact is not!

Absent support for 64-bit checks

I could never understand why the PVS-Studio developers had always been so much focused on this feature. 32-bit versions of programs run well under Windows x64, so the question of creating 64-bit versions arises most often either when one needs to write a driver or when the program needs more than 3 Gbytes of RAM. According to my statistics, it's about 10-15% of all projects.

Absent integration with MSBuild and the like

I find it logical to check code manually before committing it to the repository - not on each build (too slow) and not on a build-server (lack of interactiveness) - but just in this particular way. It doesn't take much time and you don't risk feeling ashamed before your coworkers. Convenient, isn't it?

What could be a pro, but in fact is not!

Asceticism in the settings

CppCat looks too ascetic. It lacks some flexibility: you are allowed to exclude files, folders or a particular line in a file from analysis, but what about excluding a part of a file? What about disabling the check for a certain bug or a class of bugs? Especially if not on the global scale, but for some files exclusively? Either it is impossible or I was inattentive while reading the documentation.

Suggestions

Suggestion No. 1

It would be great to be able to store the information about warnings excluded from analysis in an external file instead of comments inside the code, so that it could be excluded from the commit. Not all developers working on the project can use the same static analysis tool, so stumbling upon comments like "//-V:is_test_ok&=:501" and wondering at their meaning may be irritating.

Suggestion No. 2

It would be nice to add the "Copy" command into the context menu of warnings and support the Ctrl+V hotkey. It would make it very convenient to copy-and-paste the texts of warnings into the comment to the commit. You can open the documentation and copy the heading from there, of course - but the text there is too general, while the analysis report contains particular code lines and variable names, which is more useful.

Final Conclusions

What is good about CppCat is how simple it is to evaluate if you need to buy it. Let's take an average salary of an average C++ programmer in vacuum from a recent review at Habrahabr - 80000 rubles ($2370). It means that 1 hour of his or her work costs about 14 dollars. I think you will agree that finding and fixing a bug like those described above (especially when you don't know for sure what you are looking for) takes at least one hour. CppCat costs $250. It's corresponds to 17 hours of work. If you are planning to spend more than 17 hours on bugfixing in your project (well, are there projects where they plan less?), then the purchase is worth its money.

So, thanks to the CppCat authors for taking care of small projects and single developers. I hope to see my suggestions and VS2008 integration implemented in the future versions.