Webinar: Parsing C++ - 10.10
A new static analysis tool for C++ code CppCat was presented just recently. You probably heard a lot about the previous product (PVS-Studio) by the same authors. I was pretty doubtful about it then: on the one hand, static analysis is definitely a must-have methodology - things go better with than without it; on the other hand, PVS-Studio may scare users off with its hugeness, an enterprise-like character and the price, of course. I could imagine a project team of 50 developers buying it but wasn't sure about single developers or small teams of 5 developers. I remember suggesting to the PVS-Studio authors deploying "PVS as a cloud service" and sell access to it by time. But they chose to go their own way and created an abridged version at a relatively small price (which any company or even a single developer can afford).
Unfortunately, we are no longer developing or supporting the CppCat static code analyzer. Please read here for details.
Now let's find out if the game is worth the candle.
The first thing I didn't like right away when installing the tool is incompatibility with Visual Studio 2005\2008. Well, I do understand that the older versions of this IDE employed quite a different API for extensions and perhaps supporting it demanded extra efforts; but the C++ language is far from young and I regular stumble upon numerous projects still living under VS2008 and nobody is going to port them as long as the modification rate is about ten lines per quarter. So it appears that you still need the full-function PVS-Studio to be able to work with legacy code. Well, let it be.
I do like the minimalistic character about the way the tool integrates into Visual Studio: a menu with a couple of commands and one toolbar - no frills. The "Enable Analysis on Build" option is ticked by default. What for? I find it more rational not to slow down every build but run analysis of the whole project - for example, before committing to the repository. You can get a pre-commit hook in svn\git clients to remind you to check the freshly written code with a static analyzer.
For the tests to be fair, I picked three projects:
I chose Notepad++ and ZeroMQ because I have some experience developing both - just a couple of patches in each, but thanks to that I didn't need to waste time figuring out how to compile and check them (I knew for sure that they could be built under VS2010).
86 files, complete analysis ran in 2 minutes. CppCat generated 48 warnings about suspicious code (this makes an average of 0.55 warnings per file).
// V560. A part of conditional expression is always true/false.
ToAscii(wParam,(lParam >> 16) && 0xff,keys,&dwReturnedValue,0);
WPARAM wParam
...
if(wParam<0) // V547. Expression is always true/false.
j=lstrlen(BGHS[SelfIndex].editstring);
BGHS[SelfIndex].editstring[j-1]=0x00;
// V557. Array overrun is possible.
I'm not sure if this is a mistake at all - perhaps the check for an empty string is done before, but I guess we shouldn't rely on that assumption.
lpcs = &cs;
// V519. The 'x' variable is assigned values twice successively.
// Perhaps this is a mistake.
lpcs = (LPCREATESTRUCT)lParam;
if(MATCH)
{
return returnvalue+MAX_GRIDS;
}
// V560. A part of conditional expression is always true/false.
if(!MATCH)
{
return -1;
}
V668. There is no sense in testing the pointer against null, as the memory was allocated using the 'new' operator. The exception will be generated in the case of memory allocation error.
char *source = new char[docLength];
if (source == NULL)
return;
According to my personal statistics, this is the most common bug which can be found in any C++ text (in this particular project, there are 24 instances of it). This mistake dates back to C memory allocation functions that used to return NULL in case of an error. But in C++, when using the new operator to check if there is still available memory left, one should remember to catch the exception std::bad_alloc instead of checking the result for NULL.
TCHAR intStr[5];
...
// V600. Consider inspecting the condition.
// The 'Foo' pointer is always not equal to NULL.
if (!intStr)
If something in the program goes so wrong that the pointer to a locally declared array of five characters turns null - well, it is so much bad that you'd better let the program crash.
do
{
...
} while(openFound.success && (styleAt == SCE_H_DOUBLESTRING
|| styleAt == SCE_H_DOUBLESTRING) && searchStartPoint > 0);
The programmer intended to check for quotes (double and single ones) but actually made a check for double quotes twice. The copy-and-paste method is to blame: the programmer wanted to replace the second check with SCE_H_SINGLESTRING but forgot about that. This is one of the most crucial bugs found - a real bug in the XML parser, not just a "tip on improvement".
//Setup GUI
// V581. The conditional expressions of the 'if' operators
// situated alongside each other are identical.
if (!_beforeSpecialView.isPostIt)
{
...
}
//Set old style if not fullscreen
if (!_beforeSpecialView.isPostIt)
{
...
}
I can't agree with CppCat about this. How can it know why the programmer decided to write it this way? The code is valid and works well. This is definitely not a bug of the "a ! left by mistake" kind because there would have been "else" then. It's just a way of code formatting.
printPage = (!(_pdlg.Flags & PD_PAGENUMS) ||
(pageNum >= _pdlg.nFromPage) && (pageNum <= _pdlg.nToPage));
CppCat is very persistent in its belief that programmers are not good at remembering operation priorities. In this particular case, it is obvious from the code's meaning and formatting that the programmer was absolutely aware of what he was doing. "Explicit is better than implicit", of course, but there are no bugs here.
if (unicodeSupported)
::DispatchMessageW(&msg);
else
// V523. The 'then' statement is equivalent to the 'else' statement
::DispatchMessage(&msg);
We get this warning because there was the "#define DispatchMessage DispatchMessageW" directive a bit earlier in the code. But the matter is really the following: this replacement is enabled by conditional compilation macros. In this case, the code is meaningless indeed, but if you compile the project with other switches, DispatchMessage will point to DispatchMessageA thus making the code meaningful.
I'm just nagging at trifles, of course: it's not fair to demand that a static analyzer should be able to make guesses about other possible compilation options. But the programmer ought to.
Out of 48 bug-messages, I find about 38 really worth considering and making some fixes in the code. Out of these, 30 instances are obviously bugs, while 8 are useful but not crucial stylistic issues. The remaining 10 messages I find to be false positives from the viewpoint of the code logic. Not bad.
72 files, analyzed in 1 minute.
In fact, only 1 suspicious fragment found - which really turned to be a false positive.
rc = pipe_->write (&probe_msg_);
// zmq_assert (rc) is not applicable here, since it is not a bug.
pipe_->flush ();
// V519. The 'x' variable is assigned values twice successively.
// Perhaps this is a mistake.
rc = probe_msg_.close ();
The analyzer is upset with the fact that nobody needs the poor rc variable between its first and second assignments. Yes, that's true. But there are also other things to consider:
The analyzer, of course, is not obliged to read comments, so it cannot know why everything is alright here.
Not a single useful message from the analyzer. Well, I told you in the beginning that I think highly of this library.
420 files (in 8 subprojects), analyzed in 9 minutes, 99 warnings generated (which makes an average of 0.23 warnings per file).
void CHttpDownloaderBase::GetResponseHeader(
const std::string& strHeaderName,
std::list<std::string>& listValues) const
{
// V530. The return value of function 'Foo'
// is required to be utilized.
listValues.empty();
...
The mess here is caused by the fact that the empty method of a container type does clear containers in some frameworks. But it's just a check of emptiness in case of std::list. It should be replaced with clear().
const char *xml_attr(xml_t* xml, const char *attr)
{
...
const char *name = xml->name;
// V595. The pointer was utilized before
// it was verified against nullptr.
if (! xml || ! xml->attr)
return NULL;
// V567. Undefined behavior.
// The variable is modified while being used twice
// between sequence points.
while (*(n = ++s + strspn(s, XML_WS)))
{...}
// V562. It's odd to compare a bool type value with a value of N.
if (!text[++r] == '\\')
{
break;
}
// V562. It's odd to compare a bool type value with a value of N.
if (!text[++r] == 'u')
{
break;
}
It's clear: the "!" operator has a higher priority than "=="
// V532. Consider inspecting the statement of '*pointer++' pattern.
// Probably meant: '(*pointer)++'.
TCHAR *ch = path + lstrlen(path) - 1;
while (*ch && *ch != '\\')
*ch--;
CppCat's assumption that I probably wanted to change the value by the pointer is incorrect; I actually wanted to change the pointer itself. However, this message is still helpful - it points to an unnecessary operation of taking a value by pointer,"*". It is not used here, so I could simply write "ch--"
// V590. Consider inspecting this expression.
// The expression is excessive or contains a misprint.
while (*p1 != 0 && *p1 == _T(' '))
p1++;
It's clear: if you compare p1 to a space, the first check is not necessary.
m_dwInPartPos = 0;
m_pcOutData = NULL;
...
// V519. The 'x' variable is assigned values twice successively.
m_dwInPartPos = 0;
Wrong enum.
// V556. The values of different enum types are compared.
if (type == eRT_unk)
return false;
One of C++'s greatest troubles (well, at least it used to be until enum classes appeared in the new standard) is the fact that enum is actually a set of numbers, not a separate entity. With the value eRt_unk in one enum and the value eResourceUnknown in the other, I was just lucky they both were 0. The mistake was lurking in the code for many years and it all worked just by sheer luck.
if (scheme == ZLIB_COMPRESSION)
out.push(boost::iostreams::zlib_compressor());
else if (scheme = GZIP_COMPRESSION)
// V559. Suspicious assignment inside the condition expression
// of 'if/while/for' operator.
out.push(boost::iostreams::gzip_compressor());
Well, what can I say - a stupid mistake, no excuse for it.
int len = lstrlen(szLogDir);
TCHAR ch = szLogDir[len-1]; // V557. Array overrun is possible.
Right, the len variable is not checked for ">0", but the szLogDir string itself is checked for being non-empty a bit earlier in the code. The second check won't make it more secure.
I found the following fragment in my code:
if (m_packets[i] != NULL)
delete m_packets[i];
CppCat didn't react to it in any way, although, as I recall, PSV-Studio used to say in such cases that removing NULL is safe.
Out of 99 warnings, about 65 were relevant; the ratio of bugs and simple tips on improvement is about 50/50.
I could never understand why the PVS-Studio developers had always been so much focused on this feature. 32-bit versions of programs run well under Windows x64, so the question of creating 64-bit versions arises most often either when one needs to write a driver or when the program needs more than 3 Gbytes of RAM. According to my statistics, it's about 10-15% of all projects.
I find it logical to check code manually before committing it to the repository - not on each build (too slow) and not on a build-server (lack of interactiveness) - but just in this particular way. It doesn't take much time and you don't risk feeling ashamed before your coworkers. Convenient, isn't it?
CppCat looks too ascetic. It lacks some flexibility: you are allowed to exclude files, folders or a particular line in a file from analysis, but what about excluding a part of a file? What about disabling the check for a certain bug or a class of bugs? Especially if not on the global scale, but for some files exclusively? Either it is impossible or I was inattentive while reading the documentation.
It would be great to be able to store the information about warnings excluded from analysis in an external file instead of comments inside the code, so that it could be excluded from the commit. Not all developers working on the project can use the same static analysis tool, so stumbling upon comments like "//-V:is_test_ok&=:501" and wondering at their meaning may be irritating.
It would be nice to add the "Copy" command into the context menu of warnings and support the Ctrl+V hotkey. It would make it very convenient to copy-and-paste the texts of warnings into the comment to the commit. You can open the documentation and copy the heading from there, of course - but the text there is too general, while the analysis report contains particular code lines and variable names, which is more useful.
What is good about CppCat is how simple it is to evaluate if you need to buy it. Let's take an average salary of an average C++ programmer in vacuum from a recent review at Habrahabr - 80000 rubles ($2370). It means that 1 hour of his or her work costs about 14 dollars. I think you will agree that finding and fixing a bug like those described above (especially when you don't know for sure what you are looking for) takes at least one hour. CppCat costs $250. It's corresponds to 17 hours of work. If you are planning to spend more than 17 hours on bugfixing in your project (well, are there projects where they plan less?), then the purchase is worth its money.
So, thanks to the CppCat authors for taking care of small projects and single developers. I hope to see my suggestions and VS2008 integration implemented in the future versions.
0