Webinar: Evaluation - 05.12
We gave the TortoiseSVN developers a free registration key for some time so that they could check their project. While they haven't utilized it yet, I've decided to download the TortoiseSVN source codes and check it myself. My interest is obvious: I want to make another article to advertise PVS-Studio.
We already checked the TortoiseSVN project long ago. It was done at the same time as PVS-Studio 4.00 was released, which for the first time included diagnostic rules for general analysis.
We recheck some projects from time to time to demonstrate how useful it is to use the tool regularly. It's pointless to check a project just a couple of times: new bugs are constantly being added into a live code, and you then spend a lot of time and nerves to fix them. Consequently, you'll gain the largest benefit from static analysis when using PVS-Studio daily, or, what's even better, when using incremental analysis.
So, let's take a look at some interesting issues PVS-Studio 5.05 has managed to find in the project. The TortoiseSVN source codes were downloaded June 19, 2013, from tortoisesvn.googlecode.com/svn/trunk. The TortoiseSVN project is by the way a very high-quality one and has a huge base of users, programmers. That's why finding at least a few defects is a great achievement.
static void ColouriseA68kDoc (....)
{
if (((sc.state == SCE_A68K_NUMBER_DEC) && isdigit(sc.ch))
....
|| ((sc.state == SCE_A68K_MACRO_ARG) && isdigit(sc.ch))
|| ((sc.state == SCE_A68K_MACRO_ARG) && isdigit(sc.ch))
....
}
PVS-Studio's diagnostic message: V501 There are identical sub-expressions '((sc.state == 11) && isdigit(sc.ch))' to the left and to the right of the '||' operator. lexa68k.cxx 160
Two identical comparisons are present, which is perhaps caused by a misprint.
The following code also seems to have a misprint: the 'rv' variable's value is being checked twice.
struct hentry * AffixMgr::compound_check(
....
if (rv && forceucase && (rv) && ....)
....
}
PVS-Studio's diagnostic message: V501 There are identical sub-expressions to the left and to the right of the '&&' operator: rv && forceucase && (rv):
One more code fragment with an incorrect comparison:
int main(int argc, char **argv)
{
....
DWORD ticks;
....
if (run_timers(now, &next)) {
ticks = next - GETTICKCOUNT();
if (ticks < 0) ticks = 0;
} else {
ticks = INFINITE;
}
....
}
PVS-Studio's diagnostic message: V547 Expression 'ticks < 0' is always false. Unsigned type value is never < 0. winplink.c 635
The variable 'ticks' is unsigned, which means that the check "if (ticks < 0)" is pointless: the issue with an overflow won't be handled.
Consider a bug that causes the function 'strncmp' to compare strings only partially.
int AffixMgr::parse_convtable(...., const char * keyword)
{
char * piece;
....
if (strncmp(piece, keyword, sizeof(keyword)) != 0) {
....
}
PVS-Studio's diagnostic message: V579 The strncmp function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the third argument. affixmgr.cxx 3654
The 'sizeof' operator calculates the pointer size. This value is not related in any way to the string length.
Variable argument functions can be found everywhere and are always dangerous.
class CTSVNPath
{
....
private:
mutable CString m_sBackslashPath;
mutable CString m_sLongBackslashPath;
mutable CString m_sFwdslashPath;
....
};
const FileStatusCacheEntry * SVNFolderStatus::BuildCache(
const CTSVNPath& filepath, ....)
{
....
CTraceToOutputDebugString::Instance() (_T(__FUNCTION__)
_T(": building cache for %s\n"), filepath);
....
}
PVS-Studio's diagnostic message: V510 The 'operator()' function is not expected to receive class-type variable as second actual argument:
The "%s" specifier specifies that the function is waiting for a string to be passed to it as an actual argument. But the 'filepath' variable is not a string at all, but a complex object consisting of a number of strings. I cannot say for sure what will be printed and whether the code won't crash at all.
It's not safe to use such functions as 'printf()' in the following way: "printf(myStr);". If there are control specifiers inside 'myStr', the program may print what it shouldn't or crash.
Look at the following code fragment from TortoiseSVN:
BOOL CPOFile::ParseFile(....)
{
....
printf(File.getloc().name().c_str());
....
}
PVS-Studio's diagnostic message: V618 It's dangerous to call the 'printf' function in such a manner, as the line being passed could contain format specification. The example of the safe code: printf("%s", str); pofile.cpp 158
If the file name is "myfile%s%i%s.txt", that will be a fail.
Note. We have an interesting article telling how dangerous it is to use the printf() function.
I don't know for sure if it is dangerous to leave the buffers' content without clearing them in case of ToroiseSVN. Perhaps it's absolutely safe. But the code does contain a fragment intended to clear the buffers. Since it doesn't work, I suppose I should mention it. Here are the bugs:
static void sha_mpint(SHA_State * s, Bignum b)
{
unsigned char lenbuf[4];
....
memset(lenbuf, 0, sizeof(lenbuf));
}
PVS-Studio's diagnostic message: V597 The compiler could delete the 'memset' function call, which is used to flush 'lenbuf' buffer. The RtlSecureZeroMemory() function should be used to erase the private data. sshdss.c 23
The array 'lenbuf' should be cleared before leaving the function. Since the array is not used after that any more, the compiler optimization will remove the call of the 'memset' function. To avoid this you need to use special functions.
Here are other fragments where the compiler will remove 'memset()' function calls:
BOOL InitInstance(HINSTANCE hResource, int nCmdShow)
{
....
app.hwndTT; // handle to the ToolTip control
....
}
PVS-Studio's diagnostic message: V607 Ownerless expression 'app.hwndTT'. tortoiseblame.cpp 1782
The 'hwndTT' member of the function 'InitInstance()' should probably be initialized with some value, but the code appeared to be incomplete due to a misprint.
My search for bugs is quite superficial. My attention is focused just to the extent necessary to notice enough examples of bugs to serve as the basis for an article. No, I'm not a baddie. It's just that an analysis of the product performed by its authors will undoubtedly be more thorough than mine.
Even more superficial is my examination of 64-bit bugs. It's very difficult to judge whether or not a certain error will occur without knowing the project structure.
So, I'll show you only a couple of unsafe fragments:
void LoginDialog::CreateModule(void)
{
....
DialogBoxParam(g_hmodThisDll, MAKEINTRESOURCE(IDD_LOGIN),
g_hwndMain, (DLGPROC)(LoginDialogProc),
(long)this);
....
}
PVS-Studio's diagnostic message: V220 Suspicious sequence of types castings: memsize -> 32-bit integer -> memsize. The value being casted: 'this'. logindialog.cpp 105
The pointer 'this' is explicitly cast to the 'long' type. It is then inexplicitly extended to the LPARAM (LONG_PTR) type. The important thing is that the pointer for some time turns into 'long', which is very bad when your program is 64-bit. The pointer size is 64 bits, while the 'long' type in Win64 is still a 32-bit type. This causes the high-order bits of a 64-bit variable to get lost.
If an object is created outside the low-order 4 Gbytes of memory, the program will start undefined behavior. Though the probability of this event is not high, the bug is very difficult to reproduce.
This is the fixed code: DialogBoxParam(...., (LPARAM)this);
Consider one more dangerous type conversion:
static int cmpforsearch(void *av, void *bv)
{
Actual_Socket b = (Actual_Socket) bv;
unsigned long as = (unsigned long) av,
bs = (unsigned long) b->s;
if (as < bs)
return -1;
if (as > bs)
return +1;
return 0;
}
PVS-Studio's diagnostic message: V205 Explicit conversion of pointer type to 32-bit integer type: (unsigned long) av:
The pointers are explicitly cast to 'unsigned long' and put into the variables 'as' and 'bs'. Since the high-order bits of the address can get lost during this operation, the comparison may become incorrect. After all, I don't see any reason at all why to cast pointers to integer types in this code; you could just compare them as they are.
Long gone are the times when the 'new' operator returned NULL when it failed to allocate memory. Now it throws a std::bad_alloc exception. We could of course have the 'new' operator return 0, but this is not the case.
Despite what is said above, programs are still inhabited by code fragments like this:
int _tmain(....)
{
....
pBuf = new char[maxlength];
if (pBuf == NULL)
{
_tprintf(_T("Could not allocate enough memory!\n"));
delete [] wc;
delete [] dst;
delete [] src;
return ERR_ALLOC;
}
....
}
PVS-Studio's diagnostic message: V668 There is no sense in testing the 'pBuf' pointer against null, as the memory was allocated using the 'new' operator. The exception will be generated in the case of memory allocation error.
In my articles I don't cite many of the bugs I find in code, for they don't prevent a program from working correctly. This time, however, I've decided to tell you about a couple of those. You see, it's just fun to watch a program run well from sheer luck, but not because it was written well.
void CBaseView::OnContextMenu(CPoint point, DiffStates state)
{
....
popup.AppendMenu(MF_STRING | oWhites.HasTrailWhiteChars ?
MF_ENABLED : (MF_DISABLED|MF_GRAYED),
POPUPCOMMAND_REMOVETRAILWHITES, temp);
....
}
PVS-Studio's diagnostic message: V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '|' operator. baseview.cpp 2246
Depending on the 'oWhites.HasTrailWhiteChars' variable's value, you need to get one of the following combinations of constants:
But the code works in quite a different way. The priority of the '|' operation is higher than that of the '?:' operation. Let's add parentheses to make it clearer:
(MF_STRING | oWhites.HasTrailWhiteChars) ? MF_ENABLED : MF_DISABLED | MF_GRAYED
The code runs correctly only because the constant 'MF_STRING' equals 0. It doesn't affect the result in any way, which causes the incorrect expression to work well.
Here's one more example of programmer's luck. The HWND type is often used as the 'unsigned' type in TortoiseSVN. To be able to do this, the programmer had to use explicit type conversions, as in the following functions:
HWND m_hWnd;
UINT_PTR uId;
INT_PTR CBaseView::OnToolHitTest(....) const
{
....
pTI->uId = (UINT)m_hWnd;
....
}
UINT_PTR idFrom;
HWND m_hWnd;
BOOL CBaseView::OnToolTipNotify(
UINT, NMHDR *pNMHDR, LRESULT *pResult)
{
if (pNMHDR->idFrom != (UINT)m_hWnd)
return FALSE;
....
}
Or, for instance, the value of a HWND variable is printed as if it were the 'long' type.
bool CCommonAppUtils::RunTortoiseProc(....)
{
....
CString sCmdLine;
sCmdLine.Format(L"%s /hwnd:%ld",
(LPCTSTR)sCommandLine, AfxGetMainWnd()->GetSafeHwnd());
....
}
From a formal viewpoint the code is incorrect: the 'HWND' type is a pointer, which means it cannot be cast to 32-bit integer types. The PVS-Studio analyzer is anxious about violations of that rule and generates the warnings.
But the funny thing is that this code will work as it should!
The HWND type is used to store descriptors which are used in Windows to handle various system objects. Such are also the types HANDLE, HMENU, HPALETTE, HBITMAP, etc.
Though descriptors are in fact 64-bit pointers, only the low-order 32 bits of these are used - to provide better compatibility (for instance, to enable interaction between 32-bit and 64-bit processes). See "Microsoft Interface Definition Language (MIDL): 64-Bit Porting Guide" (USER and GDI handles are sign extended 32b values) for details.
I doubt that the TortoiseSVN developers took these assumptions into account when putting the HWND type into 32-bit types. It must be just a carelessly written code which works well due to luck and Windows API developers' efforts.
Use static analysis regularly in your development process, and you'll find a whole lot of bugs at the very early stages of development. I naturally recommend you to consider trying the PVS-Studio code analyzer in the first place; but there also exist many other good analyzers: static code analysis tools.
Here are additional references to clarify some subtleties described in the article.
0