This is the fourth post in which I want to share with you some useful observations on error patterns and the ways of fighting them. This time I will touch upon the subject of handling rare and emergency conditions in programs. While examining a number of applications, I came to a conclusion that the error handling code is one of the most unreliable parts in C/C++ programs' sources. What are the consequences of such defects? An application must generate the message "file X is not found" but instead it crashes and forces the user to make guesses about what he/she is doing wrong. A program handling a data base produces an incomprehensible message instead of telling the user that there is just a field filled in incorrectly. Let's try to fight against this type of errors that haunt our users.
Firstly, here is the information for those readers who are not familiar with my previous posts. You can find them here:
As usual, I won't go into abstract speculations but instead will start with examples. This time I decided to take them from the open source Firefox project. I will try to show you that even in high-quality and popular application things are not very good in the code intended for error handling. All the defects have been found with the PVS-Studio 4.50 analyzer.
Example N1. Incomplete verification for table integrity
int AffixMgr::parse_convtable(..., const char * keyword)
{
...
if (strncmp(piece, keyword, sizeof(keyword)) != 0) {
HUNSPELL_WARNING(stderr,
"error: line %d: table is corrupt\n",
af->getlinenum());
delete *rl;
*rl = NULL;
return 1;
}
...
}
PVS-Studio diagnostic message: V579 The strncmp function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the third argument. affixmgr.cpp 3708
The programmer tried to verify the table integrity here. Unfortunately, this check may both work and fail. To calculate the length of the key word the sizeof() operator is used, which is certainly incorrect. As a result, whether or not the code works will depend on pure luck (at certain values of the key word and 'keyword' pointer's size in the current data model).
Example 2. Invalid verification for file reading operation
int PatchFile::LoadSourceFile(FILE* ofile)
{
...
size_t c = fread(rb, 1, r, ofile);
if (c < 0) {
LOG(("LoadSourceFile: "
"error reading destination file: " LOG_S "\n",
mFile));
return READ_ERROR;
}
...
}
PVS-Studio diagnostic message: V547 Expression 'c < 0' is always false. Unsigned type value is never < 0. updater.cpp 1179
This is an example when the code of error handling was written with the "just letting it to be" approach. The programmer didn't even bother to think over what he/she had written and how it would work. Such a verification is an incorrect one: the fread() function uses an unsigned type to return the number of bytes read. This is the function's prototype:
size_t fread(
void *buffer,
size_t size,
size_t count,
FILE *stream
);
The 'c' variable having the size_t type is naturally used to store the result. Consequently, the result of the (c < 0) check is always false.
This is a good example. It seems at the first glance that there is some checking here but we find out that it's absolutely useless.
The same error can be found in other places as well:
V547 Expression 'c < 0' is always false. Unsigned type value is never < 0. updater.cpp 2373
V547 Expression 'c < 0' is always false. Unsigned type value is never < 0. bspatch.cpp 107
Example 3. Checking a pointer for NULL only after it had been used
nsresult
nsFrameSelection::MoveCaret(...)
{
...
mShell->FlushPendingNotifications(Flush_Layout);
if (!mShell) {
return NS_OK;
}
...
}
PVS-Studio diagnostic message: V595 The 'mShell' pointer was utilized before it was verified against nullptr. Check lines: 1107, 1109. nsselection.cpp 1107
If the pointer is equal to null, we must handle this special occasion and return NS_OK from the function. What confuses me is that the mShell pointer has already been used before this moment.
Probably, this code must is operational only because the mShell pointer never equals to NULL. I cite this example to demonstrate that one can easily make a mistake even in the simplest of checks. We have it but still it is useless.
Example 4. Checking a pointer for NULL only after it had been used
CompileStatus
mjit::Compiler::performCompilation(JITScript **jitp)
{
...
JaegerSpew(JSpew_Scripts,
"successfully compiled (code \"%p\") (size \"%u\")\n",
(*jitp)->code.m_code.executableAddress(),
unsigned((*jitp)->code.m_size));
if (!*jitp)
return Compile_Abort;
...
}
PVS-Studio diagnostic message: V595 The '* jitp' pointer was utilized before it was verified against nullptr. Check lines: 547, 549. compiler.cpp 547
By the way, using a pointer before checking it is a wide-spread error. This was one more example of this kind.
Example 5. Incomplete checking of input values
PRBool
nsStyleAnimation::AddWeighted(...)
{
...
if (unit[0] == eCSSUnit_Null || unit[1] == eCSSUnit_Null ||
unit[0] == eCSSUnit_Null || unit[0] == eCSSUnit_URL) {
return PR_FALSE;
}
...
}
PVS-Studio diagnostic message: V501 There are identical sub-expressions 'unit [0] == eCSSUnit_Null' to the left and to the right of the '||' operator. nsstyleanimation.cpp 1767
It seems to me that this code fragment contains 2 misprints simultaneously. I cannot tell for sure how exactly the code should look, but the developers most likely intended it to be written as follows:
if (unit[0] == eCSSUnit_Null || unit[1] == eCSSUnit_Null ||
unit[0] == eCSSUnit_URL || unit[1] == eCSSUnit_URL) {
The misprints may cause the function to process incorrect input values.
Example 6. Incomplete checking of input values
nsresult PresShell::SetResolution(float aXResolution, float
aYResolution)
{
if (!(aXResolution > 0.0 && aXResolution > 0.0)) {
return NS_ERROR_ILLEGAL_VALUE;
}
...
}
PVS-Studio diagnostic message: V501 There are identical sub-expressions to the left and to the right of the '&&' operator: aXResolution > 0.0 && aXResolution > 0.0 nspresshell.cpp 5114
And here was one more example of invalid input parameters verification. This time, a misprint doesn't allow the program to check the aYResolution argument's value.
Example 7. A non-dereferenced pointer
nsresult
SVGNumberList::SetValueFromString(const nsAString& aValue)
{
...
const char *token = str.get();
if (token == '\0') {
return NS_ERROR_DOM_SYNTAX_ERR; // nothing between commas
}
...
}
PVS-Studio diagnostic message: V528 It is odd that pointer to 'char' type is compared with the '\0' value. Probably meant: *token == '\0'. svgnumberlist.cpp 96
The code checking that there's nothing between the commas does not work. To find out whether or not the string is empty, we can compare the first character to '\0'. But it is the pointer which is compared to null instead of the first character. This pointer is never equal to zero. This is the correct check: (*token == '\0').
Example 8. Incorrect type for storing the index
PRBool
nsIEProfileMigrator::TestForIE7()
{
...
PRUint32 index = ieVersion.FindChar('.', 0);
if (index < 0)
return PR_FALSE;
...
}
PVS-Studio diagnostic message: V547 Expression 'index < 0' is always false. Unsigned type value is never < 0. nsieprofilemigrator.cpp 622
The function won't return PR_FALSE if there is no dot in the string and will continue handling incorrect data. The error here is that an unsigned data type was used for the 'index' variable. Checking that (index < 0) is meaningless.
Example 9. Forming a wrong error message
cairo_status_t
_cairo_win32_print_gdi_error (const char *context)
{
...
fwprintf(stderr, L"%s: %S", context, (wchar_t *)lpMsgBuf);
...
}
PVS-Studio diagnostic message: V576 Incorrect format. Consider checking the third actual argument of the 'fwprintf' function. The pointer to string of wchar_t type symbols is expected. cairo-win32-surface.c 129
Even if an error was successfully detected, it should be processed correctly. And since nobody tests error handlers either, we may find many interesting things there.
The _cairo_win32_print_gdi_error() function will print some trash. The fwprintf() function awaits a pointer to a unicode-string as the third argument, but instead it gets a string with the 'const char *' format.
Example 10. Dumping error
bool ExceptionHandler::WriteMinidumpForChild(...)
{
...
DWORD last_suspend_cnt = -1;
...
// this thread may have died already, so not opening
// the handle is a non-fatal error
if (NULL != child_thread_handle) {
if (0 <= (last_suspend_cnt =
SuspendThread(child_thread_handle))) {
...
}
PVS-Studio diagnostic message: V547 Expression is always true. Unsigned type value is always >= 0. exception_handler.cc 846
This is another example in the error handler. The result returned by the SuspendThread function is processed incorrectly here. The last_suspend_cnt variable has the DWORD type and therefore is always greater or equal to 0.
Let me digress from the central topic a bit and tell you about the results of checking Firefox in general. The project is of a very high quality, and PVS-Studio had found quite a few errors in it. However, since it's huge one, there is rather big number of errors in a quantitative relation. Unfortunately, I was not able to study the report generated by the PVS-Studio tool thoroughly. The project was analyzed with the console version of PVS-Studio called from the make-file. It is possible to review all the diagnostic messages with opening of the report in Visual Studio. But as there is no project for Visual Studio, it doesn't prompt you what variables and where are defined and doesn't allow you to navigate to fragments where macros are defined and so on. As a result, analysis of an unknown project is quite labor-intensive, and I managed to study only a fraction of the messages.
The errors are diverse. For example, there are array overruns:
class nsBaseStatis : public nsStatis {
public:
...
PRUint32 mLWordLen[10];
...
nsBaseStatis::nsBaseStatis(...)
{
...
for(PRUint32 i = 0; i < 20; i++)
mLWordLen[i] = 0;
...
}
...
};
PVS-Studio diagnostic message: V557 Array overrun is possible. The value of 'i' index could reach 19. detectcharset.cpp 89
Although this error and other similar errors are interesting, they are not related to the subject of our article. So, if you want to see some other errors, download this file: mozilla-test.txt.
I decided to cite 10 examples instead of just a couple to convince you that defects in error handlers are a wide-spread issue. Of course, error handlers are not the most crucial and important fragments of a program. But programmers write them, so they hope to improve the program behavior with their help. Unfortunately, my observations convince me that checks and error handlers often fail to work correctly. You see, I had just one project to show you those many errors of this kind.
What should we do with them, what recommendations can we give?
We must admit that one might make a mistake even in a simple check. This is the most difficult and important thing to understand. It is because error handlers are considered as simple code fragments that they contain so many misprints and other defects. Error handlers are not tested and checked. Nobody writes tests for them.
Of course, it's difficult and often unreasonable from the economic viewpoint to write tests for error handlers. But if programmers at least know about the danger, it's already a progress. When you are aware of something, you are already armed to deal with it. There is also an analogy to error handlers that we can refer to.
Statistics tell us that mountain climbers most often fall at the end of ascension. It happens not because of tiredness, but because the person thinks that he/she will soon finish the ascension - he/she relaxes, loses attention and therefore makes more mistakes. Something like that happens to a programmer when he is writing a program. He/she spends much effort and attention on creating an algorithm but does not concentrate much on writing various checks because he/she is sure that he/she in no way can make a mistake there.
So, now you are aware. And I'm sure this thing alone is already good.
If you say that only students and novice programmers make such silly mistakes, you are wrong. Everyone makes misprints. Please read a small post on this topic: "The second myth - expert developers do not make silly mistakes". I can confirm the idea by many examples from various projects. But I think the ones cited here will be enough to make you think it over.
The dumping mechanisms, logging functions and other similar auxiliary mechanisms do deserve creating unit-tests for them.
An inefficient dumping mechanism is not only useless; it only pretends to be able to help you in an emergency situation. If a user sends you a corrupted dump-file, it will not only be unable to help you but will also mislead you and you will spend much more time searching for errors than if the dump-file had never existed at all.
The recommendation looks simple and obvious. But do many of you reading this post have unit-tests to check the WriteMyDump class?
Use static code analyzers. The capability of finding defects in error handlers is one of the strong points of the static analysis methodology. Static analysis covers all the code branches regardless of how often they are used while an application is running. It can detect errors which reveal themselves quite rare.
In other words, code coverage with static analysis is 100%. It's almost impossible to reach the same code coverage using other types of testing. Code coverage with unit-tests and regression testing usually is less than 80%. The remaining 20% are very difficult to test. These 20% include most error handlers and rare conditions.
You may try to use the methodology of Fault injection. The point is that some functions start to return various error codes from time to time, and the program must handle them correctly. For instance, you can write your own function malloc() that will return NULL from time to time even when there is some memory left. It will allow you to know how the application will behave when memory really runs out. The same approach can be applied to such functions as fopen(), CoCreateInstance(), CreateDC(), etc.
There are special programs that allow you to automate this process and do it without manually writing your own functions for causing random failures. Unfortunately, I never dealt with such systems, so I cannot tell you about them in every detail.
Defects in error handlers are very frequent. Unfortunately, I'm not sure that the above given recommendations are enough to avoid them. But I do hope that now this issue is of interest for you and you will invent means to make defects in your programs fewer. I, as the other readers as well, will also appreciate if you could share your ideas and methods with us on how to avoid errors of the type we have discussed in this article.