Webinar: Parsing C++ - 10.10
Recently, while telling you about check of another project, I have constantly repeated that it is a very quality code and there are almost no errors in it. A good example is analysis of such projects as Apache, MySQL and Chromium. I think you understand why we choose such applications for analysis. They are known to all of us while nobody wants to hear about horrible things found in the diploma design of student Jack. But sometimes we check projects that just come to hand. Some of such projects leave painful impressions in my delicate and vulnerable soul. This time we checked Intel(R) Energy Checker SDK (IEC SDK).
Intel Energy Checker SDK is a small C project containing just 74500 lines of code. Compare this number to the WinMerge project's size of 186 000 lines or size of the Miranda IM project together with the plugins (about 950 000 lines).
By the way, while we are just in the beginning of the article, try to guess how many 'goto' operators there are in this project. Have you thought of a number? If yes, then we go on.
All in all, this is one of those small projects unknown to a wide audience. When I look at a project of that kind, the following analogy occurs to me. You may walk on good familiar streets for a long time without seeing any puddles. But as you make a turn and look into a yard, you just not only see a puddle but a puddle so big that you don't understand how you can pass it without getting your feet wet.
I wouldn't say that the code is horrible, there are cases much worse. But look at it yourself. There are only 247 functions in the whole project. You will say it's few, won't you? Of course, it's few. But the size of some of them embarrasses me. Four functions have size of more than 2000 lines each:
V553 The length of 'pl_open' function's body is more than 2000 lines long. You should consider refactoring the code. pl_csv_logger productivity_link.c 379
V553 The length of 'pl_attach' function's body is more than 2000 lines long. You should consider refactoring the code. pl_csv_logger productivity_link.c 9434
V553 The length of 'main' function's body is more than 2000 lines long. You should consider refactoring the code. cluster_energy_efficiency cee.c 97
V553 The length of 'main' function's body is more than 2000 lines long. You should consider refactoring the code. pl2ganglia pl2ganglia.c 105
The following method of getting the length of a directory's name is also significant:
#define PL_FOLDER_STRING "C:\\productivity_link"
#define PL_PATH_SEPARATOR_STRING "\\"
#define PL_APPLICATION_NAME_SEPARATOR_STRING "_"
...
pl_root_name_length = strlen(PL_FOLDER_STRING);
pl_root_name_length += strlen(PL_PATH_SEPARATOR_STRING);
pl_root_name_length += application_name_length;
pl_root_name_length += strlen(PL_APPLICATION_NAME_SEPARATOR_STRING);
pl_root_name_length += PL_UUID_MAX_CHARS;
pl_root_name_length += strlen(PL_PATH_SEPARATOR_STRING);
I understand that this fragment is not crucial to speed and there's no reason to optimize the string length calculation. But just from the mere love to art, the programmer could have created a macro "#define STRLEN(s) (sizeof(s) / sizeof(*s) - 1)". My sense of beauty suffers even more because of strings containing "C:\\". The following macros alert me:
#define PL_INI_WINDOWS_FOLDER "C:\\productivity_link"
#define PL_INI_WINDOWS_LC_FOLDER "c:\\productivity_link"
#define PLH_FOLDER_SEARCH _T("C:\\productivity_link\\*")
However, this code does what it should and we won't focus our attention on the programming style. It is the number of errors found by PVS-Studio in a project of such a small size - this is what scares me most. And keep in mind that far not all the 74000 code lines were checked. About one third of the code is intended for LINUX/SOLARIS/MACOSX and is stored in #ifdef/#endif code branches that were not checked. The impassible wood of #ifdef/#endif's is just a different story but I promised not to speak of code design anymore. If you wish, you may look through the source codes yourselves.
The code of IEC SDK has collected a diverse bunch of mistakes one can make when handling arrays at the low level. However, mistakes of this kind are very typical of the C language.
There is code addressing memory outside an array:
V557 Array overrun is possible. The '255' index is pointing beyond array bound. pl2ganglia pl2ganglia.c 1114
#define PL_MAX_PATH 255
#define PL2GANFLIA_COUNTER_MAX_LENGTH PL_MAX_PATH
char name[PL_MAX_PATH];
int main(int argc, char *argv[]) {
...
p->pl_counters_data[i].name[
PL2GANFLIA_COUNTER_MAX_LENGTH
] = '\0';
...
}
Here we deal with a typical defect of writing the terminal zero outside the array. The code must look this way:
p->pl_counters_data[i].name[
PL2GANFLIA_COUNTER_MAX_LENGTH - 1
] = '\0';
There are errors of structure clearing.
V568 It's odd that the argument of sizeof() operator is the '& file_data' expression. pl_csv_logger productivity_link_helper.c 1667
V568 It's odd that the argument of sizeof() operator is the '& file_data' expression. pl_csv_logger productivity_link_helper.c 1831
V512 A call of the 'memset' function will lead to underflow of the buffer 'pconfig'. pl_csv_logger productivity_link_helper.c 1806
Here is an example of such incorrect emptying:
int plh_read_pl_folder(PPLH_PL_FOLDER_INFO pconfig) {
...
WIN32_FIND_DATA file_data;
...
memset(
&file_data,
0,
sizeof(&file_data)
);
...
}
The code intended for file search will work badly when you use the WIN32_FIND_DATA structure with rubbish inside it. But I suspect that hardly anybody is interested in the Windows version of this work of programming art. For instance, the code compiles in the "Use Unicode Character Set" mode although it is not fully intended for this. It seems that nobody ever thought of this - they just created the project for Visual Studio and the "Character Set" setting by default defines use of UNICODE.
As a result, there are a dozen of fragments where strings are cleared only partially. I'll cite only one sample of such code:
V512 A call of the 'memset' function will lead to underflow of the buffer '(pl_cvt_buffer)'. pl_csv_logger productivity_link_helper.c 683
#define PL_MAX_PATH 255
typedef WCHAR TCHAR, *PTCHAR;
TCHAR pl_cvt_buffer[PL_MAX_PATH] = { '\0' };
int plh_read_pl_config_ini_file(...)
{
...
ZeroMemory(
pl_cvt_buffer,
PL_MAX_PATH
);
...
}
Well, there are, however, places where disabling UNICODE won't help. Something strange will be printed here instead of text:
V576 Incorrect format. Consider checking the second actual argument of the 'wprintf' function. The pointer to string of wchar_t type symbols is expected. producer producer.c 166
int main(void) {
...
char *p = NULL;
...
#ifdef __PL_WINDOWS__
wprintf(
_T("Using power link directory: %s\n"),
p
);
#endif // __PL_WINDOWS__
...
}
Let me explain this. The wprintf function waits for a string of the "wchar_t *" type while it is a string of the "char *" type that will be passed to it.
There are other errors and small defects similar to this one:
V571 Recurring check. The 'if (ret == PL_FAILURE)' condition was already verified in line 1008. pl_csv_logger pl_csv_logger.c 1009
if(ret == PL_FAILURE) {
if(ret == PL_FAILURE) {
pl_csv_logger_error(
PL_CSV_LOGGER_ERROR_UNABLE_TO_READ_PL
);
I see no reason in enumerating all the defects we have found. If somebody of you wants, I may give you a registration key to check the project and study the messages. I will also send error descriptions to SDK's authors.
Do you remember I asked you to think of a number of 'goto' operators found in the project? So, I think your number is far from the truth. There are 1198 goto operators altogether in the project, i.e. one goto operator for every 60 code lines. And I thought that style had been forgotten for a long time.
Well, and what did the author actually want to say by writing this article? Intel URGENTLY needs to pay the greatest attention to PVS-Studio. :-)
Dear Andrey and PVS-Studio team,
I first want to re-iterate my thank you for bringing to my attention my code's imperfections. Using your tool was really a great opportunity to improve the code's quality.
Error humanum est, does apply to all of use, and especially to me in this case! I wrote almost all of the Energy Checker code so I think that it could be interesting to share my experience with your tool with your potential customers. Beside the great insight it provides - I will come back to this later on - I really appreciated the good integration into Microsoft Visual Studio 2005. This is important to me because I could use the software as-is, "out of the box" and work to remove my bugs immediately. On the downside, I would essentially quote the speed of the analysis. I believe that your company is working on improving it, so I am confident that the use of PVS-Studio will be even smoother in the future.
Now let's get to the point. PVS-Studio slapped me on the hand for my mistakes. And this is what I expect from such software. Sure our code runs OK, with good performance and fulfilling its role (until the next bug report, of course). But in many cases, I was feeling guilty that that some code section just work because I was lucky. Applying the correction, makes them work because they are correct now. And when such mistake happens to be in a macro, it is a great pleasure to see the error count drop down drastically just after an incremental analysis. Another mistake of mines that struck me was that copy-and-past can be a false friend. It helps to speed up crafting a new section of code, but you need to be very vigilant doing so. Obviously in some cases, I was not enough vigilant. PVS-Studio woke me up.
Again, thank you for your feedback, your software's insight and for letting me try PVS-Studio.
Jamel Tayeb, IEC SDK
0