CppCat is a simple static code analyzer capable of detecting bugs in C/C++ programs. We started granting free academic licenses to all interested (students, teachers, and so on). For the sake of popularizing CppCat among students, I decided to write this post about errors that can be found in student lab work tasks posted at Pastebin.com.
Unfortunately, we are no longer developing or supporting the CppCat static code analyzer. Please read here for details.
CppCat is a static code analyzer integrating into the Visual Studio environment and allowing the user to detect a variety of typos and other errors as early as at the coding stage already. The analyzer can launch automatically after compilation and check freshly written code. The tool supports the languages C, C++, C++/CLI, C++/CX.
To find out how to get a free CppCat license, see the following article: Free CppCat for Students. I'd just like to add that we grant licenses not only to students, but postgraduates, teachers, etc. as well.
Many are upset by the fact that CppCat cannot integrate into the free environment Visual Studio Express. Unfortunately, we can't help it because Express-versions of Visual Studio don't support plugins. But it's not a problem. Keep in mind that students have access to Microsoft DreamSpark and therefore can get access to Visual Studio Professional.
My original appeal to students was meant to sound something like this:
Even a student can benefit from static analysis. Why spend time and nerves on hunting a bug in your program when there's CppCat out there to help you? It will not only find the bug quicker but also help you learn more about how you shouldn't write your code. The documentation on CppCat provides a detailed description for each diagnostic and a wide variety of bug examples with tips on how to fix them.
Then I saw it was somewhat forced, exaggerated. Students don't really face serious bugs, do they? After all, there's nothing bad about manually debugging a loop of 10 iterations. It's even useful if we treat it from the viewpoint of training one's practical skills. That's why I decided to paraphrase my appeal to students for trying CppCat in the following way:
When applying for a job, your employer will appreciate not only your skill of programming and creating tricky algorithms but your ability to handle the basic toolkit, which is no less important.
Your algorithm is worth nothing, after all, if you lose it because you don't know what a version control system is and the best thing you did was to save a copy of your source codes to a flash drive.
That's why it is crucial not only to study programming languages and development environments but auxiliary toolkit as well.
I don't presume to give you a complete list of all the must-have tools, but you certainly should know at least one version control system, why WinMerge may be needed, what a profiler is, how to create a distribution package and so on.
One of the technologies I recommend you to study and mention in your resume is static code analysis. CppCat is an excellent tool to get started with this methodology. It will serve a good bonus to your knowledge and a sign for your employer that you do know something about code quality.
The boring part is over. I think you've already guessed that we are going to search for errors in students' lab work tasks.
This time, there's no need to say most code fragments we are going to discuss are poor. It's clear by itself that students' lab work tasks contain piles of various bugs. So the goal I set was not to find as many bugs as possible - it's just not interesting. Instead, I tried to distinguish certain bug patterns most common among students. Well, for now I have managed to single out only three such patterns. But we'll speak of that a bit later.
You probably want to know where I took the lab work tasks from. Here's the answer.
There is the Pastebin.com site with a convenient service for developers to share their code fragments. Students use this site very actively. Almost every code sample with the C++ tag is a lab work task or an excerpt from it.
We have written a program to monitor the Pastebin.com site and download fresh files with the "C++ code" tag from there. Having collected over two thousand files, I made a Visual Studio project of them and checked it. Of course, more than half of the project was impossible to check because many files contained only code fragments or text not commented out or some headers were missing, and the like. But my goal was not to check as much code published at Pastebin.com as possible anyway. What I did manage to check is quite enough for this article. We keep collecting files, so perhaps there will be another article on this topic.
The number of errors I examined in the lab work tasks was not that great. So there are only 3 patterns that I can distinguish so far.
I won't cite all the examples as they all are alike and uninteresting. So I will give you just a few examples from each pattern. But please take my word for it that if I'm saying a certain error is very common, then it's really so.
P.S. Many of mentioned examples were posted with an expiration time limit and are not available any longer. That's why I won't give links to such pages.
Many programming tasks imply checking numbers of conditions and students may easily get confused or make typos while implementing them. Here's a typical example:
int main()
{
int n,a,b,c;
cin >> n;
for(int i=0;i<n;i++)
{
cin >> a >> b >> c;
if((a % 2==0 && b % 2 ==0 && c % 2!=0)||
(a % 2==0 && b % 2!=0 && c % 2==0)||
(a % 2!=0 && b % 2==0 && c % 2==0)||
(a % 2!=0 && b % 2 !=0 && c % 2==0)||
(a % 2==0 && b % 2!=0 && c % 2!=0)|| // <=
(a % 2==0 && b % 2!=0 && c % 2!=0)) // <=
{
cout << "1";
}
else
cout << "2";
}
cout << endl;
return 0;
}
CppCat's diagnostic message: V501 There are identical sub-expressions '(a % 2 == 0 && b % 2 != 0 && c % 2 != 0)' to the left and to the right of the '||' operator. jtzrihcg.cpp 14
The programmer has to make some tricky checks of the values of three declared variables. It looks like the code was being copy-pasted and was not edited right in some lines. As a result, the last line and the one before it in the condition are the same.
Another example:
int main() {
....
} else if(gesucht < geraten) {
puts("Ein bisschen zu klein");
} else if (gesucht < geraten) {
puts("Ein bisschen zu gross");
}
....
}
V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 41, 43. wrgkuuzr.cpp 40
The (gesucht < geraten) check is executed twice although different strings should be output.
By the way, the error is found in the last line in both examples. Again we have come across the "last line effect".
The fact that array items in C++ are numbered starting with 0 makes it very difficult to study the language. That is, it's not something difficult to understand, but you have to train yourself to keep that in mind all the time and never get outside the array boundaries. If you need the 10-th item of the array, you just can't help writing A[10]. For example:
int main()
{
....
int rodnecs[10];
....
VelPol1 = rodnecs[1] + rodnecs[3] + rodnecs[5] +
rodnecs[8] + rodnecs[10];
....
}
CppCat's diagnostic message: V557 Array overrun is possible. The '10' index is pointing beyond array bound. 0z3x9b3i.cpp 38
Another one:
void main()
{
....
double pop[3][3];
....
for (int i = 0; i<3; i++)
{
calc_y[i] = F(pop[i][1], pop[i][2], pop[i][3], x[i]);
}
....
}
CppCat's diagnostic message: V557 Array overrun is possible. The '3' index is pointing beyond array bound. 1uj9v9xs.cpp 48
There are plenty of incorrect comparisons in loop conditions:
int main()
{
int i,pinakas[20],temp,temp2,max,min,sum=0;
for (i=1;i<=20;i++)
{
pinakas[i]=rand();
......
}
CppCat's diagnostic message: V557 Array overrun is possible. The value of 'i' index could reach 20. 287ep6c0.cpp 20
Just numbers of them:
int main()
{
const int arraySize = 10;
int a[arraySize];
int key,index,to_do = arraySize - 1;
bool did_swap = true;
srand(time(NULL));
for (int i = 0; i <= arraySize; i++)
{
//generating random number between 1 - 100
a[i] = rand() % 100 + 1;
}
....
}
CppCat's diagnostic message: V557 Array overrun is possible. The value of 'i' index could reach 10. wgk1lx3u.cpp 18
All the rest errors are similar to those cited above, so let's stop here.
Hey! I think I've finally got it why everyone calls "uninitialized variables" the most common and dangerous error in C/C++ programming although I don't see it that often when checking projects with PVS-Studio.
Why? Probably because people suffer too much from this mistake when studying the language and therefore learn to be careful and gradually stop making it. But the memory still remains, so if you ask them what they are most afraid of, they'll say, "uninitialized variables".
Here's a very simple case:
int main()
{
....
int n,k=0, liczba=n, i=1;
....
}
CppCat's diagnostic message: V614 Uninitialized variable 'n' used. 1hvefw6r.cpp 92
Also, there's a risk of handling a list incorrectly:
void erase(List * Lista){
List* pom;
pom->next = Lista->next;
Lista->next= pom;
delete pom;
}
CppCat's diagnostic message: V614 Uninitialized pointer 'pom' used. 6gpsgjuy.cpp 54
You may also make a loop with a random number of iterations by mistake:
void main()
{
int i,n;
imie* ime[20];
string nazwa;
string kobieta="Kobiece imina: ";
wpr_dane();
for (i = 1; i < n; i++)
{
....
}
CppCat's diagnostic message: V614 Uninitialized variable 'n' used. 8kns8hyn.cpp 63
Another way is to use a variable first and only then set it:
int main() {
int n1;
int n2;
std::vector<int> vec1(n1);
std::vector<int> vec2(n2);
std::cin >> n1;
for (int i = 0; i < n1; i++) {
std::cin >> vec1[i];
}
std::cin >> n2;
for (int j = 0; j < n2; j++) {
std::cin >> vec2[j];
}
....
}
CppCat's diagnostic messages:
I don't think it will make any sense citing any more examples. But please believe me, students do tend to shoot themselves in the foot with uninitialized variables in a variety of ways.
Of course, I've come across a number of other very diverse mistakes in students' lab work tasks. But I cannot distinguish any other groups of bugs as large as the ones described above. There are a few quite noticeable though: incorrect array size calculation, an issue with a semicolon, pre-term loop termination, incorrect array handling, WTF.
Many novice programmers have a hard time learning to understand that a pointer and an array are two different entities in C/C++. As a result, you may often see code like this:
int arrayLen(int p[])
{
return(sizeof(p)/sizeof(*p));
}
CppCat's diagnostic message: V511 The sizeof() operator returns size of the pointer, and not of the array, in 'sizeof (p)' expression. seprcjvw.cpp 147
The arrayLen() function is not used anywhere though. Probably because it doesn't work. :)
Another example:
bool compare_mas(int * mas, int * mas2){
//calculating number of items of first array
const auto mas_size = sizeof(mas) / sizeof(mas[0]);
//calculating number of times of second array
const auto mas2_size = sizeof(mas2) / sizeof(mas2[0]);
....
}
CppCat's diagnostic messages:
Mistakes of this kind are not as common as I expected. There are some but I won't call it a widely spread mistake in students' lab work tasks.
A typical example:
vector sum(vector m[],int N){
vector sum,tmp;
for (int i=0;i<N;i++);
{
tmp.a=m[i].a;
tmp.b=m[i].b;
tmp.c=m[i].c;
sum.a+=tmp.a;
sum.b+=tmp.b;
sum.c+=tmp.c;
}
return sum.a,sum.b,sum.c;
}
CppCat's diagnostic message: V529 Odd semicolon ';' after 'for' operator. knadcqde.cpp 122
There are a few examples where a loop is accidentally terminated earlier than it should:
int main()
{
....
for (long long j = sled.size()-1; j > i; j --)
{
sled[j] = '0';
des = 1;
break;
}
....
}
CppCat's diagnostic message: V612 An unconditional 'break' within a loop. XHPquVXs.cpp 31
In a few tasks, I came across examples of Pascal-style array handling, i.e. when a comma is used, which obviously leads to incorrect execution even though the code still compiles:
void build_maze(){
// tablica przechowujaca informacje o odwiedzonych polach
bool ** tablica = new bool *[n];
....
if (tablica[aktualny.x - 1, aktualny.y] == false){
....
}
CppCat's diagnostic message: V520 The comma operator ',' in array index expression '[aktualny.x - 1, aktualny.y]'. qqxjufye.cpp 125
Or, students sometimes forget that memory for returned arrays should be allocated in a special way:
int *mul3(int *a)
{
int mem = 0;
int b[1001];
for (int i = 100; i >= 0; i--)
{
int x = a[i] * 3 + mem;
mem = x / 10;
b[i] = x % 10;
}
return b;
}
CppCat's diagnostic message: V558 Function returns the pointer to temporary local object: b. hqvgtwvr.cpp 89
There are code fragments I can't call other than WTF. Perhaps someone asked a classmate to explain where a mistake in his or her program was. But, what I find more likely, it was to study that very array overrun issue that the task was about. Unfortunately, I don't know the comment says.
Here is one full example:
#include <iostream>
using namespace std;
int main()
{
int a[10];
for(int i=0; i<50; i++)
cout << a[i] << endl;
//ovoj loop ili kje krashne ili kje ti nedefinirani vrednost
//(ne mora da bidat 0)
//ako namesto 50 stavis 500000, skoro sigurno kje krashne
int b[10];
for(int i=0; i<50; i++)
{
b[i] = i;
cout << b[i] << endl;
}
//ovoj loop nekogas kje raboti, nekogas ne. problemot so
//out-of-bounds index errori e sto nekogas rabotat kako
//sto treba, pa greskata tesko se naogja
}
Quite a lot! For instance, examples of incorrect use of the printf() function. But these are just so trivial I don't even feel like discussing them.
However, there were some quite exotic kinds of errors:
void zmienne1()
{
....
int a,b,c,d;
cin >> a >> b >> c >> d;
if(a == b == c == d)
....
}
CppCat's diagnostic message: V709 Suspicious comparison found: 'a == b == c'. Remember that 'a == b == c' is not e qual to 'a == b && b == c'. b5lt64hj.cpp 284
Here's one more example of a rare kind (if you don't look at the compiler warnings of course):
const long AVG_PSYCHO = 0.8;
const long AVG_GRAD = 1.2;
CppCat's diagnostic messages:
But we have to stop, I'm afraid. I hope you enjoyed reading this article and I managed to persuade some of you to try CppCat.
I can foresee a possible question, "Why don't you make some online code analysis system? There's one form out there where you can paste your code and click "Analyze" to have it checked. Or, since you are monitoring the pastebin.com site, why not upload analysis results somewhere?"
I'm sure there's no need for that. I can name three reasons why, so please don't start any debates on this topic.
The reasons are:
Dear students and teachers! We will be glad to see you as our users. I wish students to become highly skilled professionals and persuade their future co-workers to purchase PVS-Studio for teamwork.