>
>
Why Students Need the CppCat Code Analy…

Andrey Karpov
Articles: 674

Why Students Need the CppCat Code Analyzer

CppCat is a simple static code analyzer capable of detecting bugs in C/C++ programs. We started granting free academic licenses to all interested (students, teachers, and so on). For the sake of popularizing CppCat among students, I decided to write this post about errors that can be found in student lab work tasks posted at Pastebin.com.

Unfortunately, we are no longer developing or supporting the CppCat static code analyzer. Please read here for details.

Just a few words about CppCat

CppCat is a static code analyzer integrating into the Visual Studio environment and allowing the user to detect a variety of typos and other errors as early as at the coding stage already. The analyzer can launch automatically after compilation and check freshly written code. The tool supports the languages C, C++, C++/CLI, C++/CX.

To find out how to get a free CppCat license, see the following article: Free CppCat for Students. I'd just like to add that we grant licenses not only to students, but postgraduates, teachers, etc. as well.

Many are upset by the fact that CppCat cannot integrate into the free environment Visual Studio Express. Unfortunately, we can't help it because Express-versions of Visual Studio don't support plugins. But it's not a problem. Keep in mind that students have access to Microsoft DreamSpark and therefore can get access to Visual Studio Professional.

Attracting students

My original appeal to students was meant to sound something like this:

Even a student can benefit from static analysis. Why spend time and nerves on hunting a bug in your program when there's CppCat out there to help you? It will not only find the bug quicker but also help you learn more about how you shouldn't write your code. The documentation on CppCat provides a detailed description for each diagnostic and a wide variety of bug examples with tips on how to fix them.

Then I saw it was somewhat forced, exaggerated. Students don't really face serious bugs, do they? After all, there's nothing bad about manually debugging a loop of 10 iterations. It's even useful if we treat it from the viewpoint of training one's practical skills. That's why I decided to paraphrase my appeal to students for trying CppCat in the following way:

When applying for a job, your employer will appreciate not only your skill of programming and creating tricky algorithms but your ability to handle the basic toolkit, which is no less important.

Your algorithm is worth nothing, after all, if you lose it because you don't know what a version control system is and the best thing you did was to save a copy of your source codes to a flash drive.

That's why it is crucial not only to study programming languages and development environments but auxiliary toolkit as well.

I don't presume to give you a complete list of all the must-have tools, but you certainly should know at least one version control system, why WinMerge may be needed, what a profiler is, how to create a distribution package and so on.

One of the technologies I recommend you to study and mention in your resume is static code analysis. CppCat is an excellent tool to get started with this methodology. It will serve a good bonus to your knowledge and a sign for your employer that you do know something about code quality.

Now what we have gathered here for

The boring part is over. I think you've already guessed that we are going to search for errors in students' lab work tasks.

This time, there's no need to say most code fragments we are going to discuss are poor. It's clear by itself that students' lab work tasks contain piles of various bugs. So the goal I set was not to find as many bugs as possible - it's just not interesting. Instead, I tried to distinguish certain bug patterns most common among students. Well, for now I have managed to single out only three such patterns. But we'll speak of that a bit later.

You probably want to know where I took the lab work tasks from. Here's the answer.

There is the Pastebin.com site with a convenient service for developers to share their code fragments. Students use this site very actively. Almost every code sample with the C++ tag is a lab work task or an excerpt from it.

We have written a program to monitor the Pastebin.com site and download fresh files with the "C++ code" tag from there. Having collected over two thousand files, I made a Visual Studio project of them and checked it. Of course, more than half of the project was impossible to check because many files contained only code fragments or text not commented out or some headers were missing, and the like. But my goal was not to check as much code published at Pastebin.com as possible anyway. What I did manage to check is quite enough for this article. We keep collecting files, so perhaps there will be another article on this topic.

C++ students' typical mistakes

The number of errors I examined in the lab work tasks was not that great. So there are only 3 patterns that I can distinguish so far.

I won't cite all the examples as they all are alike and uninteresting. So I will give you just a few examples from each pattern. But please take my word for it that if I'm saying a certain error is very common, then it's really so.

P.S. Many of mentioned examples were posted with an expiration time limit and are not available any longer. That's why I won't give links to such pages.

Pattern 1. Third place. Confusing similar conditions

Many programming tasks imply checking numbers of conditions and students may easily get confused or make typos while implementing them. Here's a typical example:

int main()
{
  int n,a,b,c;
  cin >> n;
  for(int i=0;i<n;i++)
  {
    cin >> a >> b >> c;
    if((a % 2==0 && b % 2 ==0 && c % 2!=0)||
       (a % 2==0 && b % 2!=0 && c % 2==0)||
       (a % 2!=0 && b % 2==0 && c % 2==0)||
       (a % 2!=0 && b % 2 !=0 && c % 2==0)||
       (a % 2==0 && b % 2!=0 && c % 2!=0)||   // <=
       (a % 2==0 && b % 2!=0 && c % 2!=0))    // <=
    {
      cout << "1";
    }
    else
      cout << "2";
  }
  cout << endl;
  return 0;
}

CppCat's diagnostic message: V501 There are identical sub-expressions '(a % 2 == 0 && b % 2 != 0 && c % 2 != 0)' to the left and to the right of the '||' operator. jtzrihcg.cpp 14

The programmer has to make some tricky checks of the values of three declared variables. It looks like the code was being copy-pasted and was not edited right in some lines. As a result, the last line and the one before it in the condition are the same.

Another example:

int main() {
  ....
  } else if(gesucht < geraten) {
    puts("Ein bisschen zu klein");
  } else if (gesucht < geraten) {
    puts("Ein bisschen zu gross");
  }
  ....
}

V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 41, 43. wrgkuuzr.cpp 40

The (gesucht < geraten) check is executed twice although different strings should be output.

By the way, the error is found in the last line in both examples. Again we have come across the "last line effect".

Pattern 2. Second place. Array overrun by 1 item

The fact that array items in C++ are numbered starting with 0 makes it very difficult to study the language. That is, it's not something difficult to understand, but you have to train yourself to keep that in mind all the time and never get outside the array boundaries. If you need the 10-th item of the array, you just can't help writing A[10]. For example:

int main()
{
  ....
  int rodnecs[10];
  ....
  VelPol1 = rodnecs[1] + rodnecs[3] + rodnecs[5] +
            rodnecs[8] + rodnecs[10];
  ....
}

CppCat's diagnostic message: V557 Array overrun is possible. The '10' index is pointing beyond array bound. 0z3x9b3i.cpp 38

Another one:

void main()
{
  ....
  double pop[3][3];
  ....
  for (int i = 0; i<3; i++)
  {
    calc_y[i] = F(pop[i][1], pop[i][2], pop[i][3], x[i]);
  }
  ....
}

CppCat's diagnostic message: V557 Array overrun is possible. The '3' index is pointing beyond array bound. 1uj9v9xs.cpp 48

There are plenty of incorrect comparisons in loop conditions:

int main()
{
  int i,pinakas[20],temp,temp2,max,min,sum=0;
  for (i=1;i<=20;i++)
  {
    pinakas[i]=rand();
  ......
}

CppCat's diagnostic message: V557 Array overrun is possible. The value of 'i' index could reach 20. 287ep6c0.cpp 20

Just numbers of them:

int main()
{
  const int arraySize = 10;
  int a[arraySize];
  int key,index,to_do = arraySize - 1;
  bool did_swap = true;

  srand(time(NULL));
  for (int i = 0; i <= arraySize; i++)
  {
    //generating random number between 1 - 100
    a[i] = rand() % 100 + 1;
  }
  ....
}

CppCat's diagnostic message: V557 Array overrun is possible. The value of 'i' index could reach 10. wgk1lx3u.cpp 18

All the rest errors are similar to those cited above, so let's stop here.

Pattern 3. First place. Uninitialized variables

Hey! I think I've finally got it why everyone calls "uninitialized variables" the most common and dangerous error in C/C++ programming although I don't see it that often when checking projects with PVS-Studio.

Why? Probably because people suffer too much from this mistake when studying the language and therefore learn to be careful and gradually stop making it. But the memory still remains, so if you ask them what they are most afraid of, they'll say, "uninitialized variables".

Here's a very simple case:

int main()
{
  ....
  int n,k=0, liczba=n, i=1;
  ....
}

CppCat's diagnostic message: V614 Uninitialized variable 'n' used. 1hvefw6r.cpp 92

Also, there's a risk of handling a list incorrectly:

void erase(List * Lista){
  List* pom;
  pom->next = Lista->next;
  Lista->next= pom;
  delete pom;
}

CppCat's diagnostic message: V614 Uninitialized pointer 'pom' used. 6gpsgjuy.cpp 54

You may also make a loop with a random number of iterations by mistake:

void main()
{
  int i,n;
  imie* ime[20];
  string nazwa;
  string kobieta="Kobiece imina: ";
  wpr_dane();
  for (i = 1; i < n; i++)
  {
    ....
}

CppCat's diagnostic message: V614 Uninitialized variable 'n' used. 8kns8hyn.cpp 63

Another way is to use a variable first and only then set it:

int main() {
  int n1;
  int n2;
  std::vector<int> vec1(n1);
  std::vector<int> vec2(n2);
  std::cin >> n1;
  for (int i = 0; i < n1; i++) {
    std::cin >> vec1[i];
  }
  std::cin >> n2;
  for (int j = 0; j < n2; j++) {
    std::cin >> vec2[j];
  }
  ....
}

CppCat's diagnostic messages:

  • V614 Uninitialized variable 'n1' used. 9r9zdkp6.cpp 25
  • V614 Uninitialized variable 'n2' used. 9r9zdkp6.cpp 26

I don't think it will make any sense citing any more examples. But please believe me, students do tend to shoot themselves in the foot with uninitialized variables in a variety of ways.

Other mistakes

Of course, I've come across a number of other very diverse mistakes in students' lab work tasks. But I cannot distinguish any other groups of bugs as large as the ones described above. There are a few quite noticeable though: incorrect array size calculation, an issue with a semicolon, pre-term loop termination, incorrect array handling, WTF.

Incorrect array size calculation

Many novice programmers have a hard time learning to understand that a pointer and an array are two different entities in C/C++. As a result, you may often see code like this:

int arrayLen(int p[])
{
   return(sizeof(p)/sizeof(*p));
}

CppCat's diagnostic message: V511 The sizeof() operator returns size of the pointer, and not of the array, in 'sizeof (p)' expression. seprcjvw.cpp 147

The arrayLen() function is not used anywhere though. Probably because it doesn't work. :)

Another example:

bool compare_mas(int * mas, int * mas2){
  //calculating number of items of first array
  const auto mas_size = sizeof(mas) / sizeof(mas[0]);

  //calculating number of times of second array
  const auto mas2_size = sizeof(mas2) / sizeof(mas2[0]);
  ....
}

CppCat's diagnostic messages:

  • V514 Dividing sizeof a pointer 'sizeof (mas)' by another value. There is a probability of logical error presence. 0mxbjwbg.cpp 2
  • V514 Dividing sizeof a pointer 'sizeof (mas2)' by another value. There is a probability of logical error presence. 0mxbjwbg.cpp 3

A semicolon ';' put in a wrong place

Mistakes of this kind are not as common as I expected. There are some but I won't call it a widely spread mistake in students' lab work tasks.

A typical example:

vector sum(vector m[],int N){
vector sum,tmp;
    for (int i=0;i<N;i++);
    {
    tmp.a=m[i].a;
    tmp.b=m[i].b;
    tmp.c=m[i].c;
    sum.a+=tmp.a;
    sum.b+=tmp.b;
    sum.c+=tmp.c;
    }
    return sum.a,sum.b,sum.c;
}

CppCat's diagnostic message: V529 Odd semicolon ';' after 'for' operator. knadcqde.cpp 122

Pre-term loop termination

There are a few examples where a loop is accidentally terminated earlier than it should:

int main()
{
  ....
  for (long long j = sled.size()-1; j > i; j --)
  {
    sled[j] = '0';
    des = 1;
    break;
  }
  ....
}

CppCat's diagnostic message: V612 An unconditional 'break' within a loop. XHPquVXs.cpp 31

Incorrect array handling

In a few tasks, I came across examples of Pascal-style array handling, i.e. when a comma is used, which obviously leads to incorrect execution even though the code still compiles:

void build_maze(){
  // tablica przechowujaca informacje o odwiedzonych polach
  bool ** tablica = new bool *[n];
  ....
  if (tablica[aktualny.x - 1, aktualny.y] == false){
  ....
}

CppCat's diagnostic message: V520 The comma operator ',' in array index expression '[aktualny.x - 1, aktualny.y]'. qqxjufye.cpp 125

Or, students sometimes forget that memory for returned arrays should be allocated in a special way:

int *mul3(int *a)
{
  int mem = 0;
  int b[1001];
  for (int i = 100; i >= 0; i--)
  {
    int x = a[i] * 3 + mem;
    mem = x / 10;
    b[i] = x % 10;
  }
  return b;
}

CppCat's diagnostic message: V558 Function returns the pointer to temporary local object: b. hqvgtwvr.cpp 89

WTF

There are code fragments I can't call other than WTF. Perhaps someone asked a classmate to explain where a mistake in his or her program was. But, what I find more likely, it was to study that very array overrun issue that the task was about. Unfortunately, I don't know the comment says.

Here is one full example:

#include <iostream>
using namespace std;
int main()
{
    int a[10];
    for(int i=0; i<50; i++)
        cout << a[i] << endl;
    //ovoj loop ili kje krashne ili kje ti nedefinirani vrednost
    //(ne mora da bidat 0)
    //ako namesto 50 stavis 500000, skoro sigurno kje krashne
    int b[10];
    for(int i=0; i<50; i++)
    {
        b[i] = i;
        cout << b[i] << endl;
    }
    //ovoj loop nekogas kje raboti, nekogas ne. problemot so
    //out-of-bounds index errori e sto nekogas rabotat kako
    //sto treba, pa greskata tesko se naogja
}

What else wasn't included into the article

Quite a lot! For instance, examples of incorrect use of the printf() function. But these are just so trivial I don't even feel like discussing them.

However, there were some quite exotic kinds of errors:

void zmienne1()
{
  ....
  int a,b,c,d;
  cin >> a >> b >> c >> d;
  if(a == b == c == d)
  ....
}

CppCat's diagnostic message: V709 Suspicious comparison found: 'a == b == c'. Remember that 'a == b == c' is not e qual to 'a == b && b == c'. b5lt64hj.cpp 284

Here's one more example of a rare kind (if you don't look at the compiler warnings of course):

const long AVG_PSYCHO = 0.8;
const long AVG_GRAD = 1.2;

CppCat's diagnostic messages:

  • V674 The '0.8' literal of the 'double' type is assigned to a variable of the 'long' type. Consider inspecting the '= 0.8' expression. 2k2bmnpz.cpp 21
  • V674 The '1.2' literal of the 'double' type is assigned to a variable of the 'long' type. Consider inspecting the '= 1.2' expression. 2k2bmnpz.cpp 22

But we have to stop, I'm afraid. I hope you enjoyed reading this article and I managed to persuade some of you to try CppCat.

Why we don't intend to make some online-analyzer

I can foresee a possible question, "Why don't you make some online code analysis system? There's one form out there where you can paste your code and click "Analyze" to have it checked. Or, since you are monitoring the pastebin.com site, why not upload analysis results somewhere?"

I'm sure there's no need for that. I can name three reasons why, so please don't start any debates on this topic.

The reasons are:

  • Neither we nor users need this. For us, it means an additional bulk of work, while users won't get anything new. They can simply download and install PVS-Studio or CppCat and carry out all the experiments they wish. A demo version will be more than enough for this purpose. "Paste and check your code" forms are usually used by those companies you cannot easily download a demo version from. From us, you can. Moreover, our demo version doesn't have any functionality limitations. Also, someone may complain they don't have Windows but really want to try our tool. But since they don't have Windows, they are not our customers anyway.
  • This system greatly distorts the estimate of the static analyzer's capabilities. Here's an article about that: Myths about static analysis. The fifth myth - a small test program is enough to evaluate a tool. We want people to try the analyzer on their real-life projects, not synthetic samples.
  • As I've already said, we don't feel like checking synthetic samples. But a full-blown analysis of a large project is too difficult to implement from the viewpoint of infrastructure. To learn more about it, see this interview. To put it brief, we would have to create a complex system to upload source files and libraries to it and set up build parameters, and so on. Otherwise, you won't get a full-blown analysis. So it turns out that downloading and installing the analyzer and checking your project by yourself is a much easier way.

Conclusion

Dear students and teachers! We will be glad to see you as our users. I wish students to become highly skilled professionals and persuade their future co-workers to purchase PVS-Studio for teamwork.

References: