Not so long ago one of our colleagues left the team and joined one company developing software for embedded systems. There is nothing extraordinary about it: in every firm people come and go, all the time. Their choice is determined by bonuses offered, the convenience aspect, and personal preferences. What we find interesting is quite another thing. Our ex-colleague is sincerely worried about the quality of the code he deals with in his new job. And that has resulted in us writing a joint article. You see, once you have figured out what static analysis is all about, you just don't feel like settling for "simply programming".
I find an interesting phenomenon occurring in the world nowadays. What happens when a software development department turns into a secondary entity not closely related to the company's basic area of activity? A forest reserve appears. However significant and critical the company's area of activity may be (say, medicine or military equipment), a small swamp appears anyway, where new ideas get stuck and 10-year old technologies are in use.
Here you are a couple of extracts from the correspondence of a man working in the software development department at some nuclear power plant:
And then he says, "What for do we need git? Look here, I've got it all written down in my paper notebook."
...
And do you have any version control at all?
2 men use git. The rest of the team use numbered zip's at best. Though it's only 1 person with zip's I'm sure about.
Don't be scared. Software developed at nuclear power plants may serve different purposes, and no one has abolished hardware security yet. In that particular department, people collect and process statistical data. Yet the tendency to swamping is pretty obvious. I don't know why it happens, but the fact is certain. What's interesting, the larger the company, the more intense the swamping effect.
I want to point it out that stagnation in large companies is an international phenomenon. Things are quite the same abroad. There is an article on the subject, but I don't remember its title. I spent quite a time trying to find it, but in vain. If somebody knows it, give me the link please so that I could post it. In that article, a programmer is telling a story about him having worked at some military department. It was - naturally - awfully secret and bureaucratic - so much secret and bureaucratic that it took them several months to agree on which level of access permissions he could be granted to work on his computer. As a result, he was writing a program in Notepad (without compiling it) and was soon fired for inefficiency.
Now let's get back to our ex-colleague. Having come to his new office, he was struck with kind of a cultural shock. You see, after spending so much time and effort on studying and working with static analysis tools, it's very painful to watch people ignore even compiler warnings. It's just like a separate world where they program according to their own canons and even use their own terms. The man told me some stories about it, and most of all I liked the phrase "grounded pointers" common among the local programmers. See how close they are to the hardware aspect?
We are proud of having raised inside our team a skilled specialist who cares about the code quality and reliability. He hasn't silently accepted the established situation; he is trying to improve it.
As a start, he did the following. He studied the compiler warnings, then checked the project with Cppcheck and considered preventing typical mistakes in addition to making some fixes.
One of his first steps was preparing a paper aiming at improving the quality of the code created by the team. Introducing and integrating a static code analyzer into the development process might be the next step. It will certainly not be PVS-Studio: first, they work under Linux; second, it's very difficult to sell a software product to such companies. So, he has chosen Cppcheck for now. This tool is very fine for people to get started with the static analysis methodology.
I invite you to read the paper he has prepared. It is titled "The Way You Shouldn't Write Programs". Many of the items may look written pretty much in the Captain Obvious style. However, these are real problems the man tries to address.
Ignoring compiler warnings. When there are many of them in the list, you risk easily missing genuine errors in the lately written code. That's why you should address them all.
In the conditional statement of the 'if' operator, a variable is assigned a value instead of being tested for this value:
if (numb_numbc[i] = -1) { }
The code is compiled well in this case, but the compiler produces a warning. The correct code is shown below:
if (numb_numbc[i] == -1) { }
The statement "using namespace std;" written in header files may cause using this namespace in all the files which include this header, which in turn may lead to calling wrong functions or occurrence of name collisions.
Comparing signed variables to unsigned ones:
unsigned int BufPos;
std::vector<int> ba;
....
if (BufPos * 2 < ba.size() - 1) { }
Keep in mind that mixing signed and unsigned variables may result in:
The foregoing code sample incorrectly handles the situation of the 'ba' array being empty. The expression "ba.size() - 1" evaluates to an unsigned size_t value. If the array contains no items, the expression evaluates to 0xFFFFFFFFu.
Neglecting usage of constants may lead to overlooking hard-to-eliminate bugs. For example:
void foo(std::string &str)
{
if (str = "1234")
{
}
}
The '=' operator is mistakenly used instead of '=='. If the 'str' variable were declared as a constant, the compiler would not even compile the code.
Pointers to strings are compared instead of strings themselves:
char TypeValue [4];
...
if (TypeValue == "S") {}
Even if the string "S" is stored in the variable TypeValue, the comparison will always return 'false'. The correct way to compare strings is to use the special functions 'strcmp' or 'strncmp'.
Buffer overflow:
memset(prot.ID, 0, sizeof(prot.ID) + 1);
This code may cause several bytes of the memory area right after 'prot.ID' to be cleared as well.
Don't mix up sizeof() and strlen(). The sizeof() operator returns the complete size of an item in bytes. The strlen() function returns the string length in characters (without counting the null terminator).
Buffer underflow:
struct myStruct
{
float x, y, h;
};
myStruct *ptr;
....
memset(ptr, 0, sizeof(ptr));
In this case only N bytes will be cleared instead of the whole '*ptr' structure (N is the pointer size on the current platform). The correct way is to use the following code:
myStruct *ptr;
....
memset(ptr, 0, sizeof(*ptr));
Incorrect expression:
if (0 < L < 2 * M_PI) { }
The compiler doesn't see any error here, yet the expression is meaningless, for you will always get either 'true' or 'false' when executing it, the exact result depending on the comparison operators and boundary conditions. The compiler generates a warning for such expressions. The correct version of this code is this:
if (0 < L && L < 2 * M_PI) { }
unsigned int K;
....
if (K < 0) { }
...
if (K == -1) { }
Unsigned variables cannot be less than zero.
Comparing a variable to a value it can never reach. For example:
short s;
...
If (s==0xaaaa) { }
The compiler produces warnings against such things.
Memory is allocated with the help of 'new' or 'malloc', while forgotten to be freed through 'delete'/'free' correspondingly. It may look something like this:
void foo()
{
std::vector<int> *v1 = new std::vector<int>;
std::vector<int> v2;
v2->push_back(*v1);
...
}
Perhaps it was the pointer to 'std::vector<int>' that used to be saved into 'v2' before. Now, due to modifications of some code parts, it is no longer needed and there are just 'int' values being saved. At the same time, memory allocated for 'v1' is not freed, as that was not needed in earlier times. To fix the code we should add the statement 'delete v1' at the end of the function, or use smart pointers.
Even better is to bring refactoring to an end, making 'v1' a local object, since you no longer need to pass it anywhere:
void foo()
{
std::vector<int> v1;
std::vector<int> v2;
v2->push_back(v1[0]);
...
}
Memory is allocated through 'new[]' and freed through 'delete'. Or, vice versa, memory is allocated through 'new' and freed through 'delete[]'.
Using uninitialized variables:
int sum;
...
for (int i = 0; i < 10; i++)
{
sum++;
}
In C/C++, variables are not initialized to zero by default. Sometimes code only seems to run well, which is not so - it's merely luck.
A function returns a reference or pointer to local objects:
char* CreateName()
{
char FileName[100];
...
return FileName;
}
On leaving the function, 'FileName' will refer to an already freed memory area, since all the local objects are created on the stack, so it's impossible to handle it correctly any further.
Values returned by functions are not checked, while they may return an error code or '-1' in case of an error. It may happen that a function returns an error code, us continuing to work without noticing and reacting to it in any way, which will result in a sudden program crash at some point. Such defects take much time to debug after that.
Neglecting usage of special static and dynamic analysis tools, as well as creation and usage of unit-tests.
Being too greedy for adding some parentheses in math expressions, which results in the following:
D = ns_vsk.bit.D_PN_ml + (int)(ns_vsk.bit.D_PN_st) << 16;
In this case, addition is executed in the first place and only then left-shift is. See "Operation priorities in C/C++ ". Judging by the program logic, the order the operations must be executed in is quite reverse: shift first, then addition. A similar error occurs in the following fragment:
#define A 1
#define B 2
#define TYPE A | B
if (type & TYPE) { }
The error here is this: the programmer forgot to enclose the TYPE macro in parentheses. This results in first executing the 'type & A' expression and only then the '(type & A ) | B' expression. As a consequence, the condition is always true.
Array index out of bounds:
int mas[3];
mas[0] = 1;
mas[1] = 2;
mas[2] = 3;
mas[3] = 4;
The 'mas[3] = 4;' expression addresses a non-existing array item, since it follows from the declaration of the 'int mas[N]' array that its items can be indexed within the range [0...N-1].
Priorities of the logical operations '&&' and '||' are mixed up. The '&&' operator has a higher priority. Example of bad code:
if (A || B && C) { }
This may not conform to the required execution logic. It's often assumed that logical expressions are executed from left to right. The compiler generates warnings for such suspicious fragments.
An assigned value will have no effect outside the function:
void foo(int *a, int b)
{
If (b == 10)
{
*a = 10;
}
else
{
a = new int;
}
}
The pointer 'a' cannot be assigned a different address value. To do that, you need to declare the function in the following way:
void foo(int *&a, int b) {....}
or:
void foo(int **a, int b) {....}
I haven't drawn any specific and significant conclusions. I'm only sure that in one particular place the situation with software development is beginning to improve. And that's pleasant.
On the other hand, it makes me sad that many people haven't even heard of static analysis. And these people are usually responsible for serious and important affairs. The area of programming is developing very fast. As a result, those who are constantly "working at work" fail to keep track of contemporary tendencies and tools in the industry. They eventually grow to work much less efficiently than freelance programmers and programmers engaged in startups and small companies.
Thus we get a strange situation. A young freelancer can do his work better (because he has knowledge: TDD, continuous integration, static analysis, version control systems, and so on) than a programmer who has worked for 10 years at Russian Railways/nuclear power plant/... (add your variant of some large enterprise). Thank God, it's far not always like that. But still it happens.
Why am I feeling sad about this? I wish we could sell PVS-Studio to them. But they don't even have a slightest suspicion about existence and usefulness of such tools. :)
0