How to make fewer errors at the stage of code writing. Part N3

Introduction
1. Process variables in the same order as they are defined
2. Table-driven methods are good
3. Various interesting things
Instead of summary
References

This is the third article where I will tell you about a couple of new programming methods that can help you make your code simpler and safer.

You may read the previous two posts here [1] and here [2]. This time we will take samples from the Qt project.

Introduction

It was not accidentally that I got the Qt 4.7.3. project for investigation. PVS-Studio users noticed that analysis is a bit weak when it comes to checking projects based on the Qt library. It's no wonder. What enables static analysis to detect errors is studying the code at a higher level than a compiler. Consequently, it must know certain code patterns and what functions of various libraries do. Otherwise, it will overlook many nice defects. Let me explain this by an example:

if (strcmp(My_Str_A, My_Str_A) == 0)

It's unreasonable to compare a string to itself. But the compiler keeps silent, it doesn't think about the essence of strcmp() function; the compiler has its own business. But static analyzers might suspect there's something wrong here. Qt has its own type of a string comparison function - qstrcmp(). Therefore, the analyzer must be taught to pay attention to this line:

if (qstrcmp(My_Str_A, My_Str_A) == 0)

Studying the Qt library and creating specialized diagnostics is a large and regular work. Verification of the library itself has become the beginning of this work.

On finishing studying the warnings, several new ideas occurred to me on how to improve the source code and I hope you will find these ideas interesting and useful as well.

1. Process variables in the same order as they are defined

The code of the Qt library is of a very high quality and it's almost free of errors. But we found a lot of unnecessary initializations, comparisons and variable value copying.

Here are a couple of samples to make the point clearer:

QWidget *WidgetFactory::createWidget(...)
{
  ...
  } else if (widgetName == m_strings.m_qDockWidget) { <<<===
    w = new QDesignerDockWidget(parentWidget);            
  } else if (widgetName == m_strings.m_qMenuBar) {
    w = new QDesignerMenuBar(parentWidget);
  } else if (widgetName == m_strings.m_qMenu) {
    w = new QDesignerMenu(parentWidget);
  } else if (widgetName == m_strings.m_spacer) {
    w = new Spacer(parentWidget);
  } else if (widgetName == m_strings.m_qDockWidget) { <<<===
    w = new QDesignerDockWidget(parentWidget);
  ...
}

One and the same comparison is repeated twice here. This is not an error but an absolutely excessive code. This is another similar example:

void QXmlStreamReaderPrivate::init()
{
  tos = 0;  <<<===
  scanDtd = false;
  token = -1;
  token_char = 0;
  isEmptyElement = false;
  isWhitespace = true;
  isCDATA = false;
  standalone = false;
  tos = 0;  <<<===
  ...
}

Again it is not an error but an absolutely unnecessary duplicated variable initialization. I've found a lot of such duplicated operations in the code. They occur because of long lists of comparisons, assignments and initializations. The programmer just doesn't see that a variable is already being processed and introduces excessive operations. I can name three unpleasant consequences of such duplicated actions:

1. Duplicates lengthen the code. The longer is the code, the more probable it is that you will add one more duplicate.

2. If we want to change the program's logic and remove one check or one assignment, a duplicate of this operation will present us with several hours of captivating debugging. Imagine you write 'tos = 1' (see the first sample) and then wonder why 'tos' still equals zero in a different part of the program.

3. Operation slowdown. You can usually ignore it in such cases, but it is still there.

I hope I have managed to persuade you that there must be no duplicates in your code. How to fight them? Usually such initializations/comparisons go in a block. There is also a similar block of variables. It is reasonable to write code so that the order of defining variables and the order of handling them coincides. Below is an example of not so good source code:

struct T {
  int x, y, z;
  float m;
  int q, w, e, r, t;
} A;
...
A.m = 0.0;
A.q = 0;
A.x = 0;
A.y = 0;
A.z = 0;
A.q = 0;
A.w = 0;
A.r = 1;
A.e = 1;
A.t = 1;

This is just a conceptual sample, of course. The point is that when initialization is not sequential, you are more inclined to write two identical lines. In the code above, the 'q' variable is initialized twice. And the error is not clearly visible when you are just glancing through the code. Now, if you initialize the variables in the same sequence as they are defined, such an error will simply have no chance of occurring. Here is the improved version of the source code:

struct T {
  int x, y, z;
  float m;
  int q, w, e, r, t;
} A;
...
A.x = 0;
A.y = 0;
A.z = 0;
A.m = 0.0;
A.q = 0;
A.w = 0;
A.e = 1;
A.r = 1;
A.t = 1;

Of course, I know that sometimes you cannot do so (utilize variables in the same order as they are defined). But it is often possible and useful. One more advantage of this method is that the code navigation is much simpler.

Recommendation. While adding a new variable, try to initialize and handle it in correspondence to its position in relation to other variables.

2. Table-driven methods are good

S. McConnell wrote very well on table-driven methods in the book "Code Complete", in chapter N18 [3]:

A table-driven method is a scheme that allows you to look up information in a table rather than using logic statements ( if and case ) to figure it out. Virtually anything you can select with logic statements, you can select with tables instead. In simple cases, logic statements are easier and more direct. As the logic chain becomes more complex, tables become increasingly attractive.

Well, it's a pity that programmers still prefer huge switch()'s or thick forests of if-else constructs. It is very hard to overcome this habit. You are thinking: "well, one more case" or "this small 'if' won't do any harm". But it will. Sometimes even skillful programmers poorly add new conditions. Here are a couple of examples of defects found in Qt.

int QCleanlooksStyle::pixelMetric(...)
{
  int ret = -1;
  switch (metric) {
    ...
    case PM_SpinBoxFrameWidth:
      ret = 3;
      break;
    case PM_MenuBarItemSpacing:
      ret = 6;
    case PM_MenuBarHMargin:
      ret = 0;
      break;
    ...
}

A very-very long switch() it was. And, naturally, there is a lost 'break' operator. The analyzer found this error by finding out that the 'ret' variable is assigned different values one after another twice.

It would probably be much better if the programmer defined a std::map<PixelMetric, int> and used a table to explicitly define the correspondence between metrics and numbers. You can also work out some other versions of table-driven methods for this function's implementation.

One more example:

QStringList ProFileEvaluator::Private::values(...)
{
  ...
  else if (ver == QSysInfo::WV_NT)
    ret = QLatin1String("WinNT");
  else if (ver == QSysInfo::WV_2000)
    ret = QLatin1String("Win2000");
  else if (ver == QSysInfo::WV_2000)  <<<=== 2003
    ret = QLatin1String("Win2003");
  else if (ver == QSysInfo::WV_XP)
    ret = QLatin1String("WinXP");
  ...
}

The 'ver' variable is compared to the WV_2000 constant twice. It is a good example where the table-driven method would work quite well. For instance, this method could look like that:

struct {
  QSysInfo::WinVersion m_ver;
  const char *m_str;
} Table_WinVersionToString[] = {
  { WV_Me,   "WinMe" },
  { WV_95,   "Win95" },
  { WV_98,   "Win98" },
  { WV_NT,   "WinNT" },
  { WV_2000, "Win2000" },
  { WV_2003, "Win2003" },
  { WV_XP,   "WinXP" },
  { WV_VISTA,"WinVista" }
};

ret = QLatin1String("Unknown");
for (size_t i = 0; i != count_of(Table_WinVersionToString); ++i)
  if (Table_WinVersionToString[i].m_ver == ver)
    ret = QLatin1String(Table_WinVersionToString[i].m_str);

This is just conceptual, of course, but it demonstrates the idea of table-driven methods very well. You agree that it's much easier to find an error in this table, don't you?

Recommendation. Do not be lazy to write a function using table-driven methods. Yes, it will take you some time but it will be repaid later. Adding new conditions will be easier and faster while errors will be much less probable.

3. Various interesting things

Since Qt is a large library, you may come across various errors in it despite the high quality. That's the law of large numbers which starts working here. The size of *.cpp, *.h and other similar files of the Qt project is about 250 Mbytes. No matter how unlikely an error is, you may well come across it in a large source code. I can't give you any recommendations on the basis of other errors I have found in Qt. So I will just describe some errors I liked.

QString decodeMSG(const MSG& msg)
{
  ...
  int repCount     = (lKeyData & 0xffff);        // Bit 0-15
  int scanCode     = (lKeyData & 0xf0000) >> 16; // Bit 16-23
  bool contextCode = (lKeyData && 0x20000000);   // Bit 29
  bool prevState   = (lKeyData && 0x40000000);   // Bit 30
  bool transState  = (lKeyData && 0x80000000);   // Bit 31
  ...
}

The && operator is used accidentally instead of &. Note how useful it is to have comments in code: you can see clearly that it's an error and how bits must be actually processed.

The next example is to the issue of long expressions:

static ShiftResult shift(...)
{
  ...
  qreal l = (orig->x1 - orig->x2)*(orig->x1 - orig->x2) +
            (orig->y1 - orig->y2)*(orig->y1 - orig->y1) *
            (orig->x3 - orig->x4)*(orig->x3 - orig->x4) +
            (orig->y3 - orig->y4)*(orig->y3 - orig->y4);
  ...
}

Can you see an error? Right, you can't see it right away. Ok, I will prompt you. The problem is here: "orig->y1 - orig->y1". I'm also confused by the third multiplication, but perhaps it should be so.

Yes, one more question. You have such blocks of calculations in your programs too, don't you? Isn't it time to try the PVS-Studio static code analyzer? Well, a bit of advertisement that was. Ok, let's go on.

Using of uninitialized variables. You may find them in any large application:

PassRefPtr<Structure> 
Structure::getterSetterTransition(Structure* structure)
{
  ...
  RefPtr<Structure> transition = create(
    structure->storedPrototype(), structure->typeInfo());
  transition->m_propertyStorageCapacity = 
    structure->m_propertyStorageCapacity;
  transition->m_hasGetterSetterProperties = 
    transition->m_hasGetterSetterProperties;
  transition->m_hasNonEnumerableProperties = 
    structure->m_hasNonEnumerableProperties;
  transition->m_specificFunctionThrashCount = 
    structure->m_specificFunctionThrashCount;
  ...
}

Again I should prompt you not to make you strain your eyes. You should look at variable initialization 'transition->m_hasGetterSetterProperties'.

I'm sure that virtually each of you, when only starting your way in programming, made a mistake like this:

const char *p = ...;
if (p == "12345")

And only then you got aware what for you needed such functions (strange at first sight) as strcmp(). Unfortunately, the C++ language is so much stern that you might make this kind of mistake even many years later being an expert developer:

const TCHAR* getQueryName() const;
...
Query* MultiFieldQueryParser::parse(...)
{
  ...
  if (q && (q->getQueryName() != _T("BooleanQuery") ...
  ...
}

Well, what else can I show you? Here is, for instance, an incorrectly written swap of variables' values.

bool qt_testCollision(...)
{
  ...
  t=x1; x1=x2; x2=t;
  t=y1; x1=y2; y2=t;
  ...
}

This is an example of how you may make a mistake even in a very simple code. Well, I haven't shown you samples on array overrun. Here you are:

bool equals( class1* val1, class2* val2 ) const
{
  ...
  size_t size = val1->size();
  ...
  while ( --size >= 0 ){
    if ( !comp(*itr1,*itr2) )
      return false;
    itr1++;
    itr2++;
  }
  ...
}

The condition "‑‑size >= 0" is always true since the size variable is of the unsigned type. If identical sequences are compared, an array overrun will occur.

I could go on. I hope that you, as programmers, understand that we cannot describe all the errors from a project of the size like that in one article. So, the last one for dessert:

STDMETHODIMP QEnumPins::QueryInterface(const IID &iid,void **out)
{
  ...
  if (S_OK)
    AddRef();
  return hr;
}

There must be something like "if (hr == S_OK)" or "if (SUCCEEDED(hr))". The S_OK macro is nothing more than 0. That's why the bug with incorrect calculation of number of references is inevitable.

Instead of summary

Thank you for your attention. Use static code analysis to save a lot of time for more useful things than code debugging and maintenance.

I will also appreciate if you, the readers, will send me samples of interesting errors you found in your own code or somebody else's code, for which we could implement diagnostic rules.

References

Andrey Karpov. How to make fewer errors at the stage of code writing. Part N1. http://www.viva64.com/en/a/0070/
Andrey Karpov. How to make fewer errors at the stage of code writing. Part N2. http://www.viva64.com/en/a/0072/
3.Steve McConnell, "Code Complete, 2nd Edition" Microsoft Press, Paperback, 2nd edition, Published June 2004, 914 pages, ISBN: 0-7356-1967-0.