>
>
Ineffectiveness of last() in the real w…

Andrey Karpov
Articles: 673

Ineffectiveness of last() in the real world

While studying at the institute and learning different data processing algorithms, I already knew that the necessity of using such a function as last() for one-way list can indicate an unfortunate choice of the data structure and lead to ineffective algorithm. Although I knew it long ago, I haven't faced this in practice until recently.

It began with a complaint that the static analyzer Viva64 hangs when analyzing one project, or rather one file of the project. Analysis showed that the analyzer doesn't hang at all and on the contrary works very hard. Very hard and very long. I think even this analyzer can analyze it successfully in approximately 5 or 10 hours :). But this became clear later and of course such behavior should be considered an error.

Frankly speaking, at the beginning I was astonished by this behavior. Viva64 analyzer operates rather quickly, and average time of analyzing one file takes just a few seconds. And its operating time is much less than time needed for preprocessing files. Preprocessing is executed by Visual C++ compiler and we cannot speed up this process yet. That's why, roughly speaking, the time of testing a project is close to the time of preprocessing all the files. And suddenly we discover such a great ineffectiveness in the analysis algorithm!

Investigations showed that hanging occurs when processing the file which contains some resources from Qt library, more exactly array whose size... well, I don't know its size, but I know that this is the largest array I've ever seen. It looks as follows:

static const unsigned char qt_resource_data[] = {
  0x0,0x0,0x0,0xc5,
  0x51,
  0x46,0x72,0x61,0x6d,0x65,0x20,0x7b,
  0xd,0xa,0x20,0x20,0x20,0x20,0x6d,0x69,0x6e,
  0x2d,0x77,0x69,0x64,0x74,0x68,0x3a,0x20,
  0x36,0x70,0x78,0x3b,0xd,0xa,0x20,0x20,
  0x20,0x20,0x6d,0x69,0x6e,0x2d,0x68,0x65,
  0x69,0x67,0x68,0x74,0x3a,0x20,0x36,0x30,
  . . .

This array occupies about 9 Mb in the file. This is a lot but it's not reason for the analyzer for such a bad behavior.

The problem was the already mentioned last() function. Viva64 analyzer is built on VivaCore library. And VivaCore is built in its turn on OpenC++ library from which it inherited data representation as trees and lists. By the way, data processing in OpenC++ (and VivaCore as well) reminds a lisp program very much. There are a lot of functions of the type like Eq, Car, Cdr, First, Rest, Second, Nth, List etc.

And there is also last() function which is used when creating a list of items for initializing the array. In the pseudocode it looks like this:

while (some items still remain)
{
  Ptree *e = to take another item;
  Ptree *last = Last(eList);
  last->SetCdr(Cons(e, 0));
}

In general the point is to take the following array's item and add it to the end of the list. To simplify search of the end we use last() function. OpenC++'s author obviously didn't suggest that the algorithm will be forced to process the array of millions of items. On small arrays it's OK, but to the time of such an algorithm increases in proportion to the squared number of items (to be more exact - the triangular number 0.5 * n * (n + 1), where n is the number of items).

The simplest optimization consisting in keeping the pointer to the last item of the list (what allows you not to call last() function) literally worked a miracle. Processing of the file which earlier took more than 3 hours (I didn't manage to wait more than 3 hours :), now takes several seconds.

Well, all we can say - be careful before you use functions like last(). You can never tell what your algorithm will be sooner or later palmed off. I wish you luck in optimization!