>
>
>
A 64-bit horse that can count

Andrey Karpov
Articles: 671

A 64-bit horse that can count

The article concerns the peculiarities of Visual C++ compiler's behavior when generating 64-bit code and possible errors relating to it.

Introduction

The phenomenon of "The Clever Hans", Mr. von Osten's horse, was described in 1911 [1]. The Clever Hans was famous because of his ability to read and solve mathematical problems by tapping with his front hoof. Of course, there were a lot of skeptics. That's why a team of experts tested Hans' abilities and proved that the horse was showing them without any help of Mr. von Osten. But how could a common horse possess such an intellectual level - a human one?! The psychologist O. Pfungst carried out some very thorough experiments and discovered that Hans received very faint unintentional hints from those who were asking him questions. For example, when people asked Hans about anything they started to stare at his front hoof with the help of which the horse "answered". But as soon as Hans had tapped the right number, they raised their eyes or head just a little waiting for him to finish his answer. And the horse, that had been trained to note and use these very subtle motions considered them as signals to stop his action. From aside it looked as if the horse had given the right answer to the question.

Such a wonderful horse it was that counted and solved arithmetic problems although he was unable to do it. 64-bit programs turned out to be such digital horses of the beginning of the 21st century, many of which cannot count either although are successful in pretending to do so. Let's consider this phenomenon in detail.

1. Potential errors

I am the author and co-author of some articles devoted to the problems of developing 64-bit applications. You can see the articles on our site. In these articles, I try to use the term "a potential error" or "a hidden error" rather than just "an error" [2, 3, 4].

This is explained by that one and the same code can be viewed upon as both correct and incorrect depending on its purpose. A simple example - using a variable of int type for indexing an array's items. If we address an array of graphics windows with the help of this variable, everything is alright. We never need, and moreover it is impossible, to operate billions of windows. But when we use a variable of int type for indexing an array's items in 64-bit mathematical programs or data bases, it can well be a problem when the number of items excesses 0..INT_MAX range.

But there is one more much subtler reason to call errors "potential". The point is that it depends not only on the input data but on the mood of the compiler's optimizer if an error occurs or not. I have been avoiding this topic for a long time for most of such errors occur explicitly in the debug-version and only in release-versions they are "potential". But not every program built as debug can be debugged at large data sizes. There is a situation when the debug-version is tested only at very small sizes of data. And overload testing and testing by end users at actual data is performed only in release-versions where errors can be temporarily hidden. That's why I decided to tell you what I know about it. I hope that I will manage to persuade you that it is dangerous to rely only on the checks of the execution stage (unit-tests, dynamic analysis, manual testing) when porting a program on a different platform. You will say that all this is meant for promoting Viva64 tool. Yes, you are right, but still read the horror stories I'm going to tell you. I am fond of telling them.

2. How it all begun

- Why do you have two identical JMPs in a row in your code?

- What if the first one wouldn't work?

I faced the peculiarities of Visual C++ 2005 compiler's optimization for the first time when developing PortSample program. This is a project included into Viva64 distribution kit and is intended for demonstrating all the errors which Viva64 analyzer diagnoses. The examples included into this project must work correctly in 32-bit mode and cause errors in 64-bit one. Everything was alright in the debug-version but I faced difficulties in the release-version. The code which was to lead to a hang or crash in 64-bit mode worked successfully! The cause lay in optimization. The solution consisted in additional redundant complication of the examples' code and adding "volatile" key words which you can see in PortSample project in a great number.

The same relates to Visual C++ 2008. The code differs a bit but everything written in this article can be applied both to Visual C++ 2005 and Visual C++ 2008. We won't make any difference between them further.

If you think that it is good that some errors don't occur, refuse this thought. Code with such errors becomes very unstable and a smallest change of it not relating directly to an error can cause change of the code's behavior. To make sure, I would like to point out that this is not the fault of the compiler but of the hidden defects of the code. Further, we will show sample phantom errors which disappear and occur in release-versions when smallest alterations of the code are introduced and which you have to hunt for a long time.

3. Phantoms

The section will be long and boring, so I will begin with a funny story which is an abstract of the section:

Once Heracles was walking by a lake and there he saw Hydra. He ran up to her and cut her single head off. But instead of one head two more grew. Heracles cut them off too but 4 more appeared. He cut the 4 heads off - and there were 8 ones... So passed one hour, two hours, three hours... And then Heracles cut Hydra's 32768 heads off and Hydra died for she was 16-bit.

Like in this funny story errors lie in types' overflow which can occur or fail to occur depending on the code the compiler will generate when optimization is enabled. Let's consider the first example of the code which works in release mode although it shouldn't be so:

int index = 0;
size_t arraySize = ...;
for (size_t i = 0; i != arraySize; i++)
  array[index++] = BYTE(i);

This code fills the whole array with values correctly even if the array's size is much larger than INT_MAX. Theoretically it is impossible because index variable has int type. Some time later, because of the overflow access to items by a negative index must occur. But optimization leads to generating the following code:

0000000140001040  mov         byte ptr [rcx+rax],cl 
0000000140001043  add         rcx,1 
0000000140001047  cmp         rcx,rbx 
000000014000104A  jne         wmain+40h (140001040h)

As you can see, 64-bit registers are used and there is no overflow. But let's alter the code a bit:

int index = 0;
for (size_t i = 0; i != arraySize; i++)
{
  array[index] = BYTE(index);
  ++index;
}

Let's consider that the code look more beautiful this way. I think you will agree that functionally it remains the same. But the result will be quite different - a program crash will occur. Let's examine the code generated by the compiler:

0000000140001040  movsxd      rcx,r8d 
0000000140001043  mov         byte ptr [rcx+rbx],r8b 
0000000140001047  add         r8d,1 
000000014000104B  sub         rax,1 
000000014000104F  jne         wmain+40h (140001040h)

That very overflow occurs that must occur in the previous example as well. r8d = 0x80000000 register's value extends into rcx as 0xffffffff80000000. The consequence is writing outside the limits of the array.

Let's consider another example of optimization and see how easy it is to spoil everything:

unsigned index = 0;
for (size_t i = 0; i != arraySize; ++i) {
  array[index++] = 1;
  if (array[i] != 1) {
    printf("Error\n");
    break;
  }
}

Assembler code:

0000000140001040  mov         byte ptr [rdx],1 
0000000140001043  add         rdx,1 
0000000140001047  cmp         byte ptr [rcx+rax],1 
000000014000104B  jne         wmain+58h (140001058h) 
000000014000104D  add         rcx,1 
0000000140001051  cmp         rcx,rdi 
0000000140001054  jne         wmain+40h (140001040h)

The compiler decided to use 64-bit register rdx for storing index variable. As a result the code may correctly process arrays with the size more than UINT_MAX.

But the world is fragile. It is enough just to complicate the code a bit and it becomes incorrect:

volatile unsigned volatileVar = 1;
...
unsigned index = 0;
for (size_t i = 0; i != arraySize; ++i) {
  array[index] = 1;
  index += volatileVar;
  if (array[i] != 1) {
    printf("Error\n");
    break;
  }
}

Using "index += volatileVar;" expression instead of index++ leads to participation of 32-bit registers in the code and therefore occurrence of overflows:

0000000140001040  mov        ecx,r8d 
0000000140001043  add        r8d,dword ptr [volatileVar (140003020h)] 
000000014000104A  mov        byte ptr [rcx+rax],1 
000000014000104E  cmp        byte ptr [rdx+rax],1 
0000000140001052  jne        wmain+5Fh (14000105Fh) 
0000000140001054  add        rdx,1 
0000000140001058  cmp        rdx,rdi 
000000014000105B  jne        wmain+40h (140001040h)

In conclusion I will give an interesting but large example. Unfortunately, I didn't manage to abridge it because it was necessary to show the behavior. It is this why such errors are dangerous for you cannot foresee the consequence of a smallest alteration of the code.

ptrdiff_t UnsafeCalcIndex(int x, int y, int width) {
  int result = x + y * width;
  return result;
}
...
int domainWidth = 50000;
int domainHeght = 50000;
for (int x = 0; x != domainWidth; ++x)
  for (int y = 0; y != domainHeght; ++y)
    array[UnsafeCalcIndex(x, y, domainWidth)] = 1;

This code cannot fill correctly the array consisting of 50000*50000 items. It is impossible because when calculating "int result = x + y * width;" an overflow must occur.

Miraculously the array is filled correctly in the release-version. UnsafeCalcIndex function integrates inside the loop and 64-bit registers are used:

0000000140001052  test        rsi,rsi 
0000000140001055  je          wmain+6Ch (14000106Ch) 
0000000140001057  lea         rcx,[r9+rax] 
000000014000105B  mov         rdx,rsi 
000000014000105E  xchg        ax,ax 
0000000140001060  mov         byte ptr [rcx],1 
0000000140001063  add         rcx,rbx 
0000000140001066  sub         rdx,1 
000000014000106A  jne         wmain+60h (140001060h) 
000000014000106C  add         r9,1 
0000000140001070  cmp         r9,rbx 
0000000140001073  jne         wmain+52h (140001052h)

All this takes place because UnsafeCalcIndex function is simple and can be integrate easily. But once you make it a bit more complicated or the compiler considers that it shouldn't be integrated, an error occurs at large data sizes.

Let's modify (complicate) UnsafeCalcIndex function a bit. Pay attention that the function's logic has not been changed at all:

ptrdiff_t UnsafeCalcIndex(int x, int y, int width) {
  int result = 0;
  if (width != 0)
    result = y * width;
  return result + x;
}

The result is a program crash when the array's limits are exceeded:

0000000140001050  test        esi,esi 
0000000140001052  je          wmain+7Ah (14000107Ah) 
0000000140001054  mov         r8d,ecx 
0000000140001057  mov         r9d,esi 
000000014000105A  xchg        ax,ax 
000000014000105D  xchg        ax,ax 
0000000140001060  mov         eax,ecx 
0000000140001062  test        ebx,ebx 
0000000140001064  cmovne      eax,r8d 
0000000140001068  add         r8d,ebx 
000000014000106B  cdqe             
000000014000106D  add         rax,rdx 
0000000140001070  sub         r9,1 
0000000140001074  mov         byte ptr [rax+rdi],1 
0000000140001078  jne         wmain+60h (140001060h) 
000000014000107A  add         rdx,1 
000000014000107E  cmp         rdx,r12 
0000000140001081  jne         wmain+50h (140001050h)

I think you have become bored by this moment. I am sorry. I just wanted to show you how simply an efficient 64-bit program may fail after introducing most harmless alterations into it or building it by another version of the compiler.

4. Diagnosis of potential errors

A program is a sequence of processing errors. (c) An unknown author

I suppose that many already existing 64-bit applications or those which will be soon ported on 64-bit systems, can suddenly spring more and more unpleasant surprises. A lot of defects may be found in them when increasing the size of input data which was unavailable for processing in 32-bit systems. Hidden defects can suddenly occur during further modification of the program code or change of libraries or a compiler.

Like in the story about the horse, the first impression can be deceptive. It can only seem to you that your program processes large data sizes successfully. You need to perform a more thorough check to see exactly if your 64-bit horse can actually count.

To make sure that a 64-bit program is correct, the minimum thing you can do is to use not only the release-version but the debug-version as well at all stages of testing. Keep in mind that it is a necessary but far not sufficient condition. If your tests use data sets which, for example, don't cover a large main memory size, an error can fail to occur both in release- and debug-versions [5]. It is necessary to extend unit-tests and data sets for overload and manual testing. It is necessary to make algorithms process new data combinations which are available only in 64-bit systems [6].

An alternative way of diagnosing 64-bit errors lies in using static analysis tools. It is much more radical and safe than guessing if you have added enough tests or not. It is convenient for it doesn't demand using the debug-version for crunching gigabytes of data.

The point of the method is to perform a full analysis of a project for a single time when porting the program and look through all the diagnostic messages on suspicious sections in the code. Many are frightened off by the list of thousands and tens of thousands of warnings. But the total time spent at once on analyzing them will be much less than the time spent on correcting various bug-reports appearing literally from nowhere for many years. It will be those very phantoms described above. Besides, when you start working with the list of warnings you will soon find out that most of them can be filtered and there will be much less work than you have expected. Further, you will have only to use static analysis for a new code and it doesn't take much time.

Of course, when speaking about a toolkit for searching 64-bit phantoms, I offer the tool that we develop - Viva64. By the way, this tool will soon be included into PVS-Studio which will unite all our static analysis tools.

To be more objective and avoid constantly being driven out from sites with this article as an advertising one, I will mention other tools as well. We should list Gimpel PC-Lint and Parasoft C++test. Rules for testing 64-bit errors are implemented in them too, but they possess less diagnostic abilities than a highly tailored Viva64 [7]. There is also Abraxas CodeCheck in the new version of which (14.5) functions of diagnosing 64-bit errors are also implemented but I don't possess more detailed information about it.

Conclusion

I will be glad if this article helps you master new platforms easier, for you will know what hidden problems can occur. Thank you for attention.

References