Webinar: C++ semantics - 06.11
The article concerns the peculiarities of Visual C++ compiler's behavior when generating 64-bit code and possible errors relating to it.
The phenomenon of "The Clever Hans", Mr. von Osten's horse, was described in 1911 [1]. The Clever Hans was famous because of his ability to read and solve mathematical problems by tapping with his front hoof. Of course, there were a lot of skeptics. That's why a team of experts tested Hans' abilities and proved that the horse was showing them without any help of Mr. von Osten. But how could a common horse possess such an intellectual level - a human one?! The psychologist O. Pfungst carried out some very thorough experiments and discovered that Hans received very faint unintentional hints from those who were asking him questions. For example, when people asked Hans about anything they started to stare at his front hoof with the help of which the horse "answered". But as soon as Hans had tapped the right number, they raised their eyes or head just a little waiting for him to finish his answer. And the horse, that had been trained to note and use these very subtle motions considered them as signals to stop his action. From aside it looked as if the horse had given the right answer to the question.
Such a wonderful horse it was that counted and solved arithmetic problems although he was unable to do it. 64-bit programs turned out to be such digital horses of the beginning of the 21st century, many of which cannot count either although are successful in pretending to do so. Let's consider this phenomenon in detail.
I am the author and co-author of some articles devoted to the problems of developing 64-bit applications. You can see the articles on our site. In these articles, I try to use the term "a potential error" or "a hidden error" rather than just "an error" [2, 3, 4].
This is explained by that one and the same code can be viewed upon as both correct and incorrect depending on its purpose. A simple example - using a variable of int type for indexing an array's items. If we address an array of graphics windows with the help of this variable, everything is alright. We never need, and moreover it is impossible, to operate billions of windows. But when we use a variable of int type for indexing an array's items in 64-bit mathematical programs or data bases, it can well be a problem when the number of items excesses 0..INT_MAX range.
But there is one more much subtler reason to call errors "potential". The point is that it depends not only on the input data but on the mood of the compiler's optimizer if an error occurs or not. I have been avoiding this topic for a long time for most of such errors occur explicitly in the debug-version and only in release-versions they are "potential". But not every program built as debug can be debugged at large data sizes. There is a situation when the debug-version is tested only at very small sizes of data. And overload testing and testing by end users at actual data is performed only in release-versions where errors can be temporarily hidden. That's why I decided to tell you what I know about it. I hope that I will manage to persuade you that it is dangerous to rely only on the checks of the execution stage (unit-tests, dynamic analysis, manual testing) when porting a program on a different platform. You will say that all this is meant for promoting Viva64 tool. Yes, you are right, but still read the horror stories I'm going to tell you. I am fond of telling them.
- Why do you have two identical JMPs in a row in your code?
- What if the first one wouldn't work?
I faced the peculiarities of Visual C++ 2005 compiler's optimization for the first time when developing PortSample program. This is a project included into Viva64 distribution kit and is intended for demonstrating all the errors which Viva64 analyzer diagnoses. The examples included into this project must work correctly in 32-bit mode and cause errors in 64-bit one. Everything was alright in the debug-version but I faced difficulties in the release-version. The code which was to lead to a hang or crash in 64-bit mode worked successfully! The cause lay in optimization. The solution consisted in additional redundant complication of the examples' code and adding "volatile" key words which you can see in PortSample project in a great number.
The same relates to Visual C++ 2008. The code differs a bit but everything written in this article can be applied both to Visual C++ 2005 and Visual C++ 2008. We won't make any difference between them further.
If you think that it is good that some errors don't occur, refuse this thought. Code with such errors becomes very unstable and a smallest change of it not relating directly to an error can cause change of the code's behavior. To make sure, I would like to point out that this is not the fault of the compiler but of the hidden defects of the code. Further, we will show sample phantom errors which disappear and occur in release-versions when smallest alterations of the code are introduced and which you have to hunt for a long time.
The section will be long and boring, so I will begin with a funny story which is an abstract of the section:
Once Heracles was walking by a lake and there he saw Hydra. He ran up to her and cut her single head off. But instead of one head two more grew. Heracles cut them off too but 4 more appeared. He cut the 4 heads off - and there were 8 ones... So passed one hour, two hours, three hours... And then Heracles cut Hydra's 32768 heads off and Hydra died for she was 16-bit.
Like in this funny story errors lie in types' overflow which can occur or fail to occur depending on the code the compiler will generate when optimization is enabled. Let's consider the first example of the code which works in release mode although it shouldn't be so:
int index = 0;
size_t arraySize = ...;
for (size_t i = 0; i != arraySize; i++)
array[index++] = BYTE(i);
This code fills the whole array with values correctly even if the array's size is much larger than INT_MAX. Theoretically it is impossible because index variable has int type. Some time later, because of the overflow access to items by a negative index must occur. But optimization leads to generating the following code:
0000000140001040 mov byte ptr [rcx+rax],cl
0000000140001043 add rcx,1
0000000140001047 cmp rcx,rbx
000000014000104A jne wmain+40h (140001040h)
As you can see, 64-bit registers are used and there is no overflow. But let's alter the code a bit:
int index = 0;
for (size_t i = 0; i != arraySize; i++)
{
array[index] = BYTE(index);
++index;
}
Let's consider that the code look more beautiful this way. I think you will agree that functionally it remains the same. But the result will be quite different - a program crash will occur. Let's examine the code generated by the compiler:
0000000140001040 movsxd rcx,r8d
0000000140001043 mov byte ptr [rcx+rbx],r8b
0000000140001047 add r8d,1
000000014000104B sub rax,1
000000014000104F jne wmain+40h (140001040h)
That very overflow occurs that must occur in the previous example as well. r8d = 0x80000000 register's value extends into rcx as 0xffffffff80000000. The consequence is writing outside the limits of the array.
Let's consider another example of optimization and see how easy it is to spoil everything:
unsigned index = 0;
for (size_t i = 0; i != arraySize; ++i) {
array[index++] = 1;
if (array[i] != 1) {
printf("Error\n");
break;
}
}
Assembler code:
0000000140001040 mov byte ptr [rdx],1
0000000140001043 add rdx,1
0000000140001047 cmp byte ptr [rcx+rax],1
000000014000104B jne wmain+58h (140001058h)
000000014000104D add rcx,1
0000000140001051 cmp rcx,rdi
0000000140001054 jne wmain+40h (140001040h)
The compiler decided to use 64-bit register rdx for storing index variable. As a result the code may correctly process arrays with the size more than UINT_MAX.
But the world is fragile. It is enough just to complicate the code a bit and it becomes incorrect:
volatile unsigned volatileVar = 1;
...
unsigned index = 0;
for (size_t i = 0; i != arraySize; ++i) {
array[index] = 1;
index += volatileVar;
if (array[i] != 1) {
printf("Error\n");
break;
}
}
Using "index += volatileVar;" expression instead of index++ leads to participation of 32-bit registers in the code and therefore occurrence of overflows:
0000000140001040 mov ecx,r8d
0000000140001043 add r8d,dword ptr [volatileVar (140003020h)]
000000014000104A mov byte ptr [rcx+rax],1
000000014000104E cmp byte ptr [rdx+rax],1
0000000140001052 jne wmain+5Fh (14000105Fh)
0000000140001054 add rdx,1
0000000140001058 cmp rdx,rdi
000000014000105B jne wmain+40h (140001040h)
In conclusion I will give an interesting but large example. Unfortunately, I didn't manage to abridge it because it was necessary to show the behavior. It is this why such errors are dangerous for you cannot foresee the consequence of a smallest alteration of the code.
ptrdiff_t UnsafeCalcIndex(int x, int y, int width) {
int result = x + y * width;
return result;
}
...
int domainWidth = 50000;
int domainHeght = 50000;
for (int x = 0; x != domainWidth; ++x)
for (int y = 0; y != domainHeght; ++y)
array[UnsafeCalcIndex(x, y, domainWidth)] = 1;
This code cannot fill correctly the array consisting of 50000*50000 items. It is impossible because when calculating "int result = x + y * width;" an overflow must occur.
Miraculously the array is filled correctly in the release-version. UnsafeCalcIndex function integrates inside the loop and 64-bit registers are used:
0000000140001052 test rsi,rsi
0000000140001055 je wmain+6Ch (14000106Ch)
0000000140001057 lea rcx,[r9+rax]
000000014000105B mov rdx,rsi
000000014000105E xchg ax,ax
0000000140001060 mov byte ptr [rcx],1
0000000140001063 add rcx,rbx
0000000140001066 sub rdx,1
000000014000106A jne wmain+60h (140001060h)
000000014000106C add r9,1
0000000140001070 cmp r9,rbx
0000000140001073 jne wmain+52h (140001052h)
All this takes place because UnsafeCalcIndex function is simple and can be integrate easily. But once you make it a bit more complicated or the compiler considers that it shouldn't be integrated, an error occurs at large data sizes.
Let's modify (complicate) UnsafeCalcIndex function a bit. Pay attention that the function's logic has not been changed at all:
ptrdiff_t UnsafeCalcIndex(int x, int y, int width) {
int result = 0;
if (width != 0)
result = y * width;
return result + x;
}
The result is a program crash when the array's limits are exceeded:
0000000140001050 test esi,esi
0000000140001052 je wmain+7Ah (14000107Ah)
0000000140001054 mov r8d,ecx
0000000140001057 mov r9d,esi
000000014000105A xchg ax,ax
000000014000105D xchg ax,ax
0000000140001060 mov eax,ecx
0000000140001062 test ebx,ebx
0000000140001064 cmovne eax,r8d
0000000140001068 add r8d,ebx
000000014000106B cdqe
000000014000106D add rax,rdx
0000000140001070 sub r9,1
0000000140001074 mov byte ptr [rax+rdi],1
0000000140001078 jne wmain+60h (140001060h)
000000014000107A add rdx,1
000000014000107E cmp rdx,r12
0000000140001081 jne wmain+50h (140001050h)
I think you have become bored by this moment. I am sorry. I just wanted to show you how simply an efficient 64-bit program may fail after introducing most harmless alterations into it or building it by another version of the compiler.
A program is a sequence of processing errors. (c) An unknown author
I suppose that many already existing 64-bit applications or those which will be soon ported on 64-bit systems, can suddenly spring more and more unpleasant surprises. A lot of defects may be found in them when increasing the size of input data which was unavailable for processing in 32-bit systems. Hidden defects can suddenly occur during further modification of the program code or change of libraries or a compiler.
Like in the story about the horse, the first impression can be deceptive. It can only seem to you that your program processes large data sizes successfully. You need to perform a more thorough check to see exactly if your 64-bit horse can actually count.
To make sure that a 64-bit program is correct, the minimum thing you can do is to use not only the release-version but the debug-version as well at all stages of testing. Keep in mind that it is a necessary but far not sufficient condition. If your tests use data sets which, for example, don't cover a large main memory size, an error can fail to occur both in release- and debug-versions [5]. It is necessary to extend unit-tests and data sets for overload and manual testing. It is necessary to make algorithms process new data combinations which are available only in 64-bit systems [6].
An alternative way of diagnosing 64-bit errors lies in using static analysis tools. It is much more radical and safe than guessing if you have added enough tests or not. It is convenient for it doesn't demand using the debug-version for crunching gigabytes of data.
The point of the method is to perform a full analysis of a project for a single time when porting the program and look through all the diagnostic messages on suspicious sections in the code. Many are frightened off by the list of thousands and tens of thousands of warnings. But the total time spent at once on analyzing them will be much less than the time spent on correcting various bug-reports appearing literally from nowhere for many years. It will be those very phantoms described above. Besides, when you start working with the list of warnings you will soon find out that most of them can be filtered and there will be much less work than you have expected. Further, you will have only to use static analysis for a new code and it doesn't take much time.
Of course, when speaking about a toolkit for searching 64-bit phantoms, I offer the tool that we develop - Viva64. By the way, this tool will soon be included into PVS-Studio which will unite all our static analysis tools.
To be more objective and avoid constantly being driven out from sites with this article as an advertising one, I will mention other tools as well. We should list Gimpel PC-Lint and Parasoft C++test. Rules for testing 64-bit errors are implemented in them too, but they possess less diagnostic abilities than a highly tailored Viva64 [7]. There is also Abraxas CodeCheck in the new version of which (14.5) functions of diagnosing 64-bit errors are also implemented but I don't possess more detailed information about it.
I will be glad if this article helps you master new platforms easier, for you will know what hidden problems can occur. Thank you for attention.
0