To get a trial key
fill out the form below
Team License (a basic version)
Enterprise License (extended version)
* By clicking this button you agree to our Privacy Policy statement

Request our prices
New License
License Renewal
--Select currency--
USD
EUR
GBP
RUB
* By clicking this button you agree to our Privacy Policy statement

Free PVS-Studio license for Microsoft MVP specialists
* By clicking this button you agree to our Privacy Policy statement

To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

I am interested to try it on the platforms:
* By clicking this button you agree to our Privacy Policy statement

Message submitted.

Your message has been sent. We will email you at


If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

>
>
>
Undefined behavior is closer than you t…

Undefined behavior is closer than you think

Feb 05 2016
Author:

Some people think that undefined behavior is caused only by gross errors (accessing outside the bounds of the array, for instance) or inadequate constructions (i = i++ + ++i, for example). That's why it is quite surprising when a programmer sees undefined behavior in the code that used to work correctly, without arousing any suspicion. One should never let his guard down, programming in C/C++. Because hell is closer than you may think.

0374_Undefined_behavior_is_closer_than_you_think/image1.png

Error description

It's been a while since I've written anything on the topic of 64-bit errors. It's time to relive a bit of the past. In this case undefined will appear in a 64-bit program.

Consider incorrect synthetic code fragment.

size_t Count = 1024*1024*1024; // 1 Gb
if (is64bit)
  Count *= 5; // 5 Gb
char *array = (char *)malloc(Count);
memset(array, 0, Count);

int index = 0;
for (size_t i = 0; i != Count; i++)
  array[index++] = char(i) | 1;

if (array[Count - 1] == 0)
  printf("The last array element contains 0.\n");

free(array);

This code works correctly if the programmer compiles a 32-bit version of the program. But if you compile the 64-bit program, everything is way more interesting.

64-bit program allocates a 5 gigabyte array/buffer and initially fills it with zeros. The loop then modifies it, filling it with non-zero values we use "| 1" to ensure this.

And now try to guess how the code will run if it is compiled in x64 mode using Visual Studio 2015? Have you got the answer? If yes, then let's continue.

If you run a debug version of this program, it'll crash because it'll index out of bounds At some point the index variable will overflow and its value will become −2147483648 (INT_MIN).

Sounds logical, right? Nothing of the kind! This is an undefined behavior and really anything can happen.

Additional links:

When I or somebody else says that this is an example of undefined behavior, people start grumbling. I don't know why, but it feels like they assume that they know absolutely everything about C++ and how compilers work.

Usually we get comments like:

"This is some theoretical nonsense. Well, yes, formally the 'int' overflow leads to an undefined behavior. But it's nothing more but some jabbering. In practice, we can always tell what we will get. If you add 1 to INT_MAX then we'll have INT_MIN. Maybe somewhere in the universe there are some exotic architectures, but my Visual C++ / GCC compiler gives a correct result."

And now without any magic, I will give a demonstration of UB using a simple example and not on some fairy architecture either, but a Win64-program.

It would be enough to build the example given above in the Release x64 mode and run it. The program will cease crashing and the warning "the last array element contains 0" won't be issued.

The undefined behavior reveals itself in the following way. The array will be completely filled, in spite of the fact that index variable isn't wide enough to index all the array elements. Those who still don't believe me should have a look at the assembly code:

  int index = 0;
  for (size_t i = 0; i != Count; i++)
000000013F6D102D  xor         ecx,ecx  
000000013F6D102F  nop  
    array[index++] = char(i) | 1;
000000013F6D1030  movzx       edx,cl  
000000013F6D1033  or          dl,1  
000000013F6D1036  mov         byte ptr [rcx+rbx],dl  
000000013F6D1039  inc         rcx  
000000013F6D103C  cmp         rcx,rdi  
000000013F6D103F  jne         main+30h (013F6D1030h)

Here is the UB! And no exotic compilers were used, it's just VS2015.

If you replace 'int' with 'unsigned' the undefined behavior will disappear. The array will be only partially filled and at the end we will have a message - "the last array element contains 0".

Assembly code with the 'unsigned':

  unsigned index = 0;
000000013F07102D  xor         r9d,r9d  
  for (size_t i = 0; i != Count; i++)
000000013F071030  mov         ecx,r9d  
000000013F071033  nop         dword ptr [rax]  
000000013F071037  nop         word ptr [rax+rax]  
    array[index++] = char(i)| 1;
000000013F071040  movzx       r8d,cl  
000000013F071044  mov         edx,r9d  
000000013F071047  or          r8b,1  
000000013F07104B  inc         r9d  
000000013F07104E  inc         rcx  
000000013F071051  mov         byte ptr [rdx+rbx],r8b  
000000013F071055  cmp         rcx,rdi  
000000013F071058  jne         main+40h (013F071040h)

Notes about PVS-Studio

PVS-Studio analyzer doesn't detect character variable overflow straight away. This is not a very rewarding job. It's almost impossible to predict what value one or another variable will have or if the overflow will or will not occur. However, the analyzer can track down some erroneous patterns that it refers to as "64-bit errors".

To tell the truth, there are no 64-bit errors. There are just those that are connected with undefined behavior, that have been hibernating in 32-bit code, and these then show up when we recompile in x64 mode. But if we speak about UB, it wouldn't sound very interesting and people won't buy it (PVS-Studio that is). On top of it all, they won't believe that there are any issues with it. But if the analyzer says that this variable can overflow in the loop, and it is a "64-bit error", then it's a different story. Profit.

The code we gave above is considered incorrect by the analyzer, and it issues a warning related to 64-bit portability issues. The idea is the following: In Win32 mode size_t is 32-bits and we cannot allocate a 5 gigabyte array, so everything works fine. In Win64, where size_t is 64-bits, we work with a bigger array, which requires more memory. So the 64-bit code won't work. 32-bit code works, but the 64-bit code fails. PVS-Studio calls it a 64-bit error.

Here are diagnostic messages that PVS-Studio will issue for the code given in th ebeginning:

  • V127 An overflow of the 32-bit 'index' variable is possible inside a long cycle which utilizes a memsize-type loop counter. consoleapplication1.cpp 16
  • V108 Incorrect index type: array[not a memsize-type]. Use memsize type instead. consoleapplication1.cpp 16

More details about 64-bit traps:

Correct code

You must use proper data types for your programs to run properly. If you are going to work with large-size arrays, forget about int and unsigned. The proper types are ptrdiff_t, intptr_t, size_t, DWORD_PTR, std::vector::size_type and so on. In this case it is size_t:

size_t index = 0;
for (size_t i = 0; i != Count; i++)
  array[index++] = char(i) | 1;

Conclusion

If the C++ language rules result in undefined behavior, don't argue with them or try to predict the way they'll behave in the future. Just don't write such dangerous code.

There are a whole lot of stubborn programmers who don't want to see anything suspicious in shifting negative numbers, comparing 'this' with null or signed types overflowing.

Don't be like that. The fact that the program is working now doesn't mean that everything is fine. The way UB will reveal itself is impossible to predict. Expected program behavior is one of the variants of UB.

Popular related articles
Appreciate Static Code Analysis!

Date: Oct 16 2017

Author: Andrey Karpov

I am really astonished by the capabilities of static code analysis even though I am one of the developers of PVS-Studio analyzer myself. The tool surprised me the other day as it turned out to be sma…
The Ultimate Question of Programming, Refactoring, and Everything

Date: Apr 14 2016

Author: Andrey Karpov

Yes, you've guessed correctly - the answer is "42". In this article you will find 42 recommendations about coding in C++ that can help a programmer avoid a lot of errors, save time and effort. The au…
PVS-Studio for Java

Date: Jan 17 2019

Author: Andrey Karpov

In the seventh version of the PVS-Studio static analyzer, we added support of the Java language. It's time for a brief story of how we've started making support of the Java language, how far we've co…
Characteristics of PVS-Studio Analyzer by the Example of EFL Core Libraries, 10-15% of False Positives

Date: Jul 31 2017

Author: Andrey Karpov

After I wrote quite a big article about the analysis of the Tizen OS code, I received a large number of questions concerning the percentage of false positives and the density of errors (how many erro…
Free PVS-Studio for those who develops open source projects

Date: Dec 22 2018

Author: Andrey Karpov

On the New 2019 year's eve, a PVS-Studio team decided to make a nice gift for all contributors of open-source projects hosted on GitHub, GitLab or Bitbucket. They are given free usage of PVS-Studio s…
The way static analyzers fight against false positives, and why they do it

Date: Mar 20 2017

Author: Andrey Karpov

In my previous article I wrote that I don't like the approach of evaluating the efficiency of static analyzers with the help of synthetic tests. In that article, I give the example of a code fragment…
Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities

Date: Nov 21 2018

Author: Andrey Karpov

A brief description of technologies used in the PVS-Studio tool, which let us effectively detect a large number of error patterns and potential vulnerabilities. The article describes the implementati…
The Evil within the Comparison Functions

Date: May 19 2017

Author: Andrey Karpov

Perhaps, readers remember my article titled "Last line effect". It describes a pattern I've once noticed: in most cases programmers make an error in the last line of similar text blocks. Now I want t…
How PVS-Studio Proved to Be More Attentive Than Three and a Half Programmers

Date: Oct 22 2018

Author: Andrey Karpov

Just like other static analyzers, PVS-Studio often produces false positives. What you are about to read is a short story where I'll tell you how PVS-Studio proved, just one more time, to be more atte…
PVS-Studio ROI

Date: Jan 30 2019

Author: Andrey Karpov

Occasionally, we're asked a question, what monetary value the company will receive from using PVS-Studio. We decided to draw up a response in the form of an article and provide tables, which will sho…

Comments (0)

Next comments
This website uses cookies and other technology to provide you a more personalized experience. By continuing the view of our web-pages you accept the terms of using these files. If you don't want your personal data to be processed, please, leave this site.
Learn More →
Accept