Pour obtenir une clé
d'essai remplissez le formulaire ci-dessous
Demandez des tariffs
Nouvelle licence
Renouvellement de licence
--Sélectionnez la devise--
* En cliquant sur ce bouton, vous acceptez notre politique de confidentialité

Free PVS-Studio license for Microsoft MVP specialists
To get the licence for your open-source project, please fill out this form
** En cliquant sur ce bouton, vous acceptez notre politique de confidentialité.

I am interested to try it on the platforms:
** En cliquant sur ce bouton, vous acceptez notre politique de confidentialité.

Votre message a été envoyé.

Nous vous répondrons à

Si vous n'avez toujours pas reçu de réponse, vérifiez votre dossier
Spam/Junk et cliquez sur le bouton "Not Spam".
De cette façon, vous ne manquerez la réponse de notre équipe.

64-bit programs and floating-point calc…

64-bit programs and floating-point calculations

18 Aoû 2010

A developer who is porting his Windows-application to the 64-bit platform sent a letter to our support service with a question about using floating-point calculations. By his permission we publish the answer to this question in the blog since this topic might be interesting for other developers as well.

The text of the letter

I want to ask you one particular question concerning 32 -> 64 bits migration. I studied articles and materials on your site and was very astonished at the discrepancy between 32-bit and 64-bit code I had encountered.

The problem is the following one: I get different results when calculating floating-point expressions. Below is a code fragment that corresponds to this issue.

float fConst = 1.4318620f; 
float fValue1 = 40.598053f * (1.f - 1.4318620f / 100.f); 
float fValue2 = 40.598053f * (1.f - fConst / 100.f);

MSVC 32, SSE and SSE2 are disabled

/fp:precise: fValue1 = 40.016743, fValue2 = 40.016747

MSVC 64, SSE and SSE2 are disabled

/fp:precise: fValue1 = 40.016743, fValue2 = 40.016743

The problem is that the resulting values of fValue2 are different. Because of this discrepancy the code compiled for 32 bits and 64 bits produces different results what is invalid in my case (or perhaps invalid in any case).

Does your product detect anything related to this issue? Could you please tip me in what way 32/64 can impact the results of real arithmetic?

Our answer

The Viva64 product does not detect such variations in a program's behavior after it's recompilation for the 64-bit system. Such changes cannot be called errors. Let's study this situation in detail.

Simple explanation

Let's see first what the 32-bit compiler generates: fValue1 = 40.016743, fValue2 = 40.016747.

Be reminded that the float type has 7 significant digits. Proceeding from that we see that actually we get a value that is a bit larger than 40.01674 (7 significant digits). It does not matter if it is actually 40.016743 or 40.016747 because this subtle difference is out of the float type's accuracy limits.

When compiling in 64-bit mode, the compiler generates the same correct code whose result is the same "a bit larger than 40.01674" value. In this case, it is always 40.016743. But it does not matter. Within the limits of float type's accuracy we get the same result as in the 32-bit program.

So, once again the results of calculations on 32-bit and 64-bit systems are equal within the limitations of the float type.

Stricter explanation

Accuracy of the float type is the value FLT_EPSILON that equals 0.0000001192092896.

If we add a value smaller than FLT_EPSILON to 1.0f, we will again get 1.0f. Only addition of a value equal to or larger than FLT_EPSILON to 1.0f will increase the value of the variable: 1.0f + FLT_EPSILON !=1.0f.

In our case, we handle not 1 but values 40.016743 and 40.016747. Let's take the largest of these two and multiple it by FLT_EPSILON. The result number will be the accuracy value for our calculations:

Epsilon = 40.016743*FLT_EPSILON = 40.016743*0.0000001192092896 = 0,0000047703675051357728

Let's see how much different numbers 40.016747 and 40.016743 are:

Delta = 40.016747 - 40.016743 = 0.000004

It turns out that the difference is smaller than the deviation value:

Delta < Epsilon

0.000004 < 0,00000477

Consequently, 40.016743 == 40.016747 within the limits of the float type.

What to do?

Although everything is correct, unfortunately, it does not make you feel easier. If you want to make the system more deterministic, you may use the /fp:strict switch.

In this case the result will be the following:

MSVC x86:

/fp:strict: fValue1 = 40.016747, fValue2 = 40.016747

MSVC x86-64:

/fp:strict: fValue1 = 40.016743, fValue2 = 40.016743

The result is more stable but we still did not manage to get an identical behavior of 32-bit and 64-bit code. What to do? The only thing you can do is to put up with it and change the methodology of result comparison.

I do not know how much the following situation I want to describe corresponds to yours, but I suppose it is rather close.

Once I developed a computational modeling package. The task was to develop a system of regression tests. There is a set of projects whose results are looked through by physicists and estimated as correct. Code revisions brought into the project must not cause a change of output data. If pressure is at some moment t in some point is 5 atmospheres, the same pressure value must remain after adding a new button to the dialogue or optimizing the mechanism of initial filling of the area. If something changes, it means that there were revisions in the model and physicists must once again estimate all the changes. Of course it is supposed that such revisions of the model are quite rare. In normal development state of a project there must always be identical output data. However, it is in theory. In practice everything is more complicated. We could not get identical results every time even when working with one compiler with the same optimization switches. Results easily started to diffuse all the same. But since the project was even built with different compilers, the task of getting absolutely identical results was admitted as unsolvable. To be exact, perhaps the task could be solved but it would require a lot of efforts and lead to an inadmissible slow-down of calculations because of the impossibility to optimize the code. The solution appeared in the form of a special result comparison system. What is more, values in different points were compared not merely with the Epsilon accuracy but in a special way. I do not remember now all the specifics of its implementation but the idea was the following. If in some point processes run that make the maximum pressure of 10 atmospheres, the difference of 0.001 atmosphere in some other point is considered an error. But if a process is running in areas with pressure of 1000 atmospheres, the difference of 0.001 is considered an admissible error. Thus, we managed to build a rather secure system of regression testing that, as I believe, has been working successfully to this day.

The last thing: why do we get different results in 32-bit and 64-bit code at all?

It seems that the reason lies in using different sets of instructions. In 64-bit mode, these are SSE2 instructions which are always used nowadays and which are implemented in all the processors of the AMD64 (Intel 64) family. By the way, because of this, the phrase in your question "MSVC 64, SSE and SSE2 are disabled" is incorrect. SSE2 are used by the 64-bit compiler anyway.


Popular related articles
"Our legacy of the past" or why we divided the V512

Date: 12 Aoû 2022

Author: Mikhail Gelvih

As the saying goes, the first step is always the hardest. That's exactly what happened in our case – after delaying it for so long, we have finally split the V512 diagnostic rule. You can read more a…
Why do arrays have to be deleted via delete[] in C++

Date: 27 Jul 2022

Author: Mikhail Gelvih

This note is for C++ beginner programmers who are wondering why everyone keeps telling them to use delete[] for arrays. But, instead of a clear explanation, senior developers just keep hiding behind …
Intermodular analysis of C and C++ projects in detail. Part 2

Date: 14 Jul 2022

Author: Oleg Lisiy

In part 1 we discussed the basics of C and C++ projects compiling. We also talked over linking and optimizations. In part 2 we are going to delve deeper into intermodular analysis and discuss its ano…
Intermodular analysis of C and C++ projects in detail. Part 1

Date: 08 Jul 2022

Author: Oleg Lisiy

Starting from PVS-Studio 7.14, the C and C++ analyzer has been supporting intermodular analysis. In this two-part article, we'll describe how similar mechanisms are arranged in compilers and reveal s…
Four reasons to check what the malloc function returned

Date: 20 Avr 2022

Author: Andrey Karpov

Some developers may be dismissive of checks: they deliberately do not check whether the malloc function allocated memory or not. Their reasoning is simple — they think that there will be enough memor…

Comments (0)

Next comments
Unicorn with delicious cookie
Nous utilisons des cookies pour améliorer votre expérience de navigation. En savoir plus