To get a trial key
fill out the form below
Team License (a basic version)
Enterprise License (extended version)
* By clicking this button you agree to our Privacy Policy statement

Request our prices
New License
License Renewal
--Select currency--
* By clicking this button you agree to our Privacy Policy statement

Free PVS-Studio license for Microsoft MVP specialists
* By clicking this button you agree to our Privacy Policy statement

To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

I am interested to try it on the platforms:
* By clicking this button you agree to our Privacy Policy statement

Message submitted.

Your message has been sent. We will email you at

If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

Parallel notes N5 - continuing to study…

Parallel notes N5 - continuing to study OpenMP constructs

Mar 26 2010

Here is the next post devoted to studying the parallel programming technology OpenMP. Here we will consider two directives: atomic, reduction.

atomic directive

Consider this code that sums up array items:

intptr_t A[1000], sum = 0;
for (intptr_t i = 0; i < 1000; i++)
  A[i] = i;
for (intptr_t i = 0; i < 1000; i++)
  sum += A[i];
printf("Sum=%Ii\n", sum);

The result is:

Press any key to continue . . .

Let us try to parallelize this code using the directives "omp" and "parallel":

#pragma omp parallel for
for (intptr_t i = 0; i < 1000; i++)
  sum += A[i];

Unfortunately, this way of parallelization is incorrect because a race condition occurs in this code. Several threads try to access the variable "sum" for reading and writing simultaneously. The access order might be the following:

Value of "sum" variable = 500;
Value of "i" in the first thread = 1;
Value of "i" in the second thread = 501;
Thread 1: processor register = sum
Thread 2: processor register = sum
Thread 1: processor register += i
Thread 2: processor register += i
Thread 2: sum = processor register
Thread 1: sum = processor register
The value of "sum" variable is 501, not 1002.

You may also make sure that this kind of parallelization is incorrect by launching the demo code. What I got was, in particular, this:

Press any key to continue . . .

To avoid the errors of refreshing shared variables, you may use critical sections. However, if the variable "sum" is shared while the operator has the form sum=sum+expr, it is more convenient to use "atomic" directive. "atomic" directive is faster than critical sections because some atomic operations can be directly replaced by processor commands.

This directive refers to the assignment operator that follows it straight and guarantees correct processing of the shared variable standing on its left side. While this operator is being executed, accesses from all the threads being launched at the moment are locked except for the thread that executes the operation.

The directive atomic applies only to operations of this type:

  • X++
  • ++X
  • X−−
  • −−X

Here X is scalar variable, EXPR is an expression with scalar types that lacks the variable X, BINOP is a non-overloaded operator +, *, -, /, &, ^, |, <<, >>. You must not use the directive "atomic" in any other case.

The corrected version of the code looks as follows:

#pragma omp parallel for
for (intptr_t i = 0; i < 1000; i++)
  #pragma omp atomic
  sum += A[i];

This solution gives a correct result but is quite ineffective. The speed of this code is lower than that of the serial version. There will constantly appear locks when the algorithm is running, and as a result, the work of the cores will be nothing but mere waiting. "atomic" directive is used in this sample only to demonstrate how it works. In practice, it is reasonable to use this directive when accesses to shared variables are rare. For example:

unsigned count = 0;
#pragma omp parallel for
for (intptr_t i = 0; i < N; i++)
  if (SlowFunction())
    #pragma omp atomic

Remember that in an expression the directive "atomic" is applied to, only processing of the variable on the left side of the assignment operator is atomic while the calculations on its right side are not necessarily such. Let us study this by an example where the directive "atomic" in no way influences the call of the functions participating in the expression:

class Example
  unsigned m_value;
  Example() : m_value(0) {}
  unsigned GetValue()
    return ++m_value;
  unsigned GetSum()
    unsigned sum = 0;
    #pragma omp parallel for
    for (ptrdiff_t i = 0; i < 100; i++)
      #pragma omp atomic
      sum += GetValue();
    return sum;

This sample contains a race condition and the result it returns may vary from run to run. The directive "atomic" in this code protects increment of the variable "sum". But "atomic" directive does not influence the call of the function GetValue(). The calls are performed in concurrent threads and it leads to errors when trying to perform the operation "++m_value" inside the function GetValue.

reduction directive

It would be logical to ask how one can quickly sum up the array items. The answer is - with the help of reduction directive.

The directive's format: reduction(operator:list)

Possible operators - "+", "*", "-", "&", "|", "^", "&&", "||".

List serves to list the names of the shared variables. These must have a scalar type (for example, float, int or long, but not std::vector, int [], etc.).

The working principle:

  • Local copies are created in each thread for each variable.
  • The local copies are initialized according to the operator's type. For additive operations it is 0 or its analogs, for multiplicative ones it is 1 or its analogs. See also Table N1.
  • When all the parallel region's operators are executed, the defined operator is executed to process the local copies of the variables. The order of operator execution is not defined.

Now the effective code that uses reduction looks as follows:

#pragma omp parallel for reduction(+:sum)
for (intptr_t i = 0; i < 1000; i++)
  sum += A[i];

To be continued in the next issue of "Parallel Notes"...

Popular related articles
Appreciate Static Code Analysis!

Date: Oct 16 2017

Author: Andrey Karpov

I am really astonished by the capabilities of static code analysis even though I am one of the developers of PVS-Studio analyzer myself. The tool surprised me the other day as it turned out to be sma…
Free PVS-Studio for those who develops open source projects

Date: Dec 22 2018

Author: Andrey Karpov

On the New 2019 year's eve, a PVS-Studio team decided to make a nice gift for all contributors of open-source projects hosted on GitHub, GitLab or Bitbucket. They are given free usage of PVS-Studio s…
PVS-Studio for Java

Date: Jan 17 2019

Author: Andrey Karpov

In the seventh version of the PVS-Studio static analyzer, we added support of the Java language. It's time for a brief story of how we've started making support of the Java language, how far we've co…
Characteristics of PVS-Studio Analyzer by the Example of EFL Core Libraries, 10-15% of False Positives

Date: Jul 31 2017

Author: Andrey Karpov

After I wrote quite a big article about the analysis of the Tizen OS code, I received a large number of questions concerning the percentage of false positives and the density of errors (how many erro…
PVS-Studio ROI

Date: Jan 30 2019

Author: Andrey Karpov

Occasionally, we're asked a question, what monetary value the company will receive from using PVS-Studio. We decided to draw up a response in the form of an article and provide tables, which will sho…
How PVS-Studio Proved to Be More Attentive Than Three and a Half Programmers

Date: Oct 22 2018

Author: Andrey Karpov

Just like other static analyzers, PVS-Studio often produces false positives. What you are about to read is a short story where I'll tell you how PVS-Studio proved, just one more time, to be more atte…
Static analysis as part of the development process in Unreal Engine

Date: Jun 27 2017

Author: Andrey Karpov

Unreal Engine continues to develop as new code is added and previously written code is changed. What is the inevitable consequence of ongoing development in a project? The emergence of new bugs in th…
The Evil within the Comparison Functions

Date: May 19 2017

Author: Andrey Karpov

Perhaps, readers remember my article titled "Last line effect". It describes a pattern I've once noticed: in most cases programmers make an error in the last line of similar text blocks. Now I want t…
The Last Line Effect

Date: May 31 2014

Author: Andrey Karpov

I have studied many errors caused by the use of the Copy-Paste method, and can assure you that programmers most often tend to make mistakes in the last fragment of a homogeneous code block. I have ne…
The Ultimate Question of Programming, Refactoring, and Everything

Date: Apr 14 2016

Author: Andrey Karpov

Yes, you've guessed correctly - the answer is "42". In this article you will find 42 recommendations about coding in C++ that can help a programmer avoid a lot of errors, save time and effort. The au…

Comments (0)

Next comments
This website uses cookies and other technology to provide you a more personalized experience. By continuing the view of our web-pages you accept the terms of using these files. If you don't want your personal data to be processed, please, leave this site.
Learn More →