To get a trial key
fill out the form below
Team License (standard version)
Enterprise License (extended version)
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Request our prices
New License
License Renewal
--Select currency--
USD
EUR
GBP
RUB
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
I am interested to try it on the platforms:
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Message submitted.

Your message has been sent. We will email you at


If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

>
>
Parallel notes N3 - base OpenMP constru…

Parallel notes N3 - base OpenMP constructs

Mar 02 2010
Author:

Now we would like to start introducing you into OpenMP technology and show you the ways of using it. In this post we will discuss some base constructs.

When using OpenMP we add two types of constructs into the program: OpenMP execution environment functions and special "#pragma" directives.

Functions

The role of OpenMP functions is rather an auxiliary one because parallelization is implemented through directives. But in some cases they are very useful and even necessary. The functions may be distinguished into three categories: execution environment functions, lock/synchronization functions and timer functions. All these functions have names beginning with "omp_" and are defined in the header file omp.h. We will discuss the functions in the next posts.

Directives

A C/C++ #pragma construct is used to specify additional options to the compiler. With the help of these constructs you may specify how data in structures must be aligned, prohibit generating particular warnings and so on. A #pragma construct is written in this format:

#pragma directive

A special key directive "omp" indicates that the commands are related to OpenMP. Thus, #pragma directives intended to work with OpenMP have the following format:

#pragma omp <directive> [clause [ [,] clause]...]

As any other pragma directives, they are ignored by those compilers that do not support this technology. In this case, the program is compiled as a serial one and without any errors. This feature allows you to create a highly portable code based on OpenMP technology. The code containing OpenMP directives may be compiled by a C/C++ compiler not familiar with this technology. The code will be executed as a serial one but this way is better than splitting the code into two branches or adding a lot of #ifdef.

OpenMP supports the directives private, parallel, for, section, sections, single, master, critical, flush, ordered and atomic and some others which define work distribution mechanisms and synchronization constructs.

parallel directive

You may call "parallel" directive the most important one. It creates a parallel region for the structured block that follows it, for example:

#pragma omp parallel [other directives]
  structured block

"parallel" directive specifies that the structured code block must be performed concurrently in several threads. Each of the created threads performs the same code in the block but not the same command set. In different threads different branches may be executed or different data processed - this depends upon the operator "if-else" or work distribution directives.

To demonstrate execution of code in several threads, let us print some text in the block being parallelized:

#pragma omp parallel
{
  cout << "OpenMP Test" << endl;
}

On a 4-core computer, we may expect the following result to be printed:

OpenMP Test
OpenMP Test
OpenMP Test
OpenMP Test

But in practice I got this one:

OpenMP TestOpenMP Test
OpenMP Test
OpenMP Test

This is explained by shared use of one resource by several threads. In this case we print the text on one console in four threads which do not arrange with each other about the printing order. This is a race condition.

Race condition is an error related to design or implementation of a multitask system when system operation depends upon the order of executing code fragments. This kind of errors is the most common in parallel programming and it is a very tricky one. It is difficult to recall and localize this error because it is not permanent and occurs from time to time (see also the term heisenbug).

for directive

The example examined above demonstrates how parallelization is implemented but is senseless by itself. Now let us get a real benefit from parallelization. Suppose we need to extract a root from each item of an array and write the result into another array:

void VSqrt(double *src, double *dst, ptrdiff_t n)
{
  for (ptrdiff_t i = 0; i < n; i++)
    dst[i] = sqrt(src[i]);
}

If we write so:

#pragma omp parallel
{
  for (ptrdiff_t i = 0; i < n; i++)
    dst[i] = sqrt(src[i]);
}

we will just do too much unnecessary work instead of speeding up the code. We will extract roots from all the array items in each thread. To parallelize the loop we need to use the work distribution directive "for". The directive "#pragma omp for" specifies that the loop iterations must be distributed among the team threads in the parallel region while for loop is being executed:

#pragma omp parallel
{
  #pragma omp for
  for (ptrdiff_t i = 0; i < n; i++)
    dst[i] = sqrt(src[i]);
}

Now each created thread will process only a particular part of the array assigned to it. For example, suppose there are 8000 array items, so if we have a four-core computer, the work may be distributed in this way. The variable "i" takes values from 0 to 1999 in the first thread. In the second - from 2000 to 3999. In the third - from 4000 to 5999. In the fourth - from 6000 to 7999. In theory, the work speeds up 4 times. In practice, it is a bit less though because we need to create threads and wait for them to terminate. At the end of the parallel region barrier synchronization is implemented. In other words, when reaching the end of the region all the threads are locked until the last thread terminates.

You may shorten the text by uniting several directives into one control string. The code above will be equal to:

#pragma omp parallel for
for (ptrdiff_t i = 0; i < n; i++)
  dst[i] = sqrt(src[i]);

Directives private and shared

Data may be shared or private in relation to regions. Private data belong only to one thread and can be initialized only by this thread. Shared data are available to all the threads. In the example above, the array was shared. If a variable is defined outside a parallel region, it is considered shared by default but if it is inside the region, it is private. Suppose we should use an intermediate variable "value" to calculate the square root:

double value;
#pragma omp parallel for
for (ptrdiff_t i = 0; i < n; i++)
{
  value = sqrt(src[i]);
  dst[i] = value;
}

In this code, the variable "value" is defined outside the parallel region defined by directives "#pragma omp parallel for" and therefore is shared. As a result, "value" variable will be used by all the threads simultaneously - it will cause a race condition and we will get garbage in the end.

To make the variable private for each thread we may use two methods. The first is to define the variable inside the parallel region:

#pragma omp parallel for
for (ptrdiff_t i = 0; i < n; i++)
{
  double value;
  value = sqrt(src[i]);
  dst[i] = value;
}

The second is to employ the directive "private". Now each thread will work with its own value variable:

double value;
#pragma omp parallel for private(value)
for (ptrdiff_t i = 0; i < n; i++)
{
  value = sqrt(src[i]);
  dst[i] = value;
}

Besides "private" directive there exists "shared" directive. But usually it is not used because all the variables defined outside the parallel region are shared by default and there is no need in this directive. Still you may use it to make the code more comprehensible.

We have discussed only few OpenMP directives and will continue to study them in the following lessons.

Popular related articles
PVS-Studio for Java

Date: Jan 17 2019

Author: Andrey Karpov

In the seventh version of the PVS-Studio static analyzer, we added support of the Java language. It's time for a brief story of how we've started making support of the Java language, how far we've co…
Free PVS-Studio for those who develops open source projects

Date: Dec 22 2018

Author: Andrey Karpov

On the New 2019 year's eve, a PVS-Studio team decided to make a nice gift for all contributors of open-source projects hosted on GitHub, GitLab or Bitbucket. They are given free usage of PVS-Studio s…
PVS-Studio ROI

Date: Jan 30 2019

Author: Andrey Karpov

Occasionally, we're asked a question, what monetary value the company will receive from using PVS-Studio. We decided to draw up a response in the form of an article and provide tables, which will sho…
The Last Line Effect

Date: May 31 2014

Author: Andrey Karpov

I have studied many errors caused by the use of the Copy-Paste method, and can assure you that programmers most often tend to make mistakes in the last fragment of a homogeneous code block. I have ne…
How PVS-Studio Proved to Be More Attentive Than Three and a Half Programmers

Date: Oct 22 2018

Author: Andrey Karpov

Just like other static analyzers, PVS-Studio often produces false positives. What you are about to read is a short story where I'll tell you how PVS-Studio proved, just one more time, to be more atte…
Static analysis as part of the development process in Unreal Engine

Date: Jun 27 2017

Author: Andrey Karpov

Unreal Engine continues to develop as new code is added and previously written code is changed. What is the inevitable consequence of ongoing development in a project? The emergence of new bugs in th…
Characteristics of PVS-Studio Analyzer by the Example of EFL Core Libraries, 10-15% of False Positives

Date: Jul 31 2017

Author: Andrey Karpov

After I wrote quite a big article about the analysis of the Tizen OS code, I received a large number of questions concerning the percentage of false positives and the density of errors (how many erro…
Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities

Date: Nov 21 2018

Author: Andrey Karpov

A brief description of technologies used in the PVS-Studio tool, which let us effectively detect a large number of error patterns and potential vulnerabilities. The article describes the implementati…
Appreciate Static Code Analysis!

Date: Oct 16 2017

Author: Andrey Karpov

I am really astonished by the capabilities of static code analysis even though I am one of the developers of PVS-Studio analyzer myself. The tool surprised me the other day as it turned out to be sma…
The way static analyzers fight against false positives, and why they do it

Date: Mar 20 2017

Author: Andrey Karpov

In my previous article I wrote that I don't like the approach of evaluating the efficiency of static analyzers with the help of synthetic tests. In that article, I give the example of a code fragment…

Comments (0)

Next comments

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
This website uses cookies and other technology to provide you a more personalized experience. By continuing the view of our web-pages you accept the terms of using these files. If you don't want your personal data to be processed, please, leave this site.
Learn More →
Accept