To get a trial key
fill out the form below
Team License (standard version)
Enterprise License (extended version)
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Request our prices
New License
License Renewal
--Select currency--
USD
EUR
GBP
RUB
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Message submitted.

Your message has been sent. We will email you at


If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

>
>
Levels of Paralleling

Levels of Paralleling

Jan. 19, 2010
Author:

A task solution can be paralleled at several levels. There are no definite boundaries between these levels, and it is difficult to refer a particular paralleling technology to any of them. The division given hereby is quite relative, and serves to demonstrate the diversity of approaches to the issue of paralleling.

Paralleling at the level of tasks

0051_Levels_of_Paralleling/image1.png

Quite often, paralleling at this level is the most easy and the most efficient one. Such paralleling is possible in cases when the task being solved naturally consists of independent subtasks, each of which can be solved separately. A good example is compression of an audio album. Each recording can be processed separately, as it is not connected with other ones.

The operational system shows us paralleling at the level of tasks, starting programs on different processors on a multui-core machine. If the first program shows us a movie, and the second one is a P2P client, the operational system will easily arrange their parallel work.

Another example of paralleling at this abstraction level is parallel compilation of files in Visual Studio 2008 or data processing in batch modes.

As it was mentioned above, this type of paralleling is quite easy, and rather efficient in a number of cases. But if we deal with a uniform task, this type of paralleling will be not applicable. The operational system will not be able to speed up the program, which uses only one processor, whatever number of cores might be available at the moment. A program dividing the coding of sound and image in a video movie into two tasks will not get anything from the third or fourth core. In order to parallel uniform tasks, it is necessary to go one level down.

Level of data parallelism

The name of model "data parallelism" comes from the fact that parallelism lies in the use of the same operation to a multitude of data elements. An archiver which uses several processor cores for archiving demonstrates data parallelism. Data is divided into blocks, which in a uniform way are processed (archived) on various units.

0051_Levels_of_Paralleling/image2.png

This type of paralleling is widely used in solving problems of computational modeling. The countable domain is presented as cells, which describe the environment condition in the referring points of space: pressure, density, gases percentage ratio, temperature, etc. The number of such cells can be enormous, millions and billions. Each of these sells must be processed in the same way. Here data paralleling model is very appropriate itself, as it allows to load each core allocating a certain set of cells for it. The countable domain is divided into geometrical objects, e.g., parallelepipeds, and cells included in this domain are given for processing to a certain core. In mathematical physics, such type of parallelism is called geometrical parallelism.

Though geometrical parallelism can seem alike paralleling at tasks level, it is more complex in its implementation. In the case of modeling tasks, it is necessary to transfer data obtained at the boundaries of geometrical areas to other cores. Quite often, special methods of calculation speed increase are used, due to loading balancing between calculation units.

0051_Levels_of_Paralleling/image3.png

In a number of algorithms, the calculation speed where active processes take place, takes more time than in the cases with quiet medium. As it is shown at the figure, dividing the countable domain into non equal parts, more uniform loading of cores can be achieved. Cores 1, 2, and 3 process small areas in which the body moves, while core 4 processes the large area which has not yet undergone disturbance. All this needs additional analysis and creation of balancing algorithm.

The prize for this complication is the ability to solve prolonged movement tasks in a reasonable calculation time. Consider the start of a rocket.

0051_Levels_of_Paralleling/image4.png

Level of algorithms paralleling

The next level is paralleling of separate procedures and algorithms. Algorithms of parallel sort, matrices multiplication, and systems of linear equations solution can be referred here. At this level of abstraction, it is suitable to use such parallel programming technology as OpenMP.

0051_Levels_of_Paralleling/image5.png

OpenMP (Open Multi-Processing) - is a set of compiler directives, library procedures and environment variables, which are intended for programming multithread applications on multiprocessor systems. In OpenMP, parallel execution mode "branching - merging" is used. OpenMP program starts as the only execution thread, called the initial thread. When the thread comes across a parallel construction, it creates a new group of threads consisting of itself and a number of additional threads, and becomes main in the new group. All the members of the new group (including the main one) execute the code inside the parallel construction. At the end of the parallel construction, there exists a non-explicit barrier. After the parallel construction, the execution of the user code is continued by the main thread only. Other parallel regions can be nested in the parallel region.

By means of the idea of "incremental paralleling", OpenMP best suits the developers who want to quickly parallel their calculating programs with large parallel cycles. The developer does not create a new parallel program but just consecutively adds text of a consecutive program of the OpenMP-directive.

The task of parallel algorithms implementation is rather complicated, that is why quite a large number of paralleling libraries exists, which allow to build programs like of bricks, without going into the arrangement of data parallel processing implementation.

Parallelism at the instructions level

This is the lowest level of parallelism carried out at the level of parallel processing of several instructions by the processor. At this level, there exists batch processing of several data elements by one command of the processor. It is referred to MMX, SSE, SSE2, etc. technologies. This type of parallelism is sometimes singled out into a deeper level of paralleling - at the bit level.

The program is a thread of instructions executed by the processor. The order of these instructions can be changed, they can be allocated by groups, which will be executed in a parallel way, without altering the result of the whole program work. This is called parallelism at the instructions level. To implement this type of parallelism, several command instruction pipelines are used in microprocessors, such technologies as command prediction and registers renaming.

The developer deals rarely with this level. There is no sense in it. The work on arrangement of commands in the most appropriate sequence for the processor is done by the compiler. This level of paralleling can be interesting for a small group of experts only, who get all the possibilities out of SSEx, or for compiler developers.

Instead of a conclusion

This text does not claim to be a complete paper on paralleling levels, it simply shows the diversity of the question of using multi-core systems. For those who are interested in software development, here are some links to sources on the parallel programming issues:

  • http://software.intel.com/en-us/ - Software developer community. I am not working for Intel but I highly recommend this source being a member of this community. There are many interesting articles, blog-posts and discussions devoted to parallel programming.
  • http://www.viva64.com/links/parallel-programming/ - Reviews of articles on parallel programming with OpenMP technology.
  • multicore.ning.com - Everything concerning the world of supercomputers and parallel calculations. Technologies, conferences, blogs, discussion forum on parallel calculations.
Popular related articles
The Last Line Effect

Date: 05.31.2014

Author: Andrey Karpov

I have studied many errors caused by the use of the Copy-Paste method, and can assure you that programmers most often tend to make mistakes in the last fragment of a homogeneous code block. I have ne…
The way static analyzers fight against false positives, and why they do it

Date: 03.20.2017

Author: Andrey Karpov

In my previous article I wrote that I don't like the approach of evaluating the efficiency of static analyzers with the help of synthetic tests. In that article, I give the example of a code fragment…
The Evil within the Comparison Functions

Date: 05.19.2017

Author: Andrey Karpov

Perhaps, readers remember my article titled "Last line effect". It describes a pattern I've once noticed: in most cases programmers make an error in the last line of similar text blocks. Now I want t…
PVS-Studio ROI

Date: 01.30.2019

Author: Andrey Karpov

Occasionally, we're asked a question, what monetary value the company will receive from using PVS-Studio. We decided to draw up a response in the form of an article and provide tables, which will sho…
Characteristics of PVS-Studio Analyzer by the Example of EFL Core Libraries, 10-15% of False Positives

Date: 07.31.2017

Author: Andrey Karpov

After I wrote quite a big article about the analysis of the Tizen OS code, I received a large number of questions concerning the percentage of false positives and the density of errors (how many erro…
How PVS-Studio Proved to Be More Attentive Than Three and a Half Programmers

Date: 10.22.2018

Author: Andrey Karpov

Just like other static analyzers, PVS-Studio often produces false positives. What you are about to read is a short story where I'll tell you how PVS-Studio proved, just one more time, to be more atte…
Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities

Date: 11.21.2018

Author: Andrey Karpov

A brief description of technologies used in the PVS-Studio tool, which let us effectively detect a large number of error patterns and potential vulnerabilities. The article describes the implementati…
Appreciate Static Code Analysis!

Date: 10.16.2017

Author: Andrey Karpov

I am really astonished by the capabilities of static code analysis even though I am one of the developers of PVS-Studio analyzer myself. The tool surprised me the other day as it turned out to be sma…
Static analysis as part of the development process in Unreal Engine

Date: 06.27.2017

Author: Andrey Karpov

Unreal Engine continues to develop as new code is added and previously written code is changed. What is the inevitable consequence of ongoing development in a project? The emergence of new bugs in th…
PVS-Studio for Java

Date: 01.17.2019

Author: Andrey Karpov

In the seventh version of the PVS-Studio static analyzer, we added support of the Java language. It's time for a brief story of how we've started making support of the Java language, how far we've co…

Comments (0)

Next comments

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
This website uses cookies and other technology to provide you a more personalized experience. By continuing the view of our web-pages you accept the terms of using these files. If you don't want your personal data to be processed, please, leave this site.
Learn More →
Accept