>
>
Parallel notes N4 - continuing to study…

Andrey Karpov
Articles: 674

Parallel notes N4 - continuing to study OpenMP constructs

In this post we will continue to introduce you into OpenMP technology and tell you about some functions and new directives.

OpenMP has some auxiliary functions. Do not forget to enable the header file <omp.h> to be able to use them..

Execution environment functions

These functions allow you to request for and set various parameters of the OpenMP environment:

  • omp_get_num_procs returns the number of the computational nodes (processors/cores) in the computer.
  • omp_in_parallel allows a thread to know if it is processing a parallel region at the moment.
  • omp_get_num_threads returns the number of threads included into the current thread team.
  • omp_set_num_thread defines the number of threads to execute the next parallel region the current executed thread will meet. The function may help you distribute resources. For example, if you are simultaneously processing sound and video on one processor with four cores, you may create one thread to process the sound and three threads to process the video.
  • omp_get_max_threads returns the maximum possible number of threads to be used in the next parallel region.
  • omp_set_nested permits or forbids nested parallelism. If nested parallelism is permitted, each thread that has a description of a parallel region will spawn a new thread team to execute this region and will become the master-thread of the team.
  • omp_get_nested tells you if nested parallelism is enabled or disabled.

If the name of a function begins with omp_set_, it may be called only outside parallel regions. All the rest functions may be used both inside and outside parallel regions.

Synchronization/lock functions

OpenMP allows you to build parallel code without these functions because there are directives that implement some synchronization types. But in some cases these functions are convenient and even necessary.

OpenMP has two types of locks: simple and nested. The locks of the latter type have the suffix "nest". Locks may be in one of the three states - non-initialized, locked and unlocked.

  • omp_init_lock/omp_init_nest_lock serves to initialize a variable of the type omp_lock_t/omp_nest_lock_t. It is equivalent to InitializeCriticalSection.
  • omp_destroy_lock/omp_destroy_nest_lock serves to unlock a variable of the type omp_lock_t/omp_nest_lock_t. It is equivalent to DeleteCriticalSection.
  • omp_set_lock/omp_set_nest_lock - when this lock is used, one thread sets the lock while the others wait for it to unlock the variable with the function omp_unset_lock(). It is equivalent to EnterCriticalSection.
  • omp_unset_lock/omp_unset_nest_lock is used to release the lock. It is equivalent to LeaveCriticalSection.
  • omp_test_lock/omp_test_nest_lock is a non-locking attempt to capture a lock. This function will try to capture the specified lock. If it succeeds, it will return 1 for a simple lock. If it fails, it will return 0. It is equivalent to TryEnterCriticalSection.

Simple locks cannot be set more than once even by the same thread. Nested locks are nearly the same as simple ones with the exception that a thread is not locked when trying to set a nested lock already belonging to it.

Here is an example of code where these functions are used. All the threads being created will by turn print the messages "Begin work" and "End work". Between these two messages generated by one thread there may appear messages from the other threads generated when they fail to enter a locked section.

omp_lock_t lock;
int n;
omp_init_lock(&lock);
#pragma omp parallel private (n)
{
  n=omp_get_thread_num();
  while (!omp_test_lock (&lock))
  {
    printf("Wait..., thread %d\n", n);
    Sleep(3);
  }
  printf("Begin work, thread %d\n", n);
  Sleep(5); // Work...
  printf("End work, thread %d\n", n);
  omp_unset_lock(&lock);
}
omp_destroy_lock(&lock);

You may expect the following result on a computer with four cores:

Begin work, thread 0

Wait..., thread 1

Wait..., thread 2

Wait..., thread 3

Wait..., thread 2

Wait..., thread 3

Wait..., thread 1

End work, thread 0

Begin work, thread 2

Wait..., thread 3

Wait..., thread 1

Wait..., thread 3

Wait..., thread 1

End work, thread 2

Begin work, thread 3

Wait..., thread 1

Wait..., thread 1

End work, thread 3

Begin work, thread 1

End work, thread 1

Timer functions

  • omp_get_wtime returns the astronomical time that has passed since some moment in past in seconds in the thread that has called it (a real number of double precision - double). If you surround some code fragment with calls of this function, the remainder of the values being returned will show the time of this fragment's execution.
  • omp_get_wtick() returns the resolution of the timer in seconds in the thread that has called it, i.e. shows the timer's precision.

Let us finish here with the functions and consider a couple of new directives. These directives may be called options of the parallel regions being created.

if (condition)

Execution of a parallel region by a condition. Several threads are created only if some condition is fulfilled. Otherwise the code continues to execute in serial mode.

For example:

void test(bool x)
{
  #pragma omp parallel if (x)
  if (omp_in_parallel())
  {
    #pragma omp single
    printf_s("parallelized with %d threads\n",
             omp_get_num_threads());
  }
  else
  {
    printf_s("single thread\n");
  }
}
int _tmain(int argc, _TCHAR* argv[])
{
  test(false);
  test(true);
  return 0;
}

The result:

single thread
parallelized with 4 threads

num_threads

It serves to explicitly define the number of threads to execute a parallel region. By default, the last value is selected that was returned by the function omp_set_num_threads().

If we modify the example above in the following way:

...
#pragma omp parallel if (x) num_threads(3)
...

the result will be the following:

single thread
parallelized with 3 threads

To be continued in the next issue of "Parallel Notes"...