Intel® Advisor User Guide

ID 766448
Date 3/22/2024
Public
Document Table of Contents

Add OpenMP Code to Synchronize the Shared Resources

OpenMP provides several forms of synchronization:

  • A critical section prevents multiple threads from accessing the critical section's code at the same time, thus only one active thread can update the data referenced by the code. A critical section may consist of one or more statements. To implement a critical section:

    • With C/C++: #pragma omp critical

    • With Fortran: !$omp critical and !$omp end critical

    Use the optional named form for a non-nested mutex, such as (C/C++) #pragma omp critical(name) or (Fortran) !$omp critical(name) and !$omp end critical(name). If the optional (name) is omitted, it locks a single unnamed global mutex. The easiest approach is to use the unnamed form unless performance measurement shows this shared mutex is causing unacceptable delays.

  • An atomic operation allows multiple threads to safely update a shared numeric variable on hardware platforms that support its use. An atomic operation applies to only one assignment statement that immediately follows it. To implement an atomic operation:

    • With C/C++: insert a #pragma omp atomic before the statement to be protected.

    • With Fortran: insert a !$omp atomic before the statement to be protected.

    The statement to be protected must meet certain criteria (see your compiler or OpenMP documentation).

  • Locks provide a low-level means of general-purpose locking. To implement a lock, use the OpenMP types, variables, and functions to provide more flexible and powerful use of locks. For example, use the omp_lock_t type in C/C++ or the type=omp_lock_kind in Fortran. These types and functions are easy to use and usually directly replace Intel Advisor lock annotations.

  • Reduction operations can be used for simple cases, such as incrementing a shared numeric variable or summing an array into a shared numeric variable. To implement a reduction operation, add the reduction clause within a parallel region to instruct the compiler to perform the summation operation in parallel using the specified operation and variable.

  • OpenMP provides other synchronization techniques, including specifying a barrier construct where threads will wait for each other, an ordered construct that ensures sequential execution of a structured block within a parallel loop, and master regions that can only be executed by the master thread. For more information, see your compiler or OpenMP documentation.

TIP:
After you rewrite your code to use OpenMP* parallel framework, you can analyze its performance with Intel® Advisor perspectives. Use the Vectorization and Code Insights perspective to analyze how well you OpenMP code is vectorized or use the Offload Modeling perspective to model its performance on a GPU.

The following topics briefly describe these forms of synchronization. Check your compiler documentation for details.