Module 3: Shared-Memory Programming with OpenMP¶
Characteristics¶
A shared-memory programming model.
Modifies existing serial program
Include library header
omp.hPreprocessor directive
PragmaRequires compiler level support
Thread based
Compile and run¶
gcc¶
include omp.h
compile with
-fopenmpflag
Concepts¶
- pragmas¶
The preprocessor directive employed by OpenMP. Employ clauses to fine control the parallelized behavior. It works like a domain specific language.
- Structured block¶
The code block to be parallelized by OpenMP. It can be a for/while/do-while statement, a function call or a code block enclosed by
{}- Threads (OpenMP)¶
OpenMP will fork threads to execute the structured block. The collection of threads is called Team. There is a master thread to execute the codes that are not parallelized and the rest of threads are known as slave, which execute the structured block only.
- Synchronization¶
A mechanism to constraint the ordering of execution of instructions to preclude potential problems such as racing condition, and other inconsistency problems.
- Variable Scope (OpenMP)¶
As OpenMP is not part of C standard, the scope of variables in the structured block needs special handling. The can be defined as
privateorsharedin OpenMPpragmadirectives. The default scope of variables declared outside of the block isshared, while the default scope of variables declared inside the block isprivate.
Clauses¶
parallel
Create a team
num_threads
Control number of threads
private
Register private variable
shared
Register shared variable
default
Override how the default scope is inferred
reduction
Register how reduction operation is done
for
Register parallelization of a for loop
schedule
Set the scheduling method
static
dynamic
guided
critical
Define a critical region, which can be accessed only one thread
atomic
Define an atomic operation
barrier
Force the thread to join
single
Only run by one thread (not necessarily the master thread)
Synchronization¶
critical section
atomic
barrier
mutual exclusion (locks)
omp_set_lock
omp_unset_lock
Nested Loop¶
Parallelize outer loop if possible
Separate
parallelandforconstructs to reduce fork and join operations if parallelization of inner loop is desirableConsider cache friendship
change scheduling method
Examples¶
Integration calculation (trapezoids)
Estimation of \(\pi\)
Sorting
Matrix-vector multiplication