Module 3: Shared-Memory Programming with OpenMP¶
Characteristics¶
A shared-memory programming model.
Modifies existing serial program
Include library header
omp.h
Preprocessor directive
Pragma
Requires compiler level support
Thread based
Compile and run¶
gcc¶
include omp.h
compile with
-fopenmp
flag
Concepts¶
- pragmas¶
The preprocessor directive employed by OpenMP. Employ clauses to fine control the parallelized behavior. It works like a domain specific language.
- Structured block¶
The code block to be parallelized by OpenMP. It can be a for/while/do-while statement, a function call or a code block enclosed by
{}
- Threads (OpenMP)¶
OpenMP will fork threads to execute the structured block. The collection of threads is called Team. There is a master thread to execute the codes that are not parallelized and the rest of threads are known as slave, which execute the structured block only.
- Synchronization¶
A mechanism to constraint the ordering of execution of instructions to preclude potential problems such as racing condition, and other inconsistency problems.
- Variable Scope (OpenMP)¶
As OpenMP is not part of C standard, the scope of variables in the structured block needs special handling. The can be defined as
private
orshared
in OpenMPpragma
directives. The default scope of variables declared outside of the block isshared
, while the default scope of variables declared inside the block isprivate
.
Clauses¶
parallel
Create a team
num_threads
Control number of threads
private
Register private variable
shared
Register shared variable
default
Override how the default scope is inferred
reduction
Register how reduction operation is done
for
Register parallelization of a for loop
schedule
Set the scheduling method
static
dynamic
guided
critical
Define a critical region, which can be accessed only one thread
atomic
Define an atomic operation
barrier
Force the thread to join
single
Only run by one thread (not necessarily the master thread)
Synchronization¶
critical section
atomic
barrier
mutual exclusion (locks)
omp_set_lock
omp_unset_lock
Nested Loop¶
Parallelize outer loop if possible
Separate
parallel
andfor
constructs to reduce fork and join operations if parallelization of inner loop is desirableConsider cache friendship
change scheduling method
Examples¶
Integration calculation (trapezoids)
Estimation of \(\pi\)
Sorting
Matrix-vector multiplication