Module 3: Shared-Memory Programming with OpenMP

Characteristics

  • A shared-memory programming model.

  • Modifies existing serial program

  • Include library header omp.h

  • Preprocessor directive Pragma

  • Requires compiler level support

  • Thread based

Compile and run

gcc

  • include omp.h

  • compile with -fopenmp flag

Concepts

pragmas

The preprocessor directive employed by OpenMP. Employ clauses to fine control the parallelized behavior. It works like a domain specific language.

Structured block

The code block to be parallelized by OpenMP. It can be a for/while/do-while statement, a function call or a code block enclosed by {}

Threads (OpenMP)

OpenMP will fork threads to execute the structured block. The collection of threads is called Team. There is a master thread to execute the codes that are not parallelized and the rest of threads are known as slave, which execute the structured block only.

Synchronization

A mechanism to constraint the ordering of execution of instructions to preclude potential problems such as racing condition, and other inconsistency problems.

Variable Scope (OpenMP)

As OpenMP is not part of C standard, the scope of variables in the structured block needs special handling. The can be defined as private or shared in OpenMP pragma directives. The default scope of variables declared outside of the block is shared, while the default scope of variables declared inside the block is private.

Clauses

  • parallel

    Create a team

  • num_threads

    Control number of threads

  • private

    Register private variable

  • shared

    Register shared variable

  • default

    Override how the default scope is inferred

  • reduction

    Register how reduction operation is done

  • for

    Register parallelization of a for loop

  • schedule

    Set the scheduling method

    • static

    • dynamic

    • guided

  • critical

    Define a critical region, which can be accessed only one thread

  • atomic

    Define an atomic operation

  • barrier

    Force the thread to join

  • single

    Only run by one thread (not necessarily the master thread)

Synchronization

  • critical section

  • atomic

  • barrier

  • mutual exclusion (locks)

    • omp_set_lock

    • omp_unset_lock

Nested Loop

  • Parallelize outer loop if possible

  • Separate parallel and for constructs to reduce fork and join operations if parallelization of inner loop is desirable

  • Consider cache friendship

    • change scheduling method

Examples

  • Integration calculation (trapezoids)

  • Estimation of \(\pi\)

  • Sorting

  • Matrix-vector multiplication