Chapter 5. OpenMP Fortran API Multiprocessing Directives

Chapter 5. OpenMP Fortran API Multiprocessing Directives
Prev		Next

This chapter provides an overview of the supported multiprocessing directives. These directives are based on the OpenMP Fortran application program interface (API) standard. Programs that use these directives are portable and can be compiled by other compilers that support the OpenMP standard.

The complete OpenMP standard is available at http://www.openmp.org/specs. See that documentation for complete examples, rules of usage, and restrictions. This chapter provides only an overview of the supported directives and does not give complete details about usage or restrictions.

To enable recognition of the OpenMP directives, specify -mp on the f77(1) command line. The -mp option must be specified in order for the compiler to honor any -MP:... options that may also be specified on the command line. The -MP:open_mp=ON option is on by default and must be in effect during compilation.

The following example command line can compile program ompprg.f, which contains OpenMP Fortran API directives:

f77 -mp ompprg.f

In addition to directives, the OpenMP Fortran API describes several library routines and environment variables. See the standard for complete details.

Using Directives

All multiprocessing directives are case-insensitive and are of the following form:

prefix directive [clause[[,] clause]...]

Directives cannot be embedded within continued statements, and statements cannot be embedded within directives. Comments cannot appear on the same line as a directive.

Comments are allowed inside directives. Comments can appear on the same line as a directive. The comment extends to the end of the source line and is ignored. If the first nonblank character after the initial prefix (or after a continuation directive line in fixed source form) is an exclamation point, the line is ignored.

Conditional Compilation

Fortran statements can be compiled conditionally as long as they are preceded by one of the following conditional compilation prefixes: C$, or *$. The prefix must be followed by a Fortran statement on the same line. During compilation, the prefix is replaced by two spaces, and the rest of the line is treated as a normal Fortran statement.

The prefixes must start in column one and appear as a single word with no intervening white space. Fortran fixed form line length, case sensitivity, white space, continuation, and column rules apply to the line. Initial lines must have a space or zero in column six, and continuation lines must have a character other than a space or zero in column six.

Your program must be compiled with the -mp option in order for the compiler to honor statements preceded by conditional compilation prefixes; without the mp command line option, statements preceded by conditional compilation prefixes are treated as comments.

You must define the _OPENMP symbol to be used for conditional compilation. This symbol is defined during OpenMP compilation to have the decimal value YYYYMM where YYYY and MM are the year and month designators of the version of the OpenMP Fortran API is supported.

Parallel Region Constructs

The PARALLEL and END PARALLEL directives define a parallel region. A parallel region is a block of code that is to be executed by multiple threads in parallel. This is the fundamental OpenMP parallel construct that starts parallel execution.

The END PARALLEL directive denotes the end of the parallel region. There is an implied barrier at this point. Only the master thread of the team continues execution past the end of a parallel region.

Work-sharing Constructs

A work-sharing construct divides the execution of the enclosed code region among the members of the team that encounter it. A work-sharing construct must be enclosed within a parallel region in order for the directive to execute in parallel. When a work-sharing construct is not enclosed dynamically within a parallel region, it is treated as though the thread that encounters it were a team of size one. The work-sharing directives do not launch new threads, and there is no implied barrier on entry to a work-sharing construct.

The following restrictions apply to the work-sharing directives:

Work-sharing constructs and BARRIER directives must be encountered by all threads in a team or by none at all.
Work-sharing constructs and BARRIER directives must be encountered in the same order by all threads in a team.

If NOWAIT is specified on the END DO, END SECTIONS, END SINGLE, or END WORKSHARE directive, an implementation may omit any code to synchronize the threads at the end of the worksharing construct. In this case, threads that finish early may proceed straight to the instructions following the work-sharing construct without waiting for the other members of the team to finish the work-sharing construct.

The following list summarizes the work-sharing constructs:

The DO directive specifies that the iterations of the immediately following DO loop must be divided among the threads in the parallel region. If there is no enclosing parallel region, the DO loop is executed serially.

The loop that follows a DO directive cannot be a DO WHILE or a DO loop without loop control. If an END DO directive is not specified, it is assumed at the end of the DO loop.

The SECTIONS directive specifies that the enclosed sections of code are to be divided among threads in the team. It is a noniterative work-sharing construct. Each section is executed once by a thread in the team.

Each section must be preceded by a SECTION directive, though the SECTION directive is optional for the first section. The SECTION directives must appear within the lexical extent of the SECTIONS/END SECTIONS directive pair. The last section ends at the END SECTIONS directive. Threads that complete execution of their sections wait at a barrier at the END SECTIONS directive unless a NOWAIT is specified.
The SINGLE directive specifies that the enclosed code is to be executed by only one thread in the team. Threads in the team that are not executing the SINGLE directive wait at the END SINGLE directive unless NOWAIT is specified.
The WORKSHARE directive divides the work of executing the enclosed code into separate units of work, and causes the threads of the team to share the work of executing the enclosed code such that each unit is executed only once. The units of work may be assigned to threads in any manner as long as each unit is executed exactly once.

Combined Parallel Work-sharing Constructs

The combined parallel work-sharing constructs are shortcuts for specifying a parallel region that contains only one work-sharing construct. The semantics of these directives are identical to that of explicitly specifying a PARALLEL directive followed by a single work-sharing construct.

The following list describes the combined parallel work-sharing directives:

The PARALLEL DO directive provides a shortcut form for specifying a parallel region that contains a single DO directive.

If the END PARALLEL DO directive is not specified, the PARALLEL DO is assumed to end with the DO loop that immediately follows the PARALLEL DO directive. If used, the END PARALLEL DO directive must appear immediately after the end of the DO loop.

The semantics are identical to explicitly specifying a PARALLEL directive immediately followed by a DO directive.
The PARALLEL SECTIONS/END PARALLEL directives provide a shortcut form for specifying a parallel region that contains a single SECTIONS directive. The semantics are identical to explicitly specifying a PARALLEL directive immediately followed by a SECTIONS directive.
The PARALLEL WORKSHARE directive provides a shortcut form for specifying a parallel region that contains a single WORKSHARE directive. The semantics are identical to explicitly specifying a PARALLEL directive immediately followed by a WORKSHARE directive.

Synchronization Constructs

The following list describe the synchronization constructs:

The code enclosed within MASTER and END MASTER directives is executed by the master thread.
The CRITICAL and END CRITICAL directives restrict access to the enclosed code to one thread at a time.

A thread waits at the beginning of a critical section until no other thread is executing a critical section with the same name. All unnamed CRITICAL directives map to the same name. Critical section names are global entities of the program. If a name conflicts with any other entity, the behavior of the program is unspecified.
The BARRIER directive synchronizes all the threads in a team. When it encounters a barrier, a thread waits until all other threads in that team have reached the same point.
The ATOMIC directive ensures that a specific memory location is updated atomically, rather than exposing it to the possibility of multiple, simultaneous writing threads.
The FLUSH directive identifies synchronization points at which thread-visible variables are written back to memory. This directive must appear at the precise point in the code at which the synchronization is required.

Thread-visible variables include the following data items:
- Globally visible variables (common blocks and modules)
- Local variables that do not have the SAVE attribute but have had their address taken and saved or have had their address passed to another subprogram
- Local variables that do not have the SAVE attribute that are declared shared in a parallel region within the subprogram
- Dummy arguments
- All pointer dereferences
The code enclosed within ORDERED and END ORDERED directives is executed in the order in which it would be executed in a sequential execution of an enclosing parallel loop.

An ORDERED directive can appear only in the dynamic extent of a DO or PARALLEL DO directive. This DO directive must have the ORDERED clause specified. For information on directive binding, see “Directive Binding”.

Only one thread is allowed in an ordered section at a time. Threads are allowed to enter in the order of the loop iterations. No thread can enter an ordered section until it is guaranteed that all previous iterations have completed or will never execute an ordered section. This sequentializes and orders code within ordered sections while allowing code outside the section to run in parallel. ORDERED sections that bind to different DO directives are independent of each other.

Data Environment Constructs

The THREADPRIVATE directive makes named common blocks and named variables private to a thread but global within the thread.

In addition to the THREADPRIVATE directive, several directives accept clauses that allow a user to control the scope attributes of variables for the duration of the construct. Not all of the clauses are allowed on all directives; usually, if no data scope clauses are specified for a directive, the default scope for variables affected by the directive is SHARED.

The following list describes the data scope attribute clauses:

The PRIVATE clause declares variables to be private to each thread in a team.
The SHARED clause makes variables shared among all the threads in a team. All threads within a team access the same storage area for SHARED data.
The DEFAULT clause allows the user to specify a PRIVATE, SHARED, or NONE default scope attribute for all variables in the lexical extent of any parallel region. Variables in THREADPRIVATE common blocks are not affected by this clause.
The FIRSTPRIVATE clause provides a superset of the functionality provided by the PRIVATE clause.
The LASTPRIVATE clause provides a superset of the functionality provided by the PRIVATE clause.

When the LASTPRIVATE clause appears on a DO directive, the thread that executes the sequentially last iteration updates the version of the object it had before the construct. When the LASTPRIVATE clause appears in a SECTIONS directive, the thread that executes the lexically last SECTION updates the version of the object it had before the construct. Subobjects that are not assigned a value by the last iteration of the DO or the lexically last SECTION of the SECTIONS directive are undefined after the construct.
The REDUCTION clause performs a reduction on the variables specified, with the operator or the intrinsic specified.

At the end of the REDUCTION, the shared variable is updated to reflect the result of combining the original value of the (shared) reduction variable with the final value of each of the private copies using the operator specified. The reduction operators are all associative (except for subtraction), and the compiler can freely reassociate the computation of the final value (the partial results of a subtraction reduction are added to form the final value).

The value of the shared variable becomes undefined when the first thread reaches the containing clause, and it remains so until the reduction computation is complete. Normally, the computation is complete at the end of the REDUCTION construct; however, if the REDUCTION clause is used on a construct to which NOWAIT is also applied, the shared variable remains undefined until a barrier synchronization has been performed to ensure that all the threads have completed the REDUCTION clause.
The COPYIN clause applies only to common blocks that are declared THREADPRIVATE. A COPYIN clause on a parallel region specifies that the data in the master thread of the team be copied to the thread private copies of the common block at the beginning of the parallel region.
The COPYPRIVATE clause uses a private variable to broadcast a value, or a pointer to a shared object, from one member of a team to the other members.

There are several rules and restrictions that apply with respect to data scope. See the OpenMP specification at http://www.openmp.org/specs for complete details.

Directive Binding

Some directives are bound to other directives. A binding specifies the way in which one directive is related to another. For instance, a directive is bound to a second directive if it can appear in the dynamic extent of that second directive. The following rules apply with respect to the dynamic binding of directives:

A parallel region is available for binding purposes, whether it is serialized or executed in parallel.
The DO, SECTIONS, SINGLE, MASTER, BARRIER, and WORKSHARE directives bind to the dynamically enclosing PARALLEL directive, if one exists. The dynamically enclosing PARALLEL directive is the closest enclosing PARALLEL directive regardless of the value of the expression in the IF clause, should the clause be present.
The ORDERED directive binds to the dynamically enclosing DO directive.
The ATOMIC directive enforces exclusive access with respect to ATOMIC directives in all threads, not just the current team.
The CRITICAL directive enforces exclusive access with respect to CRITICAL directives in all threads, not just the current team.
A directive can never bind to any directive outside the closest enclosing PARALLEL.

Directive Nesting

The following rules apply to the dynamic nesting of directives:

A PARALLEL directive dynamically inside another PARALLEL directive logically establishes a new team, which is composed of only the current thread unless nested parallelism is enabled.
DO, SECTIONS, SINGLE, and WORKSHARE directives that bind to the same PARALLEL directive cannot be nested one inside the other.
DO, SECTIONS, SINGLE, and WORKSHARE directives are not permitted in the dynamic extent of CRITICAL and MASTER directives.
BARRIER directives are not permitted in the dynamic extent of DO, SECTIONS, SINGLE, WORKSHARE, MASTER, CRITICAL, and ORDERED directives.
MASTER directives are not permitted in the dynamic extent of DO, SECTIONS, SINGLE, WORKSHARE, MASTER, CRITICAL, and ORDERED directives.
ORDERED directives must appear in the dynamic extent of a DO or PARALLEL DO directive which has an ORDERED clause.
ORDERED directives are not allowed in the dynamic extent of SECTIONS, SINGLE, WORKSHARE, CRITICAL, and MASTER directives.
CRITICAL directives with the same name are not allowed to be nested one inside the other.
Any directive set that is legal when executed dynamically inside a PARALLEL region is also legal when executed outside a parallel region. When executed dynamically outside a user-specified parallel region, the directive is executed with respect to a team composed of only the master thread.

Prev	Table of Contents	Next
Chapter 4. System Functions and Subroutines		Chapter 6. Compiling and Debugging Parallel Fortran