If your system includes multiple central processing units (CPUs), your program may be able to make use of multitasking, or running simultaneously on more than one CPU. This technology speeds up program execution by decreasing elapsed time. You can determine the number of CPUs on your system by entering the hinv(1) command.
The compiler automatically recognizes many parallel coding constructs, and it compiles them for multitasking without requiring additional user input; this capability is called Autotasking.
Autotasking directives let you specify the level of parallelism desired. You can start and end parallel processing at any number of suitable points within a subprogram. These directives are useful when the compiler fails to recognize parallelism that you know exists. This can occur, for example, when you have subroutine calls that can be executed in parallel.
Note: The directives in this section are outmoded, but they are still supported for older codes that require this functionality. Silicon Graphics encourages you to write new codes using the OpenMP directives described in Chapter 4, “OpenMP Fortran API Multiprocessing Directives”. |
This section provides an overview of the Autotasking directives recognized by the compiler.
Caution: The ability to use Autotasking directives in a subprogram that host associates a variable can result in undefined behavior. This applies only to Autotasking directives; it does not apply to parallelism detected by the compiler.
A branch out of a parallel region is not permitted and can produce incorrect results. |
Autotasking directives control the way the compiler multitasks your program. You can insert tasking directive lines directly into your source code. The compiler supports the following Autotasking directives:
CASE, ENDCASE
CNCALL
DOALL
DOPARALLEL, ENDDO
GUARD, ENDGUARD
NUMCPUS
PARALLEL, ENDPARALLEL
PERMUTATION
The following sections describe the Autotasking directives.
The following sections describe how to use the CF90 Autotasking directives and the effects they have on programs.
For additional general information on using directives, see “Using Directives” in Chapter 3.
In the following example, an asterisk (*) appears in column 6 to indicate that the second line is a continuation of the preceding line:
!MIC$ GU !MIC$*ARD |
If you want to specify more than one directive on a line, separate each directive with a comma. Some directives require that you specify one or more arguments; when specifying a directive of this type, no other directive can appear on the line.
Spaces can precede, follow, or be embedded within a directive, regardless of source form.
Do not use source preprocessor (#) directives within multiline compiler directives (CMIC$ or !MIC$).
The range and placement of directives is as follows:
The Autotasking directives must appear within a program unit.
The ENDDO directive must appear after the loop body of a DOPARALLEL loop, if it appears. The corresponding DOPARALLEL directive must be present.
The following directives apply only to the next loop encountered lexically:
DOALL
DOPARALLEL
The following Autotasking directives must appear as pairs within a program unit:
CASE, ENDCASE
GUARD, ENDGUARD
PARALLEL, ENDPARALLEL
The -x option on the f90(1) command accepts one or more directives as arguments. When your input is compiled, the compiler ignores directives named as arguments to the -x option. If you specify -x mipspro, all directives are ignored. If you specify -x dirname, a particular directive is ignored. For more information on this command line option, see “-xdirlist” in Chapter 2.
The !MIC$ CASE directive serves as a separator between adjacent code blocks that can be executed concurrently. It marks the beginning of a control structure and signals that the code following it will be executed on a single processor.
!MIC$ ENDCASE serves as the terminator for a group of one or more parallel CASE directives. All work within the control structure must complete before execution continues with the code below the ENDCASE. The compiler does not automatically generate CASE directives.
The formats for these directives are as follows:
|
Example. A single CASE/ENDCASE directive pair can also be used within a parallel region to allow only one processor to execute a code block, as follows:
!MIC$ PARALLEL !MIC$ CASE CALL XYZ !MIC$ ENDCASE : !MIC$ DOPARALLEL DO I = 1, IMAX : END DO !MIC$ ENDPARALLEL |
In the preceding code, only one processor calls XYZ, and then all available processors execute the code following the ENDCASE.
The !MIC$ CNCALL directive allows a loop to be Autotasked by asserting that subroutines called from the loop have no loop-related side effects (that is, they do not modify data referenced in other iterations of the loop) and therefore can be called concurrently by separate iterations of the loop. CNCALL is inserted immediately preceding the loop.
The format for this directive is as follows:
!MIC$ CNCALL |
Example:
!MIC$ CNCALL DO I = 1, N CALL CRUNCH(A(I), B(I)) END DO |
The !MIC$ DOALL directive indicates that the DO loop beginning on the next line may be executed in parallel by multiple processors. No directive is needed to end a DOALL loop, (that is, the DOALL initiates a parallel region that contains only a DO loop with independent iterations). The loop index variable for a DOALL must be specified as a PRIVATE variable.
For a !MIC$ DOALL directive, all the variables and arrays in the region must be defined in a SHARED or PRIVATE parameter.
The format of this directive is as follows:
!MIC$ DOALL parameter[[,]parameter] ... [[,]work_distribution] |
parameter | Table C-1, describes parameters for the DOALL directive. More than one parameter can appear on the directive, but they must be separated by commas or blanks. | |
work_distribution | Parameters that specify the work distribution policy for iterations of the parallel DO loop. Only one can be used for a given DO loop. By default, iterations are distributed one at a time. Table C-2, describes the work distribution parameters. |
The default scheduling for a DOALL directive is STATIC. In addition, CHUNKSIZE = CEILING(n/p), where n is the number of trips and p is the number of processors.
The DOALL directive does not accept the MAXCPUS or AUTOSCOPE clauses; their presence generates a fatal error.
Table C-1. Autotasking directive parameter values
parameter | Description |
---|---|
IF(expr) | Performs a run-time test to choose between uniprocessing and multiprocessing. When not specified, multiprocessing is chosen if the loop is not in a routine that was called from within a parallel region. The logical expression (expr) determines (at run time) whether multiprocessing will occur. When expr is true, multiprocessing is enabled. |
PRIVATE(var[,var] ...) | Specifies that the variables listed will have private scope; that is, each task (original or helper) will have its own private copy of these variables. The PRIVATE clause identifies those variables that are not shared between parallel processes. One variable cannot be declared both PRIVATE and SHARED. The loop control variable of the DOALL loop cannot be specified as SHARED; it must be specified as PRIVATE. Variables cannot be subobjects (that is, array elements or components of derived types). |
SAVELAST | Specifies that the values of private variables, from the final iteration of a DOALL directive, will continue in the original task after execution of the iterations of the DOALL. By default, private variables are not guaranteed to retain the last iteration values. SAVELAST can be used only with DOALL, and if the full iteration set is not completed (for example, if the loop is exited early), the values of private variables are indeterminate. |
SHARED(var[,var] ...) | Specifies that the variables listed will have shared scope; that is, they are accessible to both the original task and all helper tasks. The SHARED clause identifies those variables that are shared between parallel processes. One variable cannot be declared both PRIVATE and SHARED. The loop control variable of the DOALL loop cannot be specified as SHARED; it must be specified as PRIVATE. Variables cannot be subobjects (that is, array elements or components of derived types). |
Table C-2. Autotasking directive work_distribution values
work_distribution |
Description |
---|---|
CHUNKSIZE(n) | Specifies the number of iterations to distribute to an available processor. n is an integer expression. For best performance, n should be an integer constant. For example, given 100 iterations and CHUNKSIZE(4), 4 iterations at a time are distributed to each available processor until the 100 iterations are complete. By default, n is the number of loop iterations divided by the number of processors. |
GUIDED[(vl)] | Specifies the use of guided self-scheduling to distribute the iterations to available processors. This mechanism minimizes synchronization overhead while providing acceptable dynamic load balancing. The vl argument is the vector length. vl must be of type integer and can be either a constant or a variable. The default vl is 1. |
The !MIC$ DOPARALLEL directive indicates that the DO loop beginning on the next line may be executed in parallel by multiple processors. No directive is needed to end a DOPARALLEL loop.
The !MIC$ ENDDO directive extends a control structure beyond the DO loop. Without a !MIC$ ENDDO directive, all CPUs synchronize immediately after the loop, so that no processors can continue executing until all of the iterations are done. A !MIC$ ENDDO directive moves this point of synchronization from the end of the loop to the line of the !MIC$ ENDDO directive.
This lets the compiler use parallelism in loops containing some forms of reduction computations. These directives can be used only within a parallel region bounded by the PARALLEL and ENDPARALLEL directives.
All variables and arrays in a parallel region must be declared as PRIVATE or SHARED.
The formats for these directives are as follows:
|
The work_distribution arguments are described in Table C-2. Only one work_distribution can be used for a given DO loop.
In the following example, a parallel region is defined by PARALLEL and ENDPARALLEL. A reduction computation is implemented by a DOPARALLEL/ENDDO pair, which ensures that all contributions to SUM and BIG are included, and GUARD/ENDGUARD, which protects the updating of shared variables SUM and BIG.
SUM = 0.0 BIG = -1.0 !MIC$ PARALLEL PRIVATE(XSUM,XBIG,I) !MIC$* SHARED(SUM,BIG,AA,BB,CC) XSUM = 0.0 XBIG = -1.0 !MIC$ DOPARALLEL DO I = 1, 2000 : XSUM = XSUM + (AA(I)*(BB(I)-CC(AA(I)))) XBIG = MAX(ABS(AA(I)*BB(I)), XBIG) : END DO !MIC$ GUARD SUM = SUM + XSUM BIG = MAX(XBIG,BIG) !MIC$ ENDGUARD !MIC$ ENDDO !MIC$ ENDPARALLEL |
The !MIC$ GUARD and !MIC$ ENDGUARD directives delimit a critical region, providing the necessary synchronization to protect or guard the code inside the critical region. A critical region is a code block that is to be executed by only one processor at a time, although all processors that enter a parallel region will execute it.
The formats for these directives are as follows:
|
n | Mutual exclusion flag; two regions with the same flag cannot be active concurrently. n must be of type integer and can be a variable or an expression, from which the low-order 6 bits are used. For example, GUARD 1 and GUARD 2 can be active concurrently, but two GUARD 7 directives cannot. |
For optimal performance, no n should be specified. Otherwise, n should be an integer constant; a general expression can be used for the unusual case that the critical region number must be passed to a lower-level routine. When n is not provided, the critical region blocks only other instances of itself, but no other critical regions. Critical regions may appear anywhere in a program. That is, they are not limited to parallel regions.
Numbered GUARD directives are not supported. They are implemented as unnamed GUARD directives. This can lead to deadlock if the user has nested GUARD directives.
The !MIC$ NUMCPUS directive globally indicates the maximum number of CPUs that a section of code can use effectively. It does not guarantee that this number of processors will actually be assigned. The NUMCPUS directive is in effect until a subsequent NUMCPUS directive is encountered. The NUMCPUS directive stays in effect across program units. The NUMCPUS directive remains in effect for all subsequently called subroutines. Without this directive, CPUs are allocated based on the MP_SET_NUMTHREADS environment variable and workload.
The format for this directive is as follows:
!MIC$ NUMCPUS (ncpus) |
ncpus | Globally specifies the maximum number of CPUs that a code can use effectively. ncpus must be of type integer and can be a constant, variable, or expression. |
The number of CPUs specified with this directive should be equal to or less than the number of CPUs specified by the MP_SET_NUMTHREADS environment variable. If the number requested with the NUMCPUS directive is greater than the number specified by the MP_SET_NUMTHREADS environment variable, no error is issued, but the directive has no effect.
The !MIC$ PARALLEL and !MIC$ ENDPARALLEL directives mark, respectively, the beginning and end of a parallel region. Parallel regions are combinations of redundant code blocks and partitioned code blocks. The formats for these directives are as follows:
|
The parameters are described in Table C-1.
The PARALLEL directive indicates where multiple processors enter execution. The portion of code that all processors execute until reaching a DOPARALLEL directive is called a redundant code block. Because the iterations of the DO loop within a DOPARALLEL directive are distributed across available processors, this portion of code is called the partitioned code block. The scope of a variable in a parallel region is either shared or private. Shared variables are used by all processors; private variables are unique to a processor.
When the compiler generates code for a !MIC$ PARALLEL directive, all the variables and arrays in the region must be defined in a SHARED or PRIVATE parameter.
The !MIC$ PERMUTATION directive declares that an integer array has no repeated values. This is useful when the integer array is used as a subscript for another array (vector-valued subscript). The format for this directive is as follows:
!MIC$ PERMUTATION (ia[, ia] ...) |
ia | Integer array that has no repeated values for the entire routine. |
When an array with a vector-valued subscript appears on both sides of the equal sign in a loop, many-to-one assignment is possible even when the subscript is identical. Many-to-one assignment occurs if any repeated elements exist in the subscripting array. If it is known that the integer array is used merely to permute the elements of the subscripted array, it can often be determined that many-to-one assignment does not exist with that array reference.
Sometimes a vector-valued subscript is used as a means of indirect addressing because the elements of interest in an array are sparsely distributed; in this case, an integer array is used to select only the desired elements, and no repeated elements exist in the integer array, as in the following example:
!MIC$ PERMUTATION(IPNT) ! IPNT has no repeated values ... DO I = 1, N A(IPNT(I)) = A(IPNT(I)) + B(I) END DO |
The following examples show shared and private variables and arrays.
The following examples show read-only variables:
!MIC$ DOALL PRIVATE(I) SHARED(N1,N2,A) DO I = N1, N2 ...= A END DO |
A is a shared variable because it is a read-only variable. All processors share the same location for A.
!MIC$ DOALL SHARED(N1,N2,M1,M2,V) PRIVATE(I,J) DO 10 I = N1, N2 DO 10 J = M1, M2 ... = V(J) END DO |
V is shared because it is a read-only array. N1, N2, M1, and M2 are also shared because they are read-only variables. I and J are written and then read, so they are private variables.
The following example shows an array indexed by the loop index:
!MIC$ DOALL SHARED(N1,N2,V,U,J) PRIVATE(I,T) DO I = N1, N2 T = V(I) U(I,J) = T END DO |
U and V are shared arrays because they are indexed by the loop index. All processors share the same location for V and U. T is written and then read, so it is a private variable. J is shared because it is a read-only variable.
The following example shows read-then-write variables:
SUM = 0.0 !MIC$ DOALL SHARED(N1,N2,V,SUM) PRIVATE(I,T) DO I = N1, N2 T = V(I) !MIC$ GUARD SUM = SUM + T !MIC$ ENDGUARD END DO |
SUM is a shared variable because it is read before it is written. Special care is needed in writing into a shared variable that is not indexed by the loop control variable.