A directive is a line inserted into Fortran source code that specifies actions to be performed by the compiler. Directive lines are not Fortran statements.
Many MIPSpro 7 Fortran 90 compiler features are implemented as either command line options or directives. The features implemented as command line options are set at compile time and applied to all files in the compilation. The features implemented through directives are set within your Fortran source code, and they apply to portions of your source code.
This chapter introduces the MIPSpro 7 Fortran 90 directive set and describes the general directives.
The sections in this chapter are as follows:
“Using Directives”, describes using directives.
“LNO Directives”, describes the loop nest optimization (LNO) directives.
“Argument Aliasing Directives (ASSERT ARGUMENTALIASING and ASSERT NOARGUMENTALIASING)”, describes the argument aliasing directives.
“Symbol Storage Directives”, describes the symbol storage directives.
“Inlining and IPA Directives (INLINE, NOINLINE, IPA, and NOIPA)”, describes the inlining and IPA directives.
All directives are of the following form:
prefixdirective |
prefix | Each directive begins with a prefix. The prefix needed for each directive is shown in the directive's description. The following directive prefixes are used by the MIPSpro 7 Fortran 90 compiler:
The prefix used also depends on which Fortran source form you are using, as follows:
Because both fixed source form and free source form accept directives that start with the exclamation point (!), that is the initial character used in all directive syntax descriptions in this manual. |
directive | This is the specific directive's syntax. The syntax usually consists of the directive name. Some directives accept arguments. A directive's arguments, if any, are shown in the description for the directive itself. |
The following sections describe the general format for directives and explain how directives are continued across source code lines.
Note: The multiprocessing directives supported in previous MIPSpro 7 Fortran 90 releases are outmoded, and so are the !$PAR, C$PAR, !$, and C$ directive prefixes. This technology is outmoded, but it is still supported for older codes that require this functionality. Silicon Graphics and Cray Research encourage you to modify your code using the OpenMP directives described in Chapter 4, “OpenMP Fortran API Multiprocessing Directives”. |
Some compiler features can be activated on the command line and through compiler directives. The difference is that a command line setting applies to all files in the compilation, but a directive applies to only a program unit or to another specific part of a source file.
Generally, and by default, directives override command line options. There are exceptions to this rule, however. The exceptions, if any, are noted in the introductory text to each directive group.
The range of a particular directive depends on the directive itself, as follows:
If a directive appears within a program unit, it applies only to that program unit. Within a program unit, many directives apply only to the loops that they immediately precede.
If a directive appears outside a program unit (for example, prior to program code in a file) it applies to the entire file.
The descriptions for the individual directives indicate the range of the directive.
It is sometimes necessary to continue a directive across one or more source code lines. The continuation character used and its placement within the directive line depends on the type of directive you are using. The introductory text for each directive group indicates the continuation character that is appropriate for that group.
For all directives in this chapter, the prefix for a directive line that is a continuation line is !*$*&.
Do not use source preprocessor (#) directives within multiline compiler directives.
The loop nest optimization (LNO) directives control loop nest optimizations. By default, directives override command line options. To reverse this, and have command line options override the LNO directives, specify -LNO:ignore_pragmas. For information on the -LNO:ignore_pragmas option, see “-LNO:ignore_pragmas=setting” in Chapter 2.
To continue a directive, the continuation line must begin with !*$*&.
The following directives control loop nest optimizations:
AGGRESSIVEINNERLOOPFISSION
BLOCKABLE
BLOCKINGSIZE, NOBLOCKING
FISSION, FISSIONABLE, NOFISSION
FUSE, FUSEABLE, NOFUSION
INTERCHANGE, NOINTERCHANGE
PREFETCH
PREFETCH_MANUAL
PREFETCH_REF
PREFETCH_REF_DISABLE
UNROLL
The following sections describe the LNO directives.
The AGGRESSIVEINNERLOOPFISSION directive specifies that the following loop should be split into as many loops as possible. In a loop nest, this directive must precede an inner loop.
The format of this directive is as follows:
!*$* AGGRESSIVEINNERLOOPFISSION |
The BLOCKABLE directive specifies that it is legal to cache block the subsequent loops. For more information on controlling cache blocking, see the -LNO:blocking option in “-LNO:blocking=setting” in Chapter 2, and the -LNO:blocking_size option in “-LNO:blocking_size=n1,n2” in Chapter 2.
The format of this directive is as follows:
!*$* BLOCKABLE (do_variable,do_variable[,do_variable]...) |
do_variable | Specify the do_variable names of two or more loops. The loops identified by the do_variable names must be adjacent and nested within each other, although they need not be perfectly nested. |
This directive informs the compiler that these loops can be involved in a blocking situation with each other, even if the compiler would consider such a transformation illegal. The loops must also be interchangeable and unrollable. This directive does not instruct the compiler on which of these transformations to apply.
The BLOCKINGSIZE and NOBLOCKING directives assert that the loop following the directive either is (or is not) involved in a cache blocking for the primary or secondary cache.
The formats of these directives are as follows:
|
n1,n2 | An integer number that indicates the block size. If the loop is involved in a blocking, it will have a block size of n1 for the primary cache and n2 for the secondary cache. The compiler attempts to include this loop within such a block, but it cannot guarantee this. If n1 or n2 are 0, the loop is not blocked, but the entire loop is inside the block. |
Example:
SUBROUTINE AMAT(X,Y,Z,N,M,MM) REAL(KIND=8) X(100,100), Y(100,100), Z(100,100) DO K = 1, N !*$* BLOCKING SIZE (20) DO J = 1, M !*$* BLOCKING SIZE (20) DO I = 1, MM Z(I,K) = Z(I,K) + X(I,J)*Y(J,K) END DO END DO END DO END |
For the preceding code, the compiler makes 20 X 20 blocks when blocking, but it could block the loop nest such that loop K is not included in the tile. If it did not, add a BLOCKINGSIZE(0) directive just before loop K to specify that the compiler should generate a loop such as the following:
SUBROUTINE AMAT(X,Y,Z,N,M,MM) REAL(KIND=8) X(100,100), Y(100,100), Z(100,100) DO JJ = 1, M, 20 DO II = 1, MM, 20 DO K = 1, N DO J = JJ, MIN(M, JJ+19) DO I = II, MIN(MM, II+19) Z(I,K) = Z(I,K) + X(I,J)*Y(J,K) END DO END DO END DO END DO END DO END |
Note that an INTERCHANGE directive can be applied to the same loop nest as a BLOCKINGSIZE directive. The BLOCKINGSIZE directive applies to the loop it directly precedes; it moves with that loop when an interchange is applied.
The NOBLOCKING directive prevents the compiler from involving the subsequent loop in a cache blocking situation.
The fission control directives specify whether the compiler should perform loop fission on the loops that immediately follow these directives.
The formats of these directives are as follows:
|
level | Specify an integer number that indicates the number of loop levels that should undergo loop fission. |
The FISSION directive specifies that loop fission should be attempted. The compiler performs a validity test on the subsequent loops unless you have also specified a FISSIONABLE directive. The NOFISSION directive specifies that the following loop should not undergo fission, but its inner loops, if any, may undergo fission.
These directives do not cause statements to be reordered.
The fusion control directives specify whether the compiler should perform loop fusion on the loops that immediately follow these directives.
The formats of these directives are as follows:
|
n | Specify an integer number that indicates the number of subsequent loops that should undergo loop fusion. The default is 2. |
level | Specify an integer that indicates how deeply the loops should be fused. The level of loop fusion is determined by the maximum perfectly nested loop levels of the fused loops, although partial fusion is allowed. |
Loop iterations may be peeled as needed during loop fusion. The limit of this peeling is 5, or the number specified by the -LNO:fusion_peeling_limit command line option.
The FUSE directive specifies that loop fusion should be attempted. The compiler performs a validity test on the subsequent loops unless you have also specified a FUSEABLE directive. When the FUSEABLE directive is specified, the fusion is done for loops with identical iteration counts. The NOFUSION directive specifies that the following loop should not be fused with any other loop. For more information on the -LNO:fusion_peeling_limit command line option, see “-LNO:fusion_peeling_limit=n” in Chapter 2.
Example. Consider the following code:
DO I = 1,N DO J = 1,N ... END DO END DO DO I = 1,N DO J = 1,N ... END DO END DO |
Fusing the loops with a level of 1 results in the following loop nest:
DO I = 1,N DO J = 1,N ... END DO DO J = 1,N ... END DO END DO |
Fusing the loops with a level of 2 results in the following loop nest:
DO I = 1,N DO J = 1,N ... ... END DO END DO |
The loop interchange control directives specify whether or not the order of the following two or more loops should be interchanged. These directives apply to the loops that they immediately precede.
The formats of these directives are as follows:
|
do_variable | Specifies two or more do_variable names. The do_variable names can be specified in any order, and the compiler reorders the loops. The loops must be perfectly nested. If the loops are not perfectly nested, you may receive unexpected results. |
The compiler reorders the loops such that the loop with do_variable1 is outermost, then loop do_variable2, then loop do_variable3.
The NOINTERCHANGE directive inhibits loop interchange on the loop that immediately follows the directive.
The PREFETCH directive controls the MIPS IV prefetch instruction. Using this directive can increase performance in program units that are likely to encounter cache misses during execution. This directive applies only to the program unit in which it appears.
When the directive is specified, the compiler estimates the memory references that will be cache misses, inserts prefetches for the misses, and schedules the prefetches ahead of their corresponding references. You can specify different levels of prefetching aggressiveness for the primary and secondary cache.
The format of this directive is as follows:
!*$* PREFETCH (primary_cache[,secondary_cache]) |
primary_cache, secondary_cache | For each of these, specify 0, 1, or 2. The number specified indicates the level of prefetching requested for the primary and secondary cache levels, respectively. A 0 disables all prefetching. 1 requests conservative prefetching. 2 requests aggressive prefetching. By default, primary_cache and secondary_cache are both set to 1 when the -r10000 command line option is in effect, and they are set to 0 for all other processor settings. |
This directive is recognized only if the -mips4 and -r10000 command line options are in effect.
The PREFETCH_MANUAL directive specifies whether the PREFETCH_REF and the PREFETCH_REF_DISABLE directives, which perform manual prefetches, should be respected or ignored within a subprogram. This directive applies only to the program unit in which it appears.
The format of this directive is as follows:
!*$* PREFETCH_MANUAL (n) |
n | Specify either 0 or 1 for n. 0 indicates that the compiler should ignore all prefetch directive. 1 indicates that all prefetch directives should be recognized. By default, all prefetch directives are recognized. |
This directive is recognized only if the -mips4 and -r10000 command line options are in effect. For more information on the -mips4 option, see “-mipsn” in Chapter 2. For more information on the -r10000 option, see “-rprocessor” in Chapter 2.
The PREFETCH_REF directive requests prefetching for a specific memory reference. This directive applies only to the loop nest that includes references to array, and the directive must immediately precede the loop nest.
When this directive is specified, all references to array in the subsequent loop nest are ignored by the automatic prefetcher (if enabled).
The format of this directive is as follows:
!*$* PREFETCH_REF=array[,stride=stride[,stride]][,level=level[,level]][,kind=rw][,size=size] |
array | For array, specify identification information for the array. For example: A(I,J). |
stride | Specify prefetching for every stride iterations of the loop. The default is 1. |
level | Specify the level in the memory hierarchy to prefetch, either 1 or 2. The default is 2. 1 specifies a prefetch from secondary cache to primary cache. 2 specifies a prefetch from memory to primary cache. |
rw | Specify rd or wr. rd indicates that the location is read. wr indicates that the location is written. The default is wr. |
size | Specify the size, in KB, of array. Must be a constant. If size is specified, the automatic prefetcher (if enabled) reduces the effective cache size by that amount in its calculations. The compiler tries to issue one prefetch per stride iterations, but this cannot be guaranteed. |
This directive generates a single prefetch instruction to a specified memory reference. It searches for array references that match the supplied reference in the current loop nest and takes the following actions:
If the reference is found, the reference is scheduled relative to the prefetch node, based on the miss latency for the specified level of the cache.
If no such reference is found, the prefetch is generated at the start of the loop body.
This directive is recognized only if the -mips4 and -r10000 command line options are in effect. For more information on the -mips4 option, see “-mipsn” in Chapter 2. For more information on the -r10000 option, see “-rprocessor” in Chapter 2
The PREFETCH_REF_DISABLE directive disables prefetching for all references to an array. This directive applies to all array references within the program unit.
The format of this directive is as follows:
!*$* PREFETCH_REF_DISABLE=array[, size=size] |
array | For array, specify identification information for the array. For example: A(I,J). If the automatic prefetcher is enabled, it ignores array. |
size | Specifies the size, in Kbytes, of array. Must be a constant. The size is used for volume analysis. Volume analysis is performed as part of prefetching analysis. In volume analysis, the compiler tries to determine the amount of data referenced by each loop or loop nest. This information is used when determining whether or not to prefetch memory references. |
This directive is recognized only if the -mips4 and -r10000 command line options are in effect.
The UNROLL directive specifies loop unrolling. This directive applies to the loop that immediately follows the directive.
Inner loop unrolling occurs automatically when -O2 or -O3 are in effect. Non-inner loop unrolling (and jam) occurs when -O3 is in effect.
The format of this directive is as follows:
!*$* UNROLL (n) |
n | Specifies the number of copies of the loop body to be generated, as follows:
The value of n must be at least 2 in order for unrolling to occur. If n = 1, no unrolling is performed. |
Even with this directive specified, unrolling is not performed if the compiler determines that unrolling would be unsafe. To specify that the compiler unroll the loop regardless of its analysis, you must also specify a BLOCKABLE directive. For information on the BLOCKABLE directive, see “Permit Cache Blocking: BLOCKABLE Directive”.
Example. Assume that -O3 is specified and that the outer loop of the following nest will be unrolled by two:
!*$* UNROLL (2) DO I = 1, 10 DO J = 1,100 A(J,I) = B(J,I) + 1 END DO END DO |
With outer loop unrolling, the compiler produces the following nest, in which the two bodies of the inner loop are adjacent to each other:
DO I = 1, 10, 2 DO J = 1,100 A(J,I) = B(J,I) + 1 END DO DO J = 1,100 A(J,I+1) = B(J,I+1) + 1 END DO END DO |
The compiler then jams, or fuses, the inner two loop bodies together, producing the following nest:
DO I = 1, 10, 2 DO J = 1,100 A(J,I) = B(J,I) + 1 A(J,I+1) = B(J,I+1) + 1 END DO END DO |
The ASSERT ARGUMENTALIASING and ASSERT NOARGUMENTALIASING directives allow the compiler to make assumptions about procedure dummy arguments when performing optimizations.
It is possible to call a procedure and specify the same variable or array element in two or more positions of the actual argument list. Within the procedure, two or more dummy argument names, which appear to refer to different memory locations, actually refer to the same location. This practice violates the Fortran standard. You can use the ASSERT ARGUMENTALIASING directive to force the compiler to be more conservative.
By default, ASSERT NOARGUMENTALIASING is in effect.
The formats for these directives are as follows:
|
If these directives appear prior to Fortran source code in a file, they are applied to all program units in the file. If they appear in a program unit, they are applied to that program unit only. If one of these directives is encountered, it remains in effect until reset by the opposing directive.
The following directives control symbol storage:
ALIGN_SYMBOL
FILL_SYMBOL
FLUSH
SECTION_GP
SECTION_NON_GP
The ALIGN_SYMBOL and FILL_SYMBOL directives control the way symbols are stored.
The ALIGN_SYMBOL directive aligns the start of symbol at a specified alignment boundary.
The FILL_SYMBOL directive pads symbol with additional storage so that the symbol is assured not to overlap (even partially) with any other data item within the storage of the specified size. The additional padding required is divided between each end of the specified variable. For example, a FILL_SYMBOL(X,L1CACHELINE) directive guarantees that X does not suffer from false sharing for the primary cache line.
The formats for these directives are as follows:
|
For common block variables, these directives are required at each declaration of the common block. Because the directives modify the allocated storage and its alignment for the named symbol, inconsistent directives can lead to undefined results.
The ALIGN_SYMBOL directive has no effect on fixed-size local symbols, such as simple scalars or arrays of known size (for example symbols declared as REAL(N) or REAL(A(3))). The directive continues to be effective for automatic arrays (stack-allocated arrays of dynamically determined size).
You cannot specify an ALIGN_SYMBOL directive and a FILL_SYMBOL directive for the same symbol.
Example:
! X IS A COMMON BLOCK VARIABLE COMMON X! INTEGER(KIND=4) X !*$* ALIGN_SYMBOL (X, 32) ! X WILL START AT A 32-BYTE BOUNDARY. ! WARNING: THE LAYOUT OF THE COMMON BLOCK WILL BE AFFECTED !*$* ALIGN_SYMBOL (X, 2) ! ERROR: CANNOT REQUEST AN ALIGNMENT LOWER THAN THE NATURAL ! ALIGNMENT OF THE SYMBOL. REAL(KIND=8) Y ! Y IS A COMMON BLOCK OR LOCAL VARIABLE !*$* FILL_SYMBOL (Y, L2CACHELINE) ! ALLOCATE EXTRA STORAGE BOTH BEFORE AND AFTER Y SO THAT ! Y IS WITHIN AN L2CACHELINE (128 BYTES) ALL BY ITSELF. ! THIS CAN BE USEFUL TO AVOID FALSE-SHARING BETWEEN MULTIPLE ! PROCESSORS FOR THE CACHE LINE CONTAINING Y. |
The FLUSH directive identifies synchronization points at which thread-visible variables are written back to memory. This directive must appear at the precise point in the code at which the synchronization is required.
Note: This directive has the same effect as the FLUSH directive described in the OpenMP Fortran API. For more information on the OpenMP FLUSH directive, see “Read and Write Variables to Memory: FLUSH Directive” in Chapter 4. |
Thread-visible variables include the following data items:
Globally visible variables (common blocks and modules).
Local variables that do not have the SAVE attribute but have had their address taken and saved or have had their address passed to another subprogram.
Local variables that do not have the SAVE attribute that are declared shared in a parallel region within the subprogram.
Dummy arguments.
All pointer dereferences.
This directive has the following format:
!*$* FLUSH [(var[, var] ...)] |
The MIPSpro 7 Fortran 90 compiler can reference global data by using the global pointer and an offset value. Using the global pointer (gp) is more efficient than constructing the address at each occurence, but because the offset size is limited to 16 bits, only a limited set of elements can be referenced using the global pointer.
The compiler places global data in gp-relative or non-gp-relative sections, but you can use the SECTION_GP and SECTION_NON_GP directives to specify the variables to go within the gp-relative section and the variables that need to be addressed explicitly.
The formats for these directives are as follows:
|
symbol | Enter one or more symbols. Separate multiple symbols with commas. Valid symbols are common block names, variables specified on SAVE statements, and module names. If a module name is specified, all storage in the module is affected. If a common block name is specified, it must be of the following form: /name/. |
The following are the inlining and interprocedural analysis (IPA) directives:
INLINE, NOINLINE
IPA, NOIPA
Note: Neither inlining nor IPA are enabled by default. By default, the directives in this section, if present in your source code, are ignored. To enable the directives and turn on inlining and IPA, specify the -INLINE: option or the -IPA: option on your f90(1) command line. For more information on the command line interaction with these features, see Chapter 2, “Invoking MIPSpro 7 Fortran 90”, or see one of the following man pages: f90(1) or ipa(5). |
Inlining is the process of replacing a procedure reference with a copy of the procedure's code. This eliminates procedure call overhead and exposes the relationships between the procedure code, the return value, and the surrounding code. The INLINE and NOINLINE directives allow you to specify procedures that should be inlined.
Interprocedural analysis (IPA) is a MIPSpro compiler feature that includes inlining, common block array padding, constant propagation, dead procedure elimination, dead variable elimination, and global name optimizations. For detailed information on the IPA feature, see the ipa(5) man page. The IPA and NOIPA directives allow you to control IPA.
The formats of these directives are as follows:
|
|
location | Specify one of the following for location:
| ||||||||
name | For the inlining directives, each name specification represents one or more routines to be inlined. If no routines are named, all routines in the program are inlined. For the IPA directives, each name specification represents one or more routines to undergo IPA. If no routines are named, all routines in the program undergo IPA. |
Example. Consider the following code fragment:
DO I = 1,N !*$* INLINE (BETA) HERE CALL BETA(I,1) ENDDO CALL BETA(N,2) |
Using the specifier ROUTINE rather than HERE in this example would inline both calls to BETA. Note that -INLINE:=ON must be specified on the f90(1) command line when this code is compiled in order for the inlining directive to be recognized.