This chapter provides information on how to view and analyze experiment results. It consists of the following sections:
Performance data is examined using prof, a text-based report generator that prints to stdout.
The prof command can be used in two modes:
To generate a report from performance data gathered during experiments recorded by ssrun:
prof <options> <perf-data-file> <perf-data-file> ... |
To generate a report from data files produced by running a program that has been instrumented by pixie:
prof executable_name [options] [pixie counts file] |
This chapter focuses on the use of prof to generate reports from ssrun experiments. For information on prof for a pixie experiment, see Chapter 8, "Using SpeedShop in Expert Mode: pixie."
The syntax for prof when using it with data files from ssrun is:
prof options data_file data_file ... |
options | Zero or more of the options described in Table 7-2. | |
data_file | One or more names of performance data files generated by ssrun. These files are usually of the format prog_name.exp_type.id. The experiment ID, id, consists of one or two letters (designating the process type) and the process ID number. See Table 7-1 for letter codes and descriptions. |
Table 7-1 lists the letter codes for id.
Table 7-1. Letter Codes in Experiment ID Numbers
Letter Code | Description |
---|---|
m | Master process created by ssrun |
p | Process created by a call to sproc() |
f | Process created by a call to fork() |
s | Process created by a call to system() |
e | Process created by a call to exec() |
fe | Process created by a call to fork() and exec() |
Table 7-2 lists prof options. For more information, see the prof reference page.
When you call prof with one or more SpeedShop performance data files, it collects the data from all the output files and produces a listing depending on the experiment type. The prof command is able to detect which experiment was run and generate an appropriate report. It provides reports for all experiment types.
In cases where prof accepts more than one data file as input, it sums up the results. The multiple input data files must be generated from the same executable, using the same experiment type.
prof may report times for procedures named with a prefix of *DF*, for example *DF*_hello.init_2. DF stands for "Dummy Function," and indicates cycles spent in parts of text which are not in any function: init and fini sections, and MIPS.stubs sections, for example.
The types of reports that prof generates are described in the following sections:
For usertime experiments, prof generates a list of callers and callees of each function, with information on how much time was spent in the function, its callers and its callees.
The report shows information for each function, its callers and its callees. The function names are show in the right-hand column of the report. The function that is being reported is shown outdented from its caller and callee(s). For example, the first function shown in this report is __start(), which has no callers and two callees. The remaining columns are described below:
The index column provides an index number for reference.
The %samples column shows the cumulative percentage of time spent in each function.
The self column shows how much time, in seconds, was spent in the function.
The descendents columns shows how much time, in seconds, was spent in callees of the function.
The total column provides information on the number of samples of the function.
This example is a truncated version of the full report. For a complete report see "Generating a Report".
----------------------------------------------------------------------- Profile listing generated Mon Nov 18 11:43:45 1996 with: prof generic.usertime.m24479 ----------------------------------------------------------------------- Total Time (secs) : 43.98 Total Samples : 1466 Stack backtrace failed: 1 Sample interval (ms) : 30 CPU : R4600 FPU : R4600 Clock : 100.0MHz Number of CPUs : 1 ----------------------------------------------------------------------- index %Samples self descendents total name [1] 99.9% 0.00 43.95 1465 __start |
For [f]pcsamp[x] experiments, prof generates a function list annotated with the number of samples taken for the function, and the estimated time spent in the function.
The samples column shows how many samples of the function were taken.
The time(%) column shows the amount of time, and the percentage of that time over the total time that was spent in the function.
The cum time(%) column shows how much time has been spent up to and including the procedure being examined.
The procedure (dso:file) column lists the procedure, its DSO name and file name. For example, the first line reports statistics for the procedure anneal in the file generic.c in the generic executable.
------------------------------------------------------------------------------- Profile listing generated Sun May 19 17:21:27 1996 with: prof generic.fpcsamp.m14480 ------------------------------------------------------------------------------- samples time CPU FPU Clock N-cpu S-interval Countsize 19077 19s R4000 R4010 150.0MHz 1 1.0ms 2(bytes) Each sample covers 4 bytes for every 1.0ms ( 0.01% of 19.0770s) ------------------------------------------------------------------------------- -p[rocedures] using pc-sampling. Sorted in descending order by the number of samples in each procedure. Unexecuted procedures are excluded. ------------------------------------------------------------------------------- samples time(%) cum time(%) procedure (dso:file) 17794 18s( 93.3) 18s( 93.3) anneal (/usr/demos/SpeedShop/ generic/generic:/usr/demos/Speedshop/generic/generic.c) |
For the various hwc experiments, prof generates a function list annotated with the number of overflows generated by the function.
The overflows(%) column shows the number of overflows caused by the function, and the percentage of that number over the total number of overflows in the program.
The cum overflows(%) column shows a cumulative number and percentage of overflows.
The procedure (dso:file) column shows the procedure name and the DSO and filename that contain the procedure.
------------------------------------------------------------------------------- Profile listing generated Sun May 19 17:35:21 1996 with: prof generic.dsc_hwc.m5999 ------------------------------------------------------------------------------- Counter : Sec cache D misses Counter overflow value: 131 Total numer of ovfls : 10 CPU : R10000 FPU : R10010 Clock : 196.0MHz Number of CPUs : 1 ------------------------------------------------------------------------------- -p[rocedures] using counter overflow. Sorted in descending order by the number of overflows in each procedure. Unexecuted procedures are excluded. ------------------------------------------------------------------------------- overflows(%) cum overflows(%) procedure (dso:file) 4( 40.0) 4( 40.0) memcpy (/usr/lib64/libc.so.1: /work/irix/lib/libc/libc_64_M4/strings/bcopy.s) |
For ideal experiments, prof generates a function list annotated with the number of cycles and instructions attributed to the function, and the estimated time spent in the function.
prof does not take into account interactions between basic blocks. Within a single basic block, prof computes cycles for one execution and multiplies it with the number of times that basic block is executed.
If any of the object files linked into the application have been stripped of line-number information (with ld -x for example), prof warns about the affected procedures. The instruction counts for such procedures are shown as a procedure total, not on a per-basic-block basis. Where a line number would normally appear in a report on a function without line numbers, question marks appear instead.
The cycles(%) column reports the number and percentage of machine cycles used for the procedure. For example, 2524610038 cycles, or 94.81% of cycles were spent in the anneal() procedure.
The cum% column shows the cumulative percentage of calls. For example, 99.88% of all calls were spent between the top two functions in the listing: anneal() and slaveusrtime().
The secs column shows the number of seconds spent in the procedure. For example, 16.83 seconds were spent in the anneal() procedure. The time represents an idealized computation based on modelling the machine. Potential floating-point interlocks and memory latency time (cache misses and memory bus contention) are ignored.
The instrns column shows the number of instructions executed for a procedure. For example, there were 1797940023 instructions devoted to the anneal() procedure.
The calls column reports the number of calls to the procedure. For example, there was just one call to the anneal() procedure.
The procedure (dso:file) column lists the procedure, its DSO name and filename. For example, the first line reports statistics for the procedure anneal() in the file generic.c in the generic executable.
Prof run at: Sun May 19 17:49:10 1996 Command line: prof generic.ideal.m14517 2662778531: Total number of cycles 17.75186s: Total execution time 1875323907: Total number of instructions executed 1.420: Ratio of cycles / instruction 150: Clock rate in MHz R4000: Target processor modelled --------------------------------------------------------- Procedures sorted in descending order of cycles executed. Unexecuted procedures are not listed. Procedures beginning with *DF* are dummy functions and represent init, fini and stub sections. --------------------------------------------------------- cycles(%) cum % secs instrns calls procedure(dso:file) 2524610038(94.81) 94.81 16.83 1797940023 1 anneal(generic:/usr/demos/ SpeedShop/generic/generic.c) |
If the -gprof flag is added to prof, a list of callers and callees of each function is provided:
self kids called/total parents index cycles(%) self(%) kids(%) called+self name index self kids called/total children [1] 2661528037(99.95%) 71( 0.00%) 2661527966(100.00%) 0 __start [1] 44 2661527913 1/1 main [2] 5 0 1/1 __istart [107] 4 0 1/1 __readenv_sigfpe [108] -------------------------------------------------------------------------------- 44 2661527913 1/1 __start [1][2] 2661527957(99.95%) 44( 0.00%) 2661527913(100.00%) 1 main [2] 2152 2661524760 1/1 Scriptstring[3] 67 934 1/1 exit [55] -------------------------------------------------------------------------------- 2152 2661524760 1/1 main [2] [3] 2661526912(99.95%) 2152( 0.00%) 2661524760(100.00%) 1 Scriptstring [3] 40 2525080081 1/1 usrtime [4] 82 135044460 1/1 libdso [6] 68058 1148856 1/2 iofile [10] 124 52933 2/8 genLog [16] 7211 45001 1/1 dirstat [27] 1438 32051 1/1 linklist [31] 632 32051 1/1 fpetraps [32] 124 10922 2/19 fprintf [20] 696 0 45/45 strcmp [61] -------------------------------------------------------------------------------- 40 2525080081 1/1 Scriptstring[3] [4] 2525080121(94.83%) 40( 0.00%) 2525080081(100.00%) 1 usrtime [4] 2524610038 437992 1/1 anneal [5] 62 26466 1/8 genLog [16] 62 5461 1/19 fprintf [20] |
The report shows information for each function. The function name is show in the right column of the report. The remaining columns are described below.
The index column provides an index number for reference.
The %FPEs column shows the percentage of the total number of floating point exceptions that were found in the function.
The self column shows how many floating point exceptions were found in the function. For example, 0 floating point exceptions were found in __start().
The descendents columns shows how many floating point exceptions were found in the descendents of the function.
The totals column provides information on the number of floating point exceptions out of the total that were found.
----------------------------------------------------------------------- Profile listing generated Mon Nov 18 11:46:33 1996 with: prof generic.fpe.m18823 ----------------------------------------------------------------------- Total FPEs : 4 Stack backtrace failed: 0 CPU : R4600 FPU : R4600 Clock : 100.0MHz Number of CPUs : 1 ----------------------------------------------------------------------- index %FPEs self descendents total name [1] 100.0% 0 4 4 __start |
This section shows the output from calling prof with some of the options available for prof.
For pcsamp and ideal experiments, the -dis option to prof can be used to obtain machine instruction information. prof provides the standard report and then appends the machine instruction information to the end of the report. The example below shows partial output from prof, for a pcsamp experiment.
------------------------------------------------------------------------------- Profile listing generated Tue May 27 18:04:10 1997 with: prof -dis generic.pcsamp.m875 ------------------------------------------------------------------------------- samples time CPU FPU Clock N-cpu S-interval Countsize 4142 41s R4600 R4600 100.0MHz 1 10.0ms 2(bytes) Each sample covers 4 bytes for every 10.0ms ( 0.02% of 41.4200s) ------------------------------------------------------------------------------- -p[rocedures] using pc-sampling. Sorted in descending order by the number of samples in each procedure. Unexecuted or inlined procedures are excluded. ------------------------------------------------------------------------------- samples time(%) cum time(%) procedure (dso:file) 3975 40s( 96.0) 40s( 96.0) anneal (generic:generic.c) 124 1.2s( 3.0) 41s( 99.0) slaveusrtime (./dlslave.so:dlslave.c) 32 0.32s( 0.8) 41s( 99.7) _read (/usr/lib32/libc.so.1:/xlv1/ bonsai-sep09/work/irix/lib/libc/libc_n32_M3/sys/read.s) 4 0.04s( 0.1) 41s( 99.8) _xstat (/usr/lib32/libc.so.1:/xlv1/ bonsai-sep09/work/irix/lib/libc/libc_n32_M3/sys/xstat.s) 2 0.02s( 0.0) 41s( 99.9) fread (/usr/lib32/libc.so.1:/xlv1/ bonsai-sep09/work/irix/lib/libc/libc_n32_M3/stdio/fread.c) 1 0.01s( 0.0) 41s( 99.9) iofile (generic:generic.c) 1 0.01s( 0.0) 41s( 99.9) usrtime (generic:generic.c) 1 0.01s( 0.0) 41s(100.0) _write (/usr/lib32/libc.so.1:/xlv1/ bonsai-sep09/work/irix/lib/libc/libc_n32_M3/sys/write.s 1 0.01s( 0.0) 41s(100.0) _morecore (/usr/lib32/libc.so.1:/xlv1/ bonsai-sep09/work/irix/lib/libc/libc_n32_M3/gen/malloc.c) 1 0.01s( 0.0) 41s(100.0) next (/usr/lib32/libc.so.1:/xlv1 /bonsai-sep09/work/irix/lib/libc/libc_n32_M3/math/drand48.c) 4142 41s(100.0) 41s(100.0) TOTAL ------------------------------------------------------------------------------- * -dis[assemble] listing annotated pc-samples * * Procedures with zero samples are excluded. * ------------------------------------------------------------------------------- ... generic.c anneal: <0x100065b8-0x100068c4> 3975 total samples(95.97%) [1514] 0x100065b8 0x27bdffd0 addiu sp,sp,-48 # 1 [1514] 0x100065bc 0xffbf0018 sd ra,24(sp) # 2 [1514] 0x100065c0 0xffbc0020 sd gp,32(sp) # 3 [1514] 0x100065c4 0x3c030002 lui v1,0x2 # 4 [1514] 0x100065c8 0x246399e0 addiu v1,v1,-26144 # 5 [1514] 0x100065cc 0x0323e021 addu gp,t9,v1 # 6 [1516] 0x100065d0 0xd7808040 ldc1 $f0,-32704(gp) # 7 <2 cycle stall for following instruction> [1516] 0x100065d4 0xf7a00000 sdc1 $f0,0(sp) # 10 [1518] 0x100065d8 0x24010001 li at,1 # 11 [1518] 0x100065dc 0x8f8281c0 lw v0,-32320(gp) # 12 <2 cycle stall for following instruction> [1518] 0x100065e0 0xac410000 sw at,0(v0) # 15 [1519] 0x100065e4 0x8f99819c lw t9,-32356(gp) # 16 <2 cycle stall for following instruction> [1519] 0x100065e8 0x0320f809 jalr ra,t9 # 19 [1519] 0x100065ec 0000000000 nop # 20 <2 cycle stall for following instruction> [1527] 0x100065f0 0xafa00008 sw zero,8(sp) # 23 [1527] 0x100065f4 0x8fa40008 lw a0,8(sp) # 24 <2 cycle stall for following instruction> [1527] 0x100065f8 0x28842710 slti a0,a0,10000 # 27 [1527] 0x100065fc 0x108000ac beq a0,zero,0x100068b0 # 28 [1527] 0x10006600 0000000000 nop # 29 <2 cycle stall for following instruction> [1529] 0x10006604 0x24070001 li a3,1 # 32 ^------ 1 samples(0.02%)------^ [1529] 0x10006608 0xafa7000c sw a3,12(sp) # 33 [1529] 0x1000660c 0x8f8681b8 lw a2,-32328(gp) # 34 <2 cycle stall for following instruction> [1529] 0x10006610 0x8cc60000 lw a2,0(a2) # 37 <2 cycle stall for following instruction> [1529] 0x10006614 0x24c6ffff addiu a2,a2,-1 # 40 [1529] 0x10006618 0x8fa5000c lw a1,12(sp) # 41 <2 cycle stall for following instruction> [1529] 0x1000661c 0x00a6282a slt a1,a1,a2 # 44 ^------ 1 samples(0.02%)------^ [1529] 0x10006620 0x10a0009c beq a1,zero,0x10006894 # 45 [1529] 0x10006624 0000000000 nop # 46 <2 cycle stall for following instruction> [1530] 0x10006628 0x240a0001 li t2,1 # 49 ^------ 2 samples(0.05%)------^ [1530] 0x1000662c 0xafaa0010 sw t2,16(sp) # 50 [1530] 0x10006630 0x8f8981b8 lw t1,-32328(gp) # 51 ^------ 1 samples(0.02%)------^ <2 cycle stall for following instruction> [1530] 0x10006634 0x8d290000 lw t1,0(t1) # 54 <2 cycle stall for following instruction> [1530] 0x10006638 0x2529ffff addiu t1,t1,-1 # 57 [1530] 0x1000663c 0x8fa80010 lw t0,16(sp) # 58 <2 cycle stall for following instruction> [1530] 0x10006640 0x0109402a slt t0,t0,t1 # 61 ^------ 3 samples(0.07%)------^ [1530] 0x10006644 0x11000089 beq t0,zero,0x1000686c # 62 ^------ 1 samples(0.02%)------^ [1530] 0x10006648 0000000000 nop # 63 <2 cycle stall for following instruction> [1531] 0x1000664c 0x8fa90010 lw t1,16(sp) # 66 ^------ 17 samples(0.41%)------^ <2 cycle stall for following instruction> [1531] 0x10006650 0x25290001 addiu t1,t1,1 # 69 ^------ 18 samples(0.43%)------^ [1531] 0x10006654 0x8fab000c lw t3,12(sp) # 70 ^------ 15 samples(0.36%)------^ <2 cycle stall for following instruction> [1531] 0x10006658 0x256b0001 addiu t3,t3,1 # 73 ^------ 33 samples(0.80%)------^ [1531] 0x1000665c 0x000b5080 sll t2,t3,2 # 74 ^------ 21 samples(0.51%)------^ [1531] 0x10006660 0x014b5021 addu t2,t2,t3 # 75 ^------ 9 samples(0.22%)------^ [1531] 0x10006664 0x000a50c0 sll t2,t2,3 # 76 ^------ 15 samples(0.36%)------^ |
The listing shows statistics about the procedure anneal() in the file generic.c and lists the beginning and ending addresses of anneal(): <0x100065b8-0x100068c4>. The five columns display the following information:
Column... | Displays... |
1 | Line number of the instruction: [1514]. |
2 | Beginning address of the instruction: 0x100065b8. |
3 | Instruction in hexadecimal: 0x27bdffd0. |
4 | Assembler form (mnemonic) of the instruction: addiu sp,sp,-48. |
5 | Cycle in which the instruction executed: # 1. |
Other information includes:
The number of times the immediately preceding branch was executed and taken (ideal only).
The total number of cycles in a basic block and the percentage of the total cycles for that basic block, the number of times the branch terminating that basic block was executed, and the number of cycles for one execution of that basic block (ideal only).
The total number of samples in an instruction (pcsamp only).
Any cycle stalls, that is, cycles that were wasted.
For ideal experiments, the -S option to prof can be used to obtain source line information. prof provides the standard report and then appends the source line information to the end of the report.
This example shows output from calling prof for an ideal experiment:
Prof run at: Tue May 27 18:04:51 1997 Command line: prof -S -dis generic.ideal.m876 3900085040: Total number of cycles 39.00085s: Total execution time 2046668286: Total number of instructions executed 1.906: Ratio of cycles / instruction 100: Clock rate in MHz R4600: Target processor modelled --------------------------------------------------------- Procedures sorted in descending order of cycles executed. Unexecuted or inlined procedures are not listed. Procedures beginning with *DF* are dummy functions and represent init, fini and stub sections. --------------------------------------------------------- cycles(%) cum % secs instrns calls procedure(dso:file) 3754320037(96.26) 96.26 37.54 1971220024 1 anneal(generic.pixie:generic.c) 145001146( 3.72) 99.98 1.45 75000728 1 slaveusrtime(./dlslave.so.pixn32: dlslave.c) 187200( 0.00) 99.99 0.00 124800 1600 next(./libc.so.1.pixn32: /xlv1/bonsai-sep09/work/irix/lib/libc/libc_n32_M3/math/drand48.c) 101504( 0.00) 99.99 0.00 58124 1 init2da(generic.pixie:generic.c) 91200( 0.00) 99.99 0.00 62400 1600 _drand48(./libc.so.1.pixn32: /xlv1/bonsai-sep09/work/irix/lib/libc/libc_n32_M3/math/drand48.c) 78574( 0.00) 99.99 0.00 30063 628 __sinf(./libm.so.pixn32: ../../libm/fsin.c) 64442( 0.00) 99.99 0.00 45661 48 _doprnt(./libc.so.1.pixn32: /xlv1/bonsai-sep09/work/irix/lib/libc/libc_n32_M3/print/doprnt.c) 57888( 0.00) 100.00 0.00 9648 16 offtime(./libc.so.1.pixn32: /xlv1/bonsai-sep09/work/irix/lib/libc/libc_n32_M3/gen/time_comm.c) 43767( 0.00) 100.00 0.00 29215 263 fread(./libc.so.1.pixn32: /xlv1/bonsai-sep09/work/irix/lib/libc/libc_n32_M3/stdio/fread.c) 16484( 0.00) 100.00 0.00 12285 299 _readdir(./libc.so.1.pixn32: /xlv1/bonsai-sep09/work/irix/lib/libc/libc_n32_M3/gen/readdir.c) 12376( 0.00) 100.00 0.00 7224 281 memcpy(./libc.so.1.pixn32: /xlv1/bonsai-sep09/work/irix/lib/libc/libc_n32_M3/strings/bcopy.s) 10526( 0.00) 100.00 0.00 6321 1 dirstat(generic.pixie:generic.c) 9545( 0.00) 100.00 0.00 6103 1 iofile(generic.pixie:generic.c) 6258( 0.00) 100.00 0.00 5066 298 _stat(./libc.so.1.pixn32: /xlv1/bonsai-sep09/work/irix/lib/libc/libc_n32_M3/sys/stat.c) ----------------------- disassembly listing ----------------------- *DF*_generic.MIPS.stubs_1 *DF*_generic.MIPS.stubs_1: <0x10001a90-0x10001db4> 154 total cycles(0.00%) invoked 0 times, average ? cycles/invocation [1] 0x10001a90 0x0006000d break 0x6 # 1 ^--- 0 total cycles(0.00%) executed 0 times, average 1 cycles.---^ [1] 0x10001a94 0x8f998050 lw t9,-32688(gp) # 1 [1] 0x10001a98 0x03e07825 move t7,ra # 2 <1 cycle stall for following instruction> [1] 0x10001a9c 0x0320f809 jalr ra,t9 # 4 [1] 0x10001aa0 0x34180029 ori t8,zero,0x29 # 5 <2 cycle stall for following instruction> ^--- 7 total cycles(0.00%) executed 1 times, average 7 cycles.---^ [1] 0x10001aa4 0x8f998050 lw t9,-32688(gp) # 1 [1] 0x10001aa8 0x03e07825 move t7,ra # 2 <1 cycle stall for following instruction> [1] 0x10001aac 0x0320f809 jalr ra,t9 # 4 [1] 0x10001ab0 0x3418002a ori t8,zero,0x2a # 5 ... generic.c main: <0x10001ecc-0x10002000> 44 total cycles(0.00%) invoked 1 times, average 44 cycles/invocation File `generic.c': Skipping source listing to line 87 88: void sproctestgrandchild(void *); /* sproc grandchild code */ 89: 90: static struct timeval starttime; /* starting time--first timestamp */ 91: static struct timeval ttime; /* last-recorded timestamp */ 92: static struct timeval deltatime; 93: 94: int pagesize; 95: 96: main(unsigned argc, char **argv) 97: { [97] 0x10001ecc 0x27bdffd0 addiu sp,sp,-48 # 1 [97] 0x10001ed0 0xffbf0008 sd ra,8(sp) # 2 [97] 0x10001ed4 0xffbc0010 sd gp,16(sp) # 3 [97] 0x10001ed8 0x3c010002 lui at,0x2 # 4 [97] 0x10001edc 0x2421e0cc addiu at,at,-7988 # 5 [97] 0x10001ee0 0x0321e021 addu gp,t9,at # 6 [97] 0x10001ee4 0xafa40024 sw a0,36(sp) # 7 [97] 0x10001ee8 0xafa5002c sw a1,44(sp) # 8 98: int i; 99: 100: /* initialize the timestamp */ 101: (void) gettimeofday(&starttime, NULL); [101] 0x10001eec 0x27848360 addiu a0,gp,-31904 # 9 [101] 0x10001ef0 0x00002825 move a1,zero # 10 [101] 0x10001ef4 0x8f99807c lw t9,-32644(gp) # 11 ^--- 11 total cycles(0.00%) executed 1 times, average 11 cycles.---^ [101] 0x10001ef8 0x0320f809 jalr ra,t9 # 1 [101] 0x10001efc 0000000000 nop # 2 <2 cycle stall for following instruction> ^--- 4 total cycles(0.00%) executed 1 times, average 4 cycles.---^ 102: 103: /* set up to reap any children */ 104: (void) sigset(SIGCHLD, (SIG_PF)reapSig); [104] 0x10001f00 0x24040012 li a0,18 # 1 [104] 0x10001f04 0x8f858144 lw a1,-32444(gp) # 2 [104] 0x10001f08 0x8f998080 lw t9,-32640(gp) # 3 ^--- 3 total cycles(0.00%) executed 1 times, average 3 cycles.---^ [104] 0x10001f0c 0x0320f809 jalr ra,t9 # 1 [104] 0x10001f10 0000000000 nop # 2 <2 cycle stall for following instruction> ^--- 4 total cycles(0.00%) executed 1 times, average 4 cycles.---^ 105: 106: if(argc == 1) { [106] 0x10001f14 0x8fa20024 lw v0,36(sp) # 1 [106] 0x10001f18 0x24030001 li v1,1 # 2 <1 cycle stall for following instruction> [106] 0x10001f1c 0x1443000c bne v0,v1,0x10001f50 # 4 [106] 0x10001f20 0000000000 nop # 5 <2 cycle stall for following instruction> Preceding branch executed 1 times, taken 0 times. ^--- 7 total cycles(0.00%) executed 1 times, average 7 cycles.---^ 107: Scriptstring(DEFAULT_SCRIPT); [107] 0x10001f24 0x8f84805c lw a0,-32676(gp) # 1 <2 cycle stall for following instruction> [107] 0x10001f28 0x24847038 addiu a0,a0,28728 # 4 [107] 0x10001f2c 0x8f99814c lw t9,-32436(gp) # 5 ^--- 5 total cycles(0.00%) executed 1 times, average 5 cycles.---^ [107] 0x10001f30 0x0320f809 jalr ra,t9 # 1 |
When you run prof on the output of an experiment in which you have recorded caliper points, you can use the -calipers option to specify the area of the program for which you want to generate a performance report. For example, if you set just one caliper point in the middle of your program, prof can provide a report from the beginning of the program up to the first caliper point using the following command:
prof -calipers 0 1 |
prof can also provide a report from the caliper point to the end of the program using the following command:
prof -calipers 1 2 |
If you set two caliper points, prof can generate a report from the first to the second caliper point:
prof -calipers 1 2 |
For ideal, usertime, and fpe experiments, the -gprof option to prof can be used to obtain inclusive basic block counting information. prof provides the standard report and then appends the inclusive function counts information to the end of the report. The example below shows partial output from prof, showing just the inclusive function counts report.
With inclusive cycle counting, prof prints a list of functions at the end, which are called but not defined. This list includes functions starting with _rld because rld is not instrumented. It also includes functions from libss; they are instrumented, but their data is normally excluded.
prof fails to list cycles of a procedure in the inclusive listing for the following reasons:
init & fini sections and MIPS stubs are not part of any procedure.
Calls to procedures that don't use a "jump and link" are not recognized as procedure calls.
When global procedures with the same name are executed in different DSOs, only one of them is listed.
These exceptions are listed at the end of the report.
This example shows output from calling prof for a usertime experiment:
------------------------------------------------------------------------------- Profile listing generated Tue May 27 18:18:21 1997 with: prof -gprof generic.usertime.m1019 ------------------------------------------------------------------------------- Total Time (secs) : 41.01 Total Samples : 1367 Stack backtrace failed: 1 Sample interval (ms) : 30 CPU : R4600 FPU : R4600 Clock : 100.0MHz Number of CPUs : 1 ------------------------------------------------------------------------------- index %Samples self descendents total name [1] 99.9% 0.00 40.98 1366 __start [2] 99.9% 0.00 40.98 1366 main [3] 99.9% 0.00 40.98 1366 Scriptstring [4] 96.0% 0.00 39.36 1312 usrtime [5] 96.0% 39.36 0.00 1312 anneal [6] 3.1% 0.00 1.26 42 libdso [7] 3.1% 0.00 1.26 42 dlslave_routine [8] 3.1% 1.26 0.00 42 slaveusrtime [9] 0.8% 0.00 0.33 11 iofile [10] 0.7% 0.00 0.30 10 fread [11] 0.7% 0.30 0.00 10 _read [12] 0.1% 0.00 0.03 1 dirstat [13] 0.1% 0.00 0.03 1 _stat [14] 0.1% 0.03 0.00 1 _xstat [15] 0.1% 0.00 0.03 1 genLog [16] 0.1% 0.00 0.03 1 fprintf [17] 0.1% 0.00 0.03 1 _doprnt [18] 0.1% 0.00 0.03 1 _dowrite [19] 0.1% 0.00 0.03 1 fwrite [20] 0.1% 0.03 0.00 1 _write ------------------------------------------------------------------------------- Gprof Listing ------------------------------------------------------------------------------- caller/total parents index %time self descendents total (self) name callee/descend children ------------------------------------------------------------------------------- [1] 99.9% 0.00 40.98 1366 (0) __start [1] 0.00 40.98 1366/1366 0x10001e9c main [2] ------------------------------------------------------------------------------- 0.00 40.98 1366/1366 0x10001e9c __start [1] [2] 99.9% 0.00 40.98 1366 (0) main [2] 0.00 40.98 1366/1366 0x10001f30 Scriptstring [3] ------------------------------------------------------------------------------- 0.00 40.98 1366/1366 0x10001f30 main [2] [3] 99.9% 0.00 40.98 1366 (0) Scriptstring [3] 0.00 39.36 1312/1366 0x10002378 usrtime [4] 0.00 1.26 42/1366 0x10002378 libdso [6] 0.00 0.33 11/1366 0x10002378 iofile [9] 0.00 0.03 1/1366 0x10002378 dirstat [12] ------------------------------------------------------------------------------- 0.00 39.36 1312/1312 0x10002378 Scriptstring [3] [4] 96.0% 0.00 39.36 1312 (0) usrtime [4] 39.36 0.00 1312/1312 0x100059b8 anneal [5] ------------------------------------------------------------------------------- 39.36 0.00 1312/1312 0x100059b8 usrtime [4] [5] 96.0% 39.36 0.00 1312 (1312) anneal [5] ------------------------------------------------------------------------------- 0.00 1.26 42/42 0x10002378 Scriptstring [3] [6] 3.1% 0.00 1.26 42 (0) libdso [6] 0.00 1.26 42/42 0x10003028 dlslave_routine [7] ------------------------------------------------------------------------------- 0.00 1.26 42/42 0x10003028 libdso [6] [7] 3.1% 0.00 1.26 42 (0) dlslave_routine [7] 1.26 0.00 42/42 0x5ffe0650 slaveusrtime [8] ------------------------------------------------------------------------------- 1.26 0.00 42/42 0x5ffe0650 dlslave_routine [7] [8] 3.1% 1.26 0.00 42 (42) slaveusrtime [8] ------------------------------------------------------------------------------- 0.00 0.33 11/11 0x10002378 Scriptstring [3] [9] 0.8% 0.00 0.33 11 (0) iofile [9] 0.00 0.30 10/11 0x10002ab8 fread [10] 0.00 0.03 1/11 0x10002a5c genLog [15] ------------------------------------------------------------------------------- 0.00 0.30 10/10 0x10002ab8 iofile [9] [10] 0.7% 0.00 0.30 10 (0) fread [10] 0.30 0.00 10/10 0xfad26e0 _read [11] ------------------------------------------------------------------------------- 0.30 0.00 10/10 0xfad26e0 fread [10] [11] 0.7% 0.30 0.00 10 (10) _read [11] ------------------------------------------------------------------------------- 0.00 0.03 1/1 0x10002378 Scriptstring [3] [12] 0.1% 0.00 0.03 1 (0) dirstat [12] 0.00 0.03 1/1 0x10002820 _stat [13] ------------------------------------------------------------------------------- 0.00 0.03 1/1 0x10002820 dirstat [12] [13] 0.1% 0.00 0.03 1 (0) _stat [13] 0.03 0.00 1/1 0xfaf8c10 _xstat [14] ------------------------------------------------------------------------------- 0.03 0.00 1/1 0xfaf8c10 _stat [13] [14] 0.1% 0.03 0.00 1 (1) _xstat [14] ------------------------------------------------------------------------------- 0.00 0.03 1/1 0x10002a5c iofile [9] [15] 0.1% 0.00 0.03 1 (0) genLog [15] 0.00 0.03 1/1 0x10006bc4 fprintf [16] ------------------------------------------------------------------------------- 0.00 0.03 1/1 0x10006bc4 genLog [15] [16] 0.1% 0.00 0.03 1 (0) fprintf [16] 0.00 0.03 1/1 0xfab55ec _doprnt [17] ------------------------------------------------------------------------------- 0.00 0.03 1/1 0xfab55ec fprintf [16] [17] 0.1% 0.00 0.03 1 (0) _doprnt [17] 0.00 0.03 1/1 0xfab215c _dowrite [18] ------------------------------------------------------------------------------- 0.00 0.03 1/1 0xfab215c _doprnt [17] [18] 0.1% 0.00 0.03 1 (0) _dowrite [18] 0.00 0.03 1/1 0xfab1ddc fwrite [19] ------------------------------------------------------------------------------- 0.00 0.03 1/1 0xfab1ddc _dowrite [18] [19] 0.1% 0.00 0.03 1 (0) fwrite [19] 0.03 0.00 1/1 0xfad30f8 _write [20] ------------------------------------------------------------------------------- 0.03 0.00 1/1 0xfad30f8 fwrite [19] [20] 0.1% 0.03 0.00 1 (1) _write [20] |
If you need to generate a report for a machine model that is different from the one on which the experiment was performed, you can use several of the prof options to specify a machine model.
For example, if you record an ideal experiment on an R4000™ processor with a clock frequency of 100 megahertz, but you want to generate a report for an R10000 processor, the prof command would be
prof -r10000 -clock 196 generic.ideal.m4561 |
You can gather data from executables that use the sproc() and sprocsp() system calls, such as those executables generated by POWER Fortran and POWER C. Prepare and run the job using the same method as for uniprocessed executables. For multiprocessed executables, each thread of execution writes its own separate data file. View these data files with prof like any other data files.
The only difference between multiprocessed and regular executables is how the data files are named. The data files are named prog_name.experiment_type.id.
The experiment ID, id, consists of one or two letters (designating the process type) and the process ID number. See Table 7-3 for letter codes and descriptions.This naming convention avoids the potential conflict of multiple threads attempting to write simultaneously to the same file.
Table 7-3. Letter Codes in Experiment ID Numbers
Letter Code | Description |
---|---|
m | Master process created by ssrun |
p | Process created by a call to sproc() |
f | Process created by a call to fork() |
s | Process created by a call to system() |
e | Process created by a call to exec() |
fe | Process created by a call to fork() and exec() |
If you run an ideal experiment, run prof with the -feedback option to generate a feedback file that can be used to arrange procedures more efficiently on the next recompile. You can rearrange procedures using the -fb flag to cc, or using the cord command. For more information, view the cc or cord reference page.
If the target process was blocked for a long time as a result of an instruction, that instruction will show up as having a low or zero CPU time. On the other hand, CPU-intensive instructions will show up as having a high CPU time.
One way to sanity-check inclusive cycle counts is to look at the percentage cycles for __start(). If the value is anything less than 98 -99%, the inclusive report is suspect. Look for other warnings that prof didn't take into account certain procedures.