High-Level Optimization (HLO) Report

High-level Optimization (HLO) performs specific optimizations based on the usefulness and applicability of each optimization. The HLO report can provide information on all relevant areas plus structure splitting and loop-carried scalar replacement, and it can provide information about interchanges not performed for the following reasons:

Function call are inside the loop
Imperfect loop nesting
Reliance on data dependencies; dependencies preventing interchange are also reported.
Original order was proper but it might have been considered inefficient to perform the interchange.

For example, the report can provide clues to why the compiler was unable to apply loop interchange to a loop nest that might have been considered a candidate for optimization. If the reported problems (bottlenecks) can be removed by changing the source code, the report suggests the possible loop interchanges.

Depending on the operating system, you must specify the following options to enable HLO and generate the reports:

Linux* and Mac OS* X: -x, -O2 or -O3, -opt-report 3, -opt-report-phase=hlo
Windows*: /Qx, /O2 or /O3, /Qopt-report:3, /Qopt-report-phase:hlo

See High-level Optimization Overview for information about enabling HLO.

The following command examples illustrate the general command needed to create HLO report with combined options.

Operating System	Example Command
Linux and Mac OS X	`icpc -c -xSSE3 -O3 -opt-report 3 -opt-report-phase=hlo sample.cpp`
Windows	`icl /c /QxSSE3 /O3 /Qopt-report:3 /Qopt-report-phase:hlo sample.cpp`

You can use -opt-report-file (Linux and Mac OS X) or /Qopt-report-file (Windows) to specify an output file to capture the report results. Specifying a file to capture the results can help to reduce the time you spend analyzing the results and can provide a baseline for future testing.

Reading the report results

The report provides information using a specific format. The report format for Windows* is different from the format on Linux* and Mac OS* X. While there are some common elements in the report output, the best way to understand what kinds of advice the report can provide is to show example code and the corresponding report output.

Example 1: This example illustrates the condition where a function call is inside a loop.

Example 1
void bar (int A, int B); int foo (int A, int **B, int N) { int i, j; for (j=0; j<N; j++) { for (i=0; i<N; i++) { B[i][j] += A[j]; bar(A,B); } } return 1; }

Example 1

void bar (int *A, int **B);

int foo (int *A, int **B, int N)

{

int i, j;

for (j=0; j<N; j++) {

for (i=0; i<N; i++) {

B[i][j] += A[j];

bar(A,B);

}

return 1;

}

Regardless of the operating system, the reports list optimization results on specific functions by presenting a line above there reported action. The line format and description are included below.

The following table summarizes the common report elements and provides a general description to help interpret the results.

Report Element	Description
String listing information about the function being reported on. The string uses the following format. `<source name>;<start line>;<end line>;<optimization>; <function name>;<element type>` For example, the reports listed below report the following information: Linux and Mac OS X: <sample1.c;-1:-1;hlo;foo;0> Windows: <sample1.c;-1:-1;hlo;_foo;0>	The compact string contains the following information: <`source name`>: Name of the source file being examined. <`start line`>: Indicates the starting line number for the function being examined. A value of -1 means that the report applies to the entire function. <`end line`>: Indicates the ending line number for the function being examined. <`optimization`>: Indicates the optimization phase; for this report the indicated phase should be hlo. <`function name`>: Name of the function being examined. <`element type`>: Indicates the type of the report element; 0 indicates the element is a comment.
Several report elements grouped together. QLOOPS 2/2 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2 LINEAR HLO EXPRESSIONS: 17 / 18	Windows only: This section of the report lists the following information: QLOOPS: Indicates the number of well-formed loops found out of the loops discovered. ENODE LOOPS: Indicates number of preferred forms (canonical) of the loops generated by HLO. This indicates the number of loops generated by HLO. unknown: Indicates the number of loops that could not be counted. multi_exit_do: Indicates the countable loops containing multiple exits. do: Indicates the total number of loops with trip counts that can be counted. linear_do: Indicates the number of loops with bounds that can be represented in a linear form. LINEAR HLO EXPRESSIONS: Indicates the number of expressions (first number) in all of the intermediate forms (ENODE) of the expression (second number) that can be represented in a linear form.

Report Element

Description

String listing information about the function being reported on. The string uses the following format.

<source name>;<start line>;<end line>;<optimization>; <function name>;<element type>

For example, the reports listed below report the following information:

Linux and Mac OS X:

<sample1.c;-1:-1;hlo;foo;0>

Windows:

<sample1.c;-1:-1;hlo;_foo;0>

The compact string contains the following information:

<source name>: Name of the source file being examined.
<start line>: Indicates the starting line number for the function being examined. A value of -1 means that the report applies to the entire function.
<end line>: Indicates the ending line number for the function being examined.
<optimization>: Indicates the optimization phase; for this report the indicated phase should be hlo.
<function name>: Name of the function being examined.
<element type>: Indicates the type of the report element; 0 indicates the element is a comment.

Several report elements grouped together.

QLOOPS 2/2 ENODE LOOPS 2

unknown 0 multi_exit_do 0 do 2

linear_do 2

LINEAR HLO EXPRESSIONS: 17 / 18

Windows only: This section of the report lists the following information:

QLOOPS: Indicates the number of well-formed loops found out of the loops discovered.
ENODE LOOPS: Indicates number of preferred forms (canonical) of the loops generated by HLO. This indicates the number of loops generated by HLO.
unknown: Indicates the number of loops that could not be counted.
multi_exit_do: Indicates the countable loops containing multiple exits.
do: Indicates the total number of loops with trip counts that can be counted.
linear_do: Indicates the number of loops with bounds that can be represented in a linear form.
LINEAR HLO EXPRESSIONS: Indicates the number of expressions (first number) in all of the intermediate forms (ENODE) of the expression (second number) that can be represented in a linear form.

The code sample list above will result in a report output similar to the following.

Operating System	Example 1 Report Output
Linux and Mac OS X	<sample1.c;-1:-1;hlo;foo;0> High Level Optimizer Report (foo) Block, Unroll, Jam Report: (loop line numbers, unroll factors and type of transformation) <sample1.c;7:7;hlo_unroll;foo;0> Loop at line 7 unrolled with remainder by 2
Windows	<sample1.c;-1:-1;hlo;_foo;0> High Level Optimizer Report (_foo) QLOOPS 2/2 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2 LINEAR HLO EXPRESSIONS: 17 / 18 ------------------------------------------------------------------------------ <sample1.c;6:6;hlo_linear_trans;_foo;0> Loop Interchange not done due to: User Function Inside Loop Nest Advice: Loop Interchange, if possible, might help Loopnest at lines: 6 7 : Suggested Permutation: (1 2 ) --> ( 2 1 )

Operating System

Example 1 Report Output

Linux and Mac OS X

<sample1.c;-1:-1;hlo;foo;0>

High Level Optimizer Report (foo)

Block, Unroll, Jam Report:

(loop line numbers, unroll factors and type of transformation)

<sample1.c;7:7;hlo_unroll;foo;0>

Loop at line 7 unrolled with remainder by 2

Windows

<sample1.c;-1:-1;hlo;_foo;0>

High Level Optimizer Report (_foo)

QLOOPS 2/2 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2

LINEAR HLO EXPRESSIONS: 17 / 18

------------------------------------------------------------------------------

<sample1.c;6:6;hlo_linear_trans;_foo;0>

Loop Interchange not done due to: User Function Inside Loop Nest

Advice: Loop Interchange, if possible, might help Loopnest at lines: 6 7

: Suggested Permutation: (1 2 ) --> ( 2 1 )

Example 2: This example illustrates the condition where the loop nesting prohibits interchange.

Example 2
int foo (int A, int *B, int N) { int i, j; for (j=0; j<N; j++) { A[j] = i + B[i][1]; for (i=0; i<N; i++) { B[i][j] += A[j]; } } return 1; }

Example 2

int foo (int *A, int **B, int N)

{

int i, j;

for (j=0; j<N; j++) {

A[j] = i + B[i][1];

for (i=0; i<N; i++) {

B[i][j] += A[j];

}

return 1;

}

The code sample listed above will result in a report output similar to the following.

Operating System	Example 2 Report Output
Linux and Mac OS X	<sample2.c;-1:-1;hlo;foo;0> High Level Optimizer Report (foo) <sample2.c;7:7;hlo_scalar_replacement;in foo;0> #of Array Refs Scalar Replaced in foo at line 7=2 #of Array Refs Scalar Replaced in foo at line 7=1 Block, Unroll, Jam Report: (loop line numbers, unroll factors and type of transformation) <sample2.c;7:7;hlo_unroll;foo;0> Loop at line 7 unrolled with remainder by 2
Windows	<sample2.c;-1:-1;hlo;_foo;0> High Level Optimizer Report (_foo) QLOOPS 2/2 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2 LINEAR HLO EXPRESSIONS: 22 / 27 ------------------------------------------------------------------------------ <sample2.c;7:7;hlo_scalar_replacement;in _foo;0> #of Array Refs Scalar Replaced in _foo at line 7=1 <sample2.c;5:5;hlo_linear_trans;_foo;0> Loop Interchange not done due to: Imperfect Loop Nest (Either at Source or due to other Compiler Transformations) Advice: Loop Interchange, if possible, might help Loopnest at lines: 5 7 : Suggested Permutation: (1 2 ) --> ( 2 1 )

Operating System

Example 2 Report Output

Linux and Mac OS X

<sample2.c;-1:-1;hlo;foo;0>

High Level Optimizer Report (foo)

<sample2.c;7:7;hlo_scalar_replacement;in foo;0>

#of Array Refs Scalar Replaced in foo at line 7=2

#of Array Refs Scalar Replaced in foo at line 7=1

Block, Unroll, Jam Report:

(loop line numbers, unroll factors and type of transformation)

<sample2.c;7:7;hlo_unroll;foo;0>

Loop at line 7 unrolled with remainder by 2

Windows

<sample2.c;-1:-1;hlo;_foo;0>

High Level Optimizer Report (_foo)

QLOOPS 2/2 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2

LINEAR HLO EXPRESSIONS: 22 / 27

------------------------------------------------------------------------------

<sample2.c;7:7;hlo_scalar_replacement;in _foo;0>

#of Array Refs Scalar Replaced in _foo at line 7=1

<sample2.c;5:5;hlo_linear_trans;_foo;0>

Loop Interchange not done due to: Imperfect Loop Nest (Either at Source or due

to other Compiler Transformations)

Advice: Loop Interchange, if possible, might help Loopnest at lines: 5 7

: Suggested Permutation: (1 2 ) --> ( 2 1 )

Example 3: This example illustrates the condition where data dependence prohibits loop interchange.

Example 3
int foo (int A, int B, int *C, int N) { int i, j; for (j=0; j<N; j++) { for (i=0; i<N; i++) { A[i][j] = C[i][j] 2; B[i][j] += A[i][j] * C[i][j]; } } return 1; }

Example 3

int foo (int **A, int **B, int **C, int N)

{

int i, j;

for (j=0; j<N; j++) {

for (i=0; i<N; i++) {

A[i][j] = C[i][j] * 2;

B[i][j] += A[i][j] * C[i][j];

}

return 1;

}

The code sample listed above will result in a report output similar to the following.

Operating System	Example 3 Report Output
Linux and Mac OS X	<sample3.c;-1:-1;hlo;foo;0> High Level Optimizer Report (foo) Block, Unroll, Jam Report: (loop line numbers, unroll factors and type of transformation) <sample3.c;6:6;hlo_unroll;foo;0> Loop at line 6 unrolled with remainder by 2
Windows	<sample3.c;-1:-1;hlo;_foo;0> High Level Optimizer Report (_foo) QLOOPS 2/2 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2 LINEAR HLO EXPRESSIONS: 36 / 36 ------------------------------------------------------------------------------ <sample3.c;5:5;hlo_linear_trans;_foo;0> Loop Interchange not done due to: Data Dependencies Dependencies found between following statements: [From_Line# -> (Dependency Type) To_Line#] [7 ->(Anti) 8] [7 ->(Flow) 8] [7 ->(Output) 8] [7 ->(Flow) 7] [7 ->(Anti) 7] [7 ->(Output) 7] Advice: Loop Interchange, if possible, might help Loopnest at lines: 5 6 : Suggested Permutation: (1 2 ) --> ( 2 1 )

Operating System

Example 3 Report Output

Linux and Mac OS X

<sample3.c;-1:-1;hlo;foo;0>

High Level Optimizer Report (foo)

Block, Unroll, Jam Report:

(loop line numbers, unroll factors and type of transformation)

<sample3.c;6:6;hlo_unroll;foo;0>

Loop at line 6 unrolled with remainder by 2

Windows

<sample3.c;-1:-1;hlo;_foo;0>

High Level Optimizer Report (_foo)

QLOOPS 2/2 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2

LINEAR HLO EXPRESSIONS: 36 / 36

------------------------------------------------------------------------------

<sample3.c;5:5;hlo_linear_trans;_foo;0>

Loop Interchange not done due to: Data Dependencies

Dependencies found between following statements:

[From_Line# -> (Dependency Type) To_Line#]

[7 ->(Anti) 8] [7 ->(Flow) 8] [7 ->(Output) 8]

[7 ->(Flow) 7] [7 ->(Anti) 7] [7 ->(Output) 7]

Advice: Loop Interchange, if possible, might help Loopnest at lines: 5 6

: Suggested Permutation: (1 2 ) --> ( 2 1 )

Example 4: This example illustrates the condition where the loop order was determined to be proper, but loop interchange might offer only marginal relative improvement. To compile this code add the -restrict (Linux and Mac OS X) or /Qrestrict (Windows) option to the other options when generating the report.

Example 4
int foo (int restrict A, int restrict B, int N) { int i, j, value; for (j=0; j<N; j++) { for (i=0; i<N; i++) { A[j][i] += B[i][j]; } } value = A[1][1]; return value; }

Example 4

int foo (int ** restrict A, int ** restrict B, int N)

{

int i, j, value;

for (j=0; j<N; j++) {

for (i=0; i<N; i++) {

A[j][i] += B[i][j];

}

value = A[1][1];

return value;

}

The code sample listed above will result in a report output similar to the following.

Operating System	Example 4 Report Output
Linux and Mac OS X	<sample4.c;-1:-1;hlo;foo;0> High Level Optimizer Report (foo) Block, Unroll, Jam Report: (loop line numbers, unroll factors and type of transformation) <sample4.c;6:6;hlo_unroll;foo;0> Loop at line 6 unrolled with remainder by 2
Windows	<sample4.c;-1:-1;hlo;_foo;0> High Level Optimizer Report (_foo) QLOOPS 2/2 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2 LINEAR HLO EXPRESSIONS: 18 / 18

Operating System

Example 4 Report Output

Linux and Mac OS X

<sample4.c;-1:-1;hlo;foo;0>

High Level Optimizer Report (foo)

Block, Unroll, Jam Report:

(loop line numbers, unroll factors and type of transformation)

<sample4.c;6:6;hlo_unroll;foo;0>

Loop at line 6 unrolled with remainder by 2

Windows

<sample4.c;-1:-1;hlo;_foo;0>

High Level Optimizer Report (_foo)

QLOOPS 2/2 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2

LINEAR HLO EXPRESSIONS: 18 / 18

Example 5: This example illustrates the conditions where the loop nesting was imperfect and the loop order was good, but loop interchange would offer marginal relative improvements. To compile this code add the -restrict (Linux and Mac OS X) or /Qrestrict (Windows) option to the other options when generating the report.

Example
int foo (int restrict A, int restrict B, int ** restrict C, int N) { int i, j, sum; for (j=0; j<N; j++) { sum += A[1][1]; for (i=0; i<N; i++) { sum = B[j][i] + C[i][j]; } } return sum; }

Example

int foo (int ** restrict A, int ** restrict B, int ** restrict C, int N)

{

int i, j, sum;

for (j=0; j<N; j++) {

sum += A[1][1];

for (i=0; i<N; i++) {

sum = B[j][i] + C[i][j];

}

return sum;

}

The code sample listed above will result in a report output similar to the following.

Operating System	Example 5 Report Output
Linux and Mac OS X	<sample5.c;-1:-1;hlo;foo;0> High Level Optimizer Report (foo)
Windows	<sample5.c;-1:-1;hlo;_foo;0> High Level Optimizer Report (_foo) QLOOPS 2/2 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2 LINEAR HLO EXPRESSIONS: 16 / 19 ------------------------------------------------------------------------------ <sample5.c;5:5;hlo_linear_trans;_foo;0> Loop Interchange not done due to: Imperfect Loop Nest (Either at Source or due t o other Compiler Transformations) Advice: Loop Interchange, if possible, might help Loopnest at lines: 5 7 : Suggested Permutation: (1 2 ) --> ( 2 1 )

Operating System

Example 5 Report Output

Linux and Mac OS X

<sample5.c;-1:-1;hlo;foo;0>

High Level Optimizer Report (foo)

Windows

<sample5.c;-1:-1;hlo;_foo;0>

High Level Optimizer Report (_foo)

QLOOPS 2/2 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2

LINEAR HLO EXPRESSIONS: 16 / 19

------------------------------------------------------------------------------

<sample5.c;5:5;hlo_linear_trans;_foo;0>

Loop Interchange not done due to: Imperfect Loop Nest (Either at Source or due t

o other Compiler Transformations)

Advice: Loop Interchange, if possible, might help Loopnest at lines: 5 7

: Suggested Permutation: (1 2 ) --> ( 2 1 )

Example 6: This example illustrates the condition where perfect and imperfect loop nesting exists; however, the correctly nested loop contains data dependency.

Example
int foo (int *A, int B, int *C, int N) { int q, i, j, k; q = 0; while ( A[q][0][0] != 0) { for (j=0; j<N; j++) { A[j][0][0] = j + B[j][0][0]; for (i=0; i<N; i++) { for (k=0; k<N; k++) { B[k][i][j] += A[j][0][0] + C[i][j]; } } } A[q][0][0] = B[0][q][0] + 5; } return 1; }

Example

int foo (int ***A, int ***B, int **C, int N)

{

int q, i, j, k;

q = 0;

while ( A[q][0][0] != 0) {

for (j=0; j<N; j++) {

A[j][0][0] = j + B[j][0][0];

for (i=0; i<N; i++) {

for (k=0; k<N; k++) {

B[k][i][j] += A[j][0][0] + C[i][j];

}

A[q][0][0] = B[0][q][0] + 5;

}

return 1;

}

The code sample listed above will result in a report output similar to the following.

Operating System	Example Report Output
Linux and Mac OS X	<sample6.c;-1:-1;hlo;foo;0> High Level Optimizer Report (foo) Block, Unroll, Jam Report: (loop line numbers, unroll factors and type of transformation) <sample6.c;9:9;hlo_unroll;foo;0> Loop at line 9 unrolled with remainder by 2 [root@infodev-test hlo_samples_cpp]#
Windows	<sample6.c;-1:-1;hlo;_foo;0> High Level Optimizer Report (_foo) QLOOPS 2/4 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2 LINEAR HLO EXPRESSIONS: 34 / 34 ------------------------------------------------------------------------------ <sample6.c;8:8;hlo_linear_trans;_foo;0> Loop Interchange not done due to: Data Dependencies Dependencies found between following statements: [From_Line# -> (Dependency Type) To_Line#] [10 ->(Flow) 10] [10 ->(Anti) 10] [10 ->(Output) 10] Advice: Loop Interchange, if possible, might help Loopnest at lines: 8 9 : Suggested Permutation: (1 2 ) --> ( 2 1 )

Operating System

Example Report Output

Linux and Mac OS X

<sample6.c;-1:-1;hlo;foo;0>

High Level Optimizer Report (foo)

Block, Unroll, Jam Report:

(loop line numbers, unroll factors and type of transformation)

<sample6.c;9:9;hlo_unroll;foo;0>

Loop at line 9 unrolled with remainder by 2

[root@infodev-test hlo_samples_cpp]#

Windows

<sample6.c;-1:-1;hlo;_foo;0>

High Level Optimizer Report (_foo)

QLOOPS 2/4 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2

LINEAR HLO EXPRESSIONS: 34 / 34

------------------------------------------------------------------------------

<sample6.c;8:8;hlo_linear_trans;_foo;0>

Loop Interchange not done due to: Data Dependencies

Dependencies found between following statements:

[From_Line# -> (Dependency Type) To_Line#]

[10 ->(Flow) 10] [10 ->(Anti) 10] [10 ->(Output) 10]

Advice: Loop Interchange, if possible, might help Loopnest at lines: 8 9

: Suggested Permutation: (1 2 ) --> ( 2 1 )

Changing Code Based on the Report Results

While the HLO report tells you what loop transformations the compiler performed and provides some advice, the omission of a given loop transformation might imply that there are transformations the compiler might attempt. The following list suggests some transformations you might want to apply. (Manual optimization techniques, like manual cache blocking, should be avoided or used only as a last resort.)

Loop Interchanging - Swap the execution order of two nested loops to gain a cache locality or unit-stride access performance advantage.
Distributing - Distribute or split up one large loop into two smaller loops. This strategy might provide an advantage when too many registers are being consumed in a large loop.
Fusing - Fuse two smaller loops with the same trip count together to improve data locality.
Loop Blocking - Use cache blocking to arrange a loop so it will perform as many computations as possible on data already residing in cache. (The next block of data is not read into cache until computations using the first block are finished.)
Unrolling - Unrolling is a way of partially disassembling a loop structure so that fewer numbers of iterations of the loop are required; however, each resulting loop iteration is larger. Unrolling can be used to hide instruction and data latencies, to take advantage of floating point loadpair instructions, and to increase the ratio of real work done per memory operation.
Prefetching - Request the compiler to bring data in from relatively slow memory to a faster cache several loop iterations ahead of when the data is actually needed.
Load Pairing - Use an instruction to bring two floating point data elements in from memory in a single step.

High-level Optimization (HLO) Report

Reading the report results

Changing Code Based on the Report Results