High-level Optimization (HLO) Report

High-level Optimization (HLO) performs specific optimizations based on the usefulness and applicability of each optimization. The HLO report can provide information on all relevant areas plus structure splitting and loop-carried scalar replacement, and it can provide information about interchanges not performed for the following reasons:

For example, the report can provide clues to why the compiler was unable to apply loop interchange to a loop nest that might have been considered a candidate for optimization. If the reported problems (bottlenecks) can be removed by changing the source code, the report suggests the possible loop interchanges.

Depending on the operating system, you must specify the following options to enable HLO and generate the reports:

See High-level Optimization Overview for information about enabling HLO.

The following command examples illustrate the general command needed to create HLO report with combined options.

Operating System

Example Command

Linux and Mac OS X

icpc -c -xSSE3 -O3 -opt-report 3 -opt-report-phase=hlo sample.cpp

Windows

icl /c /QxSSE3 /O3 /Qopt-report:3 /Qopt-report-phase:hlo sample.cpp

You can use -opt-report-file (Linux and Mac OS X) or /Qopt-report-file (Windows) to specify an output file to capture the report results. Specifying a file to capture the results can help to reduce the time you spend analyzing the results and can provide a baseline for future testing.

Reading the report results

The report provides information using a specific format. The report format for Windows* is different from the format on Linux* and Mac OS* X. While there are some common elements in the report output, the best way to understand what kinds of advice the report can provide is to show example code and the corresponding report output.

Example 1: This example illustrates the condition where a function call is inside a loop.

Example 1

void bar (int *A, int **B);

int foo (int *A, int **B, int N)

{

int i, j;

for (j=0; j<N; j++) {

for (i=0; i<N; i++) {

B[i][j] += A[j];

bar(A,B);

}

}

return 1;

}

Regardless of the operating system, the reports list optimization results on specific functions by presenting a line above there reported action. The line format and description are included below.

The following table summarizes the common report elements and provides a general description to help interpret the results.

Report Element

Description

String listing information about the function being reported on. The string uses the following format.

<source name>;<start line>;<end line>;<optimization>; <function name>;<element type>

For example, the reports listed below report the following information:

Linux and Mac OS X:

<sample1.c;-1:-1;hlo;foo;0>

Windows:

<sample1.c;-1:-1;hlo;_foo;0>

The compact string contains the following information:

  • <source name>: Name of the source file being examined.

  • <start line>: Indicates the starting line number for the function being examined. A value of -1 means that the report applies to the entire function.

  • <end line>: Indicates the ending line number for the function being examined.

  • <optimization>: Indicates the optimization phase; for this report the indicated phase should be hlo.

  • <function name>: Name of the function being examined.

  • <element type>: Indicates the type of the report element; 0 indicates the element is a comment.

Several report elements grouped together.

QLOOPS 2/2 ENODE LOOPS 2

unknown 0 multi_exit_do 0 do 2

linear_do 2

LINEAR HLO EXPRESSIONS: 17 / 18

Windows only: This section of the report lists the following information:

  • QLOOPS: Indicates the number of well-formed loops found out of the loops discovered.

  • ENODE LOOPS: Indicates number of preferred forms (canonical) of the loops generated by HLO. This indicates the number of loops generated by HLO.

  • unknown: Indicates the number of loops that could not be counted.

  • multi_exit_do: Indicates the countable loops containing multiple exits.

  • do: Indicates the total number of loops with trip counts that can be counted.

  • linear_do: Indicates the number of loops with bounds that can be represented in a linear form.

  • LINEAR HLO EXPRESSIONS: Indicates the number of expressions (first number) in all of the intermediate forms (ENODE) of the expression (second number) that can be represented in a linear form.

The code sample list above will result in a report output similar to the following.

Operating System

Example 1 Report Output

Linux and Mac OS X

<sample1.c;-1:-1;hlo;foo;0>

High Level Optimizer Report (foo)

Block, Unroll, Jam Report:

(loop line numbers, unroll factors and type of transformation)

<sample1.c;7:7;hlo_unroll;foo;0>

Loop at line 7 unrolled with remainder by 2

Windows

<sample1.c;-1:-1;hlo;_foo;0>

High Level Optimizer Report (_foo)

QLOOPS 2/2 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2

LINEAR HLO EXPRESSIONS: 17 / 18

------------------------------------------------------------------------------

<sample1.c;6:6;hlo_linear_trans;_foo;0>

Loop Interchange not done due to: User Function Inside Loop Nest

Advice: Loop Interchange, if possible, might help Loopnest at lines: 6 7

: Suggested Permutation: (1 2 ) --> ( 2 1 )

Example 2: This example illustrates the condition where the loop nesting prohibits interchange.

Example 2

int foo (int *A, int **B, int N)

{

int i, j;

for (j=0; j<N; j++) {

A[j] = i + B[i][1];

for (i=0; i<N; i++) {

B[i][j] += A[j];

}

}

return 1;

}

The code sample listed above will result in a report output similar to the following.

Operating System

Example 2 Report Output

Linux and Mac OS X

<sample2.c;-1:-1;hlo;foo;0>

High Level Optimizer Report (foo)

<sample2.c;7:7;hlo_scalar_replacement;in foo;0>

#of Array Refs Scalar Replaced in foo at line 7=2

#of Array Refs Scalar Replaced in foo at line 7=1

Block, Unroll, Jam Report:

(loop line numbers, unroll factors and type of transformation)

<sample2.c;7:7;hlo_unroll;foo;0>

Loop at line 7 unrolled with remainder by 2

Windows

<sample2.c;-1:-1;hlo;_foo;0>

High Level Optimizer Report (_foo)

QLOOPS 2/2 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2

LINEAR HLO EXPRESSIONS: 22 / 27

------------------------------------------------------------------------------

<sample2.c;7:7;hlo_scalar_replacement;in _foo;0>

#of Array Refs Scalar Replaced in _foo at line 7=1

<sample2.c;5:5;hlo_linear_trans;_foo;0>

Loop Interchange not done due to: Imperfect Loop Nest (Either at Source or due

to other Compiler Transformations)

Advice: Loop Interchange, if possible, might help Loopnest at lines: 5 7

: Suggested Permutation: (1 2 ) --> ( 2 1 )

Example 3: This example illustrates the condition where data dependence prohibits loop interchange.

Example 3

int foo (int **A, int **B, int **C, int N)

{

int i, j;

for (j=0; j<N; j++) {

for (i=0; i<N; i++) {

A[i][j] = C[i][j] * 2;

B[i][j] += A[i][j] * C[i][j];

}

}

return 1;

}

The code sample listed above will result in a report output similar to the following.

Operating System

Example 3 Report Output

Linux and Mac OS X

<sample3.c;-1:-1;hlo;foo;0>

High Level Optimizer Report (foo)

Block, Unroll, Jam Report:

(loop line numbers, unroll factors and type of transformation)

<sample3.c;6:6;hlo_unroll;foo;0>

Loop at line 6 unrolled with remainder by 2

Windows

<sample3.c;-1:-1;hlo;_foo;0>

High Level Optimizer Report (_foo)

QLOOPS 2/2 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2

LINEAR HLO EXPRESSIONS: 36 / 36

------------------------------------------------------------------------------

<sample3.c;5:5;hlo_linear_trans;_foo;0>

Loop Interchange not done due to: Data Dependencies

Dependencies found between following statements:

[From_Line# -> (Dependency Type) To_Line#]

[7 ->(Anti) 8] [7 ->(Flow) 8] [7 ->(Output) 8]

[7 ->(Flow) 7] [7 ->(Anti) 7] [7 ->(Output) 7]

Advice: Loop Interchange, if possible, might help Loopnest at lines: 5 6

: Suggested Permutation: (1 2 ) --> ( 2 1 )

Example 4: This example illustrates the condition where the loop order was determined to be proper, but loop interchange might offer only marginal relative improvement. To compile this code add the -restrict (Linux and Mac OS X) or /Qrestrict (Windows) option to the other options when generating the report.

Example 4

int foo (int ** restrict A, int ** restrict B, int N)

{

int i, j, value;

for (j=0; j<N; j++) {

for (i=0; i<N; i++) {

A[j][i] += B[i][j];

}

}

value = A[1][1];

return value;

}

The code sample listed above will result in a report output similar to the following.

Operating System

Example 4 Report Output

Linux and Mac OS X

<sample4.c;-1:-1;hlo;foo;0>

High Level Optimizer Report (foo)

Block, Unroll, Jam Report:

(loop line numbers, unroll factors and type of transformation)

<sample4.c;6:6;hlo_unroll;foo;0>

Loop at line 6 unrolled with remainder by 2

Windows

<sample4.c;-1:-1;hlo;_foo;0>

High Level Optimizer Report (_foo)

QLOOPS 2/2 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2

LINEAR HLO EXPRESSIONS: 18 / 18

Example 5: This example illustrates the conditions where the loop nesting was imperfect and the loop order was good, but loop interchange would offer marginal relative improvements. To compile this code add the -restrict (Linux and Mac OS X) or /Qrestrict (Windows) option to the other options when generating the report.

Example

int foo (int ** restrict A, int ** restrict B, int ** restrict C, int N)

{

int i, j, sum;

for (j=0; j<N; j++) {

sum += A[1][1];

for (i=0; i<N; i++) {

sum = B[j][i] + C[i][j];

}

}

return sum;

}

The code sample listed above will result in a report output similar to the following.

Operating System

Example 5 Report Output

Linux and Mac OS X

<sample5.c;-1:-1;hlo;foo;0>

High Level Optimizer Report (foo)

Windows

<sample5.c;-1:-1;hlo;_foo;0>

High Level Optimizer Report (_foo)

QLOOPS 2/2 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2

LINEAR HLO EXPRESSIONS: 16 / 19

------------------------------------------------------------------------------

<sample5.c;5:5;hlo_linear_trans;_foo;0>

Loop Interchange not done due to: Imperfect Loop Nest (Either at Source or due t

o other Compiler Transformations)

Advice: Loop Interchange, if possible, might help Loopnest at lines: 5 7

: Suggested Permutation: (1 2 ) --> ( 2 1 )

Example 6: This example illustrates the condition where perfect and imperfect loop nesting exists; however, the correctly nested loop contains data dependency.

Example

int foo (int ***A, int ***B, int **C, int N)

{

int q, i, j, k;

q = 0;

while ( A[q][0][0] != 0) {

for (j=0; j<N; j++) {

A[j][0][0] = j + B[j][0][0];

for (i=0; i<N; i++) {

for (k=0; k<N; k++) {

B[k][i][j] += A[j][0][0] + C[i][j];

}

}

}

A[q][0][0] = B[0][q][0] + 5;

}

return 1;

}

The code sample listed above will result in a report output similar to the following.

Operating System

Example Report Output

Linux and Mac OS X

<sample6.c;-1:-1;hlo;foo;0>

High Level Optimizer Report (foo)

Block, Unroll, Jam Report:

(loop line numbers, unroll factors and type of transformation)

<sample6.c;9:9;hlo_unroll;foo;0>

Loop at line 9 unrolled with remainder by 2

[root@infodev-test hlo_samples_cpp]#

Windows

<sample6.c;-1:-1;hlo;_foo;0>

High Level Optimizer Report (_foo)

QLOOPS 2/4 ENODE LOOPS 2 unknown 0 multi_exit_do 0 do 2 linear_do 2

LINEAR HLO EXPRESSIONS: 34 / 34

------------------------------------------------------------------------------

<sample6.c;8:8;hlo_linear_trans;_foo;0>

Loop Interchange not done due to: Data Dependencies

Dependencies found between following statements:

[From_Line# -> (Dependency Type) To_Line#]

[10 ->(Flow) 10] [10 ->(Anti) 10] [10 ->(Output) 10]

Advice: Loop Interchange, if possible, might help Loopnest at lines: 8 9

: Suggested Permutation: (1 2 ) --> ( 2 1 )

Changing Code Based on the Report Results

While the HLO report tells you what loop transformations the compiler performed and provides some advice, the omission of a given loop transformation might imply that there are transformations the compiler might attempt. The following list suggests some transformations you might want to apply. (Manual optimization techniques, like manual cache blocking, should be avoided or used only as a last resort.)