High-Level Optimizations (HLO) Overview

HLO exploits the properties of source code constructs (for example, loops and arrays) in applications developed in high-level programming languages. Within HLO, loop transformation techniques include:

While the default optimization level, -O2 (Linux* OS and Mac OS* X) or /O2 (Windows* OS) option, performs some high-level optimizations (for example, prefetching, complete unrolling, etc.), specifying -O3 (Linux and Mac OS X) or /O3 (Windows) provides the best chance for performing loop transformations to optimize memory accesses; the scope of optimizations enabled by these options is different for IA-32 architecture, Intel® 64, and IA-64 architectures.

Applications for the IA-32 and Intel® 64 architectures

In conjunction with the vectorization options, -ax and -x (Linux and Mac OS X) or /Qax and /Qx (Windows), the -O3 (Linux and Mac OS X) or /O3 (Windows) option causes the compiler to perform more aggressive data dependency analysis than the default -O2 (Linux and Mac OS X) or /O2 (Windows).

Compiler prefetching is disabled in favor of the prefetching support available in the processors.

Applications for the IA-32 and IA-64 architectures

The -O3 (Linux and Mac OS X) or /O3 (Windows) option enables the -O2 (Linux and Mac OS X) or /O2 (Windows) option and adds more aggressive optimizations (like loop transformations); O3 optimizes for maximum speed, but may not improve performance for some programs.

Applications for the IA-64 architecture

The -ivdep-parallel (Linux) or /Qivdep-parallel (Windows) option implies there is no loop-carried dependency in the loop where an ivdep pragma is specified. (This strategy is useful for sparse matrix applications.)

Tune applications for IA-64 architecture by following these general steps:


  1. Compile your program with -O3 (Linux) or /O3 (Windows) and -ipo (Linux) or /Qipo (Windows). Use profile guided optimization whenever possible.

  2. Identify hot spots in your code.

  3. Generate a high-level optimization report.

  4. Check why loops are not software pipelined.

  5. Make the changes indicated by the results of the previous steps.

  6. Repeat these steps until you achieve the desired performance.

General Application Tuning

In general, you can use the following strategies to tune applications for multiple architectures: