Profile-Guided Optimization (PGO) Quick Reference

Profile-Guided Optimization consists of three phases (or steps):


  1. Generating instrumented code by compiling with the -prof-gen (Linux* OS and Mac OS* X) or /Qprof-gen (Windows* OS) option when creating the instrumented executable.

  2. Running the instrumented executable, which produces dynamic-information (.dyn) files.

  3. Compiling the application using the profile information using the -prof-use (Linux and Mac OS X) or /Qprof-use (Windows) option.

The figure illustrates the phases and the results of each phase.



See Profile an Application for details about using each phase.

The following table lists the compiler options used in PGO:

Linux* and Mac OS* X

Windows*

Effect

-prof-gen

/Qprof-gen

Instruments a program for profiling to get the execution counts of each basic block. The option is used in phase 1 (instrumenting the code) to instruct the compiler to produce instrumented code for your object files in preparation for instrumented execution. By default, each instrumented execution creates one dynamic-information (dyn) file for each executable and (on Windows OS) one for each DLL invoked by the application. You can specify keywords, such as -prof-gen=default (Linux and Mac OS X) or /Qprof-gen:default (Windows).

The keywords control the amount of source information gathered during phase 2 (run the instrumented executable). The prof-gen keywords are:

  • Specify default (or omit the keyword) to request profiling information for use with the prof-use option and optimization when the instrumented application is run (phase 2).

  • Specify srcpos or globdata to request additional profiling information for the code coverage and test prioritization tools when the instrumented application is run (phase 2). The phase 1 compilation creates an spi file.

  • Specify globdata to request additional profiling information for data ordering optimization when the instrumented application is run (phase 2). The phase 1 compilation creates an spi file.

If you are performing a parallel make, this option will not affect it.

-prof-use

/Qprof-use

Instructs the compiler to produce a profile-optimized executable and merges available dynamic-information (dyn) files into a pgopti.dpi file. This option implicitly invokes the profmerge tool.

The dynamic-information files are produced in phase 2 when you run the instrumented executable.

If you perform multiple executions of the instrumented program to create additional dynamic-information files that are newer than the current summary pgopti.dpi file, this option merges the dynamic-information files again and overwrites the previous pgopti.dpi file (you can set the environment variable PROF_NO_CLOBBER to prevent the previous dpi file from being overwritten).

When you compile with prof-use, all dynamic information and summary information files should be in the same directory (current directory or the directory specified by the prof-dir option). If you need to use certain profmerge options not available with compiler options (such as specifying multiple directories), use the profmerge tool. For example, you can use profmerge to create a new summary dpi file before you compile with the prof-use option to create the optimized application.

You can specify keywords, such as -prof-gen=weighted (Linux and Mac OS X) or /Qprof-gen:weighted (Windows). If you omit the weighted keyword, the merged dynamic-information (dyn) files will be weighted proportionally to the length of time each application execution runs. If you specify the weighted keyword, the profiler applies an equal weighting (regardless of execution times) to the dyn file values to normalize the data counts. This keyword is useful when the execution runs have different time durations and you want them to be treated equally.

When you use prof-use, you can also specify the prof-file option to rename the summary dpi file and the prof-dir option to specify the directory for dynamic-information (dyn) and summary (dpi) files.

Linux:

  • Using this option with -prof-func-groups allows you to control function grouping behavior.

-no-fnsplit

/Qfnsplit-

Disables function splitting. Function splitting is enabled by the prof-use option in phase 3 to improve code locality by splitting routines into different sections: one section to contain the cold or very infrequently executed (cold) code, and one section to contain the rest of the frequently executed (hot) code. You may want to disable function splitting for the following reasons:

  • Improve debugging capability. In the debug symbol table, it is difficult to represent a split routine, that is, a routine with some of its code in the hot code section and some of its code in the cold code section.

  • Account for the cases when the profile data does not represent the actual program behavior, that is, when the routine is actually used frequently rather than infrequently.

This option is supported on IA-32 architecture for Windows OS and on IA-64 architecture for Windows and Linux OS. It is not supported on other platforms (Intel® 64 architecture, Mac OS X, and Linux on IA-32 architecture).

Windows: This option behaves differently on systems based on IA-32 architecture than it does on systems based on IA-64 architecture.

IA-32 architecture, Windows OS:

  • The option completely disables function splitting, placing all the code in one section.

IA-64 architecture, Linux and Windows OS:

  • The option disables the splitting within a routine but enables function grouping, an optimization in which entire routines are placed either in the cold code section or the hot code section. Function grouping does not degrade debugging capability.

-prof-func-groups

/Qprof-func-order

Enables ordering of program routines using profile information when specified with prof-use (phase 3). The instrumented program (phase 1) must have been compiled with the prof-gen option srcpos keyword. Not valid for multi-file compilation with the ipo options.

Mac OS X: Not supported.

IA-64 architecture: Not supported.

For more information, see Using Function Ordering, Function Order Lists, Function Grouping, and Data Ordering Optimizations.

-prof-data-order

/Qprof-data-order

Enables ordering of static program data items based on profiling information when specified with prof-use. The instrumented program (phase 1) must have been compiled with the prof-gen option srcpos keyword. Not valid for multi-file compilation with the ipo options.

Mac OS X: Not supported.

For more information, see Using Function Ordering, Function Order Lists, Function Grouping, and Data Ordering Optimizations.

-prof-src-dir

/Qprof-src-dir

Controls whether full directory information for the source directory path is stored in or read from dynamic-information (dyn) files. When used during phase 1 compilation (prof-gen), this determines whether the full path is added into dyn file created during instrumented application execution. When used during profmerge or phase 3 compilation (prof-use), this determines whether the full path for source file names is used or ignored when reading the dyn or dpi files.

Using the default -prof-src-dir (Linux and Mac OS X) or /Qprof-src-dir (Windows) uses the full directory information and also enables the use of the prof-src-root and prof-src-cwd options.

If you specify -no-prof-src-dir (Linux and Mac OS X) or /Qprof-src-dir (Windows), only the file name (and not the full path) is stored or used. If you do this, all dyn or dpi files must be in the current directory and the prof-src-root and prof-src-cwd options are ignored.

-prof-dir

/Qprof-dir

Specifies the directory in which dynamic information (dyn) files are created in, read from, and stored; otherwise, the dyn files are created in or read from the current directory used during compilation. For example, you can use this option when compiling in phase 1 (prof-gen option) to define where dynamic information files will be created when running the instrumented executable in phase 2. You also can use this option when compiling in phase 3 (prof-use option) to define where the dynamic information files will be read from and a summary file (dpi) created.

-prof-src-root or

-prof-src-cwd

/Qprof-src-root

or

/Qprof-src-cwd

Specifies a directory path prefix for the root directory where the user's application files are stored:

  • To specify the directory prefix root where source files are stored, specify the -prof-src-root (Linux and Mac OS X) or /Qprof-src-root (Windows) option.

  • To use the current working directory, specify the -prof-src-cwd (Linux and Mac OS X) or /Qprof-src-cwd (Windows) option.

This option is ignored if you specify -no-prof-src-dir (Linux and Mac OS X) or /Qprof-src-dir (Windows).

-prof-file

/Qprof-file

Specifies file name for profiling summary file. If this option is not specified, the name of the file containing summary information will be pgopti.dpi.

-prof-gen-sampling

/Qprof-gen-sampling

IA-32 architecture. Prepares application executables for hardware profiling (sampling) and causes the compiler to generate source code mapping information.

Mac OS X: This option is not supported.

-ssp

/Qssp

IA-32 architecture. Enables Software-based Speculative Pre-computation (SSP) optimization.

Mac OS X: This option is not supported.

Refer to Quick Reference Lists for a complete listing of the quick reference topics.