Because inline function expansion does not require that the applications meet the criteria for whole program analysis normally require by IPO, this optimization is one of the primary optimizations used in Interprocedural Optimization (IPO). For function calls that the compiler believes are frequently executed, the Intel® compiler often decides to replace the instructions of the call with code for the function itself.
In the compiler, inline function expansion typically favors relatively small user functions over functions that are relatively large. This optimization improves application performance by performing the following:
Removing the need to set up parameters for a function call
Eliminating the function call branch
Propagating constants
Function inlining can improve execution time by removing the runtime overhead of function calls; however, function inlining can increase code size, code complexity, and compile times. In general, when you instruct the compiler to perform function inlining, the compiler can examine the source code in a much larger context, and the compiler can find more opportunities to apply optimizations.
Specifying -ip (Linux* and Mac OS* X) or /Qip (Windows*), single-file IP, causes the compiler to perform inline function expansion for calls to procedures defined within the current source file; in contrast, specifying -ipo (Linux and Mac OS X) or /Qipo (Windows), multi-file IPO, causes the compiler to perform inline function expansion for calls to procedures defined in other files.
Using the -ip and -ipo (Linux and Mac OS X) or /Qip and /Qipo (Windows) options can in some cases significantly increase compile time and code size.
The compiler attempts to select the routines whose inline expansions will provide the greatest benefit to program performance. The selection is done using the default heuristics. The inlining heuristics used by the compiler differ based on whether or not you use Profile-Guided Optimizations (PGO): -prof-use (Linux and Mac OS X) or /Qprof-use (Windows).
When you use PGO with -ip or -ipo (Linux and Mac OS X) or /Qip or /Qipo (Windows), the compiler uses the following guidelines for applying heuristics:
The default heuristic focuses on the most frequently executed call sites, based on the profile information gathered for the program.
The default heuristic always inlines very small functions that meet the minimum inline criteria.
PGO (Windows)
Combining IPO and PGO produces better results than using IPO alone. PGO produces dynamic profiling information that can usually provide better optimization opportunities than the static profiling information used in IPO.
The compiler uses characteristics of the source code to estimate which function calls are executed most frequently. It applies these estimates to the PGO-based guidelines described above. The estimation of frequency, based on static characteristics of the source, is not always accurate.
Avoid using static profile information when combining PGO and IPO; with static profile information, the compiler can only estimate the application performance for the source files being used. Using dynamically generated profile information allows the compiler to accurately determine the real performance characteristics of the application.