Methods to Optimize Code Size

This section provides some guidance on how to achieve smaller object and smaller executable size when using the optimizing features of Intel compilers.

There are two compiler options that are designed to prioritize code size over performance:

Favors size over speed

This option enables optimizations that do not increase code size; it produces smaller code size than option O2.

Option Os disables some optimizations that may increase code size for a small speed benefit.

Minimizes code size

Compared to option Os, option O1 disables even more optimizations that are generally known to increase code size. Specifying option O1 implies option Os.

As an intermediate step in reducing code size, you can replace option O3 with option O2 before specifying option O1.

Option O1 may improve performance for applications with very large code size, many branches, and execution time not dominated by code within loops.

For more information about compiler options mentioned in this topic, see their full descriptions in the Compiler Reference.

The rest of this topic briefly discusses other methods that may help you further improve code size even when compared to the default behaviors of options Os and O1.

Things to remember:

Some of these methods may already be applied by default when options Os and O1 are specified. All the methods mentioned in this topic can be applied at higher optimization levels.
Some of the options referred to in this topic will not necessarily cause code size reduction, and they may provide varying results (good, bad, or neutral) based on the characteristics of the target code. Still, these are the recommended things to try to see if they cause your binaries to become smaller while maintaining acceptable performance.

Disable or Decrease the Amount of Inlining

Inlining replaces a call to a function with the body of the function. This lets the compiler optimize the code for the inlined function in the context of its caller, usually yielding more specialized and better performing code. This also removes the overhead of calling the function at run-time.

However, replacing a call to a function by the code for that function usually increases code size. The code size increase can be substantial. To eliminate this code size increase, at the cost of the potential performance improvement, inlining can be disabled.

As an alternative to completely disabling inlining, the default amount of inlining can be decreased by using an inline factor less than the default value of 100. It corresponds to scaling the default values of the main inlining parameters by n%.

Options to specify:

Linux* and macOS*: fno-inline

Windows*: Ob0

Disables inlining

Linux* and macOS*: inline-factor=n

Windows*: Qinline-factor:n

ifort only - Reduces inlining and factors the main inlining parameters

Linux* and macOS*:

inline-factor
inline-max-per-compile
inline-max-per-routine
inline-max-size
inline-max-total-size
inline-min-size

Windows*:

Qinline-factor
Qinline-max-per-compile
Qinline-max-per-routine
Qinline-max-size
Qinline-max-total-size
Qinline-min-size

ifort only - Fine tunes the main inlining parameters

Advantages:	Disabling or reducing this optimization can reduce code size.
Disadvantages:	Performance is likely to be sacrificed by disabling or reducing inlining especially for applications with many small functions.

Strip Symbols from Your Binaries

You can specify a compiler option to omit debugging and symbol information from the executable without sacrificing its operability.

Options to specify:

Linux* and macOS*:	Wl, --strip-all
Windows*:	None

Advantages:	This method noticeably reduces the size of the binary.
Disadvantages:	It may be very difficult to debug a stripped application.

Dynamically Link Intel-Provided Libraries

By default, some of the Intel support and performance libraries are linked statically into an executable. As a result, the library codes are linked into every executable being built. This means that codes are duplicated.

It may be more profitable to link them dynamically.

Options to specify:

Linux* and macOS*:	shared-intel
Windows*:	MD or libs:dll Note Option MD affects all libraries, not only the Intel-provided ones.

Advantages:

Performance of the resulting executable is normally not significantly affected.

Library codes that are otherwise linked in statically into every executable will not contribute to the code size of each executable with this option. These codes will be shared between all executables using them, and they will be available independent of those executables.

Disadvantages:

The libraries on which the resulting executable depends must be re-distributed with the executable in order for it to work properly.

When libraries are linked statically, only library content that is actually used is linked into the executable. Dynamic libraries, on the other hand, contain all the library content. Therefore, it may not be beneficial to use this option if you only need to build and/or distribute a single executable.

The executable itself may be much smaller when linked dynamically, compared to a statically linked executable. However, the total size of the executable plus shared libraries or DLLs may be much larger than the size of the statically linked executable.

Disable Inline Expansion of Standard Library or Intrinsic Functions

In some cases, disabling the inline expansion of standard library or intrinsic functions may noticeably improve the size of the produced object or binary.

Options to specify:

Linux* and macOS*:	nolib-inline
Windows*:	None

Disable Passing Arguments in Registers Instead of On the Stack

This content is specific to ifort; it does not apply to ifx.

You can specify an option that causes the compiler to pass arguments in registers rather than on the stack. This can yield faster code.

However, doing this may require the compiler to create an additional entry point for any function that can be called outside the code being compiled.

In many cases, this will lead to an increase in code size. To prevent this increase in code size, you can disable this optimization.

Options to specify:

Linux* and macOS*:	qopt-args-in-regs=none
Windows*:	Qopt-args-in-regs:none

Advantages:	Disabling this optimization can reduce code size.
Disadvantages:	The amount of code size saved may be small when compared to the corresponding performance loss of disabling the optimization.

Additional information:

If you do not specify "none" for option [q or Q]opt-args-in-regs, the default behavior for the option is that parameters are passed in registers when they are passed to routines whose definition is seen in the same compilation unit.
Depending on code characteristics, this option can sometimes increase binary size.

Disable Loop Unrolling

Unrolling a loop increases the size of the loop proportionally to the unroll factor.

Disabling (or limiting) this optimization may help reduce code size at the expense of performance.

Options to specify:

Linux* and macOS*:	unroll=0
Windows*:	Qunroll:0

Advantages:	Code size is reduced.
Disadvantages:	Performance of otherwise unrolled loops may noticeably degrade because this limits other possible loop optimizations.

Additional information:

This option is already the default if you specify option Os or option O1.

Disable Automatic Vectorization

The compiler finds possibilities to use SIMD (SSE/AVX) instructions to improve performance of applications. This optimization is called automatic vectorization.

In most cases, this optimization involves transformation of loops and increases code size, in some cases significantly.

Disabling this optimization may help reduce code size at the expense of performance.

Options to specify:

Linux* and macOS*:	no-vec
Windows*:	Qvec-

Advantages:	Compile-time is also improved significantly.
Disadvantages:	Performance of otherwise vectorized loops may suffer significantly. If you care about the performance of your application, you should use this option selectively to suppress vectorization on everything except performance-critical parts.

Additional information:

Depending on code characteristics, this option can sometimes increase binary size.

Avoid Unnecessary 16-Byte Alignment

This topic only applies to Linux systems on IA-32 architecture.

This method should only be used in certain situations that are well understood. It can potentially cause correctness issues when linking with other objects or libraries that aren't built with this option.

The 32-bit Linux ABI states that stacks need only maintain 4-byte alignment. However, for performance reasons in modern architectures, GCC and ICC maintain an alignment of 16-bytes on the stack. Maintaining 16-byte alignment may require additional instructions to adjust the stack on function entries where no stack adjustment would otherwise be needed. This can impact code size, especially in code that consists of many small routines.

You can specify a compiler option that will revert ICC back to maintaining 4-byte alignment, which can eliminate the need for extra stack adjust instructions in some cases.

Use this option only if one of the following is true:

Your code does not call any other object or library that can be built without this option and, therefore, may rely on the stack being aligned to 16-bytes when called.
Your code is targeted for architectures that do not have or support SSE instructions; therefore, it would never need 16-byte alignment for correctness reasons.

Options to specify:

Linux*:	falign-stack=assume-4-byte
macOS*:	None
Windows*:	None

Advantages:

Code size can be smaller because you do not need extra instructions to maintain 16-byte alignment when not needed.

This method can improve performance in some cases because of this reduction of instructions.

Disadvantages:

This method can cause incompatibility when linked with other objects or libraries that rely on the stack being 16-byte aligned across the calls.

Additional information:

Depending on code characteristics, this option can sometimes increase binary size.

Use Interprocedural Optimization

Using interprocedural optimization (IPO) may reduce code size because it enables dead code elimination and suppresses generation of code for functions always inlined or proven never to be called during execution.

Options to specify:

Linux* and macOS*:	ipo
Windows*:	Qipo

Advantages:	Depending on the code characteristics, this optimization can reduce executable size and improve performance.
Disadvantages:	Binary size can increase depending on code/application.

Note

This method is not recommended if you plan to ship object files as part of a final product.