Loop Unrolling

The benefits of loop unrolling are as follows:

Unrolling eliminates branches and some of the code.
Unrolling enables you to aggressively schedule (or pipeline) the loop to hide latencies if you have enough free registers to keep variables live.
For processors based on the IA-32 architectures, the processor can correctly predict the exit branch for an inner loop that has 16 or fewer iterations, if that number of iterations is predictable and there are no conditional branches in the loop. Therefore, if the loop body size is not excessive, and the probable number of iterations is known, unroll inner loops for the processors, until they have a maximum of 16 iterations
A potential limitation is that excessive unrolling, or unrolling of very large loops, can lead to increased code size.

The -unroll[n] (Linux* and Mac OS* X) or /Qunroll:[n] (Windows*) option controls how the Intel® compiler handles loop unrolling.

Refer to Applying Optimization Strategies for more information.

Linux and Mac OS X	Windows	Description
-unroll`n`	/Qunroll:`n`	Specifies the maximum number of times you want to unroll a loop. The following examples unrolls a loop four times: (Linux and Mac OS X) (Windows) `icpc -unroll4 a.cpp` (Linux and Mac OS X) `icl /Qunroll:4 a.cpp` (Windows) Note The compilers for IA-64 architecture recognizes only `n` = 0; any other value is ignored. Omitting a value for `n` lets the compiler decide whether to perform unrolling or not. This is the default; the compiler uses default heuristics or defines `n`. Passing 0 as `n` disables loop unrolling; the following examples disables loop unrolling: (Linux and Mac OS X) (Windows) `icpc -unroll0 a.cpp` (Linux and Mac OS X) `icl /Qunroll:0 a.cpp` (Windows)
-funroll-all-loops	No equivalent	Instructs the compiler to unroll all loops even if the number of iterations is uncertain when the loop is entered.

Linux and Mac OS X

Windows

Description

-unrolln

/Qunroll:n

Specifies the maximum number of times you want to unroll a loop. The following examples unrolls a loop four times:

(Linux and Mac OS X)

(Windows)

icpc -unroll4 a.cpp (Linux and Mac OS X)

icl /Qunroll:4 a.cpp (Windows)

Note

The compilers for IA-64 architecture recognizes only n = 0; any other value is ignored.

Omitting a value for n lets the compiler decide whether to perform unrolling or not. This is the default; the compiler uses default heuristics or defines n.

Passing 0 as n disables loop unrolling; the following examples disables loop unrolling:

(Linux and Mac OS X)

(Windows)

icpc -unroll0 a.cpp (Linux and Mac OS X)

icl /Qunroll:0 a.cpp (Windows)

-funroll-all-loops

No equivalent

Instructs the compiler to unroll all loops even if the number of iterations is uncertain when the loop is entered.