Vectorization and Loops

This topic provides more information on the interaction between the auto-vectorizer and loops.

See Programming Guidelines for Vectorization.

In some rare cases, a successful loop parallelization (either automatically or by means of OpenMP* directives) may affect the messages reported by the compiler for a non-vectorizable loop in a non-intuitive way.

Types of Vectorized Loops

For integer loops, the 128-bit Intel® Streaming SIMD Extensions (Intel® SSE) and the Intel® Advanced Vector Extensions (Intel® AVX) provide SIMD instructions for most arithmetic and logical operators on 32-bit, 16-bit, and 8-bit integer data types, with limited support for the 64-bit integer data type.

Vectorization may proceed if the final precision of integer wrap-around arithmetic is preserved. A 32-bit shift-right operator, for instance, is not vectorized in 16-bit mode if the final stored value is a 16-bit integer. Also, note that because the Intel® SSE and the Intel® AVX instruction sets are not fully orthogonal (shifts on byte operands, for instance, are not supported), not all integer operations can actually be vectorized.

For loops that operate on 32-bit single-precision and 64-bit double-precision floating-point numbers, Intel® SSE provides SIMD instructions for the following arithmetic operators:

Additionally, Intel® SSE provide SIMD instructions for the binary MIN and MAX and unary SQRT operators. SIMD versions of several other mathematical operators (like the trigonometric functions SIN, COS, and TAN) are supported in software in a vector mathematical run-time library that is provided with the Intel® oneAPI DPC++/C++ Compiler.

To be vectorizable, loops must be:

Intrinsic math functions are allowed, because the compiler runtime library contains vectorized versions of these functions. See the table below for a list of these functions; most exist in both float and double versions.

acos ceil fabs round
acosh cos floor sin
asin cosh fmax sinh
asinh erf fmin sqrt
atan erfc log tan
atan2 erfinv log10 tanh
atanh exp log2 trunc
cbrt exp2 pow  

Statements in the Loop Body

The vectorizable operations are different for floating-point and integer data.

Integer Array Operations

The statements within the loop body may contain char, unsigned char, short, unsigned short, int, and unsigned int. Calls to functions such as sqrt and fabs are also supported. Arithmetic operations are limited to addition, subtraction, bitwise AND, OR, and XOR operators, division (via run-time library call), multiplication, min, and max. You can mix data types but this may potentially cost you in terms of lowering efficiency. Some example operators where you can mix data types are multiplication, shift, or unary operators.

Other Operations

No statements other than the preceding floating-point and integer operations are allowed. In particular, note that the special __m64__m128, and __m256 data types are not vectorizable. The loop body cannot contain any function calls. Use of Intel® SSE intrinsics ( for example, _mm_add_ps) or Intel® AVX intrinsics (for example, _mm256_add_ps) are not allowed.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201

See Also