Overview: Intrinsics for Intel® Advanced Vector Extensions Instructions

Intel® Advanced Vector Extensions 2 (Intel® AVX2) extends Intel® Advanced Vector Extensions (Intel® AVX) by promoting most of the 128-bit SIMD integer instructions with 256-bit numeric processing capabilities. The Intel® AVX2 instructions follow the same programming model as the Intel® AVX instructions.

Intel® AVX2 also provides enhanced functionality for broadcast/permute operations on data elements, vector shift instructions with variable-shift count per data element, and instructions to fetch non-contiguous data elements from memory.

Intel(R) AVX2 intrinsics have vector variants that use __m128, __m128i, __m256, and __m256i data types. The prototypes for the Intel® AVX2 intrinsics are available in the immintrin.h file.

The Intel® AVX2 intrinsics are supported on the IA-32 and Intel® 64 architectures built from 32nm process technology. They map directly to the Intel® AVX2 new instructions and other enhanced 128-bit SIMD instructions.

Functional Overview

Intel® AVX2 instructions promote the vast majority of 128-bit integer SIMD instruction sets to operate with 256-bit wide YMM registers. Intel® AVX2 instructions are encoded using the VEX prefix and require the same operating system support as Intel® AVX. Generally, most of the promoted 256-bit vector integer instructions follow the 128-bit lane operation, similar to the promoted 256-bit floating-point SIMD instructions in Intel® AVX.

The Intel® AVX2 instructions may be broadly categorized as follows:

Fused Multiply Add instructions: Intel® AVX2 instructions include instructions to perform fused multiply add operations on packed and scalar floating-point data elements.
Intel® AVX complementary integer instructions: Intel® AVX2 instructions complement the Intel® AVX instructions that are typed for integer operations with a full complement of equivalent instruction set for operating with integer data elements.
BROADCAST and PERMUTE instructions: These instructions provide cross-lane functionality for floating-point and integer operations. In addition, some of the Intel® AVX2 256-bit vector integer instructions promoted from legacy SSE instruction sets also exhibiting cross-lane behavior fall into this category; for example, instructions of the VPMOVZ/VPMOVS family.
SHIFT instructions: Intel® AVX2 vector SHIFT instructions operate with per-element shift count and support data element sizes of 32- and 64-bits.
GATHER instructions: The Intel® AVX2 vector GATHER instructions are used for fetching non-contiguous data elements from memory using vector-index memory addressing. They introduce a new memory addressing form consisting of a base register and multiple indices specified by a vector register (XMM or YMM). Data element sizes of 32- and 64-bits are supported as well as data types for floating-point and integer elements.

Overview: Intrinsics for Intel® Advanced Vector Extensions Instructions

Functional Overview

See Also