Targeting IA-32 and Intel 64 Architecture Processors Manually

Using manual processor dispatch, your code can detect the IA-32 or Intel® 64 architecture processor at run time through the cpu_specific and cpu_dispatch keywords, enabling you to write one code path that runs only on the targeted processor, and other code path(s) that are taken on other processors. Manual processor dispatch will not recognize processors based on IA-64 architecture.

Use the __declspec(cpu_specific) and __declspec(cpu_dispatch) syntax in your code to create code specific to a targeted Intel® processor and allow the other code paths to execute correctly on other IA-32 or Intel 64 architecture processors.

Refer to the Programming with Mixed Languages section in Building Applications for information on using these C++ keywords.

The general syntax for these keywords changes a function declaration by using the following arguments:

The following table lists the values for cpuid:

Argument for cpuid

Processors

core_i7_sse4_2

Intel® Core™ i7 processors with Intel® Streaming SIMD Extensions 4.2 (SSE4.2)

core_2_duo_sse4_1

Intel® 45nm Hi-k next generation Intel® Core™ microarchitecture processors with Streaming SIMD Extensions 4 (SSE4) Vectorizing Compiler and Media Accelerators instructions

core_2_duo_ssse3

Intel® Core™2 Duo processors and Intel® Xeon® processors with Intel® Supplemental Streaming SIMD Extensions 3 (SSSE3)

pentium_4_sse3

Intel® Pentium 4 processor with Intel® Streaming SIMD Extensions 3 (Intel® SSE3), Intel® Core™ Duo processors, Intel® Core™ Solo processors

pentium_4

Intel® Intel Pentium 4 processors

pentium_m

Intel® Pentium M processors

pentium_iii

Intel® Pentium III processors

generic

x86 processors not provided by Intel Corporation

The following table lists the syntax for cpuid-list:

Syntax for cpuid-list

cpuid

cpuid-list, cpuid

The attributes are not case sensitive. The body of a function declared with __declspec(cpu_dispatch) must be empty, and is referred to as a stub (an empty-bodied function).

Manual processor dispatch can disable some types of inlining, almost always results in larger code and executable sizes, and can introduce additional performance overhead because of the additional function calls. Test your application on all of the targeted platforms before release. Before using manual dispatch, consider whether the benefits outweigh the additional effort and possible performance issues.

Use the following guidelines to implement processor dispatch support:

The following example demonstrates using manual dispatch with both cpu_specific and cpu_dispatch.

Example

#include <stdio.h>

#include <mmintrin.h>

/* Pentium processor function does not use intrinsics

to add two arrays. */

__declspec(cpu_specific(pentium))

void array_sum1(int *result, int *a, int *b, size_t len)

{

for (; len > 0; len--)

*result++ = *a++ + *b++;

}

/* Implementation for a Pentium processor with MMX technology uses

an MMX instruction intrinsic to add four elements simultaneously. */

__declspec(cpu_specific(pentium_MMX))

void array_sum2(int *result, int const *a, int *b, size_t len)

{

__m64 *mmx_result = (__m64 *)result;

__m64 const *mmx_a = (__m64 const *)a;

__m64 const *mmx_b = (__m64 const *)b;

for (; len > 3; len -= 4)

*mmx_result++ = _mm_add_pi16(*mmx_a++, *mmx_b++);

/* The following code, which takes care of excess elements, is not

needed if the array sizes passed are known to be multiples of four. */

result = (unsigned short *)mmx_result;

a = (unsigned short const *)mmx_a;

b = (unsigned short const *)mmx_b;

for (; len > 0; len--)

*result++ = *a++ + *b++;

}

__declspec(cpu_dispatch(pentium, pentium_MMX))

void array_sum3(int *result, int const *a, int *b, size_t len)

{

/* Empty function body informs the compiler to generate the

CPU-dispatch function listed in the cpu_dispatch clause. */

}