Using manual processor dispatch, your code can detect the IA-32 or Intel® 64 architecture processor at run time through the cpu_specific and cpu_dispatch keywords, enabling you to write one code path that runs only on the targeted processor, and other code path(s) that are taken on other processors. Manual processor dispatch will not recognize processors based on IA-64 architecture.
Use the __declspec(cpu_specific) and __declspec(cpu_dispatch) syntax in your code to create code specific to a targeted Intel® processor and allow the other code paths to execute correctly on other IA-32 or Intel 64 architecture processors.
Refer to the Programming with Mixed Languages section in Building Applications for information on using these C++ keywords.
The general syntax for these keywords changes a function declaration by using the following arguments:
cpu_specific(cpuid)
cpu_dispatch(cpuid-list)
The following table lists the values for cpuid:
Argument for cpuid |
Processors |
---|---|
core_i7_sse4_2 |
Intel® Core™ i7 processors with Intel® Streaming SIMD Extensions 4.2 (SSE4.2) |
core_2_duo_sse4_1 |
Intel® 45nm Hi-k next generation Intel® Core™ microarchitecture processors with Streaming SIMD Extensions 4 (SSE4) Vectorizing Compiler and Media Accelerators instructions |
core_2_duo_ssse3 |
Intel® Core™2 Duo processors and Intel® Xeon® processors with Intel® Supplemental Streaming SIMD Extensions 3 (SSSE3) |
pentium_4_sse3 |
Intel® Pentium 4 processor with Intel® Streaming SIMD Extensions 3 (Intel® SSE3), Intel® Core™ Duo processors, Intel® Core™ Solo processors |
pentium_4 |
Intel® Intel Pentium 4 processors |
pentium_m |
Intel® Pentium M processors |
pentium_iii |
Intel® Pentium III processors |
generic |
x86 processors not provided by Intel Corporation |
The following table lists the syntax for cpuid-list:
Syntax for cpuid-list |
---|
cpuid |
cpuid-list, cpuid |
The attributes are not case sensitive. The body of a function declared with __declspec(cpu_dispatch) must be empty, and is referred to as a stub (an empty-bodied function).
Manual processor dispatch can disable some types of inlining, almost always results in larger code and executable sizes, and can introduce additional performance overhead because of the additional function calls. Test your application on all of the targeted platforms before release. Before using manual dispatch, consider whether the benefits outweigh the additional effort and possible performance issues.
Use the following guidelines to implement processor dispatch support:
A stub for cpu_dispatch must have a cpuid defined in cpu_specific elsewhere if the cpu_dispatch stub for a function f contains the cpuid p, then a cpu_specific definition of f with cpuid p must appear somewhere in the program; otherwise, an unresolved external error is reported.
A cpu_specific function definition need not appear in the same translation unit as the corresponding cpu_dispatch stub, unless the cpu_specific function is declared static. The inline attribute is disabled for all cpu_specific and cpu_dispatch functions.
Have a stub for cpu_specific function if a function f is defined as __declspec(cpu_specific(p)), then a cpu_dispatch stub must also appear for f within the program, and p must be in the cpuid-list of that stub; otherwise, that cpu_specific definition cannot be called nor generate an error condition. (This overrides command line settings when a cpu_dispatch stub is compiled, its body is replaced with code that determines the processor on which the program is running, then dispatches the best cpu_specific implementation available as defined by the cpuid-list.)
A cpu_specific function optimizes to the specified Intel processor regardless of command-line option settings.
The following example demonstrates using manual dispatch with both cpu_specific and cpu_dispatch.
Example |
---|
#include <stdio.h> #include <mmintrin.h> /* Pentium processor function does not use intrinsics to add two arrays. */ __declspec(cpu_specific(pentium)) void array_sum1(int *result, int *a, int *b, size_t len) { for (; len > 0; len--) *result++ = *a++ + *b++; } /* Implementation for a Pentium processor with MMX technology uses an MMX instruction intrinsic to add four elements simultaneously. */ __declspec(cpu_specific(pentium_MMX)) void array_sum2(int *result, int const *a, int *b, size_t len) { __m64 *mmx_result = (__m64 *)result; __m64 const *mmx_a = (__m64 const *)a; __m64 const *mmx_b = (__m64 const *)b; for (; len > 3; len -= 4) *mmx_result++ = _mm_add_pi16(*mmx_a++, *mmx_b++); /* The following code, which takes care of excess elements, is not needed if the array sizes passed are known to be multiples of four. */ result = (unsigned short *)mmx_result; a = (unsigned short const *)mmx_a; b = (unsigned short const *)mmx_b; for (; len > 0; len--) *result++ = *a++ + *b++; } __declspec(cpu_dispatch(pentium, pentium_MMX)) void array_sum3(int *result, int const *a, int *b, size_t len) { /* Empty function body informs the compiler to generate the CPU-dispatch function listed in the cpu_dispatch clause. */ } |