Targeting Multiple IA-32 and Intel(R) 64 Architecture Processors for Run-time Performance

The -ax (Linux* and Mac OS* X) or /Qax (Windows*) option instructs the compiler to determine if opportunities exist to generate multiple, specialized code paths to take advantage of performance gains and features available on newer Intel® processors based on IA-32 and Intel® 64 architectures. This option also instructs the compiler to generate a more generic (baseline) code path that should allow the same application to run on a larger number of processors; however, the baseline code path is usually slower than the specialized code.

The compiler inserts run-time checking code to help determine which version of the code to execute. The size of the compiled binary increases because it contains both a processor-specific version of some of the code and a generic baseline version of all code. Application performance is affected slightly due to the run-time checks needed to determine which code to use. The code path executed depends strictly on the processor detected at run time.

Processor support for the baseline code path is determined by the processor family or instruction set specified in the -m or -x (Linux and Mac OS X) or /arch or /Qx (Windows) option, which has default values for each architecture.

This allows you to impose a more strict processor or instruction set requirement for the baseline code path; however, such generic baseline code will not operate correctly on processors that are not compatible with the minimum processor or instruction set requirement. For the IA-32 architecture, you can specify a baseline code path that will work on all IA-32 compatible processors using the -mia32 (Linux) or /arch:IA32 (Windows) options. You should always specify the processor or instruction set requirements explicitly for the baseline code path, rather than depend on the defaults for the architecture.

Optimizations in the specialized code paths can include generating and using Intel® Streaming SIMD Extensions 4 (SSE4), Supplemental Streaming SIMD Extensions 3 (SSSE3), Streaming SIMD Extensions 3 (SSE3), or Streaming SIMD Extensions 2 (SSE2) instructions for supported Intel processors; however, such specialized code paths are executed only after checking verifies that the code is supported by the run-time host processor.

If not indicated otherwise, the following processor values are valid for IA-32 and Intel® 64 architectures.

Linux OS and Mac OS X

Windows OS

Description

-axSSE4.2

/QaxSSE4.2

Can generate Intel® SSE4 Efficient Accelerated String and Text Processing instructions supported by Intel® Core™ i7 processors. Can generate Intel® SSE4 Vectorizing Compiler and Media Accelerator, Intel® SSSE3, SSE3, SSE2, and SSE instructions and it can optimize for the Intel® Core™ processor family.

-axSSE4.1

/QaxSSE4.1

Can generate Intel® SSE4 Vectorizing Compiler and Media Accelerator instructions for Intel processors. Can generate Intel® SSSE3, SSE3, SSE2, and SSE instructions and it can optimize for Intel® 45nm Hi-k next generation Intel® Core™ microarchitecture. This replaces value S, which is deprecated.

Mac OS X: IA-32 and Intel® 64 architectures.

-axSSSE3

/QaxSSSE3

Can generate Intel® SSSE3, SSE3, SSE2, and SSE instructions for Intel processors and it can optimize for the Intel® Core™2 Duo processor family. This replaces value T, which is deprecated.

Mac OS X: IA-32 architecture.

-axSSE3_ATOM

/QaxSSE3_ATOM

Optimizes for the Intel® Atom™ processor and Intel® Centrino® Atom™ Processor Technology. Can generate MOVBE instructions, depending on the setting of option -minstruction (Linux and Mac OS) or /Qinstruction (Windows).

Mac OS X: Supported on IA-32 architectures.

-axSSE3

/QaxSSE3

Can generate Intel® SSE3, SSE2, and SSE instructions for Intel processors and it can optimize for processors based on Intel® Core™ microarchitecture and Intel NetBurst® microarchitecture. This replaces value P, which is deprecated.

Mac OS X: IA-32 architecture.

-axSSE2

/QaxSSE2

Can generate Intel® SSE2 and SSE instructions for Intel processors, and it can optimize for Intel® Pentium® 4 processors, Intel® Pentium® M processors, and Intel® Xeon® processors with Intel® SSE2.

Linux and Windows: IA-32 architecture.

Note iconNote

You can specify -diag-disable cpu-dispatch (Linux and Mac OS X) or /Qdiag-disable:cpu-dispatch (Windows) to disable the display of remarks about multiple code paths for CPU dispatch.

If your application for IA-32 or Intel® 64 architectures does not need to run on multiple processors , consider using the -x (Linux and Mac OS X) or /Qx (Windows) option instead of this option.

The following compilation examples demonstrate how to generate an IA-32 architecture executable that includes an optimized version for Intel® Core™2 Duo processors, as long as there is a performance gain, an optimized version for Intel® Core™ Duo processors, as long as there is a performance gain, and a generic baseline version that runs on any IA-32 architecture processor.

Note iconNote

If you combine the arguments, you must add a comma (",") separator between the individual arguments.

Operating System

Example

Linux

icpc -axSSSE3,SSE3 -mia32 sample.cpp

Windows

icl /QaxSSSE3,SSE3 /arch:IA32 sample.cpp