Overview of Parallelism Method

The three major features of parallel programming supported by the Intel® compiler include:

OpenMP*
Auto-parallelization
Auto-vectorization

Each of these features contributes to application performance depending on the number of processors, target architecture (IA-32, Intel® 64, and IA-64 architectures), and the nature of the application. These features of parallel programming can be combined to contribute to application performance.

Parallelism defined with the OpenMP* API is based on thread-level and task-level parallelism. Parallelism defined with auto-parallelization techniques is based on thread-level parallelism (TLP). Parallelism defined with auto-vectorization techniques is based on instruction-level parallelism (ILP).

Parallel programming can be explicit, that is, defined by a programmer using the OpenMP* API and associate options. Parallel programming can also be implicit, that is, detected automatically by the compiler. Implicit parallelism implements auto-parallelization of outer-most loops and auto-vectorization of innermost loops (or both).

To enhance the compilation of the code with auto-vectorization, users can also add to their program.

Note

Software pipelining (SWP), a technique closely related to auto-vectorization, is available on systems based on IA-64 architecture.

The following table summarizes the different ways in which parallelism can be exploited with the Intel® Compiler.

Intel provides performance libraries that contain highly optimized, extensively threaded routines, including the Intel® Math Kernel Library (Intel® MKL) and the Intel® Integrated Performance Primitives (Intel® IPP).

In addition to these major features supported by the Intel compiler, certain operating systems support application program interface (API) function calls that provide explicit threading controls. For example, Windows* operating systems support API calls such as CreateThread, and multiple operating systems support POSIX* threading APIs. Intel also provides the Intel® Threading Building Blocks (Intel® TBB), a C++ run-time library that helps simplify threading for scalable, multi-core performance.

Parallelism Method	Supported On
Implicit (parallelism generated by the compiler and by user-supplied hints)

Auto-parallelization (Thread-Level Parallelism)	IA-32 architecture, Intel® 64 architecture, IA-64 architecture based multi-processor systems, and multi-core processors Hyper-Threading Technology-enabled systems
Auto-vectorization (Instruction-Level Parallelism)	Pentium®, Pentium with MMX™ Technology, Pentium II, Pentium III, Pentium 4 processors, Intel® Core™ processor, and Intel® Core™ 2 processor.
Explicit (parallelism programmed by the user)
OpenMP* (Thread-Level and Task-Level Parallelism)	IA-32 architecture, Intel® 64 architecture, IA-64 architecture-based multiprocessor systems, and multi-core processors Hyper-Threading Technology-enabled systems

Threading Resources

For general information about threading an existing serial application or design considerations for creating new threaded applications, see Other Resources and the web site http://go-parallel.com.

To display diagnostic messages about the use of global variables, use the Intel C++ Compiler option -diag-enable thread (Linux* and Mac OS* X) or /Qdiag-enable thread (Windows*). For example, when threading an existing serial application, the diagnostic messages can help you identify places where you need to protect access to global variables. For more information about this option, see -diag in Compiler Options.