Math Kernel Library Developer Guide
The following Intel® oneAPI Math Kernel Library function domains are threaded with the OpenMP* technology:
Direct sparse solver.
LAPACK.
For a list of threaded routines, see LAPACK Routines.
Level1 and Level2 BLAS.
For a list of threaded routines, see BLAS Level1 and Level2 Routines.
All Level 3 BLAS and all Sparse BLAS routines except Level 2 Sparse Triangular solvers.
All Vector Mathematics functions (except service functions).
FFT.
For a list of FFT transforms that can be threaded, see Threaded FFT Problems.
Product and Performance Information |
---|
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Notice revision #20201201 |
In this section, ? stands for a precision prefix of each flavor of the respective routine and may have the value of s, d, c, or z.
The following LAPACK routines are threaded with OpenMP*:
A number of other LAPACK routines, which are based on threaded LAPACK or BLAS routines, make effective use of
OpenMP* parallelism:
?gesv, ?posv, ?gels, ?gesvd, ?syev, ?heev, cgegs/zgegs, cgegv/zgegv, cgges/zgges,
cggesx/zggesx, cggev/zggev, cggevx/zggevx, and so on.
In the following list, ? stands for a precision prefix of each flavor of the respective routine and may have the value of s, d, c, or z.
The following routines are threaded with OpenMP*:
The following characteristics of a specific problem determine whether your FFT computation may be threaded with OpenMP*:
Most FFT problems are threaded. In particular, computation of multiple transforms in one call (number of transforms > 1) is threaded. Details of which transforms are threaded follow.
One-dimensional (1D) transforms
1D transforms are threaded in many cases.
1D complex-to-complex (c2c) transforms of size N using interleaved complex data layout are threaded under the following conditions depending on the architecture:
Architecture |
Conditions |
---|---|
Intel® 64 |
N is a power of 2, log2(N) > 9, the transform is double-precision out-of-place, and input/output strides equal 1. |
IA-32 |
N is a power of 2, log2(N) > 13, and the transform is single-precision. |
N is a power of 2, log2(N) > 14, and the transform is double-precision. |
|
Any |
N is composite, log2(N) > 16, and input/output strides equal 1. |
1D complex-to-complex transforms using split-complex layout are not threaded.
Multidimensional transforms
All multidimensional transforms on large-volume data are threaded.