The following sample program shows how to employ internal threading in Intel MKL for FFT computation (see case "a" in “Number of user threads”).
To specify the number of threads inside Intel MKL, use the following settings:
set MKL_NUM_THREADS = 1 for one-threaded mode;
set MKL_NUM_THREADS = 4 for multi-threaded mode.
Note that the configuration parameter DFTI_NUMBER_OF_USER_THREADS must be equal to its default value 1.
#include "mkl_dfti.h" int main () { float x[200][100]; DFTI_DESCRIPTOR_HANDLE fft; MKL_LONG len[2] = {200, 100}; // initialize x DftiCreateDescriptor ( &fft, DFTI_SINGLE, DFTI_REAL, 2, len ); DftiCommitDescriptor ( fft ); DftiComputeForward ( fft, x ); DftiFreeDescriptor ( &fft ); return 0; }
The following Example “Using Parallel Mode with Multiple Descriptors Initialized in a Parallel Region” and Example “Using Parallel Mode with Multiple Descriptors Initialized in One Thread” illustrate a parallel customer program with each descriptor instance used only in a single thread (see cases "b" and "c" in Number of user threads).
Specify the number of threads for Example “Using Parallel Mode with Multiple Descriptors Initialized in a Parallel Region” like this:
set MKL_NUM_THREADS = 1 for Intel MKL to work in the single-threaded mode (recommended);
set OMP_NUM_THREADS = 4 for the customer program to work in the multi-threaded mode.
The configuration parameter DFTI_NUMBER_OF_USER_THREADS must have its default value of 1.
Note that in this example, the program can be transformed to become single-threaded at the customer level but using parallel mode within Intel MKL (case "a"). To achieve this, you need to set the parameter DFTI_NUMBER_OF_TRANSFORMS = 4 and to set the corresponding parameter DFTI_INPUT_DISTANCE = 5000.
C code for the example is as follows:
#include "mkl_dfti.h" #include <omp.h> #define ARRAY_LEN(a) sizeof(a)/sizeof(a[0]) int main () { // 4 OMP threads, each does 2D FFT 50x100 points MKL_Complex8 x[4][50][100]; int nth = ARRAY_LEN(x); MKL_LONG len[2] = {ARRAY_LEN(x[0]), ARRAY_LEN(x[0][0])}; int th; // assume x is initialized and do 2D FFTs #pragma omp parallel for shared(len, x) for (th = 0; th < nth; th++) { DFTI_DESCRIPTOR_HANDLE myFFT; DftiCreateDescriptor (&myFFT, DFTI_SINGLE, DFTI_COMPLEX, 2, len); DftiCommitDescriptor (myFFT); DftiComputeForward (myFFT, x[th]); DftiFreeDescriptor (&myFFT); } return 0; }
Fortran code for the example is as follows:
program fft2d_private_descr_main use mkl_dfti integer nth, len(2) ! 4 OMP threads, each does 2D FFT 50x100 points parameter (nth = 4, len = (/50, 100/)) complex x(len(2)*len(1), nth) type(dfti_descriptor), pointer :: myFFT integer th, myStatus ! assume x is initialized and do 2D FFTs !$OMP PARALLEL DO SHARED(len, x) PRIVATE(myFFT, myStatus) do th = 1, nth myStatus = DftiCreateDescriptor (myFFT, DFTI_SINGLE, DFTI_COMPLEX, 2, len) myStatus = DftiCommitDescriptor (myFFT) myStatus = DftiComputeForward (myFFT, x(:, th)) myStatus = DftiFreeDescriptor (myFFT) end do !$OMP END PARALLEL DO end
Specify the number of threads for Example “Using Parallel Mode with Multiple Descriptors Initialized in One Thread” like this:
set MKL_NUM_THREADS = 1 for Intel MKL to work in the single-threaded mode (obligatory);
set OMP_NUM_THREADS = 4 for the customer program to work in the multi-threaded mode.
The configuration parameter DFTI_NUMBER_OF_USER_THREADS must have the default value of 1.
C code for the example is as follows:
#include "mkl_dfti.h" #include <omp.h> #define ARRAY_LEN(a) sizeof(a)/sizeof(a[0]) int main () { // 4 OMP threads, each does 2D FFT 50x100 points MKL_Complex8 x[4][50][100]; int nth = ARRAY_LEN(x); MKL_LONG len[2] = {ARRAY_LEN(x[0]), ARRAY_LEN(x[0][0])}; DFTI_DESCRIPTOR_HANDLE FFT[ARRAY_LEN(x)]; int th; for (th = 0; th < nth; th++) DftiCreateDescriptor (&FFT[th], DFTI_SINGLE, DFTI_COMPLEX, 2, len); for (th = 0; th < nth; th++) DftiCommitDescriptor (FFT[th]); // assume x is initialized and do 2D FFTs #pragma omp parallel for shared(FFT, x) for (th = 0; th < nth; th++) DftiComputeForward (FFT[th], x[th]); for (th = 0; th < nth; th++) DftiFreeDescriptor (&FFT[th]); return 0; }
Fortran code for the example is as follows:
program fft2d_array_descr_main use mkl_dfti integer nth, len(2) ! 4 OMP threads, each does 2D FFT 50x100 points parameter (nth = 4, len = (/50, 100/)) complex x(len(2)*len(1), nth) type thread_data type(dfti_descriptor), pointer :: FFT end type thread_data type(thread_data) :: workload(nth) integer th, status, myStatus do th = 1, nth status = DftiCreateDescriptor (workload(th)%FFT, DFTI_SINGLE, DFTI_COMPLEX, 2, len) status = DftiCommitDescriptor (workload(th)%FFT) end do ! assume x is initialized and do 2D FFTs !$OMP PARALLEL DO SHARED(len, x, workload) PRIVATE(myStatus) do th = 1, nth myStatus = DftiComputeForward (workload(th)%FFT, x(:, th)) end do !$OMP END PARALLEL DO do th = 1, nth status = DftiFreeDescriptor (workload(th)%FFT) end do end
The following Example “Using Parallel Mode with a Common Descriptor” illustrates a parallel customer program with a common descriptor used in several threads (see case "d" in “Number of user threads”).
In this case, the number of threads, as well as any other configuration parameter, must not be changed after FFT initialization by the DftiCommitDescriptor() function is done.
C code for the example is as follows:
#include "mkl_dfti.h" #include <omp.h> #define ARRAY_LEN(a) sizeof(a)/sizeof(a[0]) int main () { // 4 OMP threads, each does 2D FFT 50x100 points MKL_Complex8 x[4][50][100]; int nth = ARRAY_LEN(x); MKL_LONG len[2] = {ARRAY_LEN(x[0]), ARRAY_LEN(x[0][0])}; DFTI_DESCRIPTOR_HANDLE FFT; int th; DftiCreateDescriptor (&FFT, DFTI_SINGLE, DFTI_COMPLEX, 2, len); DftiSetValue (FFT, DFTI_NUMBER_OF_USER_THREADS, nth); DftiCommitDescriptor (FFT); // assume x is initialized and do 2D FFTs #pragma omp parallel for shared(FFT, x) for (th = 0; th < nth; th++) DftiComputeForward (FFT, x[th]); DftiFreeDescriptor (&FFT); return 0; }
Fortran code for the example is as follows:
program fft2d_shared_descr_main use mkl_dfti integer nth, len(2) ! 4 OMP threads, each does 2D FFT 50x100 points parameter (nth = 4, len = (/50, 100/)) complex x(len(2)*len(1), nth) type(dfti_descriptor), pointer :: FFT integer th, status, myStatus status = DftiCreateDescriptor (FFT, DFTI_SINGLE, DFTI_COMPLEX, 2, len) status = DftiSetValue (FFT, DFTI_NUMBER_OF_USER_THREADS, nth) status = DftiCommitDescriptor (FFT) ! assume x is initialized and do 2D FFTs !$OMP PARALLEL DO SHARED(len, x, FFT) PRIVATE(myStatus) do th = 1, nth myStatus = DftiComputeForward (FFT, x(:, th)) end do !$OMP END PARALLEL DO status = DftiFreeDescriptor (FFT) end