Intel® oneAPI Math Kernel Library Developer Reference - C
Computes scalar-matrix-matrix products and adds the results to scalar matrix products for groups of general matrices.
void cblas_sgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array, const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const MKL_INT* n_array, const MKL_INT* k_array, const float* alpha_array, const float **a_array, const MKL_INT* lda_array, const float **b_array, const MKL_INT* ldb_array, const float* beta_array, float **c_array, const MKL_INT* ldc_array, const MKL_INT group_count, const MKL_INT* group_size);
void cblas_dgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array, const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const MKL_INT* n_array, const MKL_INT* k_array, const double* alpha_array, const double **a_array, const MKL_INT* lda_array, const double **b_array, const MKL_INT* ldb_array, const double* beta_array, double **c_array, const MKL_INT* ldc_array, const MKL_INT group_count, const MKL_INT* group_size);
void cblas_cgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array, const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const MKL_INT* n_array, const MKL_INT* k_array, const void *alpha_array, const void **a_array, const MKL_INT* lda_array, const void **b_array, const MKL_INT* ldb_array, const void *beta_array, void **c_array, const MKL_INT* ldc_array, const MKL_INT group_count, const MKL_INT* group_size);
void cblas_zgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array, const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const MKL_INT* n_array, const MKL_INT* k_array, const void *alpha_array, const void **a_array, const MKL_INT* lda_array, const void **b_array, const MKL_INT* ldb_array, const void *beta_array, void **c_array, const MKL_INT* ldc_array, const MKL_INT group_count, const MKL_INT* group_size);
The ?gemm_batch routines perform a series of matrix-matrix operations with general matrices. They are similar to the ?gemm routine counterparts, but the ?gemm_batch routines perform matrix-matrix operations with groups of matrices, processing a number of groups at once. The groups contain matrices with the same parameters.
The operation is defined as
idx = 0 for i = 0..group_count - 1 alpha and beta in alpha_array[i] and beta_array[i] for j = 0..group_size[i] - 1 A, B, and C matrix in a_array[idx], b_array[idx], and c_array[idx] C := alpha*op(A)*op(B) + beta*C, idx = idx + 1 end for end for
where:
op(X) is one of op(X) = X, or op(X) = XT, or op(X) = XH,
alpha and beta are scalar elements of alpha_array and beta_array,
A, B and C are matrices such that for m, n, and k which are elements of m_array, n_array, and k_array:
op(A) is an m-by-k matrix,
op(B) is a k-by-n matrix,
C is an m-by-n matrix.
A, B, and C represent matrices stored at addresses pointed to by a_array, b_array, and c_array, respectively. The number of entries in a_array, b_array, and c_array is total_batch_count = the sum of all of the group_size entries.
See also gemm for a detailed description of multiplication for general matrices and ?gemm3m_batch, BLAS-like extension routines for similar matrix-matrix operations.
Error checking is not performed for oneMKL Windows* single dynamic libraries for the?gemm_batch routines.
Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major (CblasColMajor).
Array of size group_count. For the group i, transai = transa_array[i] specifies the form of op(A) used in the matrix multiplication:
if transai = CblasNoTrans, then op(A) = A;
if transai = CblasTrans, then op(A) = AT;
if transai = CblasConjTrans, then op(A) = AH.
Array of size group_count. For the group i, transbi = transb_array[i] specifies the form of op(Bi) used in the matrix multiplication:
if transbi = CblasNoTrans, then op(B) = B;
if transbi = CblasTrans, then op(B) = BT;
if transbi = CblasConjTrans, then op(B) = BH.
Array of size group_count. For the group i, mi = m_array[i] specifies the number of rows of the matrix op(A) and of the matrix C.
The value of each element of m_array must be at least zero.
Array of size group_count. For the group i, ni = n_array[i] specifies the number of columns of the matrix op(B) and the number of columns of the matrix C.
The value of each element of n_array must be at least zero.
Array of size group_count. For the group i, ki = k_array[i] specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B).
The value of each element of k_array must be at least zero.
Array of size group_count. For the group i, alpha_array[i] specifies the scalar alphai.
Array, size total_batch_count, of pointers to arrays used to store A matrices.
Array of size group_count. For the group i, ldai = lda_array[i] specifies the leading dimension of the array storing matrix A as declared in the calling (sub)program.
transai=CblasNoTrans |
transai=CblasTrans or transai=CblasConjTrans |
|
Layout = CblasColMajor |
ldai must be at least max(1, mi). |
ldai must be at least max(1, ki) |
Layout = CblasRowMajor |
ldai must be at least max(1, ki) |
ldai must be at least max(1, mi). |
Array, size total_batch_count, of pointers to arrays used to store B matrices.
Array of size group_count. For the group i, ldbi = ldb_array[i] specifies the leading dimension of the array storing matrix B as declared in the calling (sub)program.
transbi=CblasNoTrans |
transbi=CblasTrans or transbi=CblasConjTrans |
|
Layout = CblasColMajor |
ldbi must be at least max(1, ki). |
ldbi must be at least max(1, ni). |
Layout = CblasRowMajor |
ldbi must be at least max(1, ni). |
ldbi must be at least max(1, ki). |
Array of size group_count. For the group i, beta_array[i] specifies the scalar betai.
When betai is equal to zero, then C matrices in group i need not be set on input.
Array, size total_batch_count, of pointers to arrays used to store C matrices.
Array of size group_count. For the group i, ldci = ldc_array[i] specifies the leading dimension of all arrays storing matrix C in group i as declared in the calling (sub)program.
When Layout = CblasColMajorldci must be at least max(1, mi).
When Layout = CblasRowMajorldci must be at least max(1, ni).
Specifies the number of groups. Must be at least 0.
Array of size group_count. The element group_size[i] specifies the number of matrices in group i. Each element in group_size must be at least 0.
Output buffer, overwritten by total_batch_count matrix multiply operations of the form alpha*op(A)*op(B) + beta*C.