Intel® oneAPI Math Kernel Library Developer Reference - C
Pack the matrix into the buffer allocated previously.
void cblas_gemm_s8u8s32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const void *src, const MKL_INT ld, void *dest);
void cblas_gemm_s16s16s32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const MKL_INT16 *src, const MKL_INT ld, MKL_INT16 *dest);
void cblas_gemm_bf16bf16f32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const MKL_BF16 *src, const MKL_INT ld, MKL_BF16 *dest);
The cblas_gemm_*_pack routine is one of a set of related routines that enable the use of an internal packed storage. Call cblas_gemm_*_pack after you allocate a buffer whose size is given by cblas_gemm_*_pack_get_size. The cblas_gemm_*_pack routine packs the identified matrix into the buffer allocated previously.
The cblas_gemm_*_pack routine performs this operation:
dest := op(src) as part of the computation C := alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset for integer types.
C := alpha*op(A) * op(B) + beta*C for bfloat16 type.
where:
You must use the same value of the Layout parameter for the entire sequence of related cblas_gemm_*_pack and cblas_gemm_*_compute calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both A and B matrices, you must use the same number of threads for packing A as for packing B.
Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major(CblasColMajor).
Specifies which matrix is to be packed:
If identifier = CblasAMatrix, the A matrix is packed.
If identifier = CblasBMatrix, the B matrix is packed.
Specifies the form of op(src) used in the packing:
If trans = CblasNoTrans op(src) = src.
If trans = CblasTrans op(src) = srcT.
Specifies the number of rows of matrix op(A) and of the matrix C. The value of m must be at least zero.
Specifies the number of columns of matrix op(B) and the number of columns of matrix C. The value of n must be at least zero.
Specifies the number of columns of matrix op(A) and the number of rows of matrix op(B). The value of k must be at least zero.
MKL_BF16* for cblas_gemm_bf16bf16f32_pack, void* for cblas_gemm_s8u8s32_pack and MKL_INT16* for cblas_gemm_s16s16s32_pack
identifier = CblasAMatrix |
identifier = CblasBMatrix |
|||
---|---|---|---|---|
trans = CblasNoTrans |
trans = CblasTrans |
trans = CblasNoTrans |
trans = CblasTrans |
|
Layout = CblasColMajor |
Size ld*k. Before entry, the leading m-by-k part of the array src must contain the matrix A. For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit signed integer. |
Size ld*m. Before entry, the leading k-by-m part of the array src must contain the matrix A. For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit signed integer. |
Size ld*n. Before entry, the leading k-by-n part of the array src must contain the matrix B. For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit unsigned integer. |
Size ld*k. Before entry, the leading n-by-k part of the array src must contain the matrix B. For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit unsigned integer. |
Layout = CblasRowMajor |
Size ld*m. Before entry, the leading k-by-m part of the array src must contain the matrix A. For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit unsigned integer. |
Size ld*k. Before entry, the leading m-by-k part of the array src must contain the matrix A. For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit unsigned integer. |
Size ld*k. Before entry, the leading n-by-k part of the array src must contain the matrix B. For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit signed integer. |
Size ld*n. Before entry, the leading k-by-n part of the array src must contain the matrix B. For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit signed integer. |
MKL_INTSpecifies the leading dimension of src as declared in the calling (sub)program.
identifier = CblasAMatrix |
identifier = CblasBMatrix |
|||
---|---|---|---|---|
trans = CblasNoTrans |
trans = CblasTrans |
trans = CblasNoTrans |
trans = CblasTrans |
|
Layout = CblasColMajor |
ld must be at least max(1, m). |
ld must be at least max(1, k). |
ld must be at least max(1, k). |
ld must be at least max(1, n). |
Layout = CblasRowMajor |
ld must be at least max(1, k). |
ld must be at least max(1, m). |
ld must be at least max(1, n). |
ld must be at least max(1, k). |
Buffer for the packed matrix.
dest |
MKL_BF16* for cblas_gemm_bf16bf16f32_pack, void* for cblas_gemm_s8u8s32_pack or MKL_INT16* for cblas_gemm_s16s16s32_pack Overwritten by the matrix op(src)stored in a format internal to Intel® oneAPI Math Kernel Library. |
See the following examples in the MKL installation directory to understand the use of these routines:
cblas_gemm_s8u8s32_pack: examples\cblas\source\cblas_gemm_s8u8s32_computex.c
cblas_gemm_s16s16s32_pack: examples\cblas\source\cblas_gemm_s16s16s32_computex.c
cblas_gemm_bf16bf16f32_pack: examples\cblas\source\cblas_gemm_bf16bf16f32_computex.c
When using cblas_gemm_s8u8s32_pack with row-major layout , the data types of A and B must be swapped. That is, you must provide an 8-bit unsigned integer array for matrix A and an 8-bit signed integer array for matrix B .