gemm_*_pack

Pack the matrix into the buffer allocated previously.

Syntax

call gemm_s8u8s32_pack (identifier, trans, m, n, k, src, ld, dest)

call gemm_s16s16s32_pack (identifier, trans, m, n, k, src, ld, dest)

Include Files

mkl.fi

Description

The gemm_*_pack routine is one of a set of related routines that enable the use of an internal packed storage. Call gemm_*_pack after you allocate a buffer whose size is given by gemm_*_pack_get_size. The gemm_*_pack routine packs the identified matrix into the buffer allocated previously.

The gemm_*_pack routine performs this operation:

dest := op(src) as part of the computation C := alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset for integer types.

C := alpha*op(A) * op(B) + beta*C for bfloat16 type.

where:

op(X) is one of the operations op(X) = X or op(X) = X^T
alpha and beta are scalars,
src is a matrix,
A , A_offset,B, B_offset,c,and C_offset are matrices
op(src) is an m-by-k matrix if identifier = 'A' or 'a',
op(src) is a k-by-n matrix if identifier = 'B' or 'b',
dest is the buffer previously allocated to store the matrix packed into an internal format
A_offset is an m-by-k matrix.
B_offset is an k-by-n matrix.
C_offset is an m-by-n matrix.

Note

For best performance, use the same number of threads for packing and for computing.

If packing for both A and B matrices, you must use the same number of threads for packing A as for packing B.

Input Parameters

identifier

CHARACTER*1.

Specifies which matrix is to be packed:

If identifier = 'A' or 'a', the A matrix is packed.

If identifier = 'B' or 'b', the B matrix is packed.

trans

CHARACTER*1.

Specifies the form of op(src) used in the packing:

If trans = 'N' or 'n' op(src) = src.

If trans = 'T' or 't' op(src) = src^T.

m

INTEGER.

Specifies the number of rows of matrix op(A) and of the matrix C. The value of m must be at least zero.

n

INTEGER.

Specifies the number of columns of matrix op(B) and the number of columns of matrix C. The value of n must be at least zero.

k

INTEGER.

Specifies the number of columns of matrix op(A) and the number of rows of matrix op(B). The value of k must be at least zero.

src

INTEGER*1 for gemm_s8u8s32_pack and INTEGER*2 for gemm_s16s16s32_pack

	trans = 'N' or 'n'	trans = 'T' or 't'
identifier = 'A' or 'a'	Size `ld*k`. Before entry, the leading m-by-k part of the array src must contain the matrix `A`.	Size `ld*m`. Before entry, the leading k-by-m part of the array src must contain the matrix `A`.
identifier = 'B' or 'b'	Size `ld*n`. Before entry, the leading k-by-n part of the array src must contain the matrix `B`.	Size `ld*k`. Before entry, the leading n-by-k part of the array src must contain the matrix `B`.

trans = 'N' or 'n'

trans = 'T' or 't'

identifier = 'A' or 'a'

Size ld*k.

Before entry, the leading m-by-k part of the array src must contain the matrix A.

Size ld*m.

Before entry, the leading k-by-m part of the array src must contain the matrix A.

identifier = 'B' or 'b'

Size ld*n.

Before entry, the leading k-by-n part of the array src must contain the matrix B.

Size ld*k.

Before entry, the leading n-by-k part of the array src must contain the matrix B.

ld

INTEGER. Specifies the leading dimension of src as declared in the calling (sub)program.

	trans = `'N'` or `'n'`	trans = `'T'` or `'t'`
identifier = `'A'` or `'a'`	ld must be at least `max(1, m)`.	ld must be at least `max(1, k)`.
identifier = `'B'` or `'b'`	ld must be at least `max(1, k)`.	ld must be at least `max(1, n)`.

dest

INTEGER*1 for gemm_s8u8s32_pack or INTEGER*2 for gemm_s16s16s32_pack

Buffer for the packed matrix.

Output Parameters

dest	INTEGER1 for `gemm_s8u8s32_pack` or INTEGER2 for `gemm_s16s16s32_pack` Overwritten by the matrix `op(src)`stored in a format internal to Intel® oneAPI Math Kernel Library.

Example

See the following examples in the MKL installation directory to understand the use of these routines:

gemm_s8u8s32_pack: examples\blas\source\gemm_s8u8s32_computex.f

gemm_s16s16s32_pack: examples\blas\source\gemm_s16s16s32_computex.f

gemm_*_pack

Syntax

Include Files

Description

Note

Input Parameters

Output Parameters

Example

Application Notes

See Also