dgmm_batch

Computes a group of (diagonal matrix-matrix product (dgmm) operations.

Description

The dgmm_batch routines perform multiple diagonal matrix-matrix product (dgmm) operations in a single call. The diagonal matrices are stored as dense vectors and the operations are performed with groups of matrices and vectors.

dgmm_batch supports the following precisions:

T

float

double

std::complex<float>

std::complex<double>

dgmm_batch (Buffer Version)

Buffer version of dgmm_batch supports only strided API.

Strided API

Strided API operation is defined as:

for i = 0 … batch_size – 1
    A and C are matrices at offset i * stridea in a, i * stridec in c.
    X is a vector at offset i * stridex in x
    if (left_right == side::left)
        C = diag(X) * A
    else
        C = A * diag(X)
end for

where:

  • A is a matrix

  • X is a diagonal matrix stored as a vector

For strided API, all matrices A and C and vector X have the same parameters (size, increments) and are stored at a constant stride given by stridea, stridec and stridex from each other.

The a and x buffers contain all the input matrices. Total number of matrices in a and x are given by batch_size parameter.

Syntax

namespace oneapi::mkl::blas::column_major {
    void dgmm_batch(sycl::queue &queue,
                    oneapi::mkl::side left_right,
                    std::inte64_t m,
                    std::int64_t n,
                    sycl::buffer<T,1> &a,
                    std::int64_t lda,
                    std::int64_t stridea,
                    sycl::buffer<T,1> &x,
                    std::int64_t incx,
                    std::int64_t stridex,
                    sycl::buffer<T,1> &c,
                    std::int64_t ldc,
                    std::int64_t stridec,
                    std::int64_t batch_size);
}
namespace oneapi::mkl::blas::row_major {
    void dgmm_batch(sycl::queue &queue,
                    oneapi::mkl::side left_right,
                    std::inte64_t m,
                    std::int64_t n,
                    sycl::buffer<T,1> &a,
                    std::int64_t lda,
                    std::int64_t stridea,
                    sycl::buffer<T,1> &x,
                    std::int64_t incx,
                    std::int64_t stridex,
                    sycl::buffer<T,1> &c,
                    std::int64_t ldc,
                    std::int64_t stridec,
                    std::int64_t batch_size);
}

Input Parameters

queue

The queue where the routine should be executed.

left_right

Specifies the position of the diagonal matrix in the product. See Data Types for more details.

m

Number of rows of matrix A and matrix C. Must be at least zero.

n

Number of columns of matrix A and matrix C. Must be at least zero.

a

Buffer holding input matrices A. Size of the buffer must be at least lda * k + stridea * (batch_size - 1) where k is n if column major layout or m if row major layout is used.

lda

Leading dimension of matrices A. Must be at least m if column major layout or n if row major layout is used. Must be positive.

stridea

Stride between two consecutive A matrices. Must be at least zero. See Matrix Storage for more details.

x

Buffer holding input matrices X. Size of the buffer must be at least (1 + (len - 1)*abs(incx)) + stridex * (batch_size - 1) where len is n if the diagonal matrix is on the right of the product or m otherwise.

incx

Stride between two consecutive elements of the X vectors.

stridex

Stride between two consecutive X vectors. Must be at least zero. See Matrix Storage for more details.

c

Buffer holding input/output matrices C. Size of the buffer must be at least batch_size * stridec.

ldc

Leading dimension of matrices C. Must be at least m if column major layout or n if row major layout is used. Must be positive.

stridec

Stride between two consecutive C matrices. Must be at least ldc * n if column major layout or ldc * m if row major layout is used. See Matrix Storage for more details.

batch_size

Number of dgmm computations to perform. Must be at least zero.

Output Parameters

c

Buffer holding output matrices C overwritten by batch_size dgmm operations.

dgmm_batch (USM Version)

USM version of dgmm_batch supports group API and strided API.

Group API

Group API operation is defined as:

idx = 0
for i = 0 … group_count – 1
     for j = 0 … group_size – 1
         A and C are matrices at a[idx] and c[idx]
         X is a vector at x[idx]
         if (left_right[idx] == side::left)
             C = diag(X) * A
         else
             C = A * diag(X)
         idx = idx + 1
     end for
end for

where:

  • A is a matrix

  • X is a diagonal matrix stored as a vector

For group API, each group contain matrices and vectors with the same parameters (size, increment). The a and x arrays contain the pointers for all the input matrices. Total number of matrices in a and x are given by:

total\_batch\_count = \sum_{i=0}^{group\_count-1}group\_size[i]

Syntax

namespace oneapi::mkl::blas::column_major {
    sycl::event dgmm_batch(sycl::queue &queue,
                           oneapi::mkl::side *left_right,
                           std::int64_t *m,
                           std::int64_t *n,
                           const T **a,
                           std::int64_t *lda,
                           const T **x,
                           std::int64_t *incx,
                           T **c,
                           std::int64_t *ldc,
                           std::int64_t group_count,
                           std::int64_t *group_size,
                           const std::vector<sycl::event> &dependencies = {})
}
namespace oneapi::mkl::blas::row_major {
    sycl::event dgmm_batch(sycl::queue &queue,
                           oneapi::mkl::side *left_right,
                           std::int64_t *m,
                           std::int64_t *n,
                           const T **a,
                           std::int64_t *lda,
                           const T **x,
                           std::int64_t *incx,
                           T **c,
                           std::int64_t *ldc,
                           std::int64_t group_count,
                           std::int64_t *group_size,
                           const std::vector<sycl::event> &dependencies = {})
}

Input Parameters

queue

The queue where the routine should be executed.

left_right

Array of group_count parameters. left_right[i] specifies the position of the diagonal matrix in group i. See Data Types for more details.

m

Array of group_count integers. m[i] specifies number of rows of A for every matrix in group i. All entries must be at least zero.

n

Array of group_count integers. n[i] specifies number of columns of A for every matrix in group i. All entries must be at least zero.

a

Array of pointers to input matrices A with size total_batch_count. Size of the array must be at least lda[i] * n[i] if column major layout or at least lda[i] * m[i] if row major layout is used. See Matrix Storage for more details.

lda

Array of group_count integers. lda[i] specifies the leading dimension of A for every matrix in group i. All entries must be positive and at least m[i] if column major layout or at least n[i] if row major layout is used.

x

Array of pointers to input vectors X with size total_batch_count. Size of the array must be at least (1 + len[i] – 1)*abs(incx[i])) where len[i] is n[i] if diagonal matrix is on the right of the product or m[i] otherwise. See Matrix Storage for more details.

incx

Array of group_count integers. incx[i] specifies the stride of X for every vector in group i. All entries must be positive.

c

Array of pointers to input/output matrices C with size total_batch_count. Size of the array must be least ldc[i] * n[i] if column major layout or at least ldc[i] * m[i] if row major layout is used. See Matrix Storage for more details.

ldc

Array of group_count integers. ldc[i] specifies the leading dimension of C for every matrix in group i. All entries must be positive and at least m[i] if column major layout or at least n[i] if row major layout is used.

group_count

Specifies number of groups. Must be at least zero.

group_size

Array of group_count integers. group_size[i] specifies the number of diagonal matrix-matrix product operations in group i. All entries must be at least zero.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters

c

Array of pointers to output matrices C overwritten by total_batch_count dgmm operations.

Return Values

Output event to wait on to ensure computation is complete.

Strided API

Strided API operation is defined as:

for i = 0 … batch_size – 1
    A and C are matrices at offset i * stridea in a, i * stridec in c.
    X is a vector at offset i * stridex in x
    if (left_right == side::left)
        C = diag(X) * A
    else
        C = A * diag(X)
end for

where:

  • A is a matrix

  • X is a diagonal matrix stored as a vector

For strided API, all matrices A and C and vector X have the same parameters (size, increments) and are stored at a constant stride given by stridea, stridec and stridex from each other.

The a and x buffers contain all the input matrices. Total number of matrices in a and x are given by batch_size parameter.

Syntax

namespace oneapi::mkl::blas::column_major {
    sycl::event dgmm_batch(sycl::queue &queue,
                           oneapi::mkl::side left_right,
                           std::inte64_t m,
                           std::int64_t n,
                           const T *a,
                           std::int64_t lda,
                           std::int64_t stridea,
                           const T *x,
                           std::int64_t incx,
                           std::int64_t stridex,
                           T *c,
                           std::int64_t ldc,
                           std::int64_t stridec,
                           std::int64_t batch_size,
                           const std::vector<sycl::event> &dependencies = {})
}
namespace oneapi::mkl::blas::row_major {
    sycl::event dgmm_batch(sycl::queue &queue,
                           oneapi::mkl::side left_right,
                           std::inte64_t m,
                           std::int64_t n,
                           const T *a,
                           std::int64_t lda,
                           std::int64_t stridea,
                           const T *x,
                           std::int64_t incx,
                           std::int64_t stridex,
                           T *c,
                           std::int64_t ldc,
                           std::int64_t stridec,
                           std::int64_t batch_size,
                           const std::vector<sycl::event> &dependencies = {})
}

Input Parameters

queue

The queue where the routine should be executed.

left_right

Specifies the position of the diagonal matrix in the product. See Data Types for more details.

m

Number of rows of matrix A and matrix C. Must be at least zero.

n

Number of columns of matrix A and matrix C. Must be at least zero.

a

Pointer to input matrices A. Size of the array must be at least lda * k + stridea * (batch_size - 1) where k is n if column major layout or m if row major layout is used.

lda

Leading dimension of matrices A. Must be at least m if column major layout or n if row major layout is used. Must be positive.

stridea

Stride between two consecutive A matrices. Must be at least zero. See Matrix Storage for more details.

x

Pointer to input matrices X. Size of the array must be at least (1 + (len - 1)*abs(incx)) + stridex * (batch_size - 1) where len is n if the diagonal matrix is on the right of the product or m otherwise.

incx

Stride between two consecutive elements of the X vectors.

stridex

Stride between two consecutive X vectors. Must be at least zero. See Matrix Storage for more details.

c

Pointer to input/output matrices C. Size of the array must be at least batch_size * stridec.

ldc

Leading dimension of matrices C. Must be at least m if column major layout or n if row major layout is used. Must be positive.

stridec

Stride between two consecutive C matrices. Must be at least ldc * n if column major layout or ldc * m if row major layout is used. See Matrix Storage for more details.

batch_size

Number of dgmm computations to perform. Must be at least zero.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters

c

Pointer to output matrices C overwritten by batch_size dgmm operations.

Return Values

Output event to wait on to ensure computation is complete.