omatadd_batch

Computes a group of out-of-place scaled matrix additions using general matrices.

Description

The omatadd_batch routines perform a series of out-of-place scaled matrix additions. They are similar to the omatadd routines, but the omatadd_batch routines perform matrix operations with a group of matrices.

The matrices are always in a strided format for this API. The operation is defined as:

for i = 0 … batch_size – 1
    A is a matrix at offset i * stridea in a
    B is a matrix at offset i * strideb in b
    C is a matrix at offset i * stridec in c
    C = alpha * op(A) + beta * op(B)
end for

where:

  • op(X) is one of op(X) = X, op(X) = X', or op(X) = conjg(X')

  • alpha and beta are scalars

  • A, B and C are matrices

The API is available with USM pointers or buffer arguments for the input and output arrays.

The input buffers or arrays a and b contain all the input matrices, and the single output buffer or array c contains all the output matrices. The locations of the individual matrices within the buffer or array are given by stride lengths, while the number of matrices is given by the batch_size parameter.

API

Syntax

USM arrays:

event omatadd_batch(queue &queue,
    transpose transa,
    transpose transb,
    std::int64_t m,
    std::int64_t n,
    T alpha,
    const T *a,
    std::int64_t lda,
    std::int64_t stride_a,
    T beta,
    T *b,
    std::int64_t ldb,
    std::int64_t stride_b,
    T *c,
    std::int64_t ldc,
    std::int64_t stride_c,
    std::int64_t batch_size,
    const std::vector<event> &dependencies = {});

Buffer arrays:

void omatadd_batch(queue &queue, transpose transa,
                   transpose transb,
                   std::int64_t m, std::int64_t n,
                   T alpha, cl::sycl::buffer<T, 1> &a,
                   std::int64_t lda, std::int64_t stride_a,
                   T beta, cl::sycl::buffer<T, 1> &b,
                   std::int64_t ldb, std::int64_t stride_b,
                   cl::sycl::buffer<T, 1> &c, std::int64_t ldc,
                   std::int64_t stride_c,
                   std::int64_t batch_size);

omatadd_batch supports the following precisions and devices:

T

Devices Supported

float

Host, CPU, and GPU

double

Host, CPU, and GPU

std::complex<float>

Host, CPU, and GPU

std::complex<double>

Host, CPU, and GPU

Input Parameters

transa

Specifies op(A), the transposition operation applied to the matrices A.

transb

Specifies op(B), the transposition operation applied to the matrices B.

m

Number of rows for the result matrix C. Must be at least zero.

n

Number of columns for the result matrix C. Must be at least zero.

alpha

Scaling factor for the matrices A.

a

Buffer or array holding the input matrices A. Must have size at least stride_a*batch_size.

lda

Leading dimension of the A matrices. If matrices are stored using column major layout, lda must be at least m if A is not transposed or n if A is transposed. If matrices are stored using row major layout, lda must be at least n if A is not transposed or at least m if A is transposed. Must be positive.

stride_a

Stride between the different A matrices. If matrices are stored using column major layout, stride_a must be at least lda*n if A is not transposed or at least lda*m if A is transposed. If matrices are stored using row major layout, stride_a must be at least lda*m if B is not transposed or at least lda*n if A is transposed.

beta

Scaling factor for the matrices B.

b

Buffer or array holding the input matrices B. Must have size at least stride_b*batch_size.

ldb

Leading dimension of the B matrices. If matrices are stored using column major layout, ldb must be at least m if B is not transposed or n if B is transposed. If matrices are stored using row major layout, ldb must be at least n if B is not transposed or at least m if B is transposed. Must be positive.

stride_b

Stride between the different B matrices. If matrices are stored using column major layout, stride_b must be at least ldb*n if B is not transposed or at least ldb*m if B is transposed. If matrices are stored using row major layout, stride_b must be at least ldb*m if B is not transposed or at least ldb*n if B is transposed.

ldc

Leading dimension of the A matrices. If matrices are stored using column major layout, lda must be at least m. If matrices are stored using row major layout, lda must be at least n. Must be positive.

stride_c

Stride between the different C matrices. If matrices are stored using column major layout, stride_c must be at least ldc*n. If matrices are stored using row major layout, stride_c must be at least ldc*m.

batch_size

Specifies the number of input and output matrices to add.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters

c

Output buffer or array, overwritten by batch_size matrix addition operations of the form alpha*op(A) + beta*op(B). Must have size at least stride_c*batch_size.