omatadd_batch¶

Computes a group of out-of-place scaled matrix additions using general matrices.

Description¶

The omatadd_batch routines perform a series of out-of-place scaled matrix additions. They are similar to the omatadd routines, but the omatadd_batch routines perform matrix operations with a group of matrices.

The matrices are always in a strided format for this API. The operation is defined as:

for i = 0 … batch_size – 1
    A is a matrix at offset i * stridea in a
    B is a matrix at offset i * strideb in b
    C is a matrix at offset i * stridec in c
    C = alpha * op(A) + beta * op(B)
end for

where:

op(X) is one of op(X) = X, op(X) = X', or op(X) = conjg(X')
alpha and beta are scalars
A, B and C are matrices

The API is available with USM pointers or buffer arguments for the input and output arrays.

The input buffers or arrays a and b contain all the input matrices, and the single output buffer or array c contains all the output matrices. The locations of the individual matrices within the buffer or array are given by stride lengths, while the number of matrices is given by the batch_size parameter.

API¶

Syntax¶

USM arrays:

event omatadd_batch(queue &queue,
    transpose transa,
    transpose transb,
    std::int64_t m,
    std::int64_t n,
    T alpha,
    const T *a,
    std::int64_t lda,
    std::int64_t stride_a,
    T beta,
    T *b,
    std::int64_t ldb,
    std::int64_t stride_b,
    T *c,
    std::int64_t ldc,
    std::int64_t stride_c,
    std::int64_t batch_size,
    const std::vector<event> &dependencies = {});

Buffer arrays:

void omatadd_batch(queue &queue, transpose transa,
                   transpose transb,
                   std::int64_t m, std::int64_t n,
                   T alpha, cl::sycl::buffer<T, 1> &a,
                   std::int64_t lda, std::int64_t stride_a,
                   T beta, cl::sycl::buffer<T, 1> &b,
                   std::int64_t ldb, std::int64_t stride_b,
                   cl::sycl::buffer<T, 1> &c, std::int64_t ldc,
                   std::int64_t stride_c,
                   std::int64_t batch_size);

omatadd_batch supports the following precisions and devices:

T	Devices Supported
`float`	Host, CPU, and GPU
`double`	Host, CPU, and GPU
`std::complex<float>`	Host, CPU, and GPU
`std::complex<double>`	Host, CPU, and GPU

Input Parameters¶

transa: Specifies op(A), the transposition operation applied to the matrices A.
transb: Specifies op(B), the transposition operation applied to the matrices B.
m: Number of rows for the result matrix C. Must be at least zero.
n: Number of columns for the result matrix C. Must be at least zero.
alpha: Scaling factor for the matrices A.
a: Buffer or array holding the input matrices A. Must have size at least stride_a*batch_size.
lda: Leading dimension of the A matrices. If matrices are stored using column major layout, lda must be at least m if A is not transposed or n if A is transposed. If matrices are stored using row major layout, lda must be at least n if A is not transposed or at least m if A is transposed. Must be positive.
stride_a: Stride between the different A matrices. If matrices are stored using column major layout, stride_a must be at least lda*n if A is not transposed or at least lda*m if A is transposed. If matrices are stored using row major layout, stride_a must be at least lda*m if B is not transposed or at least lda*n if A is transposed.
beta: Scaling factor for the matrices B.
b: Buffer or array holding the input matrices B. Must have size at least stride_b*batch_size.
ldb: Leading dimension of the B matrices. If matrices are stored using column major layout, ldb must be at least m if B is not transposed or n if B is transposed. If matrices are stored using row major layout, ldb must be at least n if B is not transposed or at least m if B is transposed. Must be positive.
stride_b: Stride between the different B matrices. If matrices are stored using column major layout, stride_b must be at least ldb*n if B is not transposed or at least ldb*m if B is transposed. If matrices are stored using row major layout, stride_b must be at least ldb*m if B is not transposed or at least ldb*n if B is transposed.
ldc: Leading dimension of the A matrices. If matrices are stored using column major layout, lda must be at least m. If matrices are stored using row major layout, lda must be at least n. Must be positive.
stride_c: Stride between the different C matrices. If matrices are stored using column major layout, stride_c must be at least ldc*n. If matrices are stored using row major layout, stride_c must be at least ldc*m.
batch_size: Specifies the number of input and output matrices to add.
dependencies: List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters¶

c: Output buffer or array, overwritten by batch_size matrix addition operations of the form alpha*op(A) + beta*op(B). Must have size at least stride_c*batch_size.

imatcopy_batch Sparse BLAS Routines

oneAPI Math Kernel Library (oneMKL) documentation