omatcopy_batch

Computes a group of out-of-place scaled matrix transpose or copy operations using general matrices.

Description

The omatcopy_batch routines perform a series of out-of-place scaled matrix copies or transpositions. They are similar to the omatcopy routines, but the omatcopy_batch routines perform matrix operations with a group of matrices.

The operation for the strided API is defined as:

for i = 0 … batch_size – 1
    A and B are matrices at offset i * stridea in a and i * strideb in b
    B = alpha * op(A)
end for

The operation for the group API is defined as:

idx = 0
for i = 0 … group_count – 1
    m, n, alpha, lda, ldb and group_size at position i in their respective arrays
    for j = 0 … group_size – 1
        A and B are matrices at position idx in their respective arrays
        B = alpha * op(A)
        idx := idx + 1
    end for
end for

where:

  • op(X) is one of op(X) = X, op(X) = X', or op(X) = conjg(X')

  • alpha is a scalar

  • A and B are matrices

The strided API is available with USM pointers or buffer arguments for the input and output arrays, while the group API is available only with USM pointers.

For the strided API, the single input buffer or array contains all the input matrices, and the single output buffer or array contains all the output matrices. The locations of the individual matrices within the buffer or array are given by stride lengths, while the number of matrices is given by the batch_size parameter.

For the group API, the matrices are given by arrays of pointers. A and B represent matrices stored at addresses pointed to by a_array and b_array respectively. The number of entries in a_array and b_array is total_batch_count = the sum of all of the group_size entries.

API

Syntax

Strided API

USM arrays:

event omatcopy_batch(queue &queue,
    transpose trans,
    std::int64_t m,
    std::int64_t n,
    T alpha,
    const T *a,
    std::int64_t lda,
    std::int64_t stride_a,
    T *b,
    std::int64_t ldb,
    std::int64_t stride_b,
    std::int64_t batch_size,
    const vector_class<event> &dependencies = {});

Buffer arrays:

void omatcopy_batch(queue &queue, transpose trans,
                    std::int64_t m, std::int64_t n,
                    T alpha, cl::sycl::buffer<T, 1> &a,
                    std::int64_t lda, std::int64_t stride_a,
                    cl::sycl::buffer<T, 1> &b, std::int64_t ldb,
                    std::int64_t stride_b, std::int64_t batch_size);

Group API

event omatcopy_batch(queue &queue, const transpose *trans_array,
                     const std::int64_t *m_array,
                     const std::int64_t *n_array,
                     const T *alpha_array, const T **a_array,
                     const std::int64_t *lda_array, T **b_array,
                     const std::int64_t *ldb_array,
                     std::int64_t group_count,
                     const std::int64_t *groupsize,
                     const vector_class<event> &dependencies = {});

omatcopy_batch supports the following precisions and devices:

T

Devices Supported

float

Host, CPU, and GPU

double

Host, CPU, and GPU

std::complex<float>

Host, CPU, and GPU

std::complex<double>

Host, CPU, and GPU

Input Parameters

Strided API

trans

Specifies op(A), the transposition operation applied to the matrices A.

m

Number of rows for each matrix A. Must be at least zero.

n

Number of columns for each matrix A. Must be at least zero.

alpha

Scaling factor for the matrix transposition or copy.

a

Buffer or array holding the input matrices A. Must have size at least stride_a*batch_size.

lda

Leading dimension of the A matrices. If matrices are stored using column major layout, lda must be at least m. If matrices are stored using row major layout, lda must be at least n. Must be positive.

stride_a

Stride between the different A matrices. If matrices are stored using column major layout, stride_a must be at least lda*n. If matrices are stored using row major layout, stride_a must be at least lda*m.

b

Buffer or array holding the input matrices B. Must have size at least stride_b*batch_size.

ldb

Leading dimension of the B matrices. If matrices are stored using column major layout, ldb must be at least m if B is not transposed or n if B is transposed. If matrices are stored using row major layout, ldb must be at least n if B is not transposed or at least m if B is transposed. Must be positive.

stride_b

Stride between the different B matrices. If matrices are stored using column major layout, stride_b must be at least ldb*n if B is not transposed or at least ldb*m if B is transposed. If matrices are stored using row major layout, stride_b must be at least ldb*m if B is not transposed or at least ldb*n if B is transposed.

batch_size

Specifies the number of matrices to transpose or copy.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Group API

trans_array

Array of size group_count. Each element i in the array specifies op(A) the transposition operation applied to the matrices A.

m_array

Array of size group_count of number of rows of A. Each must be at least zero.

n_array

Array of size group_count of number of columns of A. Each must be at least zero.

alpha_array

Array of size group_count containing scaling factors for the operation.

a_array

Array of size total_batch_count of pointers to A matrices. If matrices are stored in column major layout, the array allocated for each A matrix of the group i must be of size at least lda_array[i] * n_array[i]. If matrices are stored in row major layout, the array allocated for each A matrix of the group i must be of size at least lda_array[i]*m_array[i].

lda_array

Array of size group_count of leading dimension of the A matrices. If matrices are stored using column major layout, lda_array[i] must be at least m_array[i]. If matrices are stored using row major layout, lda_array[i] must be at least n_array[i]. Each must be positive.

b_array

Array of size total_batch_count of pointers used to store B matrices. If matrices are stored using column major layout, the array allocated for each B matrix of the group i must be of size at least ldb_array[i] * n_array[i] if B is not transposed or ldb_array[i]*m_array[i] if B is transposed. If matrices are stored using row major layout, the array allocated for each B matrix of the group i must be of size at least ldb_array[i] * m_array[i] if B is not transposed or ldb_array[i]*n_array[i] if B is transposed.

ldb_array

Array of size group_count of leading dimension of the B matrices. If matrices are stored using column major layout, ldb_array[i] must be at least m_array[i] if B is not transposed or at least n_array[i] if B is transposed. If matrices are stored using row major layout, ldb_array[i] must be at least n_array[i] if B is not transposed or at least m_array[i] if B is transposed. Each must be positive.

group_count

Number of groups. Must be at least 0.

group_size

Array of size group_count`. The element ``group_size[i] is the number of matrices in the group i. Each element in group_size must be at least 0.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters

Strided API

b

Output buffer, overwritten by batch_size matrix transpose or copy operations of the form alpha*op(A).

Group API

b_array

Output array of pointers to B matrices, overwritten by total_batch_count matrix transpose or copy operations of the form alpha*op(A).