omatcopy_batch¶
Computes a group of out-of-place scaled matrix transpose or copy operations using general matrices.
Description¶
The omatcopy_batch
routines perform a series of out-of-place scaled
matrix copies or transpositions. They are similar to the omatcopy
routines, but the omatcopy_batch
routines perform matrix operations with
a group of matrices.
The operation for the strided API is defined as:
for i = 0 … batch_size – 1
A and B are matrices at offset i * stridea in a and i * strideb in b
B = alpha * op(A)
end for
The operation for the group API is defined as:
idx = 0
for i = 0 … group_count – 1
m, n, alpha, lda, ldb and group_size at position i in their respective arrays
for j = 0 … group_size – 1
A and B are matrices at position idx in their respective arrays
B = alpha * op(A)
idx := idx + 1
end for
end for
where:
op(X)
is one ofop(X) = X
,op(X) = X'
, orop(X) = conjg(X')
alpha
is a scalarA and B are matrices
The strided API is available with USM pointers or buffer arguments for the input and output arrays, while the group API is available only with USM pointers.
For the strided API, the single input buffer or array contains all the input
matrices, and the single output buffer or array contains all the output
matrices. The locations of the individual matrices within the buffer or
array are given by stride lengths, while the number of matrices is given by
the batch_size
parameter.
For the group API, the matrices are given by arrays of pointers. A and B
represent matrices stored at addresses pointed to by a_array and b_array
respectively. The number of entries in a_array and b_array is
total_batch_count
= the sum of all of the group_size
entries.
API¶
Syntax¶
Strided API
USM arrays:
event omatcopy_batch(queue &queue,
transpose trans,
std::int64_t m,
std::int64_t n,
T alpha,
const T *a,
std::int64_t lda,
std::int64_t stride_a,
T *b,
std::int64_t ldb,
std::int64_t stride_b,
std::int64_t batch_size,
const vector_class<event> &dependencies = {});
Buffer arrays:
void omatcopy_batch(queue &queue, transpose trans,
std::int64_t m, std::int64_t n,
T alpha, cl::sycl::buffer<T, 1> &a,
std::int64_t lda, std::int64_t stride_a,
cl::sycl::buffer<T, 1> &b, std::int64_t ldb,
std::int64_t stride_b, std::int64_t batch_size);
Group API
event omatcopy_batch(queue &queue, const transpose *trans_array,
const std::int64_t *m_array,
const std::int64_t *n_array,
const T *alpha_array, const T **a_array,
const std::int64_t *lda_array, T **b_array,
const std::int64_t *ldb_array,
std::int64_t group_count,
const std::int64_t *groupsize,
const vector_class<event> &dependencies = {});
omatcopy_batch
supports the following precisions and devices:
T |
Devices Supported |
---|---|
|
Host, CPU, and GPU |
|
Host, CPU, and GPU |
|
Host, CPU, and GPU |
|
Host, CPU, and GPU |
Input Parameters¶
Strided API
- trans
Specifies
op(A)
, the transposition operation applied to the matrices A.- m
Number of rows for each matrix A. Must be at least zero.
- n
Number of columns for each matrix A. Must be at least zero.
- alpha
Scaling factor for the matrix transposition or copy.
- a
Buffer or array holding the input matrices A. Must have size at least
stride_a*batch_size
.- lda
Leading dimension of the A matrices. If matrices are stored using column major layout,
lda
must be at leastm
. If matrices are stored using row major layout,lda
must be at leastn
. Must be positive.- stride_a
Stride between the different A matrices. If matrices are stored using column major layout,
stride_a
must be at leastlda*n
. If matrices are stored using row major layout,stride_a
must be at leastlda*m
.- b
Buffer or array holding the input matrices B. Must have size at least
stride_b*batch_size
.- ldb
Leading dimension of the B matrices. If matrices are stored using column major layout,
ldb
must be at leastm
if B is not transposed orn
if B is transposed. If matrices are stored using row major layout,ldb
must be at leastn
if B is not transposed or at leastm
if B is transposed. Must be positive.- stride_b
Stride between the different B matrices. If matrices are stored using column major layout,
stride_b
must be at leastldb*n
if B is not transposed or at leastldb*m
if B is transposed. If matrices are stored using row major layout,stride_b
must be at leastldb*m
if B is not transposed or at leastldb*n
if B is transposed.- batch_size
Specifies the number of matrices to transpose or copy.
- dependencies
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Group API
- trans_array
Array of size
group_count
. Each elementi
in the array specifiesop(A)
the transposition operation applied to the matrices A.- m_array
Array of size
group_count
of number of rows of A. Each must be at least zero.- n_array
Array of size
group_count
of number of columns of A. Each must be at least zero.- alpha_array
Array of size
group_count
containing scaling factors for the operation.- a_array
Array of size
total_batch_count
of pointers to A matrices. If matrices are stored in column major layout, the array allocated for each A matrix of the groupi
must be of size at leastlda_array[i] * n_array[i]
. If matrices are stored in row major layout, the array allocated for each A matrix of the groupi
must be of size at leastlda_array[i]*m_array[i]
.- lda_array
Array of size
group_count
of leading dimension of the A matrices. If matrices are stored using column major layout,lda_array[i]
must be at leastm_array[i]
. If matrices are stored using row major layout,lda_array[i]
must be at leastn_array[i]
. Each must be positive.- b_array
Array of size
total_batch_count
of pointers used to store B matrices. If matrices are stored using column major layout, the array allocated for each B matrix of the groupi
must be of size at leastldb_array[i] * n_array[i]
if B is not transposed orldb_array[i]*m_array[i]
if B is transposed. If matrices are stored using row major layout, the array allocated for each B matrix of the groupi
must be of size at leastldb_array[i] * m_array[i]
if B is not transposed orldb_array[i]*n_array[i]
if B is transposed.- ldb_array
Array of size
group_count
of leading dimension of the B matrices. If matrices are stored using column major layout,ldb_array[i]
must be at leastm_array[i]
if B is not transposed or at leastn_array[i]
if B is transposed. If matrices are stored using row major layout,ldb_array[i]
must be at leastn_array[i]
if B is not transposed or at leastm_array[i]
if B is transposed. Each must be positive.- group_count
Number of groups. Must be at least 0.
- group_size
Array of size
group_count`. The element ``group_size[i]
is the number of matrices in the groupi
. Each element ingroup_size
must be at least 0.- dependencies
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters¶
Strided API
- b
Output buffer, overwritten by
batch_size
matrix transpose or copy operations of the formalpha*op(A)
.
Group API
- b_array
Output array of pointers to B matrices, overwritten by
total_batch_count
matrix transpose or copy operations of the formalpha*op(A)
.