omatadd_batch¶
Computes a group of out-of-place scaled matrix additions using general matrices.
Description¶
The omatadd_batch
routines perform a series of out-of-place scaled
matrix additions. They are similar to the omatadd
routines, but
the omatadd_batch
routines perform matrix operations with a group
of matrices.
The matrices are always in a strided format for this API. The operation is defined as:
for i = 0 … batch_size – 1
A is a matrix at offset i * stridea in a
B is a matrix at offset i * strideb in b
C is a matrix at offset i * stridec in c
C = alpha * op(A) + beta * op(B)
end for
where:
op(X)
is one ofop(X) = X
,op(X) = X'
, orop(X) = conjg(X')
alpha
andbeta
are scalarsA, B and C are matrices
The API is available with USM pointers or buffer arguments for the input and output arrays.
The input buffers or arrays a
and b
contain all the input
matrices, and the single output buffer or array c
contains all the
output matrices. The locations of the individual matrices within the
buffer or array are given by stride lengths, while the number of
matrices is given by the batch_size
parameter.
API¶
Syntax¶
USM arrays:
event omatadd_batch(queue &queue,
transpose transa,
transpose transb,
std::int64_t m,
std::int64_t n,
T alpha,
const T *a,
std::int64_t lda,
std::int64_t stride_a,
T beta,
T *b,
std::int64_t ldb,
std::int64_t stride_b,
T *c,
std::int64_t ldc,
std::int64_t stride_c,
std::int64_t batch_size,
const std::vector<event> &dependencies = {});
Buffer arrays:
void omatadd_batch(queue &queue, transpose transa,
transpose transb,
std::int64_t m, std::int64_t n,
T alpha, cl::sycl::buffer<T, 1> &a,
std::int64_t lda, std::int64_t stride_a,
T beta, cl::sycl::buffer<T, 1> &b,
std::int64_t ldb, std::int64_t stride_b,
cl::sycl::buffer<T, 1> &c, std::int64_t ldc,
std::int64_t stride_c,
std::int64_t batch_size);
omatadd_batch
supports the following precisions and devices:
T |
Devices Supported |
---|---|
|
Host, CPU, and GPU |
|
Host, CPU, and GPU |
|
Host, CPU, and GPU |
|
Host, CPU, and GPU |
Input Parameters¶
- transa
Specifies
op(A)
, the transposition operation applied to the matrices A.- transb
Specifies
op(B)
, the transposition operation applied to the matrices B.- m
Number of rows for the result matrix C. Must be at least zero.
- n
Number of columns for the result matrix C. Must be at least zero.
- alpha
Scaling factor for the matrices A.
- a
Buffer or array holding the input matrices A. Must have size at least
stride_a*batch_size
.- lda
Leading dimension of the A matrices. If matrices are stored using column major layout,
lda
must be at leastm
if A is not transposed orn
if A is transposed. If matrices are stored using row major layout,lda
must be at leastn
if A is not transposed or at leastm
if A is transposed. Must be positive.- stride_a
Stride between the different A matrices. If matrices are stored using column major layout,
stride_a
must be at leastlda*n
if A is not transposed or at leastlda*m
if A is transposed. If matrices are stored using row major layout,stride_a
must be at leastlda*m
if B is not transposed or at leastlda*n
if A is transposed.- beta
Scaling factor for the matrices B.
- b
Buffer or array holding the input matrices B. Must have size at least
stride_b*batch_size
.- ldb
Leading dimension of the B matrices. If matrices are stored using column major layout,
ldb
must be at leastm
if B is not transposed orn
if B is transposed. If matrices are stored using row major layout,ldb
must be at leastn
if B is not transposed or at leastm
if B is transposed. Must be positive.- stride_b
Stride between the different B matrices. If matrices are stored using column major layout,
stride_b
must be at leastldb*n
if B is not transposed or at leastldb*m
if B is transposed. If matrices are stored using row major layout,stride_b
must be at leastldb*m
if B is not transposed or at leastldb*n
if B is transposed.- ldc
Leading dimension of the A matrices. If matrices are stored using column major layout,
lda
must be at leastm
. If matrices are stored using row major layout,lda
must be at leastn
. Must be positive.- stride_c
Stride between the different C matrices. If matrices are stored using column major layout,
stride_c
must be at leastldc*n
. If matrices are stored using row major layout,stride_c
must be at leastldc*m
.- batch_size
Specifies the number of input and output matrices to add.
- dependencies
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters¶
- c
Output buffer or array, overwritten by
batch_size
matrix addition operations of the formalpha*op(A) + beta*op(B)
. Must have size at leaststride_c*batch_size
.