axpy_batch

Computes a group of axpy operations.

Description

The axpy_batch routines are batched versions of axpy, performing multiple axpy operations in a single call. Each axpy operation adds a scalar-vector product to a vector.

axpy_batch supports the following precisions:

T

float

double

std::complex<float>

std::complex<double>

axpy_batch (Buffer Version)

Buffer version of axpy_batch supports only strided API.

Strided API

Strided API operation is defined as:

for i = 0 … batch_size – 1
    X and Y are vectors at offset i * stridex and i * stridey in x and y
    Y = alpha * X + Y
end for

where:

  • alpha is scalar

  • X and Y are vectors

For strided API, all vectors X and Y have same parameters (size, increments) and are stored at constant stride given by stridex and stridey from each other. The x and y arrays contain all the input vectors. Total number of vectors in x and y are given by batch_size parameter.

Syntax

namespace oneapi::mkl::blas::column_major {
    void axpy_batch(sycl::queue &queue,
                    std::int64_t n,
                    T alpha,
                    sycl::buffer<T, 1> &x,
                    std::int64_t incx,
                    std::int64_t stridex,
                    sycl::buffer<T, 1> &y,
                    std::int64_t incy,
                    std::int64_t stridey,
                    std::int64_t batch_size)
}
namespace oneapi::mkl::blas::row_major {
    void axpy_batch(sycl::queue &queue,
                    std::int64_t n,
                    T alpha,
                    sycl::buffer<T, 1> &x,
                    std::int64_t incx,
                    std::int64_t stridex,
                    sycl::buffer<T, 1> &y,
                    std::int64_t incy,
                    std::int64_t stridey,
                    std::int64_t batch_size)
}

Input Parameters

queue

The queue where the routine should be executed.

n

Number of elements in vectors X and Y.

alpha

Specifies the scalar alpha.

x

Buffer holding input vectors X. Size of the buffer must be at least batch_size * stridex.

incx

Stride between two consecutive elements of X vectors.

stridex

Stride between two consecutive X vectors. Must be at least (1 + (n-1)*abs(incx)). See Matrix Storage for more details.

y

Buffer holding input/output vectors Y. Size of the buffer must be at least batch_size * stridey.

incy

Stride between two consecutive elements of Y vectors.

stridey

Stride between two consecutive Y vectors. Must be at least (1 + (n-1)*abs(incy)). See Matrix Storage for more details.

batch_size

Number of axpy computations to perform. Must be at least zero.

Output Parameters

y

Output buffer overwritten by batch_size axpy operations of the form alpha * X + Y.

axpy_batch (USM Version)

USM version of axpy_batch supports group API and strided API.

Group API

Group API operation is defined as:

idx = 0
for i = 0 … group_count – 1
     for j = 0 … group_size – 1
         X and Y are vectors at x[idx] and y[idx]
         Y = alpha[i] * X + Y
         idx = idx + 1
     end for
end for

where:

  • alpha is scalar

  • X and Y are vectors

For group API, each group contains vectors with the same parameters (size and increment). The x and y arrays contain the pointers for all the input vectors. Total number of vectors in x and y are given by:

total\_batch\_count = \sum_{i=0}^{group\_count-1}group\_size[i]

Syntax

namespace oneapi::mkl::blas::column_major {
    sycl::event axpy_batch(sycl::queue &queue,
                           std::int64_t *n,
                           T *alpha,
                           const T **x,
                           std::int64_t *incx,
                           T **y,
                           std::int64_t *incy,
                           std::int64_t group_count,
                           std::int64_t *group_size,
                           const std::vector<sycl::event> &dependencies = {})
}
namespace oneapi::mkl::blas::row_major {
    sycl::event axpy_batch(sycl::queue &queue,
                           std::int64_t *n,
                           T *alpha,
                           const T **x,
                           std::int64_t *incx,
                           T **y,
                           std::int64_t *incy,
                           std::int64_t group_count,
                           std::int64_t *group_size,
                           const std::vector<sycl::event> &dependencies = {})
}

Input Parameters

queue

The queue where the routine should be executed.

n

Array of group_count integers. n[i] specifies number of elements in vectors X and Y for every vector in group i.

alpha

Array of group_count scalar elements. alpha[i] specifies scaling factor for vector X in group i.

x

Array of pointers to input vectors X with size total_batch_count. Size of the array allocated for the X vector of the group i must be at least (1 + (n[i] – 1)*abs(incx[i])). See Matrix Storage for more details.

incx

Array of group_count integers. incx[i] specifies stride of vector X in group i.

y

Array of pointers to input/output vectors Y with size total_batch_count. Size of the array allocated for the Y vector of the group i must be at least (1 + (n[i] – 1)*abs(incy[i])). See Matrix Storage for more details.

incy

Array of group_count integers. incy[i] specifies the stride of vector Y in group i.

group_count

Number of groups. Must be at least zero.

group_size

Array of group_count integers. group_size[i] specifies the number of axpy operations in group i. Each element in group_size must be at least zero.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters

y

Array of pointers holding Y vectors, overwritten by total_batch_count axpy operations of the form alpha * X + Y.

Return Values

Output event to wait on to ensure computation is complete.

Strided API

Strided API operation is defined as:

for i = 0 … batch_size – 1
    X and Y are vectors at offset i * stridex and i * stridey in x and y
    Y = alpha * X + Y
end for

where:

  • alpha is scalar

  • X and Y are vectors

For strided API, all vectors X and Y have same parameters (size, increments) and are stored at constant stride given by stridex and stridey from each other. The x and y arrays contain all the input vectors. Total number of vectors in x and y are given by batch_size parameter.

Syntax

namespace oneapi::mkl::blas::column_major {
    sycl::event axpy_batch(sycl::queue &queue,
                           std::int64_t n,
                           T alpha,
                           const T *x,
                           std::int64_t incx,
                           std::int64_t stridex,
                           T *y,
                           std::int64_t incy,
                           std::int64_t stridey,
                           std::int64_t batch_size,
                           const std::vector<sycl::event> &dependencies = {})
}
namespace oneapi::mkl::blas::row_major {
    sycl::event axpy_batch(sycl::queue &queue,
                           std::int64_t n,
                           T alpha,
                           const T *x,
                           std::int64_t incx,
                           std::int64_t stridex,
                           T *y,
                           std::int64_t incy,
                           std::int64_t stridey,
                           std::int64_t batch_size,
                           const std::vector<sycl::event> &dependencies = {})
}

Input Parameters

queue

The queue where the routine should be executed.

n

Number of elements in vectors X and Y.

alpha

Specifies the scalar alpha.

x

Pointer to input vectors X. Size of the array must be at least batch_size * stridex.

incx

Stride between two consecutive elements of X vectors.

stridex

Stride between two consecutive X vectors. Must be at least (1 + (n-1)*abs(incx)). See Matrix Storage for more details.

y

Pointer to input/output vectors Y. Size of the array must be at least batch_size * stridey.

incy

Stride between two consecutive elements of Y vectors.

stridey

Stride between two consecutive Y vectors. Must be at least (1 + (n-1)*abs(incy)). See Matrix Storage for more details.

batch_size

Number of axpy computations to perform. Must be at least zero.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters

y

Pointer to output vectors Y overwritten by batch_size axpy operations of the form alpha * X + Y.

Return Values

Output event to wait on to ensure computation is complete.