copy_batch

Computes a group of copy operations.

Description

The copy_batch routines are batched versions of copy, performing multiple copy operations in a single call. Each copy operation copies one vector to another.

copy_batch supports the following precisions:

T

float

double

std::complex<float>

std::complex<double>

copy_batch (Buffer Version)

Buffer version of copy_batch supports only strided API.

Strided API

Strided API operation is defined as:

for i = 0 … batch_size – 1
    X and Y are vectors at offset i * stridex and i * stridey in x and y
    Y = X
end for

where:

  • X and Y are vectors

For strided API, all vectors x (y) have same parameters (size, increments) and are stored at constant stridex (stridey) from each other. The x and y arrays contain all the input vectors. Total number of vectors in x and y are given by batch_size parameter.

Syntax

namespace oneapi::mkl::blas::column_major {
    void copy_batch(sycl::queue &queue,
                    std::int64_t n,
                    sycl::buffer<T, 1> &x,
                    std::int64_t incx,
                    std::int64_t stridex,
                    sycl::buffer<T, 1> &y,
                    std::int64_t incy,
                    std::int64_t stridey,
                    std::int64_t batch_size)
}
namespace oneapi::mkl::blas::row_major {
    void copy_batch(sycl::queue &queue,
                    std::int64_t n,
                    sycl::buffer<T, 1> &x,
                    std::int64_t incx,
                    std::int64_t stridex,
                    sycl::buffer<T, 1> &y,
                    std::int64_t incy,
                    std::int64_t stridey,
                    std::int64_t batch_size)
}

Input Parameters

queue

The queue where the routine should be executed.

n

Number of elements in vectors X and Y.

x

Buffer holding input vectors X. Size of the buffer must be at least batch_size * stridex.

incx

Stride between two consecutive elements of X vectors.

stridex

Stride between two consecutive X vectors. Must be at least (1 + (n-1)*abs(incx)). See Matrix Storage for more details.

y

Buffer holding input/output vectors Y. Size of the buffer must be at least batch_size * stridey.

incy

Stride between two consecutive elements of Y vectors.

stridey

Stride between two consecutive Y vectors. Must be at least (1 + (n-1)*abs(incy)). See Matrix Storage for more details.

batch_size

Number of copy computations to perform. Must be at least zero.

Output Parameters

y

Output buffer overwritten by batch_size copy operations.

copy_batch (USM Version)

USM version of copy_batch supports group API and strided API.

Group API

Group API operation is defined as:

idx = 0
for i = 0 … group_count – 1
     for j = 0 … group_size – 1
         X and Y are vectors at x[idx] and y[idx]
         Y = X
         idx = idx + 1
     end for
end for

where:

  • X and Y are vectors

For group API, each group contains vectors with the same parameters (size and increment). The x and y arrays contain the pointers for all the input vectors. Total number of vectors in x and y are given by:

total\_batch\_count = \sum_{i=0}^{group\_count-1}group\_size[i]

Syntax

namespace oneapi::mkl::blas::column_major {
    sycl::event copy_batch(sycl::queue &queue,
                           std::int64_t *n,
                           const T **x,
                           std::int64_t *incx,
                           T **y,
                           std::int64_t *incy,
                           std::int64_t group_count,
                           std::int64_t *group_size,
                           const std::vector<sycl::event> &dependencies = {})
}
namespace oneapi::mkl::blas::row_major {
    sycl::event copy_batch(sycl::queue &queue,
                           std::int64_t *n,
                           const T **x,
                           std::int64_t *incx,
                           T **y,
                           std::int64_t *incy,
                           std::int64_t group_count,
                           std::int64_t *group_size,
                           const std::vector<sycl::event> &dependencies = {})
}

Input Parameters

queue

The queue where the routine should be executed.

n

Array of group_count integers. n[i] specifies number of elements in vectors X and Y for every vector in group i.

x

Array of pointers to input vectors X with size total_batch_count. Size of the array allocated for the X vector of the group i must be at least (1 + (n[i] – 1)*abs(incx[i])). See Matrix Storage for more details.

incx

Array of group_count integers. incx[i] specifies stride of vector X in group i.

y

Array of pointers to input/output vectors Y with size total_batch_count. Size of the array allocated for the Y vector of the group i must be at least (1 + (n[i] – 1)*abs(incy[i])). See Matrix Storage for more details.

incy

Array of group_count integers. incy[i] specifies the stride of vector Y in group i.

group_count

Number of groups. Must be at least zero.

group_size

Array of group_count integers. group_size[i] specifies the number of copy operations in group i. Each element in group_size must be at least zero.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters

y

Array of pointers holding Y vectors, overwritten by total_batch_count copy operations.

Return Values

Output event to wait on to ensure computation is complete.

Examples

An example of how to use USM version of copy_batch can be found in oneMKL installation directory, under:

examples/dpcpp/blas/source/copy_batch_usm.cpp

Strided API

Strided API operation is defined as:

for i = 0 … batch_size – 1
    X and Y are vectors at offset i * stridex and i * stridey in x and y
    Y = X
end for

where:

  • X and Y are vectors

For strided API, all vectors x (y) have same parameters (size, increments) and are stored at constant stridex (stridey) from each other. The x and y arrays contain all the input vectors. Total number of vectors in x and y are given by batch_size parameter.

Syntax

namespace oneapi::mkl::blas::column_major {
    sycl::event copy_batch(sycl::queue &queue,
                           std::int64_t n,
                           const T *x,
                           std::int64_t incx,
                           std::int64_t stridex,
                           T *y,
                           std::int64_t incy,
                           std::int64_t stridey,
                           std::int64_t batch_size,
                           const std::vector<sycl::event> &dependencies = {})
}
namespace oneapi::mkl::blas::row_major {
    sycl::event copy_batch(sycl::queue &queue,
                           std::int64_t n,
                           const T *x,
                           std::int64_t incx,
                           std::int64_t stridex,
                           T *y,
                           std::int64_t incy,
                           std::int64_t stridey,
                           std::int64_t batch_size,
                           const std::vector<sycl::event> &dependencies = {})
}

Input Parameters

queue

The queue where the routine should be executed.

n

Number of elements in vectors X and Y.

x

Pointer to input vectors X. Size of the array must be at least batch_size * stridex.

incx

Stride between two consecutive elements of X vectors.

stridex

Stride between two consecutive X vectors. Must be at least (1 + (n-1)*abs(incx)). See Matrix Storage for more details.

y

Pointer to input/output vectors Y. Size of the array must be at least batch_size * stridey.

incy

Stride between two consecutive elements of Y vectors.

stridey

Stride between two consecutive Y vectors. Must be at least (1 + (n-1)*abs(incy)). See Matrix Storage for more details.

batch_size

Number of copy computations to perform. Must be at least zero.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters

y

Output vectors overwritten by batch_size copy operations.

Return Values

Output event to wait on to ensure computation is complete.