.. _onemkl_blas_gemm_batch: gemm_batch ========== Computes groups of matrix-matrix product with general matrices. Description *********** The ``gemm_batch`` routines are batched versions of :ref:`onemkl_blas_gemm`, performing multiple ``gemm`` operations in a single call. Each ``gemm`` operation performs a matrix-matrix product with general matrices. ``gemm_batch`` supports the following precisions: .. list-table:: :header-rows: 1 * - T * - ``sycl::half`` * - ``float`` * - ``double`` * - ``std::complex`` * - ``std::complex`` gemm_batch (Buffer Version) --------------------------- Buffer version of ``gemm_batch`` supports only strided API. **Strided API** --------------- Strided API operation is defined as: .. code-block:: for i = 0 … batch_size – 1 A, B and C are matrices at offset i * stridea, i * strideb, i * stridec in a, b and c. C = alpha * op(A) * op(B) + beta * C end for where: - op(``X``) is one of op(``X``) = ``X``, or op(``X``) = ``X``\ :sup:`T`, or op(``X``) = ``X``\ :sup:`H` - ``alpha`` and ``beta`` are scalars - ``A``, ``B``, and ``C`` are matrices - op(``A``) is ``m`` x ``k``, op(``B``) is ``k`` x ``n``, and ``C`` is ``m`` x ``n`` For strided API, ``a``, ``b`` and ``c`` buffers contains all the input matrices. The stride between matrices is given by the stride parameters. Total number of matrices in ``a``, ``b`` and ``c`` buffers is given by ``batch_size`` parameter. Syntax ------ .. code-block:: cpp namespace oneapi::mkl::blas::column_major { void gemm_batch(sycl::queue &queue, oneapi::mkl::transpose transa, oneapi::mkl::transpose transb, std::int64_t m, std::int64_t n, std::int64_t k, T alpha, sycl::buffer &a, std::int64_t lda, std::int64_t stridea, sycl::buffer &b, std::int64_t ldb, std::int64_t strideb, T beta, sycl::buffer &c, std::int64_t ldc, std::int64_t stridec, std::int64_t batch_size) } .. code-block:: cpp namespace oneapi::mkl::blas::row_major { void gemm_batch(sycl::queue &queue, oneapi::mkl::transpose transa, oneapi::mkl::transpose transb, std::int64_t m, std::int64_t n, std::int64_t k, T alpha, sycl::buffer &a, std::int64_t lda, std::int64_t stridea, sycl::buffer &b, std::int64_t ldb, std::int64_t strideb, T beta, sycl::buffer &c, std::int64_t ldc, std::int64_t stridec, std::int64_t batch_size) } Input Parameters ---------------- queue The queue where the routine should be executed. transa Specifies op(``A``), transposition operation applied to matrices ``A``. See :ref:`data-types` for more details. transb Specifies op(``B``), transposition operation applied to matrices ``B``. See :ref:`data-types` for more details. m Number of rows of matrices op(``A``) and matrices ``C``. Must be at least zero. n Number of columns of matrices op(``B``) and matrices ``C``. Must be at least zero. k Number of columns of matrices op(``A``) and rows of matrices op(``B``). Must be at least zero. alpha Scaling factor for matrix-matrix products. a Buffer holding input matrices ``A``. Size of the buffer must be at least ``stridea`` * ``batch_size``. lda Leading dimension of matrices ``A``. Must be positive. .. list-table:: :header-rows: 1 * - - ``transa`` = ``transpose::nontrans`` - ``transa`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans`` * - Column major - Must be at least ``m`` - Must be at least ``k`` * - Row major - Must be at least ``k`` - Must be at least ``m`` stridea Stride between two consecutive ``A`` matrices. .. list-table:: :header-rows: 1 * - - ``transa`` = ``transpose::nontrans`` - ``transa`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans`` * - Column major - Must be at least ``lda`` * ``k`` - Must be at least ``lda`` * ``m`` * - Row major - Must be at least ``lda`` * ``m`` - Must be at least ``lda`` * ``k`` b Buffer holding input matrices ``B``. Size of the buffer must be at least ``strideb`` * ``batch_size``. ldb Leading dimension of matrices ``B``. Must be positive. .. list-table:: :header-rows: 1 * - - ``transa`` = ``transpose::nontrans`` - ``transa`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans`` * - Column major - Must be at least ``k`` - Must be at least ``n`` * - Row major - Must be at least ``n`` - Must be at least ``k`` strideb Stride between two consecutive ``B`` matrices. .. list-table:: :header-rows: 1 * - - ``transa`` = ``transpose::nontrans`` - ``transa`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans`` * - Column major - Must be at least ``ldb`` * ``n`` - Must be at least ``ldb`` * ``k`` * - Row major - Must be at least ``ldb`` * ``k`` - Must be at least ``ldb`` * ``n`` beta Scaling factor for matrices ``C``. c Buffer holding input/output matrices ``C``. Size of the buffer must be at least ``stridec`` * ``batch_size``. ldc Leading dimension of matrices ``C``. Must be positive. .. list-table:: * - Column major - Must be at least ``m`` * - Row major - Must be at least ``n`` stridec Stride between two consecutive ``C`` matrices. .. list-table:: * - Column major - Must be at least ``ldc`` * ``n`` * - Row major - Must be at least ``ldc`` * ``m`` batch_size Specifies the number of matrix multiply operations to perform. Must be at least zero. Output Parameters ----------------- c Output buffer overwritten by ``batch_size`` ``gemm`` operations of the form ``alpha`` * op(``A``) * op(``B``) + ``beta`` * ``C``. .. note:: If ``beta`` = 0, matrices ``C`` do not need to be initialized before calling ``gemm_batch``. gemm_batch (USM Version) ------------------------ USM version of ``gemm_batch`` supports group API and strided API. **Group API** ------------- Group API operation is defined as: .. code-block:: idx = 0 for i = 0 … group_count – 1 for j = 0 … group_size – 1 A, B, and C are matrices in a[idx], b[idx] and c[idx] C = alpha[i] * op(A) * op(B) + beta[i] * C idx = idx + 1 end for end for where: - op(``X``) is one of op(``X``) = ``X``, or op(``X``) = ``X``\ :sup:`T`, or op(``X``) = ``X``\ :sup:`H` - ``alpha`` and ``beta`` are scalars - ``A``, ``B``, and ``C`` are matrices - op(``A``) is ``m`` x ``k``, op(``B``) is ``k`` x ``n``, and ``C`` is ``m`` x ``n`` For group API, ``a``, ``b`` and ``c`` arrays contain the pointers for all the input matrices. The total number of matrices in ``a``, ``b`` and ``c`` are given by: .. math:: total\_batch\_count = \sum_{i=0}^{group\_count-1}group\_size[i] Group API supports both pointer and span inputs. The advantage of using span instead of pointer is that the sizes of the array can vary and the size of the span can be queried at runtime. For each ``gemm`` parameter, except the output matrices, span can be of size 1, the number of groups or the total batch size. For output matrices, to ensure all computation are independent, size of the span must be the total batch size. Depending on the size of spans, each parameter for the ``gemm`` computation is used as follows: .. list-table:: * - span size = 1 - Parameter is reused for all ``gemm`` computations * - span size = ``group_count`` - Parameter is reused for all ``gemm`` within a group. Each group will have a different value for this parameter. This is same as ``gemm_batch`` group API with pointers. * - span size = ``total_batch_count`` - Each ``gemm`` computation use a different value for this parameter. Syntax ------ .. code-block:: cpp namespace oneapi::mkl::blas::column_major { sycl::event gemm_batch(sycl::queue &queue, oneapi::mkl::transpose *transa, oneapi::mkl::transpose *transb, std::int64_t *m, std::int64_t *n, std::int64_t *k, T *alpha, const T **a, std::int64_t *lda, const T **b, std::int64_t *ldb, T *beta, T **c, std::int64_t *ldc, std::int64_t group_count, std::int64_t *group_size, const std::vector &dependencies = {}) sycl::event gemm_batch(sycl::queue &queue, const sycl::span &transa, const sycl::span &transb, const sycl::span &m, const sycl::span &n, const sycl::span &k, const sycl::span &alpha, const sycl::span &a, const sycl::span &lda, const sycl::span &b, const sycl::span &ldb, const sycl::span &beta, sycl::span &c, const sycl::span &ldc, size_t group_count, const sycl::span &group_sizes, const std::vector &dependencies = {}) } .. code-block:: cpp namespace oneapi::mkl::blas::row_major { sycl::event gemm_batch(sycl::queue &queue, oneapi::mkl::transpose *transa, oneapi::mkl::transpose *transb, std::int64_t *m, std::int64_t *n, std::int64_t *k, T *alpha, const T **a, std::int64_t *lda, const T **b, std::int64_t *ldb, T *beta, T **c, std::int64_t *ldc, std::int64_t group_count, std::int64_t *group_size, const std::vector &dependencies = {}) sycl::event gemm_batch(sycl::queue &queue, const sycl::span &transa, const sycl::span &transb, const sycl::span &m, const sycl::span &n, const sycl::span &k, const sycl::span &alpha, const sycl::span &a, const sycl::span &lda, const sycl::span &b, const sycl::span &ldb, const sycl::span &beta, sycl::span &c, const sycl::span &ldc, size_t group_count, const sycl::span &group_sizes, const std::vector &dependencies = {}) } Input Parameters ---------------- queue The queue where the routine should be executed. transa Array or span of ``group_count`` ``oneapi::mkl::transpose`` values. ``transa[i]`` specifies op(``A``), the transposition operation applied to matrices ``A`` in group ``i``. See :ref:`data-types` for more details. transb Array or span of ``group_count`` ``oneapi::mkl::transpose`` values. ``transb[i]`` specifies op(``B``), the transposition operation applied to matrices ``B`` in group ``i``. See :ref:`data-types` for more details. m Array or span of ``group_count`` integers. ``m[i]`` specifies number of rows of matrices op(``A``) and matrices ``C`` in group ``i``. All entries must be at least zero. n Array or span of ``group_count`` integers. ``n[i]`` specifies number of columns of matrices op(``B``) and matrices ``C`` in group ``i``. All entries must be at least zero. k Array or span of ``group_count`` integers. ``k[i]`` specifies number of columns of matrices op(``A``) and rows of matrices op(``B``) in group ``i``. All entries must be at least zero. alpha Array or span of ``group_count`` scalar elements. ``alpha[i]`` specifies scaling factor for matrix-matrix products in group ``i``. a Array of ``total_batch_count`` pointers for input matrices ``A`` or span of ``total_batch_count`` input matrices ``A``. See :ref:`matrix-storage` for more details. .. list-table:: :header-rows: 1 * - - ``transa`` = ``transpose::nontrans`` - ``transa`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans`` * - Column major - Size of array ``A[i]`` must be at least ``lda[i]`` * ``k[i]`` - Size of array ``A[i]`` must be at least ``lda[i]`` * ``m[i]`` * - Row major - Size of array ``A[i]`` must be at least ``lda[i]`` * ``m[i]`` - Size of array ``A[i]`` must be at least ``lda[i]`` * ``k[i]`` lda Array or span of ``group_count`` integers. ``lda[i]`` specifies leading dimension of matrices ``A`` in group ``i``. Must be positive. .. list-table:: :header-rows: 1 * - - ``transa`` = ``transpose::nontrans`` - ``transa`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans`` * - Column major - Must be at least ``m[i]`` - Must be at least ``k[i]`` * - Row major - Must be at least ``k[i]`` - Must be at least ``m[i]`` b Array of ``total_batch_count`` pointers for input matrices ``B`` or span of ``total_batch_count`` input matrices ``B``. See :ref:`matrix-storage` for more details. .. list-table:: :header-rows: 1 * - - ``transa`` = ``transpose::nontrans`` - ``transa`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans`` * - Column major - Size of array ``B[i]`` must be at least ``ldb[i]`` * ``n[i]`` - Size of array ``B[i]`` must be at least ``ldb[i]`` * ``k[i]`` * - Row major - Size of array ``B[i]`` must be at least ``ldb[i]`` * ``k[i]`` - Size of array ``B[i]`` must be at least ``ldb[i]`` * ``n[i]`` ldb Array or span of ``group_count`` integers. ``ldb[i]`` specifies leading dimension of matrices ``B`` in group ``i``. Must be positive. .. list-table:: :header-rows: 1 * - - ``transa`` = ``transpose::nontrans`` - ``transa`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans`` * - Column major - Must be at least ``k[i]`` - Must be at least ``n[i]`` * - Row major - Must be at least ``n[i]`` - Must be at least ``k[i]`` beta Array or span of ``group_count`` scalar elements. ``beta[i]`` specifies scaling factor for matrices ``C`` in group ``i``. c Array of ``total_batch_count`` pointers for input/output matrices ``C`` or span of ``total_batch_count`` input/output matrices ``C``. See :ref:`matrix-storage` for more details. .. list-table:: * - Column major - Size of array ``C[i]`` must be at least ``ldc[i]`` * ``n[i]`` * - Row major - Size of array ``C[i]`` must be at least ``ldc[i]`` * ``m[i]`` ldc Array or span of ``group_count`` integers. ``ldc[i]`` specifies leading dimension of matrices ``C`` in group ``i``. Must be positive. .. list-table:: * - Column major - Must be at least ``m[i]`` * - Row major - Must be at least ``n[i]`` group_count Number of groups. Must be at least zero. group_size Array or span of ``group_count`` integers. ``group_size[i]`` specifies the number of ``gemm`` operations in group ``i``. Each element in ``group_size`` must be at least zero. dependencies List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies. Output Parameters ----------------- c Array of pointers to output matrices ``C`` overwritten by ``total_batch_count`` ``gemm`` operations of the form ``alpha`` * op(``A``) * op(``B``) + ``beta`` * ``C``. .. note:: If ``beta`` = 0, matrices ``C`` do not need to be initialized before calling ``gemm_batch``. Return Values ------------- Output event to wait on to ensure computation is complete. Examples ******** An example of how to use USM version of ``gemm_batch`` can be found in |O-MKL| installation directory, under: .. code-block:: examples/dpcpp/blas/source/gemm_batch.cpp **Strided API** --------------- Strided API operation is defined as: .. code-block:: for i = 0 … batch_size – 1 A, B and C are matrices at offset i * stridea, i * strideb, i * stridec in a, b and c. C = alpha * op(A) * op(B) + beta * C end for where: - op(``X``) is one of op(``X``) = ``X``, or op(``X``) = ``X``\ :sup:`T`, or op(``X``) = ``X``\ :sup:`H` - ``alpha`` and ``beta`` are scalars - ``A``, ``B``, and ``C`` are matrices - op(``A``) is ``m`` x ``k``, op(``B``) is ``k`` x ``n``, and ``C`` is ``m`` x ``n`` For strided API, ``a``, ``b`` and ``c`` arrays contains all the input matrices. The stride between matrices is either given by the stride parameters. Total number of matrices in ``a``, ``b`` and ``c`` arrays is given by ``batch_size`` parameter. Syntax ------ .. code-block:: cpp namespace oneapi::mkl::blas::column_major { sycl::event gemm_batch(sycl::queue &queue, oneapi::mkl::transpose transa, oneapi::mkl::transpose transb, std::int64_t m, std::int64_t n, std::int64_t k, T alpha, const T *a, std::int64_t lda, std::int64_t stridea, const T *b, std::int64_t ldb, std::int64_t strideb, T beta, T *c, std::int64_t ldc, std::int64_t stridec, std::int64_t batch_size, const std::vector &dependencies = {}) } .. code-block:: cpp namespace oneapi::mkl::blas::row_major { sycl::event gemm_batch(sycl::queue &queue, oneapi::mkl::transpose transa, oneapi::mkl::transpose transb, std::int64_t m, std::int64_t n, std::int64_t k, T alpha, const T *a, std::int64_t lda, std::int64_t stridea, const T *b, std::int64_t ldb, std::int64_t strideb, T beta, T *c, std::int64_t ldc, std::int64_t stridec, std::int64_t batch_size, const std::vector &dependencies = {}) } Input Parameters ---------------- queue The queue where the routine should be executed. transa Specifies op(``A``), transposition operation applied to matrices ``A``. See :ref:`data-types` for more details. transb Specifies op(``B``), transposition operation applied to matrices ``B``. See :ref:`data-types` for more details. m Number of rows of matrices op(``A``) and matrices ``C``. Must be at least zero. n Number of columns of matrices op(``B``) and matrices ``C``. Must be at least zero. k Number of columns of matrices op(``A``) and rows of matrices op(``B``). Must be at least zero. alpha Scaling factor for matrix-matrix products. a Pointer to input matrices ``A``. Size of the array must be at least ``stridea`` * ``batch_size``. lda Leading dimension of matrices ``A``. Must be positive. .. list-table:: :header-rows: 1 * - - ``transa`` = ``transpose::nontrans`` - ``transa`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans`` * - Column major - Must be at least ``m`` - Must be at least ``k`` * - Row major - Must be at least ``k`` - Must be at least ``m`` stridea Stride between two consecutive ``A`` matrices. .. list-table:: :header-rows: 1 * - - ``transa`` = ``transpose::nontrans`` - ``transa`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans`` * - Column major - Must be at least ``lda`` * ``k`` - Must be at least ``lda`` * ``m`` * - Row major - Must be at least ``lda`` * ``m`` - Must be at least ``lda`` * ``k`` b Pointer to input matrices ``B``. Size of the array must be at least ``strideb`` * ``batch_size``. ldb Leading dimension of matrices ``B``. Must be positive. .. list-table:: :header-rows: 1 * - - ``transa`` = ``transpose::nontrans`` - ``transa`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans`` * - Column major - Must be at least ``k`` - Must be at least ``n`` * - Row major - Must be at least ``n`` - Must be at least ``k`` strideb Stride between two consecutive ``B`` matrices. .. list-table:: :header-rows: 1 * - - ``transa`` = ``transpose::nontrans`` - ``transa`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans`` * - Column major - Must be at least ``ldb`` * ``n`` - Must be at least ``ldb`` * ``k`` * - Row major - Must be at least ``ldb`` * ``k`` - Must be at least ``ldb`` * ``n`` beta Scaling factor for matrices ``C``. c Pointer to input/output matrices ``C``. Size of the array must be at least ``stridec`` * ``batch_size``. ldc Leading dimension of matrices ``C``. Must be positive. .. list-table:: * - Column major - Must be at least ``m`` * - Row major - Must be at least ``n`` stridec Stride between two consecutive ``C`` matrices. .. list-table:: * - Column major - Must be at least ``ldc`` * ``n`` * - Row major - Must be at least ``ldc`` * ``m`` batch_size Specifies the number of matrix multiply operations to perform. Must be at least zero. dependencies List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies. Output Parameters ----------------- c Pointer to output matrices ``C`` overwritten by ``batch_size`` ``gemm`` operations of the form ``alpha`` * op(``A``) * op(``B``) + ``beta`` * ``C``. .. note:: If ``beta`` = 0, matrices ``C`` do not need to be initialized before calling ``gemm_batch``. Return Values ------------- Output event to wait on to ensure computation is complete.